Machine learning

Machine Learning algorithms

Project maintained by sudharsannr Hosted on GitHub Pages — Theme by mattgraham

Machine Learning sample algorithms

This page describes about my implementations of Machine learning algorithms. This specifically deals with supervised learning methods. The implementations are carried out mostly in matlab. The core notion is to given a set of data containing relations, the system should be able to learn from the data and able to apply the relations to the new data.

Terminologies

Vector

Set of data that is generally mapped together in a structured format, probably in a matrix form which simplifies its operation.

Training

The process where the incoming training vectors consisting of the known labeling or classifications are fed to the system for data aggregation. The training vectors consist of the training vector and its corresponding label vector. This is the data from which the system learns and develops a mode for generalizing other data.

Testing

The process where the data that is to be classified is fed to the system. Testing set consists only the data whose classification is decided by the system.

Features and Weight

The input vector's classification may depend on different parameters. Those parameters are called as features. In a real-world scenario, not all features are given equal preferences which are otherwise known as weights. These shape the factor that classifies the vector.

Some pointers

The notion seems to be simple but care should be taken such that the training does not result in the classifier over-fitting the data. For example, for a large testing set, an ideal case would suggest that the system should learn the existing data and then apply it to the different data where the classes should also grow proportional to testing amount. In other words, the system should not generalize the test set to the pre-existing classifiers.

Projects

Naive Bayes classifier

Refer here on how this works.

What's special about it?

Training was carried out for random trials of 10%, 30% and 50% of the fisher-iris dataset. The bin width is also adjusted and the results are recorded. It can be observed that the less trained agent have more misclassifications than the ones with a relatively greater training. Additionally, the bin width of 4 and 8 are tested and the inference is that the bin width with 8 has lesser number of misclassifications than the other because the larger bin width spreads the incoming data uniformly.

Project link

Implementation on matlab. The project is here.

Least squares classifier

The method of least squares discriminant function is used to classify based on the 1 of K binary coding scheme. The merit in using this method is to normalize the conditional expectation for a given target class. The demerit is obvious as it pertains to 1 of K coding scheme.

What's special about it?

Lambda value for regularizing the matrix is chosen based on the user parameter and then tested for 10%, 30% and 50% data. It can be observed that greater the lambda value, greater the amount of misclassifications. The implementation is straight forward but the choice of lambda matters.

Project link

Implementation on matlab. The project is here.

Contact author

Maintained by @sudharsannr Contact the author if it requires any edits.