63 Machine Learning Algorithms — Introduction

Data Science and analytics are transforming businesses. It has penetrated into all departments be it Finance, Marketing, Operations, HR, Designing, etc. It is becoming increasingly important for B-school students to have analytical skills and be well versed with Machine Learning and Statistics. Data is being called the new gold. The fastest growing companies in the coming period will be the ones who can make the most sense of data they collect. As through the power of Data a business can do targeted marketing, transforming the way they convert sales and satisfy demand.

But there is a catch, Machine Learning is complex and for those starting out into this field, learning it first time in B-school it seems tough to grasp these concepts together with hectic schedule. B-school student who has no prior experience in coding, machine learning is difficult, one gets lost in all the different algorithms and branches of supervised vs unsupervised learning. The mathematics behind them is tough to understand and has a steep learning curve. For start python or R itself seems like a rough sea which requires some dedicated practice. But, it is of critical importance for a business manager to have knowledge of these. New generation of MBA’s is learning it and older generation should learn it.

My blog series aims to explain these algorithms in simple to understand manner, so that someone with basic knowledge of python can implement them and benefit in their lives and businesses.

So, I decided to ditch the mathematics and dive right into how that algorithm works, why is different from others and why as a businessman I should bother about them. In this article I will explain about 11 branches of machine learning and will introduce each of the branch briefly. In the upcoming articles we will look into detailed description of each node, differences among them and use cases of each.

What is Machine Learning?

Machine Learning is the sub-field of computer science that gives “computers the ability to learn without being explicitly programmed.” ~ Arthur Samuel

It is Netflix telling you watch this movie next, Spotify playing good songs without you touching your phone, its your keyboard in phone, it is how they predict next years sales. Machine learning in its simplest form is learning from data and then predicting or dividing it into meaning parts to make sense of it in a easier and usable fashion.

Your computer can learn from data using algorithms which work on mathematics and statistics to perform the required function. Algorithms find and apply patterns to the data they try to minimize the loss of accuracy in predictions while applying a certain pattern, and then they give us back the best pattern that they could learn from the data.

If you tell your algorithm what each data point means than it is called a supervised learning algorithm whereas if you do not give any labels then algorithm tries to find patterns itself and it is called unsupervised machine learning.

63 Machine Learning Algorithms

The 11 Branches

Machine Learning algorithms can be divided into 11 branches, based on underlying mathematical model:

  1. Bayesian — Bayesian machine learning models are based on Bayes theorem which is nothing but calculation of probability of something happening knowing something else has happened, e.g. probability that Yuvraj (Cricketer) will hit six sixes knowing that he ate curry-rice today. We use machine learning to apply Bayesian statistics on our data and we are assuming in these algorithms that there is some independence in our independent variables. These models start with some belief about data and then the models update that belief based on data. There are various applications of Bayesian statistics in classification as I did in my Twitter Project using Naive Bayes Classifier. Also, in business calculating probability of success of certain marketing plan based on data points and historical parameters of other marketing strategies.

  2. Decision Tree — Decision tree as the name suggests is used to come to a decision using a tree. It uses estimates and probabilities based on which we calculate the likely outcomes. Tree’s structure has root node which gets divided into Internal nodes and then leafs. What is there on these nodes is data classification variables. Our models learns from our labelled data and finds the best variables to split our data on so as to minimize the classification error. It can either give us classified data or even predict value for our data points based on the learning it got from our training data. Decision Tree’s are used in finance in option pricing, in Marketing and Business planning to find the best plan or the overall impact on business of various possibilities.

  3. Dimensionality Reduction — Imagine you got data which has 1000 features or you conducted a survey with 25 questions and are having a hard time now making sense of which question is answering what. That is where the family of dimensionality reduction algorithms come into picture. As the name suggests they help us in reducing the dimensions of our data which in turn reduces the over-fitting in our model and reduces high variance on our training set so that we can make better predictions on our test set. In market research survey often it is used to categorize questions into topics which can then easily be made sense of.

  4. Instance Based