top of page

Three Month Plan to Learn Mathematics Behind Machine Learning

A 3-month plan to learn about the mathematics behind machine learning

In this article, I have shared a 3-month plan to learn mathematics for machine learning. As we know, almost all machine learning algorithms make use of concepts of Linear Algebra, Calculus, Probability & Statistics, etc. Some advanced algorithms and techniques also make use of subjects such as Measure Theory(a superset of probability theory), convex and non-convex optimization, and much more. To understand the machine learning algorithms and conduct research in machine learning and its related fields, the knowledge of mathematics becomes a requirement.

The plan that I have shared in this article can be used to prepare for data science interviews, to strengthen mathematical concepts, or to start researching in machine learning. The plan will not only help in understanding the intuition behind machine learning but can also be used in many other advanced fields such as statistical signal processing, computational electrodynamics, etc.

After following the plan, I aced the Microsoft interview for Data Science Internship and received an offer from Microsoft for 2021. My interview experience can be viewed in the following article: In addition to that, one should be understood the papers presented at top conferences/journals, which might be overwhelming earlier. It can also help to start with a research career.

The plan is mainly divided into four parts:-

  1. Linear Algebra

  2. Probability Theory

  3. Multivariable Calculus

  4. Multivariate Statistics

Linear Algebra

Linear Algebra is one of the most important concepts required in machine learning and deep learning. The best available course to learn linear algebra is a collection of 35 lectures by Dr. Gilbert Strang on MIT OCW. This course can take at most one month to complete for a complete beginner. But since most of the folks doing machine learning are aware of introductory linear algebra and matrices. They should be able to complete it in 2 hours per lecture. In addition to the lectures, one should also try to complete the homework and exams given in the course for proper practice. The most important skill from this course can visualize multidimensional vectors and understand the relation between them. Visualization of vectors is one of the most skill in data science.

For better understanding the visualization in Linear Algebra, one can watch the playlist “Essence of Linear Algebra” uploaded by the YouTube Channel “3blue1brown”. The channel also has other videos describing the beauty of mathematics through visualizations.

Probability Theory


Probability is a set of principles to understand the science behind uncertainty. With enough observations, uncertainties can be modeled using deterministic principles. The best course on probability theory is the lectures by Dr. John Tsitsiklis on MIT OCW. The course discusses the basics of probability theory and discusses the Poisson Process, Markov Chains, Central Limit Theorem, and much more. The assignments and exams are also available on MIT OCW, and one should do all the assignments and exams to test the concepts learned during the course. Solutions are also provided for the same. If you are someone who likes to learn from a textbook, then you can go through the book “Introduction to Probability, 2nd ed.” by Bertsekas, Dimitri, and John Tsitsiklis, which is accompanied for this course.

Multivariable Calculus

Knowing Multivariable Calculus is crucial for understanding many machine learning algorithms since most of the machine learning algorithms use more than one parameter(millions in deep learning), and thus calculating the gradient and back propagation matrix cannot be done by just single variable calculus. Hence, the knowledge of multivariable calculus is essential for machine learning.

Before going for any course on multivariable calculus, revise the concepts for single variable calculus first. One of the best courses available on this topic is lectures by Dr. Denis Auroux on MIT OCW. Also, as always, practice assignments and exams form this course. The course discusses Lagrange Multipliers, Partial Differential Equations, Vector Fields, Flux, etc. There is a series of short(5–10 min) video playlist by “3blue1brown” on this topic, which uses visualization to explain this topic. It is strongly advisable to watch the videos for a better understanding.

Multivariate Statistics

Many machine learning algorithms use the concepts of statistics in a multivariate setting. The concepts of these topics are derived from linear algebra and probability & statistics for high dimensional feature space. There is no specified course for studying this topic. However, it is advised to follow any one of the following resources to learn this topic:

  • Chapters 4, 7, and 8 of the book by R. Johnson: Applied Multivariate Statistical Analysis.

  • Chapters 2,3 and 11 of the book by T W. Anderson: An Introduction to Multivariate Statistical Analysis.

Photo by Clay Banks on Unsplash

Now, you can sit back and look back at what you have learned during the three-month plan. But this is not the end.

Now take any conference paper or any advanced machine learning book or online courses, etc. Most of the folks have already watched the coursera course on Machine Learning by Dr. Andrew Ng. Now, you can try to watch the original lectures of CS229 at Stanford and try to understand machine learning concepts in depth with mathematical rigor.

Also, you can go for any of the book mentioned below to learn machine learning:

  1. Pattern Recognition and Machine Learning Book by Christopher Bishop.

  2. Introduction to Computational Thinking and Data Science.

  3. Machine Learning by Prof. Tommi Jaakkola.

I anybody has any questions regarding this, feel free to discuss.




bottom of page