how machines learn

machine learning

Machine learning is a branch of artificial intelligence that allows computers to learn from data and make decisions without being explicitly told to do so. This means machines can think and make decisions by themselves, handling complex tasks like diagnosing diseases, driving cars, controlling home temperatures, and even predicting the stock market. But how does this work? The answer lies in math.

To explain the role of math in machine learning, think of machine learning as a chef using recipes, or algorithms, to turn raw ingredients, which is the data, into final meals or predictions. To understand how the chef works, we need to learn about four key topics: linear algebra, calculus, probability, and statistics.

Linear algebra helps represent data using vectors and matrices. Vectors are like lists of numbers, where each number represents a feature, such as the length or rating of a movie. When we stack these vectors together, we get a matrix, which is like a table where each row is a movie and each column is a feature. We can change and manipulate these matrices to make predictions and find patterns. Linear algebra is the foundation that allows us to handle and manipulate data efficiently.

Calculus comes into play when we need to optimize our models. Two important concepts in calculus are derivatives and partial derivatives, which we can picture as the slope on a mountain. Derivatives tell us the slope at any point, guiding us to the quickest way down. In machine learning, we use gradients to minimize errors and find the best parameters for our models through a process called gradient descent. This is crucial for training models because we start with a completely randomized model and use calculus to slowly adjust it to minimize errors, making predictions more accurate.

Probability helps us deal with uncertainties and make predictions. For example, to forecast weather, we use probability to quantify the chances of rain based on past data. Bayesian methods are particularly useful because they allow us to update our predictions as new information becomes available. The model starts with an initial guess and improves as more data comes in, making predictions more reliable.

Statistics helps us make sense of the data. There are two main subtopics in statistics: descriptive and inferential. Descriptive statistics summarize the main characteristics of the data, while inferential statistics allow us to draw conclusions from sample data. For example, to test if a new drug is effective, we use statistics on a sample population and infer that the same results would apply to a larger population. Confidence intervals and p-values help us determine the significance and reliability of our results.

Bringing it all together, machine learning is like a chef using recipes (algorithms) to transform ingredients (data) into final meals (predictions). Linear algebra provides the tools to chop, blend, and prepare the raw ingredients. Calculus helps optimize the cooking process by adjusting the heat and timing. Probability allows the chef to make educated guesses about how the final dish will turn out. Statistics ensures that each dish meets high standards of flavor and consistency, making sure the final predictions are reliable and accurate.

This was a high-level introduction to some of the math topics in machine learning. While this covers some important topics, there is much more to explore. If you're interested in learning more, there are many great resources available.

I think there are three basic areas that should be covered if you want to be a competent ML engineer: mathematics, concrete ML knowledge and programming skills. These are rough categories and I don't want to give the impression that this is all there is to it, but I notice the biggest progress in myself when I practice all three areas regularly over a longer period of time. I recommend that anyone who wants to learn ML professionally should take all three areas equally seriously and to study them simultaneously if possible.

Mathematics

Not all areas of mathematics are relevant to ML. The ones that are most important to ML are Linear Algebra, Probability Theory and Multivariable Calculus, for which there are a lot of good online courses. Some of them I've listed below.

We need linear algebra because the data we deal with in ML contexts are stored in n-dimensional arrays. An understanding of the interplay between scalars, vectors, matrices and tensors is therefore essential for everything we hope to do with our data. Linear Algebra will be the language we will be speaking when doing ML of any kind.

We often need probability theory to explain why a particular learning algorithm works, where certain cost functions come from, and so on. Often it is about maximizing probabilities, which cannot be understood without an understanding of basic stochastic concepts.

Everywhere in the ML context we encounter multidimensional functions and the confident handling of them is necessary to understand how our algorithms work. If there are no closed form solutions for the optimal parameters of a learning algorithm, for example, iterative methods are often used to learn the parameters. In this case, the gradient of a cost function is often used, which is obtained from the partial derivatives of a multidimensional function.

For general intuition, I think 3Blue1Brown is the best place to start:

After you've got the intuition and refreshed the basics of high school mathematics, I can recommend the following more in-depth courses:

After these courses there should be little mathematics that will surprise you later.

Concrete ML Knowledge

Andrew Ng has, in my opinion, taken the undisputed position of offering the world the Holy Trinity of basic ML education. Everything builds on each other and is more practice-oriented than it is theoretical. Although mathematical notation is used, everything is still comparatively basic.

After that you are ready for advanced courses:

I also like this set of online lectures: Cornell CS4780/5780: Machine Learning Fall 2018. Somehow I always come across these lectures while doing my own research. I think they are very good and the focus is rather probabilistic. The only catch, in my opinion, is that the lecturer's handwriting is very bad, which is why I always have the handout open in parallel. His accent is also funny.

Programming Skills

Here I had to start from zero but I learned a lot in my complementary computer science studies, so I can't recommend as many resources as in the other sections. But when studying Python, I referred to Corey Schafer a lot, who has many wonderful Python tutorials on Youtube: Corey Schafer's YouTube Channel.

There are also two books that I can recommend for learning Python, especially for ML and Data Science applications:

Always try to implement the knowledge directly, i.e. to reprogram algorithms in Python. If that's too hard, google the code and try to understand it, we all do. Get familiar with the Scikit-Learn library and Tensorflow and try to complete small projects that you can upload to Github. You won't get any better without practice. Learn app and web development to embed your projects into applications that are actually profitable for users.