How do machine learning algorithms work?

by Sajjad Najafi · November 5, 2021

In machine learning, we give the machine the power to make decisions or make predictions by giving a series of data to the machine.

There are three general types for machine learning according to input and outcome data, which are called Supervised Learning, Unsupervised Learning, and Reinforcement Learning.

We will discuss unsupervised learning, reinforcement Learning in subsequent articles, but now we want to learn Supervised Learning.

In supervised learning, there is a column that we call it “label” column, The values in this column depend on the other columns in our table, which we call “features”.

An attribute may have a large effect on the values in the label column, or it may not affect its determination.

Two common types of supervised learning algorithms:

1- Regression

2- Classification

  • Regression:

Regression algorithms are used to predict continuous values in the label column. For example, the price of a house or a car according to the parameters specific to each.

In the case of a house, this simple example can be considered:

Price parameters of a house: number of rooms, number of floors, basement, elevator, infrastructure, area, neighborhood

The label for this example is the price of the house, which is determined by an expert based on the parameters of determining the price of the house in a real estate firm.

The parameters mentioned are stored in the table. And house information is in this table. Note the columns:

ID area elevator basement floors rooms neighborhood Lot size price

In Supervised Learning, we give the machine a part of the table above that includes the price column, and the machine will learn the pricing method based on these values(features) during the algorithm we have given him, and then, for the new values we give to the machine, it will act as a pricing expert for us! (It Will predict the price based on home information.)

In this article, it was noted that some features do not affect the label column. Now Look at the table again. Which features do not affect specifying house prices? Yes, it's true; ID column.

We won't use This column(ID) in the machine learning algorithm, because it doesn't affect the price.

  • Classification:

In classification problems, label column values are discrete and not continuous. The application of classification in medicine is very abundant, including the diagnosis of various types of cancer.

In this machine learning algorithm in the label column, we will see values of zero and one and something like that.

For example, if a person had cancer, number “1” and if he was not infected, “0” would be placed in this column, and another example would be a system that predicts the weather and prints at the output that today it is cold, hot, sunny, rainy, snowy, etc.

How do machine learning algorithms work?

Each machine learning algorithm performs learning in a specific way to predict label values. Each of these algorithms has mathematical formulas for solving problems.

But here to find out how a very simple machine learning algorithm works; we want to explain simple linear regression algorithm.

Simple linear regression algorithm

In this algorithm, it is assumed that we have only one feature column (independent variables) and one label column (dependent variables).

In the machine's simple linear regression algorithm during the learning process, it obtains y values with the following formula:


 X is the value we give to the machine from the feature column, and in return, the machine gives us the value predicts and prints at the output.

But what are β0 and β1 in the formula?

A brief explanation of these two cases:

β0 is the intercept and β1 is the slope of the line.

Consider this example:

Suppose our data set is this table:

LotArea: Lot size in square feet

SalePrice: the property's sale price in dollars.

If we want to draw a chart of this data set, we will have:

If you look at the figure above, you will notice that as the area of a house increases, so does its price.

A line can be drawn that shows the slope of the increase and based on this line the machine will decide for new data:

This line is plotted to be close to all points, which means that the prediction error is at the lowest value.

For example, if the input is LotArea 9000, the output will be approximately equal to 100,000:

But what is the prediction error?!

The prediction error is the difference between the predicted value and the actual label value. As we said, the line is plotted in such a way that the lowest prediction error occurs, but we will still have errors.

Let's count the prediction error for one of the points and then we will discuss the total errors.

Pay attention to the red point. The lotarea value at this point is 10791 and the price of this house is: 106,000, but the machine predict for the price of this house is about 127,000 so:

127000-106000= 21000

Error =|yi - ŷi|

Our error value, in this case, is 21,000.

Mean Absolute Error (MAE):

MAE is the average error of all points:

Mean Square Error (MSE):

To highlight the larger errors, the scientists considered a penalty. The penalty was to add the power of two to the sum of the errors:

Root Mean Square Error (RMSE):

MSE had a problem that its unit was different from our label values unit.

Data scientists used the root of the MSE formula to solve this problem.

In supervised learning algorithms, we can evaluate our model with MAE, MSE, and RMSE.

So, when we said that we draw the line in such a way that we have the lowest amount of prediction errors, we meant the minimum amount of MAE, MSE, and RMSE.

Data scientists have said that for this purpose, we need to calculate the amounts of β0 and β1 with these formulas:

is the average of all xs.

is the average of all ys.

Let's apply these formulas to our example data:

The Sum of all Ys is = 2087000

The Sum of all Xs is = 181401

n = 17 (count of rows)


For β1 we need to calculate (xi-x̄i) , (yi-ȳi) , (xi-x̄i)(yi-ȳi) and (xi-x̄i)

x y (xi-x̄i) (yi-ȳi) (xi-x̄i)(yi-ȳi) (xi-x̄i)2
11924 143000 1253 20235 25354455 1570009
7024 62000 -3647 -60765 221609955 13300609
10652 123000 -19 235 -4465 361
9600 106000 -1071 -16765 17955315 1147041
10382 121000 -289 -1765 510085 83521
10084 115000 -587 -7765 4558055 344569
11341 149000 670 26235 17577450 448900
11550 139000 879 16235 14270565 772641
9350 120000 -1321 -2765 3652565 1745041
10791 106000 120 -16765 -2011800 14400
14260 160000 3589 37235 133636415 12880921
12968 160000 2297 37235 85528795 5276209
13815 150000 3144 27235 85626840 9884736
7120 78000 -3551 -44765 158960515 12609601
10920 137000 249 14235 3544515 62001
8420 80000 -2251 -42765 96264015 5067001
11200 138000 529 15235 8059315 279841

Sum of (xi-x̄i)(yi-ȳi) = 875092590

Sum of (xi-x̄i)2= 65487402


Now that we have the alpha and beta, our equation is solved:

Ŷ=β01X = -19825+13.36X

The machine will calculate the next problems with this formula. In fact, if we give the machine any LotArea, it will replace it with X and print the result of the calculation in the output.

I hope you enjoyed this tutorial.

Leave a Reply

Your email address will not be published.