In machine learning, we give the machine the power to make decisions or make predictions by giving a series of data to the machine.
There are three general types for machine learning according to input and outcome data, which are called Supervised Learning, Unsupervised Learning, and Reinforcement Learning.
We will discuss unsupervised learning, reinforcement Learning in subsequent articles, but now we want to learn Supervised Learning.
In supervised learning, there is a column that we call it “label” column, The values in this column depend on the other columns in our table, which we call “features”.
An attribute may have a large effect on the values in the label column, or it may not affect its determination.
Two common types of supervised learning algorithms:
1- Regression
2- Classification
- Regression:
Regression algorithms are used to predict continuous values in the label column. For example, the price of a house or a car according to the parameters specific to each.
In the case of a house, this simple example can be considered:
Price parameters of a house: number of rooms, number of floors, basement, elevator, infrastructure, area, neighborhood
The label for this example is the price of the house, which is determined by an expert based on the parameters of determining the price of the house in a real estate firm.
The parameters mentioned are stored in the table. And house information is in this table. Note the columns:
ID | area | elevator | basement | floors | rooms | neighborhood | Lot size | price |
In Supervised Learning, we give the machine a part of the table above that includes the price column, and the machine will learn the pricing method based on these values(features) during the algorithm we have given him, and then, for the new values we give to the machine, it will act as a pricing expert for us! (It Will predict the price based on home information.)
In this article, it was noted that some features do not affect the label column. Now Look at the table again. Which features do not affect specifying house prices? Yes, it's true; ID column.
We won't use This column(ID) in the machine learning algorithm, because it doesn't affect the price.
- Classification:
In classification problems, label column values are discrete and not continuous. The application of classification in medicine is very abundant, including the diagnosis of various types of cancer.
In this machine learning algorithm in the label column, we will see values of zero and one and something like that.
For example, if a person had cancer, number “1” and if he was not infected, “0” would be placed in this column, and another example would be a system that predicts the weather and prints at the output that today it is cold, hot, sunny, rainy, snowy, etc.
How do machine learning algorithms work?
Each machine learning algorithm performs learning in a specific way to predict label values. Each of these algorithms has mathematical formulas for solving problems.
But here to find out how a very simple machine learning algorithm works; we want to explain simple linear regression algorithm.
Simple linear regression algorithm
In this algorithm, it is assumed that we have only one feature column (independent variables) and one label column (dependent variables).
In the machine's simple linear regression algorithm during the learning process, it obtains y values with the following formula:
Ŷ=β0+β1X
X is the value we give to the machine from the feature column, and in return, the machine gives us the Ŷ value predicts and prints at the output.
But what are β0 and β1 in the formula?
A brief explanation of these two cases:
β0 is the intercept and β1 is the slope of the line.
Consider this example:
Suppose our data set is this table:
LotArea: Lot size in square feet
SalePrice: the property's sale price in dollars.
If we want to draw a chart of this data set, we will have:
If you look at the figure above, you will notice that as the area of a house increases, so does its price.
A line can be drawn that shows the slope of the increase and based on this line the machine will decide for new data:
This line is plotted to be close to all points, which means that the prediction error is at the lowest value.
For example, if the input is LotArea 9000, the output will be approximately equal to 100,000:
But what is the prediction error?!
The prediction error is the difference between the predicted value and the actual label value. As we said, the line is plotted in such a way that the lowest prediction error occurs, but we will still have errors.
Let's count the prediction error for one of the points and then we will discuss the total errors.
Pay attention to the red point. The lotarea value at this point is 10791 and the price of this house is: 106,000, but the machine predict for the price of this house is about 127,000 so:
127000-106000= 21000
Error =|yi - ŷi|
Our error value, in this case, is 21,000.
Mean Absolute Error (MAE):
MAE is the average error of all points:
Mean Square Error (MSE):
To highlight the larger errors, the scientists considered a penalty. The penalty was to add the power of two to the sum of the errors:
Root Mean Square Error (RMSE):
MSE had a problem that its unit was different from our label values unit.
Data scientists used the root of the MSE formula to solve this problem.
In supervised learning algorithms, we can evaluate our model with MAE, MSE, and RMSE.
So, when we said that we draw the line in such a way that we have the lowest amount of prediction errors, we meant the minimum amount of MAE, MSE, and RMSE.
Data scientists have said that for this purpose, we need to calculate the amounts of β0 and β1 with these formulas:
x̄ is the average of all xs.
ȳ is the average of all ys.
Let's apply these formulas to our example data:
The Sum of all Ys is = 2087000
The Sum of all Xs is = 181401
n = 17 (count of rows)
So:
For β1 we need to calculate (xi-x̄i) , (yi-ȳi) , (xi-x̄i)(yi-ȳi) and (xi-x̄i)2
x | y | (xi-x̄i) | (yi-ȳi) | (xi-x̄i)(yi-ȳi) | (xi-x̄i)2 |
11924 | 143000 | 1253 | 20235 | 25354455 | 1570009 |
7024 | 62000 | -3647 | -60765 | 221609955 | 13300609 |
10652 | 123000 | -19 | 235 | -4465 | 361 |
9600 | 106000 | -1071 | -16765 | 17955315 | 1147041 |
10382 | 121000 | -289 | -1765 | 510085 | 83521 |
10084 | 115000 | -587 | -7765 | 4558055 | 344569 |
11341 | 149000 | 670 | 26235 | 17577450 | 448900 |
11550 | 139000 | 879 | 16235 | 14270565 | 772641 |
9350 | 120000 | -1321 | -2765 | 3652565 | 1745041 |
10791 | 106000 | 120 | -16765 | -2011800 | 14400 |
14260 | 160000 | 3589 | 37235 | 133636415 | 12880921 |
12968 | 160000 | 2297 | 37235 | 85528795 | 5276209 |
13815 | 150000 | 3144 | 27235 | 85626840 | 9884736 |
7120 | 78000 | -3551 | -44765 | 158960515 | 12609601 |
10920 | 137000 | 249 | 14235 | 3544515 | 62001 |
8420 | 80000 | -2251 | -42765 | 96264015 | 5067001 |
11200 | 138000 | 529 | 15235 | 8059315 | 279841 |
Sum of (xi-x̄i)(yi-ȳi) = 875092590
Sum of (xi-x̄i)2= 65487402
So:
Now that we have the alpha and beta, our equation is solved:
Ŷ=β0+β1X = -19825+13.36X
The machine will calculate the next problems with this formula. In fact, if we give the machine any LotArea, it will replace it with X and print the result of the calculation in the output.
I hope you enjoyed this tutorial.