An example of linear regression
For example, from previous study, we can easly darw a conclusion that the model which the table described in the picture is a supervised learning model. And moreover, this is an example of Regression problem (回歸問(wèn)題).
And more formally, in superivised learning, we have a data set, which is called training set (訓(xùn)練集). The algorithm's job is to learn from this data how to predict the "right answer" (such as predict the prices of the houses).
Here are some notations which can help us study:
- m: represent the number of training examples (the number of rows in the above picture) .
- x^(i): represent the "input" variable / features in the i-th row.
- y^(i): represent the "output" variable / "target" variable in the i-th row.
To describle the supervised learning problem more formally, our goal is, given a training set, to learn a function h: x -> y so that h(x) is a "good" predictor for the corresponding value of y. For histirical reasons, this function h is called a hypothesis function(假設(shè)函數(shù)). Seen pictorially, the process is therefore like this:
So h is a function that maps from x (the size of house) to y (the estimated price) in this example, and according to supervised learning we can build a h like this:
This is also called linear regression with one variable (一元線性回歸) or univariate linear regression (單變量線性回歸), which is the basic block of learning other more complicated models.
Cost function (代價(jià)函數(shù))
Here is model with a training set, and we got its hypothesis:
The θi in the hypothesis is the parameters of the model (模型參數(shù)).
And the task of the algorithm is to get these two parameter values (θ0 and θ1), so that the straight line we get out of this corresponds to a straight line that somehow fits the data well.
For example:
What can be seen clearly is that based on different parameter values (θ0 and θ1), we can get different hypothesis.Therefore, in the example of predicting housing prices, we neet to predict the housing prices as correct as possible through choosing appropriate θ0 and θ1.
If we want to choose θ0 and θ1 to minimize the difference between h(x) and y, what we need to do is to minimize the square (平方) of the difference between the output value of the hypothetical function and the real price of the house (to let the cost function be the smallest/使得代價(jià)函數(shù)最小), which can be expressed by mathematical expression (數(shù)學(xué)表達(dá)式) as:
This is called cost function (代價(jià)函數(shù)) or squared error function (平方誤差函數(shù)), we can measure the accuracy of our hypothesis function by using a cost function.
To understand the cost function intuitively (直觀地) Ⅰ
Firstly, we use a simplified model like this (only one parameter θ1 -- the hypothesis functions that pass through the origin (原點(diǎn))):
So through different θ1 we can get different hypothesis function, and the result of the cost function J is also different:
And if we get more result of the function J through different θ1, then we can get a functional image of J like this:
Our goal is to minumize the cost function J ( the h(x) line should pass though all the points of our training data set in the ideal situation). In this case, θ1 = 1 is our global minimum which is the minumum value of the cost function.
To understand the cost function intuitively Ⅱ
when it goes to h(x) = θ0 + θ1x, the cost function J has two variables (θ0 and θ1).
which makes the functional image of J a three-dimensional image (三維圖像). And we also can use a contour plots (等高線圖) to show it. A contour plot is a graph that contains many contour lines. A contour line of a two variable function has a constant value at all points of the same line.
And through different θ1 and θ0 we can get different hypothesis function, and the result of the cost function J is also different (The contour line position of the results is different):
Our goal is to minumize the cost function J (the h(x) line should pass though all the points of our training data set as far as possible), like this: