Sum of least squares plot. Least square method


Method least squares(MNC, English) Ordinary Least Squares, OLS) -- a mathematical method used to solve various problems, based on minimizing the sum of squared deviations of certain functions from the desired variables. It can be used to “solve” overdetermined systems of equations (when the number of equations exceeds the number of unknowns), to find a solution in the case of ordinary (not overdetermined) nonlinear systems of equations, to approximate point values ​​with some function. OLS is one of the basic methods of regression analysis for estimating unknown parameters of regression models from sample data.

The essence of the least squares method

Let be a set of unknown variables (parameters), and let be a set of functions from this set of variables. The task is to select such values ​​of x that the values ​​of these functions are as close as possible to certain values. Essentially, we are talking about the “solution” of an overdetermined system of equations in the indicated sense of maximum proximity of the left and right parts of the system. The essence of the least squares method is to select as a “proximity measure” the sum of squared deviations of the left and right sides - . Thus, the essence of MNC can be expressed as follows:

If the system of equations has a solution, then the minimum of the sum of squares will be equal to zero and exact solutions to the system of equations can be found analytically or, for example, using various numerical optimization methods. If the system is overdetermined, that is, loosely speaking, the number of independent equations is greater than the number of desired variables, then the system does not have an exact solution and the least squares method allows one to find some “optimal” vector in the sense of the maximum proximity of vectors and or the maximum proximity of the deviation vector to zero (proximity understood in the sense of Euclidean distance).

Example -- system linear equations

In particular, the method of least squares can be used to "solve" a system of linear equations

where the matrix is ​​not square, but rectangular in size (more precisely, the rank of matrix A is greater than the number of sought variables).

Such a system of equations, in general case has no solution. Therefore, this system can be “solved” only in the sense of choosing such a vector as to minimize the “distance” between the vectors and. To do this, you can apply the criterion of minimizing the sum of squares of the differences between the left and right sides of the system equations, that is. It is easy to show that solving this minimization problem leads to solving the following system of equations

Using the pseudoinversion operator, the solution can be rewritten as follows:

where is the pseudo-inverse matrix for.

This problem can also be “solved” using the so-called weighted least squares method (see below), when different equations of the system get different weight for theoretical reasons.

A strict justification and establishment of the boundaries of the substantive applicability of the method were given by A. A. Markov and A. N. Kolmogorov.

OLS in regression analysis (data approximation)[edit | edit wiki text] Let there be values ​​of some variable (this can be the results of observations, experiments, etc.) and the corresponding variables. The task is to approximate the relationship between and by some function known to within some unknown parameters, that is, to actually find best values parameters that bring the values ​​as close as possible to the actual values. In fact, this comes down to the case of “solving” an overdetermined system of equations with respect to:

In regression analysis and in particular in econometrics, probabilistic models of dependence between variables are used

where are the so-called random errors of the model.

Accordingly, deviations of the observed values ​​from the model ones are assumed in the model itself. The essence of the least squares method (ordinary, classical) is to find such parameters for which the sum of squared deviations (errors, for regression models they are often called regression residuals) will be minimal:

where - English Residual Sum of Squares is defined as:

In the general case, this problem can be solved by numerical optimization (minimization) methods. In this case, they talk about non-linear least squares (NLS or NLLS - Non-Linear Least Squares). In many cases it is possible to obtain an analytical solution. To solve the minimization problem, it is necessary to find stationary points of the function by differentiating it with respect to unknown parameters, equating the derivatives to zero and solving the resulting system of equations:

OLS in the case of linear regression[edit | edit wiki text]

Let the regression dependence be linear:

Let y be a column vector of observations of the explained variable, and let y be a matrix of factor observations (the rows of the matrix are vectors of factor values ​​in a given observation, and the columns are a vector of values ​​of a given factor in all observations). The matrix representation of the linear model is:

Then the vector of estimates of the explained variable and the vector of regression residuals will be equal

Accordingly, the sum of squares of the regression residuals will be equal to

Differentiating this function with respect to the vector of parameters and equating the derivatives to zero, we obtain a system of equations (in matrix form):

In deciphered matrix form, this system of equations looks like this:


where all sums are taken over all valid values.

If a constant is included in the model (as usual), then for all, therefore in the upper left corner of the matrix of the system of equations there is the number of observations, and in the remaining elements of the first row and first column there are simply the sums of the values ​​of the variables: and the first element of the right side of the system is .

The solution of this system of equations gives the general formula for least squares estimates for a linear model:

For analytical purposes, the last representation of this formula turns out to be useful (in the system of equations when dividing by n, arithmetic means appear instead of sums). If in a regression model the data is centered, then in this representation the first matrix has the meaning of a sample covariance matrix of factors, and the second is a vector of covariances of factors with the dependent variable. If, in addition, the data are also normalized to standard deviation (that is, ultimately standardized), then the first matrix has the meaning of a sample correlation matrix of factors, the second vector - a vector of sample correlations of factors with the dependent variable.

An important property of OLS estimates for models with a constant is that the constructed regression line passes through the center of gravity of the sample data, that is, the equality holds:

In particular, in the extreme case, when the only regressor is a constant, we find that the OLS estimate of the only parameter (the constant itself) is equal to the average value of the explained variable. That is, the arithmetic mean, known for its good properties from the laws large numbers, is also an least squares estimate - it satisfies the criterion of the minimum sum of squared deviations from it.

The simplest special cases[edit | edit wiki text]

In the case of paired linear regression, when the linear dependence of one variable on another is estimated, the calculation formulas are simplified (you can do without matrix algebra). The system of equations has the form:

From here it is easy to find coefficient estimates:

Although in general models with a constant are preferable, in some cases it is known from theoretical considerations that the constant should be equal to zero. For example, in physics the relationship between voltage and current is; When measuring voltage and current, it is necessary to estimate the resistance. In this case, we are talking about a model. In this case, instead of a system of equations we have a single equation

Therefore, the formula for estimating the single coefficient has the form

Statistical properties of OLS estimates[edit | edit wiki text]

First of all, we note that for linear models, OLS estimates are linear estimates, as follows from the above formula. For unbiased OLS estimates, it is necessary and sufficient to perform the most important condition regression analysis: the factor-conditional mathematical expectation of a random error must be equal to zero. This condition, in particular, is satisfied if the mathematical expectation of random errors is zero, and the factors and random errors are independent random variables.

The first condition can be considered always satisfied for models with a constant, since the constant takes on a non-zero mathematical expectation of errors (therefore, models with a constant are generally preferable). least square regression covariance

The second condition - the condition of exogeneity of factors - is fundamental. If this property is not satisfied, then we can assume that almost any estimates will be extremely unsatisfactory: they will not even be consistent (that is, even very large volume data does not allow us to obtain qualitative estimates in this case). In the classical case, a stronger assumption is made about the determinism of the factors, as opposed to a random error, which automatically means that the exogeneity condition is met. In the general case, for the consistency of the estimates, it is sufficient to satisfy the exogeneity condition together with the convergence of the matrix to some non-singular matrix as the sample size increases to infinity.

In order for, in addition to consistency and unbiasedness, estimates of (ordinary) LSM to be also effective (the best in the class of linear unbiased estimates), additional properties of random error must be met:

Constant (identical) variance of random errors in all observations (no heteroskedasticity):

Lack of correlation (autocorrelation) of random errors in different observations with each other

These assumptions can be formulated for the covariance matrix of the random error vector

A linear model that satisfies these conditions is called classical. OLS estimators for classical linear regression are unbiased, consistent, and the most effective assessments in the class of all linear unbiased estimates (in the English literature the abbreviation BLUE (Best Linear Unbiased Estimator) is sometimes used - the best linear unbiased estimate; in the domestic literature the Gauss-Markov theorem is more often given). As is easy to show, the covariance matrix of the vector of coefficient estimates will be equal to:

Efficiency means that this covariance matrix is ​​“minimal” (any linear combination of coefficients, and in particular the coefficients themselves, have minimal variance), that is, in the class of linear unbiased estimators, OLS estimators are best. The diagonal elements of this matrix are the variances of the coefficient estimates - important parameters quality of the assessments received. However, it is not possible to calculate the covariance matrix because the random error variance is unknown. It can be proven that an unbiased and consistent (for a classical linear model) estimate of the variance of random errors is the quantity:

Substituting given value into the formula for the covariance matrix and obtain an estimate of the covariance matrix. The resulting estimates are also unbiased and consistent. It is also important that the estimate of the error variance (and hence the variance of the coefficients) and the estimates of the model parameters are independent random variables, which makes it possible to obtain test statistics for testing hypotheses about the model coefficients.

It should be noted that if the classical assumptions are not met, OLS estimates of parameters are not the most efficient estimates (while remaining unbiased and consistent). However, the estimate of the covariance matrix deteriorates even more - it becomes biased and untenable. This means that statistical conclusions about the quality of the constructed model in this case can be extremely unreliable. One of the solutions last problem is the application special assessments covariance matrices that are consistent under violations of classical assumptions (standard errors in White form and standard errors in Newey-West form). Another approach is to use the so-called generalized least squares method.

Generalized OLS[edit | edit wiki text]

Main article: Generalized least squares

The least squares method allows for broad generalization. Instead of minimizing the sum of squares of the residuals, one can minimize some positive definite quadratic form of the vector of residuals, where is some symmetric positive definite weight matrix. Conventional least squares is a special case of this approach, where the weight matrix is ​​proportional to the identity matrix. As is known from the theory of symmetric matrices (or operators), there is a decomposition for such matrices. Therefore, the specified functional can be represented as follows

that is, this functional can be represented as the sum of the squares of some transformed “remainders”. Thus, we can distinguish a class of least squares methods - LS methods (Least Squares).

It has been proven (Aitken’s theorem) that for a generalized linear regression model (in which no restrictions are imposed on the covariance matrix of random errors), the most effective (in the class of linear unbiased estimates) are the so-called estimates. generalized least squares (GLS - Generalized Least Squares) - LS method with a weight matrix equal to the inverse covariance matrix of random errors: .

It can be shown that the formula for GLS estimates of the parameters of a linear model has the form

The covariance matrix of these estimates will accordingly be equal to

In fact, the essence of OLS lies in a certain (linear) transformation (P) of the original data and the application of ordinary OLS to the transformed data. The purpose of this transformation is that for the transformed data, the random errors already satisfy the classical assumptions.

Weighted OLS[edit | edit wiki text]

In the case of a diagonal weight matrix (and therefore a covariance matrix of random errors), we have the so-called weighted least squares (WLS - Weighted Least Squares). In this case, the weighted sum of squares of the model residuals is minimized, that is, each observation receives a “weight” that is inversely proportional to the variance of the random error in this observation:

In fact, the data are transformed by weighting the observations (dividing by an amount proportional to the estimated standard deviation of the random errors), and ordinary OLS is applied to the weighted data.

The essence of the least squares method is in finding the parameters of a trend model that best describes the tendency of development of any random phenomenon in time or space (a trend is a line that characterizes the tendency of this development). The task of the least squares method (LSM) comes down to finding not just some trend model, but to finding the best or optimal model. This model will be optimal if the sum of square deviations between the observed actual values ​​and the corresponding calculated trend values ​​is minimal (smallest):

where is the square deviation between the observed actual value

and the corresponding calculated trend value,

The actual (observed) value of the phenomenon being studied,

The calculated value of the trend model,

The number of observations of the phenomenon being studied.

MNC is used quite rarely on its own. As a rule, most often it is used only as a necessary technical reception in correlation studies. It should be remembered that the information basis of OLS can only be a reliable statistical series, and the number of observations should not be less than 4, otherwise the smoothing procedures of OLS may lose common sense.

The MNC toolkit boils down to the following procedures:

First procedure. It turns out whether there is any tendency at all to change the resultant attribute when the selected factor-argument changes, or in other words, is there a connection between “ at " And " X ».

Second procedure. It is determined which line (trajectory) can best describe or characterize this trend.

Third procedure.

Example. Let's say we have information about the average sunflower yield for the farm under study (Table 9.1).

Table 9.1

Observation number

Productivity, c/ha

Since the level of technology in sunflower production in our country has remained virtually unchanged over the past 10 years, it means that, apparently, fluctuations in yield during the analyzed period were very much dependent on fluctuations in weather and climatic conditions. Is this really true?

First OLS procedure. The hypothesis about the existence of a trend in sunflower yield changes depending on changes in weather and climatic conditions over the analyzed 10 years is tested.

In this example, for " y " it is advisable to take the sunflower yield, and for " x » – number of the observed year in the analyzed period. Testing the hypothesis about the existence of any relationship between " x " And " y » can be done in two ways: manually and using computer programs. Of course, with the availability of computer technology, this problem can be solved by itself. But in order to better understand the MNC tools, it is advisable to test the hypothesis about the existence of a relationship between “ x " And " y » manually, when only a pen and an ordinary calculator are at hand. In such cases, the hypothesis about the existence of a trend is best checked visually by the location of the graphical image of the analyzed series of dynamics - the correlation field:

The correlation field in our example is located around a slowly increasing line. This in itself indicates the existence of a certain trend in changes in sunflower yields. It is impossible to talk about the presence of any tendency only when the correlation field looks like a circle, a circle, a strictly vertical or strictly horizontal cloud, or consists of chaotically scattered points. In all other cases, the hypothesis about the existence of a relationship between “ x " And " y ", and continue research.

Second OLS procedure. It is determined which line (trajectory) can best describe or characterize the trend of changes in sunflower yield over the analyzed period.

If you have computer technology, the selection of the optimal trend occurs automatically. When processing manually, the choice optimal function is carried out, as a rule, visually - by the location of the correlation field. That is, based on the type of graph, the equation of the line that best fits the empirical trend (the actual trajectory) is selected.

As is known, in nature there is a huge variety of functional dependencies, so it is extremely difficult to visually analyze even a small part of them. Fortunately, in real life economic practice Most relationships can be described quite accurately either by a parabola, a hyperbola, or a straight line. In this regard, with the “manual” option of selecting the best function, you can limit yourself to only these three models.

Hyperbola:

Second order parabola: :

It is easy to see that in our example, the trend in sunflower yield changes over the analyzed 10 years is best characterized by a straight line, so the regression equation will be the equation of a straight line.

Third procedure. The parameters of the regression equation characterizing this line are calculated, or in other words, an analytical formula is determined that describes best model trend.

Finding the values ​​of the parameters of the regression equation, in our case the parameters and , is the core of the OLS. This process reduces to solving a system of normal equations.

(9.2)

This system of equations can be solved quite easily by the Gauss method. Let us recall that as a result of the solution, in our example, the values ​​of the parameters and are found. Thus, the found regression equation will have the following form:

Having chosen the type of regression function, i.e. the type of the considered model of the dependence of Y on X (or X on Y), for example, a linear model y x =a+bx, it is necessary to determine the specific values ​​of the model coefficients.

At different meanings a and b, you can construct an infinite number of dependencies of the form y x =a+bx, i.e. there are an infinite number of straight lines on the coordinate plane, but we need a dependency that corresponds to the observed values the best way. Thus, the task comes down to selecting the best coefficients.

We look for the linear function a+bx based only on a certain number of available observations. To find the function with the best fit to the observed values, we use the least squares method.

Let us denote: Y i - the value calculated by the equation Y i =a+bx i. y i - measured value, ε i =y i -Y i - difference between measured and calculated values ​​using the equation, ε i =y i -a-bx i .

The least squares method requires that ε i, the difference between the measured y i and the values ​​Y i calculated from the equation, be minimal. Consequently, we find the coefficients a and b so that the sum of the squared deviations of the observed values ​​from the values ​​on the straight regression line is the smallest:

By examining the extremum of this function of arguments a and using derivatives, we can prove that the function takes minimum value, if coefficients a and b are solutions of the system:

(2)

If we divide both sides of the normal equations by n, we get:

Considering that (3)

We get , from here, substituting the value of a into the first equation, we get:

In this case, b is called the regression coefficient; a is called the free term of the regression equation and is calculated using the formula:

The resulting straight line is an estimate for the theoretical regression line. We have:

So, is a linear regression equation.

Regression can be direct (b>0) and reverse (b Example 1. The results of measuring the values ​​of X and Y are given in the table:

x i -2 0 1 2 4
y i 0.5 1 1.5 2 3

Assuming that there is a linear relationship between X and Y y=a+bx, determine the coefficients a and b using the least squares method.

Solution. Here n=5
x i =-2+0+1+2+4=5;
x i 2 =4+0+1+4+16=25
x i y i =-2 0.5+0 1+1 1.5+2 2+4 3=16.5
y i =0.5+1+1.5+2+3=8

and normal system (2) has the form

Solving this system, we get: b=0.425, a=1.175. Therefore y=1.175+0.425x.

Example 2. There is a sample of 10 observations economic indicators(X) and (Y).

x i 180 172 173 169 175 170 179 170 167 174
y i 186 180 176 171 182 166 182 172 169 177

You need to find a sample regression equation of Y on X. Construct a sample regression line of Y on X.

Solution. 1. Let's sort the data according to the values ​​x i and y i . We get a new table:

x i 167 169 170 170 172 173 174 175 179 180
y i 169 171 166 172 180 176 177 182 182 186

To simplify the calculations, we will draw up a calculation table in which we will enter the necessary numerical values.

x i y i x i 2 x i y i
167 169 27889 28223
169 171 28561 28899
170 166 28900 28220
170 172 28900 29240
172 180 29584 30960
173 176 29929 30448
174 177 30276 30798
175 182 30625 31850
179 182 32041 32578
180 186 32400 33480
∑x i =1729 ∑y i =1761 ∑x i 2 299105 ∑x i y i =304696
x=172.9 y=176.1 x i 2 =29910.5 xy=30469.6

According to formula (4), we calculate the regression coefficient

and according to formula (5)

Thus, the sample regression equation is y=-59.34+1.3804x.
Let's plot the points (x i ; y i) on the coordinate plane and mark the regression line.


Fig 4

Figure 4 shows how the observed values ​​are located relative to the regression line. To numerically assess the deviations of y i from Y i, where y i are observed and Y i are values ​​determined by regression, let’s create a table:

x i y i Y i Y i -y i
167 169 168.055 -0.945
169 171 170.778 -0.222
170 166 172.140 6.140
170 172 172.140 0.140
172 180 174.863 -5.137
173 176 176.225 0.225
174 177 177.587 0.587
175 182 178.949 -3.051
179 182 184.395 2.395
180 186 185.757 -0.243

Yi values ​​are calculated according to the regression equation.

The noticeable deviation of some observed values ​​from the regression line is explained by the small number of observations. When studying the degree of linear dependence of Y on X, the number of observations is taken into account. The strength of the dependence is determined by the value of the correlation coefficient.

Let us approximate the function by a polynomial of degree 2. To do this, we calculate the coefficients of the normal system of equations:

, ,

Let's create a normal least squares system, which has the form:

The solution to the system is easy to find:, , .

Thus, a polynomial of the 2nd degree is found: .

Theoretical information

Return to page<Введение в вычислительную математику. Примеры>

Example 2. Finding the optimal degree of a polynomial.

Return to page<Введение в вычислительную математику. Примеры>

Example 3. Derivation of a normal system of equations for finding the parameters of the empirical dependence.

Let us derive a system of equations to determine the coefficients and functions , which carries out the root-mean-square approximation of a given function by points. Let's compose a function and write it down for her necessary condition extremum:

Then the normal system will take the form:

We obtained a linear system of equations for unknown parameters and, which is easily solved.

Theoretical information

Return to page<Введение в вычислительную математику. Примеры>

Example.

Experimental data on the values ​​of variables X And at are given in the table.

As a result of their alignment, the function is obtained

Using least square method, approximate these data by a linear dependence y=ax+b(find parameters A And b). Find out which of the two lines better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

The essence of the least squares method (LSM).

The task is to find the linear dependence coefficients at which the function of two variables A And btakes the smallest value. That is, given A And b the sum of squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, solving the example comes down to finding the extremum of a function of two variables.

Deriving formulas for finding coefficients.

A system of two equations with two unknowns is compiled and solved. Finding the partial derivatives of a function by variables A And b, we equate these derivatives to zero.

We solve the resulting system of equations using any method (for example by substitution method or Cramer’s method) and obtain formulas for finding coefficients using the least squares method (LSM).

Given A And b function takes the smallest value. The proof of this fact is given below in the text at the end of the page.

That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and parameter n— amount of experimental data. We recommend calculating the values ​​of these amounts separately.

Coefficient b found after calculation a.

It's time to remember the original example.

Solution.

In our example n=5. We fill out the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

The values ​​in the fifth row of the table are obtained by squaring the values ​​in the 2nd row for each number i.

The values ​​in the last column of the table are the sums of the values ​​across the rows.

We use the formulas of the least squares method to find the coefficients A And b. We substitute the corresponding values ​​from the last column of the table into them:

Hence, y = 0.165x+2.184— the desired approximating straight line.

It remains to find out which of the lines y = 0.165x+2.184 or better approximates the original data, that is, makes an estimate using the least squares method.

Error estimation of the least squares method.

To do this, you need to calculate the sum of squared deviations of the original data from these lines And , a smaller value corresponds to a line that better approximates the original data in the sense of the least squares method.

Since , then straight y = 0.165x+2.184 better approximates the original data.

Graphic illustration of the least squares (LS) method.

Everything is clearly visible on the graphs. The red line is the found straight line y = 0.165x+2.184, the blue line is , pink dots are the original data.

Why is this needed, why all these approximations?

I personally use it to solve problems of data smoothing, interpolation and extrapolation problems (in the original example they might be asked to find the value of an observed value y at x=3 or when x=6 using the least squares method). But we’ll talk more about this later in another section of the site.

Top of page

Proof.

So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second order differential for the function was positive definite. Let's show it.

The second order differential has the form:

That is

Therefore, the matrix of quadratic form has the form

and the values ​​of the elements do not depend on A And b.

Let us show that the matrix is ​​positive definite. To do this, the angular minors must be positive.

Angular minor of the first order . The inequality is strict because the points do not coincide. In what follows we will imply this.

Second order angular minor

Let's prove that by the method of mathematical induction.

Conclusion: found values A And b correspond to the smallest value of the function , therefore, are the required parameters for the least squares method.

No time to figure it out?
Order a solution

Top of page

Developing a forecast using the least squares method. Example of problem solution

Extrapolation is a method scientific research, which is based on the dissemination of past and present trends, patterns, connections to the future development of the forecast object. Extrapolation methods include moving average method, exponential smoothing method, least squares method.

Essence least squares method consists in minimizing the sum of square deviations between observed and calculated values. The calculated values ​​are found using the selected equation - the regression equation. The smaller the distance between the actual values ​​and the calculated ones, the more accurate the forecast based on the regression equation.

A theoretical analysis of the essence of the phenomenon being studied, the change in which is reflected by a time series, serves as the basis for choosing a curve. Sometimes considerations about the nature of the increase in the levels of the series are taken into account. Thus, if output growth is expected at arithmetic progression, then smoothing is performed in a straight line. If it turns out that growth is in geometric progression, then smoothing must be done using an exponential function.

Working formula for the least squares method : Y t+1 = a*X + b, where t + 1 – forecast period; Уt+1 – predicted indicator; a and b are coefficients; X - symbol time.

Calculation of coefficients a and b is carried out using the following formulas:

where, Uf – actual values ​​of the dynamics series; n – number of time series levels;

Smoothing time series using the least squares method serves to reflect the pattern of development of the phenomenon being studied. In the analytical expression of a trend, time is considered as an independent variable, and the levels of the series act as a function of this independent variable.

The development of a phenomenon does not depend on how many years have passed since the starting point, but on what factors influenced its development, in what direction and with what intensity. From here it is clear that the development of a phenomenon over time is the result of the action of these factors.

Correctly establishing the type of curve, the type of analytical dependence on time is one of the most complex tasks pre-forecast analysis .

The selection of the type of function that describes the trend, the parameters of which are determined by the least squares method, is carried out in most cases empirically, by constructing a number of functions and comparing them with each other according to the value of the mean square error, calculated by the formula:

where UV are the actual values ​​of the dynamics series; Ur – calculated (smoothed) values ​​of the dynamics series; n – number of time series levels; p – the number of parameters defined in formulas describing the trend (development trend).

Disadvantages of the least squares method :

  • when trying to describe the economic phenomenon being studied using a mathematical equation, the forecast will be accurate for a short period of time and the regression equation should be recalculated as new information becomes available;
  • the complexity of selecting a regression equation that is solvable using standard computer programs.

An example of using the least squares method to develop a forecast

Task . There are data characterizing the unemployment rate in the region, %

  • Construct a forecast of the unemployment rate in the region for November, December, January using the following methods: moving average, exponential smoothing, least squares.
  • Calculate the errors in the resulting forecasts using each method.
  • Compare the results and draw conclusions.

Least squares solution

To solve this, let's create a table in which we will produce necessary calculations:

ε = 28.63/10 = 2.86% forecast accuracy high.

Conclusion : Comparing the results obtained from the calculations moving average method , exponential smoothing method and the least squares method, we can say that the average relative error when calculating using the exponential smoothing method falls within the range of 20-50%. This means that the accuracy of the forecast in this case is only satisfactory.

In the first and third cases, the forecast accuracy is high, since the average relative error is less than 10%. But the moving average method made it possible to obtain more reliable results (forecast for November - 1.52%, forecast for December - 1.53%, forecast for January - 1.49%), since the average relative error when using this method is the smallest - 1 ,13%.

Least square method

Other articles on this topic:

List of sources used

  1. Scientific and methodological recommendations on diagnosing social risks and forecasting challenges, threats and social consequences. Russian state social university. Moscow. 2010;
  2. Vladimirova L.P. Forecasting and planning in market conditions: Textbook. allowance. M.: Publishing House "Dashkov and Co", 2001;
  3. Novikova N.V., Pozdeeva O.G. Forecasting the national economy: Educational and methodological manual. Ekaterinburg: Ural Publishing House. state econ. Univ., 2007;
  4. Slutskin L.N. MBA course on business forecasting. M.: Alpina Business Books, 2006.

MNC program

Enter data

Data and approximation y = a + b x

i- number of experimental point;
x i- value of a fixed parameter at a point i;
y i- value of the measured parameter at a point i;
ωi- measurement weight at a point i;
y i, calc.- difference between measured and regression calculated value y at the point i;
S x i (x i)- error estimate x i when measuring y at the point i.

Data and approximation y = k x

i x i y i ωi y i, calc. Δy i S x i (x i)

Click on the chart

User's manual for the MNC online program.

In the data field, enter on each separate line the values ​​of `x` and `y` at one experimental point. Values ​​must be separated by a whitespace character (space or tab).

The third value could be the weight of the point `w`. If the weight of a point is not specified, it is equal to one. In the vast majority of cases, the weights of experimental points are unknown or not calculated, i.e. all experimental data are considered equivalent. Sometimes the weights in the studied range of values ​​are absolutely not equivalent and can even be calculated theoretically. For example, in spectrophotometry, weights can be calculated using simple formulas, although this is mostly neglected to reduce labor costs.

Data can be pasted via the clipboard from a spreadsheet in an office suite such as Excel from Microsoft Office or Calc from Open Office. To do this, in the spreadsheet, select the range of data to copy, copy to the clipboard, and paste the data into the data field on this page.

To calculate using the least squares method, at least two points are needed to determine two coefficients `b` - the tangent of the angle of inclination of the line and `a` - the value intercepted by the line on the `y` axis.

To estimate the error of the calculated regression coefficients, you need to set the number of experimental points to more than two.

Least squares method (LSM).

How more quantity experimental points, the more accurate the statistical assessment of the coefficients (by reducing the Student's coefficient) and the closer the estimate to the estimate of the general sample.

Obtaining values ​​at each experimental point is often associated with significant labor costs, so a compromise number of experiments is often carried out that gives a manageable estimate and does not lead to excessive labor costs. As a rule, the number of experimental points for a linear least squares dependence with two coefficients is selected in the region of 5-7 points.

A Brief Theory of Least Squares for Linear Relationships

Let's say we have a set of experimental data in the form of pairs of values ​​[`y_i`, `x_i`], where `i` is the number of one experimental measurement from 1 to `n`; `y_i` - the value of the measured quantity at point `i`; `x_i` - the value of the parameter we set at point `i`.

As an example, consider the operation of Ohm's law. By changing the voltage (potential difference) between sections of an electrical circuit, we measure the amount of current passing through this section. Physics gives us a dependence found experimentally:

`I = U/R`,
where `I` is the current strength; `R` - resistance; `U` - voltage.

In this case, `y_i` is the current value being measured, and `x_i` is the voltage value.

As another example, consider the absorption of light by a solution of a substance in solution. Chemistry gives us the formula:

`A = ε l C`,
where `A` is the optical density of the solution; `ε` - transmittance of the solute; `l` - path length when light passes through a cuvette with a solution; `C` is the concentration of the dissolved substance.

In this case, `y_i` is the measured value of optical density `A`, and `x_i` is the concentration value of the substance that we specify.

We will consider the case when the relative error in the assignment `x_i` is significantly less than the relative error in the measurement `y_i`. We will also assume that all measured values ​​`y_i` are random and normally distributed, i.e. obey the normal distribution law.

In the case of linear dependence of `y` on `x`, we can write theoretical dependence:
`y = a + b x`.

WITH geometric point In terms of vision, coefficient `b` denotes the tangent of the angle of inclination of the line to the `x` axis, and coefficient `a` - the value of `y` at the point of intersection of the line with the `y` axis (at `x = 0`).

Finding the regression line parameters.

In an experiment, the measured values ​​of `y_i` cannot exactly lie on the theoretical straight line due to measurement errors, which are always inherent real life. Therefore, a linear equation must be represented by a system of equations:
`y_i = a + b x_i + ε_i` (1),
where `ε_i` is the unknown measurement error of `y` in the `i`-th experiment.

Dependency (1) is also called regression, i.e. the dependence of two quantities on each other with statistical significance.

The task of restoring the dependence is to find the coefficients `a` and `b` from the experimental points [`y_i`, `x_i`].

To find the coefficients `a` and `b` it is usually used least square method(MNC). It is a special case of the maximum likelihood principle.

Let's rewrite (1) in the form `ε_i = y_i - a - b x_i`.

Then the sum of squared errors will be
`Φ = sum_(i=1)^(n) ε_i^2 = sum_(i=1)^(n) (y_i - a - b x_i)^2`. (2)

The principle of least squares (least squares) is to minimize the sum (2) with respect to parameters `a` and `b`.

The minimum is achieved when the partial derivatives of the sum (2) with respect to the coefficients `a` and `b` are equal to zero:
`frac(partial Φ)(partial a) = frac(partial sum_(i=1)^(n) (y_i - a - b x_i)^2)(partial a) = 0`
`frac(partial Φ)(partial b) = frac(partial sum_(i=1)^(n) (y_i - a - b x_i)^2)(partial b) = 0`

Expanding the derivatives, we obtain a system of two equations with two unknowns:
`sum_(i=1)^(n) (2a + 2bx_i — 2y_i) = sum_(i=1)^(n) (a + bx_i — y_i) = 0`
`sum_(i=1)^(n) (2bx_i^2 + 2ax_i — 2x_iy_i) = sum_(i=1)^(n) (bx_i^2 + ax_i — x_iy_i) = 0`

We open the brackets and transfer the sums independent of the required coefficients to the other half, we obtain a system of linear equations:
`sum_(i=1)^(n) y_i = a n + b sum_(i=1)^(n) bx_i`
`sum_(i=1)^(n) x_iy_i = a sum_(i=1)^(n) x_i + b sum_(i=1)^(n) x_i^2`

Solving the resulting system, we find formulas for the coefficients `a` and `b`:

`a = frac(sum_(i=1)^(n) y_i sum_(i=1)^(n) x_i^2 — sum_(i=1)^(n) x_i sum_(i=1)^(n ) x_iy_i) (n sum_(i=1)^(n) x_i^2 — (sum_(i=1)^(n) x_i)^2)` (3.1)

`b = frac(n sum_(i=1)^(n) x_iy_i — sum_(i=1)^(n) x_i sum_(i=1)^(n) y_i) (n sum_(i=1)^ (n) x_i^2 — (sum_(i=1)^(n) x_i)^2)` (3.2)

These formulas have solutions when `n > 1` (the line can be constructed using at least 2 points) and when the determinant `D = n sum_(i=1)^(n) x_i^2 - (sum_(i= 1)^(n) x_i)^2 != 0`, i.e. when the `x_i` points in the experiment are different (i.e. when the line is not vertical).

Estimation of errors of regression line coefficients

For a more accurate assessment of the error in calculating the coefficients `a` and `b` it is desirable a large number of experimental points. When `n = 2`, it is impossible to estimate the error of the coefficients, because the approximating line will uniquely pass through two points.

The error of the random variable `V` is determined by law of error accumulation
`S_V^2 = sum_(i=1)^p (frac(partial f)(partial z_i))^2 S_(z_i)^2`,
where `p` is the number of parameters `z_i` with error `S_(z_i)`, which affect the error `S_V`;
`f` is a function of the dependence of `V` on `z_i`.

Let us write down the law of error accumulation for the error of coefficients `a` and `b`
`S_a^2 = sum_(i=1)^(n)(frac(partial a)(partial y_i))^2 S_(y_i)^2 + sum_(i=1)^(n)(frac(partial a )(partial x_i))^2 S_(x_i)^2 = S_y^2 sum_(i=1)^(n)(frac(partial a)(partial y_i))^2 `,
`S_b^2 = sum_(i=1)^(n)(frac(partial b)(partial y_i))^2 S_(y_i)^2 + sum_(i=1)^(n)(frac(partial b )(partial x_i))^2 S_(x_i)^2 = S_y^2 sum_(i=1)^(n)(frac(partial b)(partial y_i))^2 `,
because `S_(x_i)^2 = 0` (we previously made a reservation that the error `x` is negligible).

`S_y^2 = S_(y_i)^2` - error (variance, square standard deviation) in the measurement of `y`, assuming that the error is uniform for all values ​​of `y`.

Substituting formulas for calculating `a` and `b` into the resulting expressions we get

`S_a^2 = S_y^2 frac(sum_(i=1)^(n) (sum_(i=1)^(n) x_i^2 — x_i sum_(i=1)^(n) x_i)^2 ) (D^2) = S_y^2 frac((n sum_(i=1)^(n) x_i^2 — (sum_(i=1)^(n) x_i)^2) sum_(i=1) ^(n) x_i^2) (D^2) = S_y^2 frac(sum_(i=1)^(n) x_i^2) (D)` (4.1)

`S_b^2 = S_y^2 frac(sum_(i=1)^(n) (n x_i — sum_(i=1)^(n) x_i)^2) (D^2) = S_y^2 frac( n (n sum_(i=1)^(n) x_i^2 — (sum_(i=1)^(n) x_i)^2)) (D^2) = S_y^2 frac(n) (D) ` (4.2)

In most real experiments, the value of `Sy` is not measured. To do this, it is necessary to carry out several parallel measurements (experiments) at one or several points in the plan, which increases the time (and possibly the cost) of the experiment. Therefore, it is usually assumed that the deviation of `y` from the regression line can be considered random. The estimate of variance `y` in this case is calculated using the formula.

`S_y^2 = S_(y, rest)^2 = frac(sum_(i=1)^n (y_i - a - b x_i)^2) (n-2)`.

The `n-2` divisor appears because our number of degrees of freedom has decreased due to the calculation of two coefficients using the same sample of experimental data.

This estimate is also called the residual variance relative to the regression line `S_(y, rest)^2`.

The significance of coefficients is assessed using the Student’s t test

`t_a = frac(|a|) (S_a)`, `t_b = frac(|b|) (S_b)`

If the calculated criteria `t_a`, `t_b` are less than the tabulated criteria `t(P, n-2)`, then it is considered that the corresponding coefficient is not significantly different from zero with a given probability `P`.

To assess the quality of the description of a linear relationship, you can compare `S_(y, rest)^2` and `S_(bar y)` relative to the mean using the Fisher criterion.

`S_(bar y) = frac(sum_(i=1)^n (y_i — bar y)^2) (n-1) = frac(sum_(i=1)^n (y_i — (sum_(i= 1)^n y_i) /n)^2) (n-1)` - sample estimate of the variance `y` relative to the mean.

To assess the effectiveness of the regression equation to describe the dependence, the Fisher coefficient is calculated
`F = S_(bar y) / S_(y, rest)^2`,
which is compared with the tabular Fisher coefficient `F(p, n-1, n-2)`.

If `F > F(P, n-1, n-2)`, the difference between the description of the relationship `y = f(x)` using the regression equation and the description using the mean is considered statistically significant with probability `P`. Those. regression describes the dependence better than the spread of `y` around the mean.

Click on the chart
to add values ​​to the table

Least square method. The least squares method means the determination of unknown parameters a, b, c, the accepted functional dependence

The least squares method refers to the determination of unknown parameters a, b, c,… accepted functional dependence

y = f(x,a,b,c,…),

which would provide a minimum of the mean square (variance) of the error

, (24)

where x i, y i is a set of pairs of numbers obtained from the experiment.

Since the condition for the extremum of a function of several variables is the condition that its partial derivatives are equal to zero, then the parameters a, b, c,… are determined from the system of equations:

; ; ; … (25)

It must be remembered that the least squares method is used to select parameters after the type of function y = f(x) defined

If, from theoretical considerations, no conclusions can be drawn about what the empirical formula should be, then one has to be guided by visual representations, first of all graphic image observed data.

In practice, they are most often limited to the following types of functions:

1) linear ;

2) quadratic a.

Example.

Experimental data on the values ​​of variables X And at are given in the table.

As a result of their alignment, the function is obtained

Using least square method, approximate these data by a linear dependence y=ax+b(find parameters A And b). Find out which of the two lines better (in the sense of the least squares method) aligns the experimental data. Make a drawing.

The essence of the least squares method (LSM).

The task is to find the linear dependence coefficients at which the function of two variables A And b takes the smallest value. That is, given A And b the sum of squared deviations of the experimental data from the found straight line will be the smallest. This is the whole point of the least squares method.

Thus, solving the example comes down to finding the extremum of a function of two variables.

Deriving formulas for finding coefficients.

A system of two equations with two unknowns is compiled and solved. Finding partial derivatives of a function with respect to variables A And b, we equate these derivatives to zero.

We solve the resulting system of equations using any method (for example by substitution method or ) and obtain formulas for finding coefficients using the least squares method (LSM).

Given A And b function takes the smallest value. The proof of this fact is given.

That's the whole method of least squares. Formula for finding the parameter a contains the sums , , , and parameter n- amount of experimental data. We recommend calculating the values ​​of these amounts separately. Coefficient b found after calculation a.

It's time to remember the original example.

Solution.

In our example n=5. We fill out the table for the convenience of calculating the amounts that are included in the formulas of the required coefficients.

The values ​​in the fourth row of the table are obtained by multiplying the values ​​of the 2nd row by the values ​​of the 3rd row for each number i.

The values ​​in the fifth row of the table are obtained by squaring the values ​​in the 2nd row for each number i.

The values ​​in the last column of the table are the sums of the values ​​across the rows.

We use the formulas of the least squares method to find the coefficients A And b. We substitute the corresponding values ​​from the last column of the table into them:

Hence, y = 0.165x+2.184- the desired approximating straight line.

It remains to find out which of the lines y = 0.165x+2.184 or better approximates the original data, that is, makes an estimate using the least squares method.

Error estimation of the least squares method.

To do this, you need to calculate the sum of squared deviations of the original data from these lines And , a smaller value corresponds to a line that better approximates the original data in the sense of the least squares method.

Since , then straight y = 0.165x+2.184 better approximates the original data.

Graphic illustration of the least squares (LS) method.

Everything is clearly visible on the graphs. The red line is the found straight line y = 0.165x+2.184, the blue line is , pink dots are the original data.

Why is this needed, why all these approximations?

I personally use it to solve problems of data smoothing, interpolation and extrapolation problems (in the original example they might be asked to find the value of an observed value y at x=3 or when x=6 using the least squares method). But we’ll talk more about this later in another section of the site.

Proof.

So that when found A And b function takes the smallest value, it is necessary that at this point the matrix of the quadratic form of the second order differential for the function was positive definite. Let's show it.

Editor's Choice
Your Zodiac sign makes up only 50% of your personality. The remaining 50% cannot be known by reading general horoscopes. You need to create an individual...

Description of the white mulberry plant. Composition and calorie content of berries, beneficial properties and expected harm. Delicious recipes and uses...

Like most of his colleagues, Soviet children's writers and poets, Samuil Marshak did not immediately begin writing for children. He was born in 1887...

Breathing exercises using the Strelnikova method help cope with attacks of high blood pressure. Correct execution of exercises -...
About the university Bryansk State University named after academician I.G. Petrovsky is the largest university in the region, with more than 14...
Macroeconomic calendar
Representatives of the arachnid class are creatures that have lived next to humans for many centuries. But this time it turned out...