ENT 258 Numerical Analysis Lab Laboratory Module
`
EXPERIMENT 5
CURVE FITTING: LEAST-SQUARES REGRESSION
1.0 OBJECTIVES
1.1. To regress data using least-squares line
1.2. To linearize non-linear functions
1.3. To curve fit data using polynomial and other functions
2.0. EQUIPMENT
Computers and Matlab program.
3.0 INTRODUCTION & THEORY
In many practical situations, data are available at discrete points, and we are required to fit a smooth and continuous function (curve) to this data. The data may be exact or approximate. To determine the function, we need to find certain coefficient or parameters. The curve fitting to a set of data points can be done using two approaches. In the first approach is called collocation. This approach is used either when the data are known to be accurate or when the data are generated by evaluating a complicated function at a discrete set of points. Usually, a polynomial, a trigonometric function, or an exponential function is used to approximate a set of data points.
In the second approach, the curve is made to represent the general trend of the data. This approach is useful when there are more data points than the number of unknown coefficient or when data appear to have significant error or noise
3.1. MATLAB FUNCTION
MATLAB has a built-in function polyfit that fits a least-squares nth-order polynomial to data. It can be applied as in
> p = polyfit (x, y, n)
where x and y are the vectors of the independent and the dependent variable, respectively, and n = the order of the polynomial. The function returns a vector p containing the polynomial’s coefficients. We should note that it represents the polynomial using decreasing powers of x as in the following representation:
Because a straight line is a first-order polynomial, polyfit(x,y,1) will return the slope and the intercept of the best-fit straight line.
> x=[10 20 30 40 50 60 70 80];
> y=[25 70 380 550 610 1220 830 1450];
> a=polyfit(x,y,1)
a =
19.4702 -234.2857
> xp=linspace(min(x),max(x),2);
> yp=a(1)*xp+a(2);
> plot(x,y,'o',xp,yp)
Thus, the slope is 19.4702 and the intercept is -234.2857
Another function, polyval, can then be used to compute a value using the coefficients. It has the general format:
> y = polyval (p, x)
Where p = the polynomial coefficients, and y = the best-fit value at x. For example,
> y=polyval(a,45)
y =
641.8750
3.2. LINEAR REGRESSION
Regression is the method of obtaining the best fit to a given set of data. The procedure of fitting the best straight line(linear equation) to the data is known as linear regression. Let the data points be given by , where x is the independent variable and y is the dependent variable. The equation of straight line can be expressed as
(5.1)
where and are constants to be determined. Since the linear Equation(5.1) is only an approximating function, there will be an error between the model, Eq(5.1), and the data points (true values). The error or residual at the data point (xi, yi) is given by
(5.2)
The most common approach used to fit the best straight line is by using the method of least squares. For this, we formulate the sum of squares of error (S) as
(5.3)
And minimize S with respect to the parameters and . For minimizing S with respect to and , we differentiate S with respect to and and set them equal to zero. This gives
(5.4)
(5.5)
Equation (5.4) and (5.5) can be rewritten as follows:
(5.6)
(5.7)
Noting that we can express Eqs.(5.6) and (5.7) as two simultaneous llinear equations in the unknowns and as
(5.8)
(5.9)
These equations can be solved for the unknowns and to obtain
(5.10)
(5.11)
Accuracy of linear Regression can be improved with to define a quantity r, known as the correlation coefficient, as
(5.12)
Flow Chart Linear Regression
Example 5.1
Use linear regression to fit the following data points:
i / 1 / 2 / 3 / 4 / 5xi / 1 / 2 / 3 / 4 / 5
yi / 0.7 / 2.2 / 2.8 / 4.4 / 4.9
Procedures-MATLAB Program
1. Start a new MatLab file by clicking on File, New and M-file that opens an empty file in the Editor/Debugger window.
2. Write the program given below in M-file
function f=line(x,y)
%linregr(x,y):
%Least squares fit of a straight line to data
%by solving the normal equations.
%input:
%x=independent variable
%y=dependent variabl
%a=vector of slope, a1, and intercept, a0
%r2=coefficient of determination
n=length(x);
if length(y)~=n
error('x and y must be same length');
end
x=x(:); y=y(:); %convert to column vectors.
sx=sum(x);
sy=sum(y);
sx2=sum(x.*x);
sxy=sum(x.*y);
sy2=sum(y.*y);
a1=(n*sxy-sx*sy)/(n*sx2-sx^2);
a0=sy/n-a1*sx/n;
r2=((n*sxy-sx*sy)/sqrt(n*sx2-sx^2)/sqrt(n*sy2-sy^2))^2;
3. Click on Save As to save it as line.m.
4. Define values of x=[1 2 3 4 5] and y=[0.7 2.2 2.8 4.4 4.9] in Command Window.
5. To see how it works, type line(x,y) in MatLab Command Window.
6. Create a Plot of data and best line in Command Window
xp=1:1:5;
yp=a(1)*xp+a(2);
plot(x,y,'o',xp,yp)
grid on
3.3. POLYNOMIAL REGRESSION
As seen in the previous section, linear regression provides the best straight-line fit for a given set of data. The use of linear regression implies that the relationship between the independent and dependent variable is linear. In many engineering applications, the independent and dependent variables may be related nonlinearly
For data points exhibiting a nonlinear behavior, we can use a polynomial regression by assuming the following relationship:
(5.1)
If there are n data points (the error, ei, for the ith data point is defined by
The sum of squares error (S) is given by
For the least-squares fit of polynomial, we minimize S with respect to the m+1 unknown coefficient. For this, we set partial derivatives of S with respect to the coefficients equal to zeros:
(5.2)
(5.3)
.
.
.
(5.4)
Equations (2.31) through (2.33) can be rewritten as a system of m+1 simultaneous linear equations in the unknowns as
(5.5)
(5.6)
(5.7)
The solution of Eqs.(5.5) through (5.7) gives the coefficient that defines the polynomial fit of Eqs.(5.1).
Flow Chart Polynomial Regression
k = 1
k = k +1
Example 5.2
The velocities measured at various points along the boundary layer in free convection over a vertical plate are given below (in nondimensional form);
i / 1 / 2 / 3 / 4 / 5 / 6xi(thickness)
yi(velocity) / 0.0
0.00 / 0.2
1.05 / 0.4
0.85 / 0.6
0.35 / 0.8
0.10 / 1.0
1.00
Fit a second-order polynomial to the data
Procedures-MATLAB Program
1. Start a new MatLab file by clicking on File, New and M-file that opens an empty file in the Editor/Debugger window.
2. Write the program given below in M-file
function C = poly(X,Y,M)
%Input - X is the 1xn abscissa vector
% - Y is the 1xn ordinate vector
% - M is the degree of the least-squares polynomial
%Output - C is the coefficient list for the polynomial
n=length(X);
%Fill the columns of F with the powers of X
for k=1:M+1
F(:,k)=X'.^(k-1);
end
%Solve the linear system [A][B]=[C]using elimination method
A=F'*F;
B=F'*Y';
C=A\B;
3. Click on Save As to save it as poly.m.
4. Define values of X= [0.0 1 2 3 4 5], Y= [2.1 7.7 13.6 27.2 40.9 61.1] and M=2 in Command Window
5. To see how it works, type poly(X,Y,M) in MatLab Command Window.
6. Create a Plot of data and best line in Command Window
yp=C(1)+(C(2).*X)+(C(3).*(X.^2))
plot(X,yp)
hold on
plot(X,Y,'*')
grid on
3.4. LINEARIZATION OF NONLINEAR RELATIOSHIPS
Certain types of nonlinear relations can be linearized by using suitable transformation. In such cases, linear regression can be used to yield, effectively, a nonlinear curve fit. For example, for the regression we have
This equation can be linearized by taking logarithms on both sides:
By defining a new variable Y = ln y, we can use a linear regression with the relation
where a0 = ln a and a1 = b are the new unknown coefficients.
Table 1: Nonlinear relations that can be reduced to linear form
Original Relation Transformation and linear relation
1.
2.
3.
Flow Chart Linearization of Nonlinear Relationship
Example 5.3
The vibration amplitude of machine (xi) is measured at different instant of time (ti) and the results are as follows:
i / 1 / 2 / 3 / 4Ti (sec) / 0 / 2 / 4 / 6
Xi (mm) / 5 / 3.7 / 2.7 / 2
Fit a curve of the form using the data given.
Solution
The linear equation
(1)
By defining y = lnx, c = lna, and d = -b, can write Eq(1) as
y = c + dt
The sum of the squares of error, S, can be expressed as
Procedures-MATLAB Program
1. Start a new MatLab file by clicking on File, New and M-file that opens an empty file in the Editor/Debugger window.
2. Write the program given below in M-file
function f=line(x,y)
%linregr(x,y):
%Least squares fit of a straight line to data
%by solving the normal equations.
%input:
%x=independent variable-T
%y=dependent variable-X
%a=vector of slope, ao, and intercept, a1
%r2=coefficient of determination
y1=log(y)
n=length(x);
if length(y1)~=n
error('x and y1 must be same length');
end
x=x(:); y1=y1(:); %convert to column vectors.
sx=sum(x);
sy=sum(y1);
sx2=sum(x.*x);
sxy=sum(x.*y1);
sy2=sum(y1.*y1);
a0=((sy*sx2)-(sx*sxy))/((n*sx2)-(sx^2));
a1=((n*sxy)-(sx*sy))/((n*sx2)-(sx^2));
r2=((n*sxy-sx*sy)/sqrt(n*sx2-sx^2)/sqrt(n*sy2-sy^2))^2;
%ln a =c
%a=e(c)
c=exp(a0);
d=a1;
3. Click on Save As to save it as line.m.
4. Define values of X,Y in Command Window
5. To see how it works, type line(X,Y) in MatLab Command Window.
The solution of this problem c = 1.6106 and d = -0.1532, which imply that a = ec = 5.0060 and b = 0.1532. Thus, the least-squares fit leads to the curve
mm
Create a Plot of data and best fit line in Command Window
yp=a0*exp(a1*x)
plot(x,y,'o',xp,yp)
grid on
4.0. EXERCISES
6.1. The heat-transfer coefficient (h) in a forced convection heat transfer in cross-flow past a cylinder at room temperature is found to vary with the velocity of fluid(v) flowing past the cylinder as follows:
V (m/s) / 2 / 4 / 6 / 8h (w/m2-K) / 6000 / 10000 / 13000 / 15000
Fit a linear equation between h and V.
6.2. Experiments conducted during the machining of AISI-4140 steel with fixed values of depth of cut and feed rate yielded the following results:
Cutting speed, V(m/min) / Tool life, T
(min)
160 / 7
180 / 5.5
200 / 5
220 / 3.5
240 / 2
Determine the tool life equation, , where a and b are constants, using the method of lest squares.
6.3. The variation of heat transfer per unit area (q) during the boiling of water under pressure (p) has been found to be as follows:
q (MW/m2) / 1.1 / 2.4 / 3.4 / 3.9 / 4.0 / 3.8 / 3.0 / 1.2P
(MPa) / 0 / 1 / 2 / 4 / 6 / 10 / 15 / 20
Develop a suitable polynomial relation between q and p.
6.4. The vertical displacement of a large electric motor mounted on isolators due to the forced vibration caused by the rotating unbalance in the rotor is shown in the following table:
Speed of motor, vi (rpm) / 100 / 200 / 300 / 400 / 500Displacement, di (mm) / 0.10 / 0.35 / 0.70 / 0.40 / 0.35
Develop a suitable equation between v and d.
Note: Show all the steps of the computation
5