Andrew Pownuk - Math 4329 (Numerical Analysis)
Table of Contents
1Solution of systems of linear equations
1.1(*) Matrix notation
1.2(*) Matrix algebra
1.1(*) Solvability
1.2.1Introduction
1.1.1Unique solution
1.2.2Inconsisten system (no solution)
1.2.3Infinitely many solutions
1.2.4Infinitely many solutions
1.3(*) Reduced Row Echelon Form
1.3.1Calculate echelon form for and find the solution
1.3.2Calculate echelon form for and find the solution
1.3.3Calculate echelon form for and find the solution
1.4(*) Null space
1.4.1Example
1.5(*) Cramer's rules
1.5.1Solve the following system of equation by using the Cramer's rule .
1.5.2Solve the following system of equation by using the Cramer's rule .
1.5.3Solve the following system of equation by using the inverse matrix .
1.6(*) Inverse matrix
1.6.1Solve the following system of equation by using the inverse matrix .
1.6.2Solve the following system of equation by using the inverse matrix.
1.7Gauss elimination
1.7.1Solve the following system of equations by using the Gauss elimination
1.1.1Example
1.1.2Example
1.7.2Solve the following system of equations
1.7.3Example
1.7.4Solve the following system of equations
1.7.5Solve the following system of equations by using Gauss elimination
1.7.6Solve the following system of equations by using Gauss elimination
1.7.7Solve the following system of equations by using Gauss elimination
1.7.8Solve the following system of equations by using the Gauss elimination
1.1.3Example
1.7.9(*) General method
1.7.10(*) Matlab code
1.2(*) Gauss-Jordan elimination
1.2.1Introduction
1.2.2Example
1.2.3Example
1.2.4Example
1.2.5Example
1.8LU decomposition
1.8.1Introduction
1.8.2Solve the following system of equation by using LU decomposition
1.8.3(*) Solve the following system of equation by using LU decomposition
1.8.4Example
1.8.5Example
1.8.6Solve the following system of equations by using LU decomposition
1.8.7Solve the following system of equations by using LU decomposition
1.8.8Solve the following system of equations by using LU decomposition
1.8.9Solve the following system of equations by using LU decomposition
1.8.10Solve the following system of equations by using LU decomposition
1.8.11Example
1.9LU decomposition with partial pivoting
1.9.1Solve the following system of equations by using LU decomposition
1.9.2Example
1.9.3Solve the following system of equations by using LU decomposition
1.9.4Solve the following system of equations by using LU decomposition
1.9.5Solve the following system of equations by using LU decomposition
1.10Inverse matrix and LU decomposition
1.10.1Intoduction
1.10.2Example
1.10.3Find an inverse matrix of the matrix . Use the decomposition.
1.11Elementary matrix operations
1.11.1Switching the rows
1.11.2Add rows multiplied by the number
1.11.3Multiplication by number
1.11.4Properties of the elementary matrices
1.12Matrix representation of the LU decomposition
1.13Matrix representation of the LU decomposition with partial pivoting
1.13.1Example
1.14(*) Software
1.15(***) LU decomposition in parallel - General algorithm
1.15.1General algorithm
1.15.2Test example
1.15.3upper_solve(M10, M00, nby2, n)
1.15.4lower_solve(M10, M00, nby2, n);
1.15.5schur(B11, B01, L10, nby2, N);
1.15.6LU(M11, nby2, n);
1.15.7Verification in Mathmatica
1.15.8Upper solve serial
1.15.9Lower solve serial
1.15.10C-code - LU-decomposition – serial version
1.15.11C-code - LU-decomposition – parallel version
1.15.12Results
1.16Iterative methods
1.16.1Introduction
1.16.2(*) Convergence analysis – Jacobi method
1.16.3(*) Convergence of the iterative methods
1.16.4Jacobi method
1.16.5Introduction
1.16.6Example
1.16.7Gauss-Seidel method
1.16.8(*) Successive Over-Relaxation Method - SOR
1.16.9(*) Conjugent gradient method
1.16.10(*) Steepest descent and conjugant gradient method
1.16.11(*) Preconditioning for PCG method
1.16.12(*) GMRES (Generalized Minimal Residual) – nonsymmetrical systems
2Review
2.1Test 2 - Summer 2016
1Solution of systems of linear equations
1.1(*) Matrix notation
1.2(*) Matrix algebra
1.1(*) Solvability
1.2.1Introduction
Theorem
System is solvable if .
System is inconsistent if
Heuristics!!!
------
if
number of equations = number of variables
then
the solution is unique
------
if
number of equations < number of variables
then
we have infinitely many solutions.
------
if
number of equations > number of variables
then
we have no solutions.
1.1.1Unique solution
Graphical solution
Plot[{-x+3,x-1},{x,-1,5}]
Analytical solution
Method 1
Method 2
1.2.2Inconsisten system (no solution)
Plot[{x+3,x-1},{x,-1,5}]
Overdetermined system of equations
1.2.3Infinitely many solutions
Plot[{x+3},{x,-1,5}]
1.2.3.1Graphical method
gr1=ContourPlot3D[{(x+y+z1)},{x,-2,2},{y,-2,2},{z,-2,2}]
gr2=ContourPlot3D[{2x-y+z0},{x,-2,2},{y,-2,2},{z,-2,2}]
gr3=ContourPlot3D[{4x+y+3z3},{x,-2,2},{y,-2,2},{z,-2,2}]
Show[gr1,gr2,gr3]
1.2.3.2Analytical method
1.2.4Infinitely many solutions
Plot[{-x+3,-x+3},{x,-1,5}]
Underdetermined system of equations
1.2.4.1Graphical method
gr1=ContourPlot3D[{(x+y+z==1)},{x,-2,2},{y,-2,2},{z,-2,2}]
gr2=ContourPlot3D[{(1/4)x-(1/2)y+(3/4)z==0},{x,-2,2},{y,-2,2},{z,-2,2}]
gr3=ContourPlot3D[{x+7y-3z==3},{x,-2,2},{y,-2,2},{z,-2,2}]
Show[gr1,gr2,gr3]
1.2.4.2Analytical method
1.3(*) Reduced Row Echelon Form
Definition (Reduced Row Echelon Form)
1) The first nonzero entry (called the leading entry) in each row is a 1.
2) The column of the leading entries are clear
3) The leading entry in each row is to the right of the leading entry above and any rows of zeros are at the bottom.
1.3.1Calculate echelon form for and find the solution
1.3.2Calculate echelon form for and find the solution
1.3.3Calculate echelon form for and find the solution
Solution in parametric form
1.4(*) Null space
Definition
1.4.1Example
Alternative method
then the null space can be defined in the following way
1.5(*) Cramer's rules
1.5.1Solve the following system of equation by using the Cramer's rule .
1.5.2Solve the following system of equation by using the Cramer's rule .
1.5.3Solve the following system of equation by using the inverse matrix .
1.6(*) Inverse matrix
1.6.1Solve the following system of equation by using the inverse matrix .
1.6.2Solve the following system of equation by using the inverse matrix.
1.7Gauss elimination
1.7.1Solve the following system of equations by using the Gauss elimination
1.1.1Example
Method 2
Solution
1.1.2Example
1.7.2Solve the following system of equations
In[49]:= Det[{{2, 2, 1}, {1, -2, 2}, {2, 1, -1}}]
Inverse[{{2, 2, 1}, {1, -2, 2}, {2, 1, -1}}].{{7}, {2}, {4}}
Out[49]= 15
Out[50]= {{2}, {1}, {1}}
1.7.3Example
1.7.4Solve the following system of equations
1.7.5Solve the following system of equations by using Gauss elimination
1.7.6Solve the following system of equations by using Gauss elimination
1.7.7Solve the following system of equations by using Gauss elimination
Veryfication
A={{2,1,1},{1,2,1},{4,3,1}};
b={{8},{9},{18}};
MatrixForm[A]
MatrixForm[b]
Inverse[A].b
(_{
{2, 1, 1},
{1, 2, 1},
{4, 3, 1}
}_)
(_{
{8},
{9},
{18}
}_)
{{2},{3},{1}}
1.7.8Solve the following system of equations by using the Gauss elimination
Verification
A={{1,2,0,1},
{1,1,0,1},
{0,2,0,1},
{2,2,1,1}
}
b={{5},{4},{3},{8}}
Inverse[A].b
Det[A]
{{1,2,0,1},{1,1,0,1},{0,2,0,1},{2,2,1,1}}
{{5},{4},{3},{8}}
{{2},{1},{1},{1}}
1
1.1.3Example
1.7.9(*) General method
Generalization
1.7.10(*) Matlab code
%Gauss elimination - data
A=[ 1 1 1;
2 -1 1;
1 3 2]
b=[ 4;
4;
7]
n=3;
%Gauss elimination - calculations
%iteration
for iteration=1:n-1
%row
for row=iteration+1:n
m = (A(row,iteration)/A(iteration,iteration));
%column
for col=iteration:n
A(row,col) = A(row,col) - A(iteration,col)*m;
end
b(row) = b(row) - b(iteration) * m;
end
A
b
end
%solution
x=zeros(n,1);
x(n)=b(n)/A(n,n);
for i=n-1:-1:1
sum=0;
for j=i:n
sum=sum+A(i,j)*x(j);
end
x(i)=(b(i)-sum)/A(i,i);
end
disp('solutions')
x
------
------
------
LAPack
1.2(*) Gauss-Jordan elimination
1.2.1Introduction
1.2.2Example
1.2.3Example
Method 2
1.2.4Example
Method 2
1.2.5Example
1.8LU decomposition
1.8.1Introduction
1.8.2Solve the following system of equation by using LU decomposition
U={{1,1,1},{0,-1,-1},{0,0,-1}};
L={{1,0,0},{2,1,0},{1,-1,1}};
MatrixForm[U]
MatrixForm[L]
MatrixForm[L.U]
1.8.3(*) Solve the following system of equation by using LU decomposition
1.8.4Example
U={{2,2,1},{0,-2,-1/2},{0,0,5/4}};
L={{1,0,0},{3/2,1,0},{1/2,-1/2,1}};MatrixForm[L.U]
1.8.5Example
L = {{1, 0, 0}, {3/2, 1, 0}, {1/2, -1, 1}};
U = {{2, 2, 1}, {0, -1, -1/2}, {0, 0, 1}};
MatrixForm[L.U]
1.8.6Solve the following system of equations by using LU decomposition
1.8.7Solve the following system of equations by using LU decomposition
1.8.8Solve the following system of equations by using LU decomposition
1.8.9Solve the following system of equations by using LU decomposition
Verification
In[779]:= L={{1,0,0},{2,1,0},{1,0,1}};
U={{1,2,1},{0,1,1},{0,0,1}};
MatrixForm[L.U]
Result:
{"1", "2", "1"},
{"2", "5", "3"},
{"1", "2", "2"}
},
1.8.10Solve the following system of equations by using LU decomposition
1.8.11Example
Verification
Calculate
Calculate
1.9LU decomposition with partial pivoting
1.9.1Solve the following system of equations by using LU decomposition
Verification
The algorithm
Calculations
1.9.2Example
Verification
Solve
Solve
1.9.3Solve the following system of equations by using LU decomposition
Veryfication
The algorithm
Calculations
1.9.4Solve the following system of equations by using LU decomposition
Solution
1.9.5Solve the following system of equations by using LU decomposition
Solution
1.10Inverse matrix and LU decomposition
1.10.1Intoduction
1.10.2Example
Verification of the results
A={{1,2,1},{2,5,3},{1,2,2}}
Inverse[A]
{{1,2,1},{2,5,3},{1,2,2}}
{{4,-2,1},{-1,1,-1},{-1,0,1}}
1.10.3Find an inverse matrix of the matrix . Use the decomposition.
Solution
From the previous problem we know the LU decomposition of the matrix .
Inverse matrix
1.11Elementary matrix operations
1.11.1Switching the rows
Switch the rows 2 i 3
or ,
Example
1.11.2Add rows multiplied by the number
Operation
can be represented by the following matrix
Example
Operation
can be represented by the following matrix
Example
1.11.3Multiplication by number
Operation
can be represented by the following matrix
Example
1.11.4Properties of the elementary matrices
Example
1.12Matrix representation of the LU decomposition
LU decomposition
1.13Matrix representation of the LU decomposition with partial pivoting
We know that
We know that
- is a new lower triangular matrix after switching the rows.
Let’s perform another rows operations
and let’s perform a new permuation
then
We know that where is a new lower triangular matrix after appropriate transpositions of the rows (according to ), then
where
On the other hand, if we perform appropriate operations on the matrix A we will get the upper triangular matrix .
From the previus calculations we know that , then
Inverse matrix to the lower triangular matrix is a lower triangular matrix, then
Finally the decomposition has the following form.
If we know the LU decomposition of the matrix , then it is possible to calculate the solution.
Computational method for LU decomposition with the partial pivoting.
Algorithm
Let us consider.
Calculate the upper triangular matrix, lower triangular matrix , and .
1.13.1Example
Pivoting
The matrix L
Numerical results
The algorithm
Calculations
1.14(*) Software
LAPACK
Intel® Math Kernel Library (Intel® MKL)
1.15(***) LU decomposition in parallel - General algorithm
1.15.1General algorithm
1.15.2Test example
1.15.2.1LU(M00, nby2, n)
Input
Output (after LU decomposition)
1.15.3upper_solve(M10, M00, nby2, n)
M00 – LU decomposition
M10 – RHS
upper_solve(B, LU, nby2, n)
B – LU decomposition
B’ – RHS
B’*U=B
B’=B*U^(-1)
1.15.3.1top_upper_half_solve(B00, B01, B, nby2, n);
upper_solve(B00, U00, n, N);
schur(B01, B00, U01, n, N);
upper_solve(B01, U11, n, N);
Final result: top_upper_half_solve(B00, B01, B, 1, 4);
1.15.3.2bottom_upper_half_solve(B10, B11, LU, nby2, N);
upper_solve(B10, U00, n, N);
schur(B01, B00, U01, n, N);
upper_solve(B11, U11, n, N);
Final result: bottom_upper_half_solve(B10, B11, LU, nby2, N);
1.15.4lower_solve(M10, M00, nby2, n);
lower_solve(B’, LU, nby2, n);
1.15.4.1left_half_lower_solve(B00, B10, LU, nby2, N);
lower_solve(B00, L00, n, N);
schur(B10, B00, L10, n, N);
lower_solve(B10, L11, n, N);
1.15.4.2right_half_lower_solve(B00, B11, LU, nby2, N);
lower_solve(B01, L00, n, N);
schur(B11, B01, L10, n, N);
lower_solve(B11, L11, n, N);
1.15.5schur(B11, B01, L10, nby2, N);
1.15.6LU(M11, nby2, n);
1.15.7Verification in Mathmatica
M = {{1, 2, 3, 4}, {2, 9, 12, 15}, {3, 26, 41, 49}, {5, 40, 107, 135}};
Print["Matrix M"];
MatrixForm[M]
MatrixForm[LUDecomposition[M]]
LU = {{1, 2, 3, 4}, {2, 5, 6, 7}, {3, 4, 8, 9}, {5, 6, 7, 10}};
Print["LU decomposition"]
MatrixForm[LU]
1.15.8Upper solve serial
void upper_solve_serial(double *B, double* LU, int n, int N)
{
/* solve */
/* B'*U=B */
int i, j, k;
double sum;
for (i = 0; i < n; i++)
{
B[i*N + 0] = B[i * N + 0] / LU[0 * N + 0];
for (j = 1; j < n; j++)
{
sum = 0;
for (k = 0; k < j; k++)
sum += B[i*N + k] * LU[k*N + j];
B[i*N + j] = (B[i*N + j] - sum) / LU[j*N + j];
}
}
}
1.15.9Lower solve serial
void lower_solve_serial(double *B, double* LU, int n, int N)
{
/* solve */
/* L*B'=B */
int i, j, k;
double sum;
for (i = 0; i < n; i++)
{
B[0 * N + i] = B[0 * N + i];
for (j = 1; j < n; j++)
{
sum = 0;
for (k = 0; k < j; k++)
sum += LU[j*N + k] * B[k*N + i];
B[j*N + i] = (B[j*N + i] - sum);
}
}
}
1.15.10C-code - LU-decomposition – serial version
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define DebugMode 0
int min_block_size = 2;
void LU(double *A, int n, int N)
{
int iteration, row, col;
double m;
for (iteration = 0; iteration < n - 1; iteration++)
{
for (row = iteration + 1; row < n; row++)
{
m = A[row*N + iteration] / A[iteration*N + iteration];
col = iteration;
A[row*N + col] = m;
for (col = iteration + 1; col < n; col++)
A[row*N + col] = A[row*N + col] - A[iteration*N + col] * m;
}
}
}
void PrintM(double *M, char *matrixId, int nby2, int N)
{
int i, j;
for (i = 0; i < nby2; i++)
{
for (j = 0; j < nby2; j++)
printf("%s[%d,%d]=%5.1f ", matrixId, i, j, M[i*N + j]);
printf("\n");
}
}
void schur(double *M, double *V, double *W, int n, int N);
void top_upper_half_solve(double *B00, double *B01, double *U, int n, int N);
void bottom_upper_half_solve(double *B10, double *B11, double *U, int n, int N);
void left_half_lower_solve(double *B00, double *B10, double *L, int n, int N);
void right_half_lower_solve(double *B01, double *B11, double *L, int n, int N);
void upper_solve_serial(double *B, double* LU, int n, int N)
{
/* solve */
/* B'*U=B */
int i, j, k;
double sum;
for (i = 0; i < n; i++)
{
B[i*N + 0] = B[i * N + 0] / LU[0 * N + 0];
for (j = 1; j < n; j++)
{
sum = 0;
for (k = 0; k < j; k++)
sum += B[i*N + k] * LU[k*N + j];
B[i*N + j] = (B[i*N + j] - sum) / LU[j*N + j];
}
}
}
void lower_solve_serial(double *B, double* LU, int n, int N)
{
/* solve */
/* L*B'=B */
int i, j, k;
double sum;
for (i = 0; i < n; i++)
{
B[0 * N + i] = B[0 * N + i];
for (j = 1; j < n; j++)
{
sum = 0;
for (k = 0; k < j; k++)
sum += LU[j*N + k] * B[k*N + i];
B[j*N + i] = (B[j*N + i] - sum);
}
}
}
void upper_solve(double *B, double* LU, int n, int N)
{
/* Solve B'*U=B */
/* U - given */
/* B - given */
/* solve for B' */
double *B00, *B01, *B10, *B11;
if (n > min_block_size)
{
int nby2 = n / 2;
B00 = B;
B01 = B + nby2;
B10 = B + nby2*N;
B11 = B + nby2*N + nby2;
top_upper_half_solve(B00, B01, LU, nby2, N);
bottom_upper_half_solve(B10, B11, LU, nby2, N);
}
else if (n == min_block_size)
{
upper_solve_serial(B, LU, n, N);
}
else
{
printf("Error : [upper_solve] : n <= 0");
}
}
void top_upper_half_solve(double *B00, double *B01, double *U, int n, int N)
{
double *U00, *U01, *U11;
U00 = U;
U01 = U + n;
U11 = U + n*N + n;
upper_solve(B00, U00, n, N);
schur(B01, B00, U01, n, N);
upper_solve(B01, U11, n, N);
}
void bottom_upper_half_solve(double *B10, double *B11, double *U, int n, int N)
{
double *U00, *U01, *U11;
U00 = U;
U01 = U + n;
U11 = U + n*N + n;
upper_solve(B10, U00, n, N);
schur(B11, B10, U01, n, N);
upper_solve(B11, U11, n, N);
}
void schur(double *M, double *V, double *W, int n, int N)
{
double sum = 0;
int i, j, k;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
{
sum = 0;
for (k = 0; k < n; k++)
sum += V[i*N + k] * W[k*N + j];
M[i*N + j] = M[i*N + j] - sum;
}
}
void lower_solve(double *B, double* LU, int n, int N)
{
/* Solve L*B'=B */
/* U - given */
/* B - given */
/* solve for B' */
double *B00, *B01, *B10, *B11;
int nby2;
if (n > min_block_size)
{
nby2 = n / 2;
B00 = B;
B01 = B + nby2;
B10 = B + nby2*N;
B11 = B + nby2*N + nby2;
left_half_lower_solve(B00, B10, LU, nby2, N);
right_half_lower_solve(B01, B11, LU, nby2, N);
}
else if (n == min_block_size)
{
lower_solve_serial(B, LU, n, N);
}
else
{
printf("Error : [upper_solve] : n <= 0");
}
}
void left_half_lower_solve(double *B00, double *B10, double *LU, int n, int N)
{
double *L00, *L10, *L11;
L00 = LU;
L10 = LU + n*N;
L11 = LU + n*N + n;
lower_solve(B00, L00, n, N);
schur(B10, B00, L10, n, N);
lower_solve(B10, L11, n, N);
}
void right_half_lower_solve(double *B01, double *B11, double *L, int n, int N)
{
double *L00, *L10, *L11;
L00 = L;
L10 = L + n*N;
L11 = L + n*N + n;
lower_solve(B01, L00, n, N);
schur(B11, B01, L10, n, N);
lower_solve(B11, L11, n, N);
}
int main(int argc, char** argv)
{
int n = 16;
int nby2 = n / 2;
int i, j, k;
int errorFlag;
double sum;
double *M1, *M2, *L, *U;
double *M00, *M01, *M10, *M11;
if(argc==2)
{
n = atoi(argv[1]);
nby2 = n / 2;
}
else if(argc==3)
{
n = atoi(argv[1]);
nby2 = n / 2;
min_block_size = atoi(argv[2]);
}
printf("Start: n=%d min_block_size=%d\n", n, min_block_size);
M1 = (double*)malloc(n*n*sizeof(double));
M2 = (double*)malloc(n*n*sizeof(double));
L = (double*)calloc(n*n,sizeof(double));
U = (double*)calloc(n*n,sizeof(double));
for (i = 0; i < n; i++)
for (j = 0; j <= i; j++)
{
L[i*n + j] = 1.0;
M2[i*n + j]=L[i*n + j];
}
for (i = 0; i < n; i++)
for (j = i; j < n; j++)
{
U[i*n + j] = 2.0;
M2[i*n + j] = U[i*n + j];
}
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
{
sum = 0;
for (k = 0; k < n; k++)
sum += L[i*n + k] * U[k*n + j];
M1[i*n + j] = sum;
};
#if DebugMode == 1
PrintM(L, "L", n, n);
printf("\n");
PrintM(U, "U", n, n);
printf("\n");
PrintM(M1, "M", n, n);
printf("\n");
PrintM(M2, "LU", n, n);
#endif
M00 = M1;
M01 = M1 + nby2;
M10 = M1 + nby2*n;
M11 = M1 + nby2*n + nby2;
LU(M00, nby2, n);
upper_solve(M10, M00, nby2, n);
lower_solve(M01, M00, nby2, n);
schur(M11, M10, M01, nby2, n);
LU(M11, nby2, n);
#if DebugMode == 1
printf("\n");
PrintM(M1, "M", n, n);
#endif
errorFlag = 0;
for (i = 0; (i < n) & (errorFlag == 0); i++)
for (j = 0; (j < n) & (errorFlag==0); j++)
if ( fabs( M1[i*n + j] - M2[i*n + j] ) > 0.0001 )
{
printf("Error: %d %d", i, j);
errorFlag = 1;
}
if ( errorFlag == 0 )
{
printf("\nSuccess\n");
}
else
{
printf("\nFailure\n");
}
}
1.15.11C-code - LU-decomposition – parallel version
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <cilk/cilk.h>
#define DebugMode 0
int min_block_size = 2;
void LU(double *A, int n, int N)
{
int iteration, row, col;
double m;
for (iteration = 0; iteration < n - 1; iteration++)
{
for (row = iteration + 1; row < n; row++)
{
m = A[row*N + iteration] / A[iteration*N + iteration];
col = iteration;
A[row*N + col] = m;
for (col = iteration + 1; col < n; col++)
A[row*N + col] = A[row*N + col] - A[iteration*N + col] * m;
}
}
}
void PrintM(double *M, char *matrixId, int nby2, int N)
{
int i, j;
for (i = 0; i < nby2; i++)
{
for (j = 0; j < nby2; j++)
printf("%s[%d,%d]=%5.1f ", matrixId, i, j, M[i*N + j]);
printf("\n");
}
}
void schur(double *M, double *V, double *W, int n, int N);
void top_upper_half_solve(double *B00, double *B01, double *U, int n, int N);
void bottom_upper_half_solve(double *B10, double *B11, double *U, int n, int N);
void left_half_lower_solve(double *B00, double *B10, double *L, int n, int N);
void right_half_lower_solve(double *B01, double *B11, double *L, int n, int N);
void upper_solve_serial(double *B, double* LU, int n, int N)
{
/* solve */
/* B'*U=B */
int i, j, k;
double sum;
for (i = 0; i < n; i++)
{
B[i*N + 0] = B[i * N + 0] / LU[0 * N + 0];
for (j = 1; j < n; j++)
{
sum = 0;
for (k = 0; k < j; k++)
sum += B[i*N + k] * LU[k*N + j];
B[i*N + j] = (B[i*N + j] - sum) / LU[j*N + j];
}
}
}
void lower_solve_serial(double *B, double* LU, int n, int N)
{
/* solve */
/* L*B'=B */
int i, j, k;
double sum;
for (i = 0; i < n; i++)
{
B[0 * N + i] = B[0 * N + i];
for (j = 1; j < n; j++)
{
sum = 0;
for (k = 0; k < j; k++)
sum += LU[j*N + k] * B[k*N + i];
B[j*N + i] = (B[j*N + i] - sum);
}
}
}
void upper_solve(double *B, double* LU, int n, int N)
{
/* Solve B'*U=B */
/* U - given */
/* B - given */
/* solve for B' */
double *B00, *B01, *B10, *B11;
if (n > min_block_size)
{
int nby2 = n / 2;
B00 = B;
B01 = B + nby2;
B10 = B + nby2*N;
B11 = B + nby2*N + nby2;
cilk_spawn top_upper_half_solve(B00, B01, LU, nby2, N);
bottom_upper_half_solve(B10, B11, LU, nby2, N);
}
else if (n == min_block_size)
{
upper_solve_serial(B, LU, n, N);
}
else
{
printf("Error : [upper_solve] : n <= 0");
}
}
void top_upper_half_solve(double *B00, double *B01, double *U, int n, int N)
{
double *U00, *U01, *U11;
U00 = U;
U01 = U + n;
U11 = U + n*N + n;
upper_solve(B00, U00, n, N);
schur(B01, B00, U01, n, N);
upper_solve(B01, U11, n, N);
}
void bottom_upper_half_solve(double *B10, double *B11, double *U, int n, int N)
{
double *U00, *U01, *U11;
U00 = U;
U01 = U + n;
U11 = U + n*N + n;
upper_solve(B10, U00, n, N);
schur(B11, B10, U01, n, N);
upper_solve(B11, U11, n, N);
}
void schur(double *M, double *V, double *W, int n, int N)
{
double sum = 0;
int i, j, k;
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
{
sum = 0;
for (k = 0; k < n; k++)
sum += V[i*N + k] * W[k*N + j];
M[i*N + j] = M[i*N + j] - sum;
}
}
void lower_solve(double *B, double* LU, int n, int N)
{
/* Solve L*B'=B */
/* U - given */
/* B - given */
/* solve for B' */
double *B00, *B01, *B10, *B11;
int nby2;
if (n > min_block_size)
{
nby2 = n / 2;
B00 = B;
B01 = B + nby2;
B10 = B + nby2*N;
B11 = B + nby2*N + nby2;
cilk_spawn left_half_lower_solve(B00, B10, LU, nby2, N);
right_half_lower_solve(B01, B11, LU, nby2, N);
}
else if (n == min_block_size)
{
lower_solve_serial(B, LU, n, N);
}
else
{
printf("Error : [upper_solve] : n <= 0");
}
}
void left_half_lower_solve(double *B00, double *B10, double *LU, int n, int N)
{
double *L00, *L10, *L11;
L00 = LU;
L10 = LU + n*N;
L11 = LU + n*N + n;
lower_solve(B00, L00, n, N);
schur(B10, B00, L10, n, N);
lower_solve(B10, L11, n, N);
}
void right_half_lower_solve(double *B01, double *B11, double *L, int n, int N)
{
double *L00, *L10, *L11;
L00 = L;
L10 = L + n*N;
L11 = L + n*N + n;
lower_solve(B01, L00, n, N);
schur(B11, B01, L10, n, N);
lower_solve(B11, L11, n, N);
}
int main(int argc, char** argv)
{
int n = 16;
int nby2 = n / 2;
int i, j, k;
int errorFlag;
double sum;
double *M1, *M2, *L, *U;
double *M00, *M01, *M10, *M11;
if(argc==2)
{
n = atoi(argv[1]);
nby2 = n / 2;
}
else if(argc==3)
{
n = atoi(argv[1]);
nby2 = n / 2;
min_block_size = atoi(argv[2]);
}
printf("Start: n=%d min_block_size=%d\n", n, min_block_size);
M1 = (double*)malloc(n*n*sizeof(double));
M2 = (double*)malloc(n*n*sizeof(double));
L = (double*)calloc(n*n,sizeof(double));
U = (double*)calloc(n*n,sizeof(double));
for (i = 0; i < n; i++)
for (j = 0; j <= i; j++)
{
L[i*n + j] = 1.0;
M2[i*n + j]=L[i*n + j];
}
for (i = 0; i < n; i++)
for (j = i; j < n; j++)
{
U[i*n + j] = 2.0;
M2[i*n + j] = U[i*n + j];
}
for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
{
sum = 0;
for (k = 0; k < n; k++)
sum += L[i*n + k] * U[k*n + j];
M1[i*n + j] = sum;
};
#if DebugMode == 1
PrintM(L, "L", n, n);
printf("\n");
PrintM(U, "U", n, n);
printf("\n");
PrintM(M1, "M", n, n);
printf("\n");
PrintM(M2, "LU", n, n);
#endif
M00 = M1;
M01 = M1 + nby2;
M10 = M1 + nby2*n;
M11 = M1 + nby2*n + nby2;
LU(M00, nby2, n);
upper_solve(M10, M00, nby2, n);
cilk_spawn lower_solve(M01, M00, nby2, n);
cilk_sync;
schur(M11, M10, M01, nby2, n);
LU(M11, nby2, n);
#if DebugMode == 1
printf("\n");
PrintM(M1, "M", n, n);
#endif
errorFlag = 0;
for (i = 0; (i < n) & (errorFlag == 0); i++)
for (j = 0; (j < n) & (errorFlag==0); j++)
if ( fabs( M1[i*n + j] - M2[i*n + j] ) > 0.0001 )
{
printf("Error: %d %d", i, j);
errorFlag = 1;
}
if ( errorFlag == 0 )
{
printf("\nSuccess\n");
}
else
{
printf("\nFailure\n");
}
}
1.15.12Results
Test results
DebugMode 1
./lab2-serial 4 1
Start: n=4 min_block_size=1
L[0,0]= 1.0 L[0,1]= 0.0 L[0,2]= 0.0 L[0,3]= 0.0
L[1,0]= 1.0 L[1,1]= 1.0 L[1,2]= 0.0 L[1,3]= 0.0
L[2,0]= 1.0 L[2,1]= 1.0 L[2,2]= 1.0 L[2,3]= 0.0
L[3,0]= 1.0 L[3,1]= 1.0 L[3,2]= 1.0 L[3,3]= 1.0
U[0,0]= 2.0 U[0,1]= 2.0 U[0,2]= 2.0 U[0,3]= 2.0
U[1,0]= 0.0 U[1,1]= 2.0 U[1,2]= 2.0 U[1,3]= 2.0
U[2,0]= 0.0 U[2,1]= 0.0 U[2,2]= 2.0 U[2,3]= 2.0
U[3,0]= 0.0 U[3,1]= 0.0 U[3,2]= 0.0 U[3,3]= 2.0
M[0,0]= 2.0 M[0,1]= 2.0 M[0,2]= 2.0 M[0,3]= 2.0
M[1,0]= 2.0 M[1,1]= 4.0 M[1,2]= 4.0 M[1,3]= 4.0
M[2,0]= 2.0 M[2,1]= 4.0 M[2,2]= 6.0 M[2,3]= 6.0
M[3,0]= 2.0 M[3,1]= 4.0 M[3,2]= 6.0 M[3,3]= 8.0
LU[0,0]= 2.0 LU[0,1]= 2.0 LU[0,2]= 2.0 LU[0,3]= 2.0
LU[1,0]= 1.0 LU[1,1]= 2.0 LU[1,2]= 2.0 LU[1,3]= 2.0
LU[2,0]= 1.0 LU[2,1]= 1.0 LU[2,2]= 2.0 LU[2,3]= 2.0
LU[3,0]= 1.0 LU[3,1]= 1.0 LU[3,2]= 1.0 LU[3,3]= 2.0
M[0,0]= 2.0 M[0,1]= 2.0 M[0,2]= 2.0 M[0,3]= 2.0
M[1,0]= 1.0 M[1,1]= 2.0 M[1,2]= 2.0 M[1,3]= 2.0
M[2,0]= 1.0 M[2,1]= 1.0 M[2,2]= 2.0 M[2,3]= 2.0
M[3,0]= 1.0 M[3,1]= 1.0 M[3,2]= 1.0 M[3,3]= 2.0
Performance analysis
n / min_block_size / Serial / ParallelTime[s] / Time[s]
512 / 16 / 0.139 / 0.229
1024 / 16 / 1.881 / 1.732
2048 / 16 / 23.470 / 19.877
4096 / 16 / 240.611 / 186.932
How to run the program:
time ./lab2-parallel 2048 16
1.16Iterative methods
1.16.1Introduction
For the linear system.
1.16.2(*) Convergence analysis – Jacobi method
Theorem
Let us consider:
1)system of equation in the form,
2) are eigenvalues and eigenvectors of the matrix i.e. ,
3) for ,
4),
5) for
then
where for any starting point .
Proof
Eigenvectors are linearly independent, then for any it is possible to find such that
.
For arbitrary
By assumption , then
and
is a solution of the equation
Theorem
Let us consider:
1)the system of equation in the form,
2) are eigenvalues and eigenvectors of the matrix i.e. ,
3) for ,
4),
5) for
then
where for any starting point .
Proof
Eigenvectors are linearly independent, then for any it is possible to find such that
.
For arbitrary
By assumption then in the limit case
where , …., .
Theorem
Let us consider:
1)system of equation in the form,
2),
3) for
then
where for any starting point .
Proof
For arbitrary
In the limit case
Theorem
Let us consider:
1)system of equation in the form,
2),
3) for
then
where for any starting point .
Proof
Theorem
Let be a square matrix.
If , then the series
converges to .
Proof
Let then
then
We know that
Then
In the limit case
Theorem
Let us consider:
1)system of equation in the form,
2) (matrix is diagonally dominant),
3) for
then
where for any starting point .
Proof
then
The system has the following form
Matrix norm
By assumption then
and
We know that the norm is smaller than 1, then the method converges according to the previous theorem.
Example
1.16.3(*) Convergence of the iterative methods
The method converges if
1.16.4Jacobi method
1.16.5Introduction
I-th component of the approximation is
1.16.6Example
Starting point
1.16.6.1Matrix form of Jacobi method (*)
1.16.6.2Code for Jacobi method (*)
#include <iostream>
int main(int argc, char** argv) {
double x,y,x0,y0;
x0=10;
y0=10;
int n=20;
for(int i=0;i<n;i++)
{
x=(4-y0)/3;
y=(4-x0)/3;
x0=x;
y0=y;
std::cout<"x= "<x<"\n";
std::cout<"y= "<y<"\n\n";
}
return 0;
}
1.16.6.3Solve the following system of equations by using Jacobi method .
Matrix is diagonally dominant if
1.16.6.4Example ,
1.16.6.5Example
1.16.6.6Example
1.16.6.7Solve the following system of equations by using Jacob i method
1.16.7Gauss-Seidel method
1.16.7.1Code for Gauss-Seidel Method
C++ code
#include <iostream>
int main(int argc, char** argv) {
double x,y;
x=10;
y=10;
int n=20;
for(int i=0;i<n;i++)
{
x=(4-y)/3;
y=(4-x)/3;
std::cout<"x= "<x<"\n";
std::cout<"y= "<y<"\n\n";
}
return 0;
}
1.16.7.2Solve the following system of equation by using Gauss-Seidel method .
1.16.7.3Example ,
Mathematica stript
------
x=10;
y=10;
z=10;
x=1/4 (6-(y+z));
y=1/4 (6-(x+z));
z=1/4 (6-(x+y));
x
y
z
x=1/4 (6-(y+z));
y=1/4 (6-(x+z));
z=1/4 (6-(x+y));
x
y
z
x=1/4 (6-(y+z));
y=1/4 (6-(x+z));
z=1/4 (6-(x+y));
N[x]
N[y]
N[z]
------
-(7/2)
-(1/8)
77/32
119/128
341/512
2255/2048
1.05823
0.960175
0.995399
1.16.7.4Solve the following system of equations by using Gauss-Seidel method
1.16.7.5Solve the following system of equations by using Gauss-Seidel method
1.16.7.6Write the system of equations in the form. Find one iteration of the Gauss-Seidel method. Calculate the norm (e.g. maximum absolute row sum norm) of the matrix.
Solution
One iteration of the Gauss-Seidel method.
Matrix form of the system of equation
Maximum Absolute Row Sum Norm
1.16.8(*) Successive Over-Relaxation Method - SOR
In this case we have
If then we call the method under relaxation.
If then we call the method over relaxation.
If then - reduce to the Gauss-Seidel method.
Going back to the example
After multiplication
1.16.9(*)Conjugent gradient method
Example
Next iteration
1.16.10(*) Steepest descent and conjugant gradient method
Consider a system of equation where A is SPD matrix and .
Definition
If is twice continuously differentiable, then
Definition
Let be a local minimizer of of if and is PSD (positive semidefinite).
Theorem
Let be twice differentiable in a neighborhood of the point and and is PSD, then is a local minimizer of .
Theorem
Solving is equivalent to minimizing the quadratic form .
Proof
It is possible to show that
and .
Part 1
Let us assume that then
and
which is PD (by assumption), then is a minimizer.
Part 2
Let assume that is minimizer i.e. and is PSD
then and is a solution of the system Ax=b.
Minimize by using the iterative methods (Krylow space method).
(1)Choose an initial guess .
(2)For x=0,1,…,n we know the solutions .
- If is optimal, then stop.
- Determine the search directions and the step size .
- Set
Steepest descent
i) - search direction is a residual
ii)Determine by using the line search
- Find a along the line where f is a minimum
After the calculations we have.
Update the solution
Level curve (contour lines)
Let assume that
Search direction is normal to the level curve.
Search directions are orthogonal to each other (????)
Energy norm
Spectral conditional norm
We see slow convergence for large condition numbers .
Definition
Let is SPD.
Two vector are called conjugate or A-orthogonal of .
Conjugant Gradient (CG) as a direct method
are mutually conjugate directions.
They are linearly independent and form a basis for
Let is a solution of then
In this way it is possible to find the exact solution as
Conjugate Method as an iterative method.
(1)Choose and
(2)Take
(3)For
- are mutually conjugate
- Then are mutually conjugate.
In general