Andrew Pownuk - Math 4329 (Numerical Analysis)

Table of Contents

1Solution of systems of linear equations

1.1(*) Matrix notation

1.2(*) Matrix algebra

1.1(*) Solvability

1.2.1Introduction

1.1.1Unique solution

1.2.2Inconsisten system (no solution)

1.2.3Infinitely many solutions

1.2.4Infinitely many solutions

1.3(*) Reduced Row Echelon Form

1.3.1Calculate echelon form for and find the solution

1.3.2Calculate echelon form for and find the solution

1.3.3Calculate echelon form for and find the solution

1.4(*) Null space

1.4.1Example

1.5(*) Cramer's rules

1.5.1Solve the following system of equation by using the Cramer's rule .

1.5.2Solve the following system of equation by using the Cramer's rule .

1.5.3Solve the following system of equation by using the inverse matrix .

1.6(*) Inverse matrix

1.6.1Solve the following system of equation by using the inverse matrix .

1.6.2Solve the following system of equation by using the inverse matrix.

1.7Gauss elimination

1.7.1Solve the following system of equations by using the Gauss elimination

1.1.1Example

1.1.2Example

1.7.2Solve the following system of equations

1.7.3Example

1.7.4Solve the following system of equations

1.7.5Solve the following system of equations by using Gauss elimination

1.7.6Solve the following system of equations by using Gauss elimination

1.7.7Solve the following system of equations by using Gauss elimination

1.7.8Solve the following system of equations by using the Gauss elimination

1.1.3Example

1.7.9(*) General method

1.7.10(*) Matlab code

1.2(*) Gauss-Jordan elimination

1.2.1Introduction

1.2.2Example

1.2.3Example

1.2.4Example

1.2.5Example

1.8LU decomposition

1.8.1Introduction

1.8.2Solve the following system of equation by using LU decomposition

1.8.3(*) Solve the following system of equation by using LU decomposition

1.8.4Example

1.8.5Example

1.8.6Solve the following system of equations by using LU decomposition

1.8.7Solve the following system of equations by using LU decomposition

1.8.8Solve the following system of equations by using LU decomposition

1.8.9Solve the following system of equations by using LU decomposition

1.8.10Solve the following system of equations by using LU decomposition

1.8.11Example

1.9LU decomposition with partial pivoting

1.9.1Solve the following system of equations by using LU decomposition

1.9.2Example

1.9.3Solve the following system of equations by using LU decomposition

1.9.4Solve the following system of equations by using LU decomposition

1.9.5Solve the following system of equations by using LU decomposition

1.10Inverse matrix and LU decomposition

1.10.1Intoduction

1.10.2Example

1.10.3Find an inverse matrix of the matrix . Use the decomposition.

1.11Elementary matrix operations

1.11.1Switching the rows

1.11.2Add rows multiplied by the number

1.11.3Multiplication by number

1.11.4Properties of the elementary matrices

1.12Matrix representation of the LU decomposition

1.13Matrix representation of the LU decomposition with partial pivoting

1.13.1Example

1.14(*) Software

1.15(***) LU decomposition in parallel - General algorithm

1.15.1General algorithm

1.15.2Test example

1.15.3upper_solve(M10, M00, nby2, n)

1.15.4lower_solve(M10, M00, nby2, n);

1.15.5schur(B11, B01, L10, nby2, N);

1.15.6LU(M11, nby2, n);

1.15.7Verification in Mathmatica

1.15.8Upper solve serial

1.15.9Lower solve serial

1.15.10C-code - LU-decomposition – serial version

1.15.11C-code - LU-decomposition – parallel version

1.15.12Results

1.16Iterative methods

1.16.1Introduction

1.16.2(*) Convergence analysis – Jacobi method

1.16.3(*) Convergence of the iterative methods

1.16.4Jacobi method

1.16.5Introduction

1.16.6Example

1.16.7Gauss-Seidel method

1.16.8(*) Successive Over-Relaxation Method - SOR

1.16.9(*) Conjugent gradient method

1.16.10(*) Steepest descent and conjugant gradient method

1.16.11(*) Preconditioning for PCG method

1.16.12(*) GMRES (Generalized Minimal Residual) – nonsymmetrical systems

2Review

2.1Test 2 - Summer 2016

1Solution of systems of linear equations

1.1(*) Matrix notation

1.2(*) Matrix algebra

1.1(*) Solvability

1.2.1Introduction

Theorem

System is solvable if .

System is inconsistent if

Heuristics!!!

------

if

number of equations = number of variables

then

the solution is unique

------

if

number of equations < number of variables

then

we have infinitely many solutions.

------

if

number of equations > number of variables

then

we have no solutions.

1.1.1Unique solution

Graphical solution

Plot[{-x+3,x-1},{x,-1,5}]

Analytical solution

Method 1

Method 2

1.2.2Inconsisten system (no solution)

Plot[{x+3,x-1},{x,-1,5}]

Overdetermined system of equations

1.2.3Infinitely many solutions

Plot[{x+3},{x,-1,5}]

1.2.3.1Graphical method

gr1=ContourPlot3D[{(x+y+z1)},{x,-2,2},{y,-2,2},{z,-2,2}]

gr2=ContourPlot3D[{2x-y+z0},{x,-2,2},{y,-2,2},{z,-2,2}]

gr3=ContourPlot3D[{4x+y+3z3},{x,-2,2},{y,-2,2},{z,-2,2}]

Show[gr1,gr2,gr3]

1.2.3.2Analytical method

1.2.4Infinitely many solutions

Plot[{-x+3,-x+3},{x,-1,5}]

Underdetermined system of equations

1.2.4.1Graphical method

gr1=ContourPlot3D[{(x+y+z==1)},{x,-2,2},{y,-2,2},{z,-2,2}]

gr2=ContourPlot3D[{(1/4)x-(1/2)y+(3/4)z==0},{x,-2,2},{y,-2,2},{z,-2,2}]

gr3=ContourPlot3D[{x+7y-3z==3},{x,-2,2},{y,-2,2},{z,-2,2}]

Show[gr1,gr2,gr3]

1.2.4.2Analytical method

1.3(*) Reduced Row Echelon Form

Definition (Reduced Row Echelon Form)

1) The first nonzero entry (called the leading entry) in each row is a 1.

2) The column of the leading entries are clear

3) The leading entry in each row is to the right of the leading entry above and any rows of zeros are at the bottom.

1.3.1Calculate echelon form for and find the solution

1.3.2Calculate echelon form for and find the solution

1.3.3Calculate echelon form for and find the solution

Solution in parametric form

1.4(*) Null space

Definition

1.4.1Example

Alternative method

then the null space can be defined in the following way

1.5(*) Cramer's rules

1.5.1Solve the following system of equation by using the Cramer's rule .

1.5.2Solve the following system of equation by using the Cramer's rule .

1.5.3Solve the following system of equation by using the inverse matrix .

1.6(*) Inverse matrix

1.6.1Solve the following system of equation by using the inverse matrix .

1.6.2Solve the following system of equation by using the inverse matrix.

1.7Gauss elimination

1.7.1Solve the following system of equations by using the Gauss elimination

1.1.1Example

Method 2

Solution

1.1.2Example

1.7.2Solve the following system of equations

In[49]:= Det[{{2, 2, 1}, {1, -2, 2}, {2, 1, -1}}]

Inverse[{{2, 2, 1}, {1, -2, 2}, {2, 1, -1}}].{{7}, {2}, {4}}

Out[49]= 15

Out[50]= {{2}, {1}, {1}}

1.7.3Example

1.7.4Solve the following system of equations

1.7.5Solve the following system of equations by using Gauss elimination

1.7.6Solve the following system of equations by using Gauss elimination

1.7.7Solve the following system of equations by using Gauss elimination

Veryfication

A={{2,1,1},{1,2,1},{4,3,1}};

b={{8},{9},{18}};

MatrixForm[A]

MatrixForm[b]

Inverse[A].b

(_{

{2, 1, 1},

{1, 2, 1},

{4, 3, 1}

}_)

(_{

{8},

{9},

{18}

}_)

{{2},{3},{1}}

1.7.8Solve the following system of equations by using the Gauss elimination

Verification

A={{1,2,0,1},

{1,1,0,1},

{0,2,0,1},

{2,2,1,1}

}

b={{5},{4},{3},{8}}

Inverse[A].b

Det[A]

{{1,2,0,1},{1,1,0,1},{0,2,0,1},{2,2,1,1}}

{{5},{4},{3},{8}}

{{2},{1},{1},{1}}

1

1.1.3Example

1.7.9(*) General method

Generalization

1.7.10(*) Matlab code

%Gauss elimination - data

A=[ 1 1 1;

2 -1 1;

1 3 2]

b=[ 4;

4;

7]

n=3;

%Gauss elimination - calculations

%iteration

for iteration=1:n-1

%row

for row=iteration+1:n

m = (A(row,iteration)/A(iteration,iteration));

%column

for col=iteration:n

A(row,col) = A(row,col) - A(iteration,col)*m;

end

b(row) = b(row) - b(iteration) * m;

end

A

b

end

%solution

x=zeros(n,1);

x(n)=b(n)/A(n,n);

for i=n-1:-1:1

sum=0;

for j=i:n

sum=sum+A(i,j)*x(j);

end

x(i)=(b(i)-sum)/A(i,i);

end

disp('solutions')

x

------

------

------

LAPack

1.2(*) Gauss-Jordan elimination

1.2.1Introduction

1.2.2Example

1.2.3Example

Method 2

1.2.4Example

Method 2

1.2.5Example

1.8LU decomposition

1.8.1Introduction

1.8.2Solve the following system of equation by using LU decomposition

U={{1,1,1},{0,-1,-1},{0,0,-1}};

L={{1,0,0},{2,1,0},{1,-1,1}};

MatrixForm[U]

MatrixForm[L]

MatrixForm[L.U]

1.8.3(*) Solve the following system of equation by using LU decomposition

1.8.4Example

U={{2,2,1},{0,-2,-1/2},{0,0,5/4}};

L={{1,0,0},{3/2,1,0},{1/2,-1/2,1}};MatrixForm[L.U]

1.8.5Example

L = {{1, 0, 0}, {3/2, 1, 0}, {1/2, -1, 1}};

U = {{2, 2, 1}, {0, -1, -1/2}, {0, 0, 1}};

MatrixForm[L.U]

1.8.6Solve the following system of equations by using LU decomposition

1.8.7Solve the following system of equations by using LU decomposition

1.8.8Solve the following system of equations by using LU decomposition

1.8.9Solve the following system of equations by using LU decomposition

Verification

In[779]:= L={{1,0,0},{2,1,0},{1,0,1}};

U={{1,2,1},{0,1,1},{0,0,1}};

MatrixForm[L.U]

Result:

{"1", "2", "1"},

{"2", "5", "3"},

{"1", "2", "2"}

},

1.8.10Solve the following system of equations by using LU decomposition

1.8.11Example

Verification

Calculate

Calculate

1.9LU decomposition with partial pivoting

1.9.1Solve the following system of equations by using LU decomposition

Verification

The algorithm

Calculations

1.9.2Example

Verification

Solve

Solve

1.9.3Solve the following system of equations by using LU decomposition

Veryfication

The algorithm

Calculations

1.9.4Solve the following system of equations by using LU decomposition

Solution

1.9.5Solve the following system of equations by using LU decomposition

Solution

1.10Inverse matrix and LU decomposition

1.10.1Intoduction

1.10.2Example

Verification of the results

A={{1,2,1},{2,5,3},{1,2,2}}

Inverse[A]

{{1,2,1},{2,5,3},{1,2,2}}

{{4,-2,1},{-1,1,-1},{-1,0,1}}

1.10.3Find an inverse matrix of the matrix . Use the decomposition.

Solution

From the previous problem we know the LU decomposition of the matrix .

Inverse matrix

1.11Elementary matrix operations

1.11.1Switching the rows

Switch the rows 2 i 3

or ,

Example

1.11.2Add rows multiplied by the number

Operation

can be represented by the following matrix

Example

Operation

can be represented by the following matrix

Example

1.11.3Multiplication by number

Operation

can be represented by the following matrix

Example

1.11.4Properties of the elementary matrices

Example

1.12Matrix representation of the LU decomposition

LU decomposition

1.13Matrix representation of the LU decomposition with partial pivoting

We know that

We know that

- is a new lower triangular matrix after switching the rows.

Let’s perform another rows operations

and let’s perform a new permuation

then

We know that where is a new lower triangular matrix after appropriate transpositions of the rows (according to ), then

where

On the other hand, if we perform appropriate operations on the matrix A we will get the upper triangular matrix .

From the previus calculations we know that , then

Inverse matrix to the lower triangular matrix is a lower triangular matrix, then

Finally the decomposition has the following form.

If we know the LU decomposition of the matrix , then it is possible to calculate the solution.

Computational method for LU decomposition with the partial pivoting.

Algorithm

Let us consider.

Calculate the upper triangular matrix, lower triangular matrix , and .

1.13.1Example

Pivoting

The matrix L

Numerical results

The algorithm

Calculations

1.14(*) Software

LAPACK

Intel® Math Kernel Library (Intel® MKL)

1.15(***) LU decomposition in parallel - General algorithm

1.15.1General algorithm

1.15.2Test example

1.15.2.1LU(M00, nby2, n)

Input

Output (after LU decomposition)

1.15.3upper_solve(M10, M00, nby2, n)

M00 – LU decomposition

M10 – RHS

upper_solve(B, LU, nby2, n)

B – LU decomposition

B’ – RHS

B’*U=B

B’=B*U^(-1)

1.15.3.1top_upper_half_solve(B00, B01, B, nby2, n);

upper_solve(B00, U00, n, N);

schur(B01, B00, U01, n, N);

upper_solve(B01, U11, n, N);

Final result: top_upper_half_solve(B00, B01, B, 1, 4);

1.15.3.2bottom_upper_half_solve(B10, B11, LU, nby2, N);

upper_solve(B10, U00, n, N);

schur(B01, B00, U01, n, N);

upper_solve(B11, U11, n, N);

Final result: bottom_upper_half_solve(B10, B11, LU, nby2, N);

1.15.4lower_solve(M10, M00, nby2, n);

lower_solve(B’, LU, nby2, n);

1.15.4.1left_half_lower_solve(B00, B10, LU, nby2, N);

lower_solve(B00, L00, n, N);

schur(B10, B00, L10, n, N);

lower_solve(B10, L11, n, N);

1.15.4.2right_half_lower_solve(B00, B11, LU, nby2, N);

lower_solve(B01, L00, n, N);

schur(B11, B01, L10, n, N);

lower_solve(B11, L11, n, N);

1.15.5schur(B11, B01, L10, nby2, N);

1.15.6LU(M11, nby2, n);

1.15.7Verification in Mathmatica

M = {{1, 2, 3, 4}, {2, 9, 12, 15}, {3, 26, 41, 49}, {5, 40, 107, 135}};

Print["Matrix M"];

MatrixForm[M]

MatrixForm[LUDecomposition[M]]

LU = {{1, 2, 3, 4}, {2, 5, 6, 7}, {3, 4, 8, 9}, {5, 6, 7, 10}};

Print["LU decomposition"]

MatrixForm[LU]

1.15.8Upper solve serial

void upper_solve_serial(double *B, double* LU, int n, int N)

{

/* solve */

/* B'*U=B */

int i, j, k;

double sum;

for (i = 0; i < n; i++)

{

B[i*N + 0] = B[i * N + 0] / LU[0 * N + 0];

for (j = 1; j < n; j++)

{

sum = 0;

for (k = 0; k < j; k++)

sum += B[i*N + k] * LU[k*N + j];

B[i*N + j] = (B[i*N + j] - sum) / LU[j*N + j];

}

}

}

1.15.9Lower solve serial

void lower_solve_serial(double *B, double* LU, int n, int N)

{

/* solve */

/* L*B'=B */

int i, j, k;

double sum;

for (i = 0; i < n; i++)

{

B[0 * N + i] = B[0 * N + i];

for (j = 1; j < n; j++)

{

sum = 0;

for (k = 0; k < j; k++)

sum += LU[j*N + k] * B[k*N + i];

B[j*N + i] = (B[j*N + i] - sum);

}

}

}

1.15.10C-code - LU-decomposition – serial version

#include <stdio.h>

#include <stdlib.h>

#include <math.h>

#define DebugMode 0

int min_block_size = 2;

void LU(double *A, int n, int N)

{

int iteration, row, col;

double m;

for (iteration = 0; iteration < n - 1; iteration++)

{

for (row = iteration + 1; row < n; row++)

{

m = A[row*N + iteration] / A[iteration*N + iteration];

col = iteration;

A[row*N + col] = m;

for (col = iteration + 1; col < n; col++)

A[row*N + col] = A[row*N + col] - A[iteration*N + col] * m;

}

}

}

void PrintM(double *M, char *matrixId, int nby2, int N)

{

int i, j;

for (i = 0; i < nby2; i++)

{

for (j = 0; j < nby2; j++)

printf("%s[%d,%d]=%5.1f ", matrixId, i, j, M[i*N + j]);

printf("\n");

}

}

void schur(double *M, double *V, double *W, int n, int N);

void top_upper_half_solve(double *B00, double *B01, double *U, int n, int N);

void bottom_upper_half_solve(double *B10, double *B11, double *U, int n, int N);

void left_half_lower_solve(double *B00, double *B10, double *L, int n, int N);

void right_half_lower_solve(double *B01, double *B11, double *L, int n, int N);

void upper_solve_serial(double *B, double* LU, int n, int N)

{

/* solve */

/* B'*U=B */

int i, j, k;

double sum;

for (i = 0; i < n; i++)

{

B[i*N + 0] = B[i * N + 0] / LU[0 * N + 0];

for (j = 1; j < n; j++)

{

sum = 0;

for (k = 0; k < j; k++)

sum += B[i*N + k] * LU[k*N + j];

B[i*N + j] = (B[i*N + j] - sum) / LU[j*N + j];

}

}

}

void lower_solve_serial(double *B, double* LU, int n, int N)

{

/* solve */

/* L*B'=B */

int i, j, k;

double sum;

for (i = 0; i < n; i++)

{

B[0 * N + i] = B[0 * N + i];

for (j = 1; j < n; j++)

{

sum = 0;

for (k = 0; k < j; k++)

sum += LU[j*N + k] * B[k*N + i];

B[j*N + i] = (B[j*N + i] - sum);

}

}

}

void upper_solve(double *B, double* LU, int n, int N)

{

/* Solve B'*U=B */

/* U - given */

/* B - given */

/* solve for B' */

double *B00, *B01, *B10, *B11;

if (n > min_block_size)

{

int nby2 = n / 2;

B00 = B;

B01 = B + nby2;

B10 = B + nby2*N;

B11 = B + nby2*N + nby2;

top_upper_half_solve(B00, B01, LU, nby2, N);

bottom_upper_half_solve(B10, B11, LU, nby2, N);

}

else if (n == min_block_size)

{

upper_solve_serial(B, LU, n, N);

}

else

{

printf("Error : [upper_solve] : n <= 0");

}

}

void top_upper_half_solve(double *B00, double *B01, double *U, int n, int N)

{

double *U00, *U01, *U11;

U00 = U;

U01 = U + n;

U11 = U + n*N + n;

upper_solve(B00, U00, n, N);

schur(B01, B00, U01, n, N);

upper_solve(B01, U11, n, N);

}

void bottom_upper_half_solve(double *B10, double *B11, double *U, int n, int N)

{

double *U00, *U01, *U11;

U00 = U;

U01 = U + n;

U11 = U + n*N + n;

upper_solve(B10, U00, n, N);

schur(B11, B10, U01, n, N);

upper_solve(B11, U11, n, N);

}

void schur(double *M, double *V, double *W, int n, int N)

{

double sum = 0;

int i, j, k;

for (i = 0; i < n; i++)

for (j = 0; j < n; j++)

{

sum = 0;

for (k = 0; k < n; k++)

sum += V[i*N + k] * W[k*N + j];

M[i*N + j] = M[i*N + j] - sum;

}

}

void lower_solve(double *B, double* LU, int n, int N)

{

/* Solve L*B'=B */

/* U - given */

/* B - given */

/* solve for B' */

double *B00, *B01, *B10, *B11;

int nby2;

if (n > min_block_size)

{

nby2 = n / 2;

B00 = B;

B01 = B + nby2;

B10 = B + nby2*N;

B11 = B + nby2*N + nby2;

left_half_lower_solve(B00, B10, LU, nby2, N);

right_half_lower_solve(B01, B11, LU, nby2, N);

}

else if (n == min_block_size)

{

lower_solve_serial(B, LU, n, N);

}

else

{

printf("Error : [upper_solve] : n <= 0");

}

}

void left_half_lower_solve(double *B00, double *B10, double *LU, int n, int N)

{

double *L00, *L10, *L11;

L00 = LU;

L10 = LU + n*N;

L11 = LU + n*N + n;

lower_solve(B00, L00, n, N);

schur(B10, B00, L10, n, N);

lower_solve(B10, L11, n, N);

}

void right_half_lower_solve(double *B01, double *B11, double *L, int n, int N)

{

double *L00, *L10, *L11;

L00 = L;

L10 = L + n*N;

L11 = L + n*N + n;

lower_solve(B01, L00, n, N);

schur(B11, B01, L10, n, N);

lower_solve(B11, L11, n, N);

}

int main(int argc, char** argv)

{

int n = 16;

int nby2 = n / 2;

int i, j, k;

int errorFlag;

double sum;

double *M1, *M2, *L, *U;

double *M00, *M01, *M10, *M11;

if(argc==2)

{

n = atoi(argv[1]);

nby2 = n / 2;

}

else if(argc==3)

{

n = atoi(argv[1]);

nby2 = n / 2;

min_block_size = atoi(argv[2]);

}

printf("Start: n=%d min_block_size=%d\n", n, min_block_size);

M1 = (double*)malloc(n*n*sizeof(double));

M2 = (double*)malloc(n*n*sizeof(double));

L = (double*)calloc(n*n,sizeof(double));

U = (double*)calloc(n*n,sizeof(double));

for (i = 0; i < n; i++)

for (j = 0; j <= i; j++)

{

L[i*n + j] = 1.0;

M2[i*n + j]=L[i*n + j];

}

for (i = 0; i < n; i++)

for (j = i; j < n; j++)

{

U[i*n + j] = 2.0;

M2[i*n + j] = U[i*n + j];

}

for (i = 0; i < n; i++)

for (j = 0; j < n; j++)

{

sum = 0;

for (k = 0; k < n; k++)

sum += L[i*n + k] * U[k*n + j];

M1[i*n + j] = sum;

};

#if DebugMode == 1

PrintM(L, "L", n, n);

printf("\n");

PrintM(U, "U", n, n);

printf("\n");

PrintM(M1, "M", n, n);

printf("\n");

PrintM(M2, "LU", n, n);

#endif

M00 = M1;

M01 = M1 + nby2;

M10 = M1 + nby2*n;

M11 = M1 + nby2*n + nby2;

LU(M00, nby2, n);

upper_solve(M10, M00, nby2, n);

lower_solve(M01, M00, nby2, n);

schur(M11, M10, M01, nby2, n);

LU(M11, nby2, n);

#if DebugMode == 1

printf("\n");

PrintM(M1, "M", n, n);

#endif

errorFlag = 0;

for (i = 0; (i < n) & (errorFlag == 0); i++)

for (j = 0; (j < n) & (errorFlag==0); j++)

if ( fabs( M1[i*n + j] - M2[i*n + j] ) > 0.0001 )

{

printf("Error: %d %d", i, j);

errorFlag = 1;

}

if ( errorFlag == 0 )

{

printf("\nSuccess\n");

}

else

{

printf("\nFailure\n");

}

}

1.15.11C-code - LU-decomposition – parallel version

#include <stdio.h>

#include <stdlib.h>

#include <math.h>

#include <cilk/cilk.h>

#define DebugMode 0

int min_block_size = 2;

void LU(double *A, int n, int N)

{

int iteration, row, col;

double m;

for (iteration = 0; iteration < n - 1; iteration++)

{

for (row = iteration + 1; row < n; row++)

{

m = A[row*N + iteration] / A[iteration*N + iteration];

col = iteration;

A[row*N + col] = m;

for (col = iteration + 1; col < n; col++)

A[row*N + col] = A[row*N + col] - A[iteration*N + col] * m;

}

}

}

void PrintM(double *M, char *matrixId, int nby2, int N)

{

int i, j;

for (i = 0; i < nby2; i++)

{

for (j = 0; j < nby2; j++)

printf("%s[%d,%d]=%5.1f ", matrixId, i, j, M[i*N + j]);

printf("\n");

}

}

void schur(double *M, double *V, double *W, int n, int N);

void top_upper_half_solve(double *B00, double *B01, double *U, int n, int N);

void bottom_upper_half_solve(double *B10, double *B11, double *U, int n, int N);

void left_half_lower_solve(double *B00, double *B10, double *L, int n, int N);

void right_half_lower_solve(double *B01, double *B11, double *L, int n, int N);

void upper_solve_serial(double *B, double* LU, int n, int N)

{

/* solve */

/* B'*U=B */

int i, j, k;

double sum;

for (i = 0; i < n; i++)

{

B[i*N + 0] = B[i * N + 0] / LU[0 * N + 0];

for (j = 1; j < n; j++)

{

sum = 0;

for (k = 0; k < j; k++)

sum += B[i*N + k] * LU[k*N + j];

B[i*N + j] = (B[i*N + j] - sum) / LU[j*N + j];

}

}

}

void lower_solve_serial(double *B, double* LU, int n, int N)

{

/* solve */

/* L*B'=B */

int i, j, k;

double sum;

for (i = 0; i < n; i++)

{

B[0 * N + i] = B[0 * N + i];

for (j = 1; j < n; j++)

{

sum = 0;

for (k = 0; k < j; k++)

sum += LU[j*N + k] * B[k*N + i];

B[j*N + i] = (B[j*N + i] - sum);

}

}

}

void upper_solve(double *B, double* LU, int n, int N)

{

/* Solve B'*U=B */

/* U - given */

/* B - given */

/* solve for B' */

double *B00, *B01, *B10, *B11;

if (n > min_block_size)

{

int nby2 = n / 2;

B00 = B;

B01 = B + nby2;

B10 = B + nby2*N;

B11 = B + nby2*N + nby2;

cilk_spawn top_upper_half_solve(B00, B01, LU, nby2, N);

bottom_upper_half_solve(B10, B11, LU, nby2, N);

}

else if (n == min_block_size)

{

upper_solve_serial(B, LU, n, N);

}

else

{

printf("Error : [upper_solve] : n <= 0");

}

}

void top_upper_half_solve(double *B00, double *B01, double *U, int n, int N)

{

double *U00, *U01, *U11;

U00 = U;

U01 = U + n;

U11 = U + n*N + n;

upper_solve(B00, U00, n, N);

schur(B01, B00, U01, n, N);

upper_solve(B01, U11, n, N);

}

void bottom_upper_half_solve(double *B10, double *B11, double *U, int n, int N)

{

double *U00, *U01, *U11;

U00 = U;

U01 = U + n;

U11 = U + n*N + n;

upper_solve(B10, U00, n, N);

schur(B11, B10, U01, n, N);

upper_solve(B11, U11, n, N);

}

void schur(double *M, double *V, double *W, int n, int N)

{

double sum = 0;

int i, j, k;

for (i = 0; i < n; i++)

for (j = 0; j < n; j++)

{

sum = 0;

for (k = 0; k < n; k++)

sum += V[i*N + k] * W[k*N + j];

M[i*N + j] = M[i*N + j] - sum;

}

}

void lower_solve(double *B, double* LU, int n, int N)

{

/* Solve L*B'=B */

/* U - given */

/* B - given */

/* solve for B' */

double *B00, *B01, *B10, *B11;

int nby2;

if (n > min_block_size)

{

nby2 = n / 2;

B00 = B;

B01 = B + nby2;

B10 = B + nby2*N;

B11 = B + nby2*N + nby2;

cilk_spawn left_half_lower_solve(B00, B10, LU, nby2, N);

right_half_lower_solve(B01, B11, LU, nby2, N);

}

else if (n == min_block_size)

{

lower_solve_serial(B, LU, n, N);

}

else

{

printf("Error : [upper_solve] : n <= 0");

}

}

void left_half_lower_solve(double *B00, double *B10, double *LU, int n, int N)

{

double *L00, *L10, *L11;

L00 = LU;

L10 = LU + n*N;

L11 = LU + n*N + n;

lower_solve(B00, L00, n, N);

schur(B10, B00, L10, n, N);

lower_solve(B10, L11, n, N);

}

void right_half_lower_solve(double *B01, double *B11, double *L, int n, int N)

{

double *L00, *L10, *L11;

L00 = L;

L10 = L + n*N;

L11 = L + n*N + n;

lower_solve(B01, L00, n, N);

schur(B11, B01, L10, n, N);

lower_solve(B11, L11, n, N);

}

int main(int argc, char** argv)

{

int n = 16;

int nby2 = n / 2;

int i, j, k;

int errorFlag;

double sum;

double *M1, *M2, *L, *U;

double *M00, *M01, *M10, *M11;

if(argc==2)

{

n = atoi(argv[1]);

nby2 = n / 2;

}

else if(argc==3)

{

n = atoi(argv[1]);

nby2 = n / 2;

min_block_size = atoi(argv[2]);

}

printf("Start: n=%d min_block_size=%d\n", n, min_block_size);

M1 = (double*)malloc(n*n*sizeof(double));

M2 = (double*)malloc(n*n*sizeof(double));

L = (double*)calloc(n*n,sizeof(double));

U = (double*)calloc(n*n,sizeof(double));

for (i = 0; i < n; i++)

for (j = 0; j <= i; j++)

{

L[i*n + j] = 1.0;

M2[i*n + j]=L[i*n + j];

}

for (i = 0; i < n; i++)

for (j = i; j < n; j++)

{

U[i*n + j] = 2.0;

M2[i*n + j] = U[i*n + j];

}

for (i = 0; i < n; i++)

for (j = 0; j < n; j++)

{

sum = 0;

for (k = 0; k < n; k++)

sum += L[i*n + k] * U[k*n + j];

M1[i*n + j] = sum;

};

#if DebugMode == 1

PrintM(L, "L", n, n);

printf("\n");

PrintM(U, "U", n, n);

printf("\n");

PrintM(M1, "M", n, n);

printf("\n");

PrintM(M2, "LU", n, n);

#endif

M00 = M1;

M01 = M1 + nby2;

M10 = M1 + nby2*n;

M11 = M1 + nby2*n + nby2;

LU(M00, nby2, n);

upper_solve(M10, M00, nby2, n);

cilk_spawn lower_solve(M01, M00, nby2, n);

cilk_sync;

schur(M11, M10, M01, nby2, n);

LU(M11, nby2, n);

#if DebugMode == 1

printf("\n");

PrintM(M1, "M", n, n);

#endif

errorFlag = 0;

for (i = 0; (i < n) & (errorFlag == 0); i++)

for (j = 0; (j < n) & (errorFlag==0); j++)

if ( fabs( M1[i*n + j] - M2[i*n + j] ) > 0.0001 )

{

printf("Error: %d %d", i, j);

errorFlag = 1;

}

if ( errorFlag == 0 )

{

printf("\nSuccess\n");

}

else

{

printf("\nFailure\n");

}

}

1.15.12Results

Test results

DebugMode 1

./lab2-serial 4 1

Start: n=4 min_block_size=1

L[0,0]= 1.0 L[0,1]= 0.0 L[0,2]= 0.0 L[0,3]= 0.0

L[1,0]= 1.0 L[1,1]= 1.0 L[1,2]= 0.0 L[1,3]= 0.0

L[2,0]= 1.0 L[2,1]= 1.0 L[2,2]= 1.0 L[2,3]= 0.0

L[3,0]= 1.0 L[3,1]= 1.0 L[3,2]= 1.0 L[3,3]= 1.0

U[0,0]= 2.0 U[0,1]= 2.0 U[0,2]= 2.0 U[0,3]= 2.0

U[1,0]= 0.0 U[1,1]= 2.0 U[1,2]= 2.0 U[1,3]= 2.0

U[2,0]= 0.0 U[2,1]= 0.0 U[2,2]= 2.0 U[2,3]= 2.0

U[3,0]= 0.0 U[3,1]= 0.0 U[3,2]= 0.0 U[3,3]= 2.0

M[0,0]= 2.0 M[0,1]= 2.0 M[0,2]= 2.0 M[0,3]= 2.0

M[1,0]= 2.0 M[1,1]= 4.0 M[1,2]= 4.0 M[1,3]= 4.0

M[2,0]= 2.0 M[2,1]= 4.0 M[2,2]= 6.0 M[2,3]= 6.0

M[3,0]= 2.0 M[3,1]= 4.0 M[3,2]= 6.0 M[3,3]= 8.0

LU[0,0]= 2.0 LU[0,1]= 2.0 LU[0,2]= 2.0 LU[0,3]= 2.0

LU[1,0]= 1.0 LU[1,1]= 2.0 LU[1,2]= 2.0 LU[1,3]= 2.0

LU[2,0]= 1.0 LU[2,1]= 1.0 LU[2,2]= 2.0 LU[2,3]= 2.0

LU[3,0]= 1.0 LU[3,1]= 1.0 LU[3,2]= 1.0 LU[3,3]= 2.0

M[0,0]= 2.0 M[0,1]= 2.0 M[0,2]= 2.0 M[0,3]= 2.0

M[1,0]= 1.0 M[1,1]= 2.0 M[1,2]= 2.0 M[1,3]= 2.0

M[2,0]= 1.0 M[2,1]= 1.0 M[2,2]= 2.0 M[2,3]= 2.0

M[3,0]= 1.0 M[3,1]= 1.0 M[3,2]= 1.0 M[3,3]= 2.0

Performance analysis

n / min_block_size / Serial / Parallel
Time[s] / Time[s]
512 / 16 / 0.139 / 0.229
1024 / 16 / 1.881 / 1.732
2048 / 16 / 23.470 / 19.877
4096 / 16 / 240.611 / 186.932

How to run the program:

time ./lab2-parallel 2048 16

1.16Iterative methods

1.16.1Introduction

For the linear system.

1.16.2(*) Convergence analysis – Jacobi method

Theorem

Let us consider:

1)system of equation in the form,

2) are eigenvalues and eigenvectors of the matrix i.e. ,

3) for ,

4),

5) for

then

where for any starting point .

Proof

Eigenvectors are linearly independent, then for any it is possible to find such that

.

For arbitrary

By assumption , then

and

is a solution of the equation

Theorem

Let us consider:

1)the system of equation in the form,

2) are eigenvalues and eigenvectors of the matrix i.e. ,

3) for ,

4),

5) for

then

where for any starting point .

Proof

Eigenvectors are linearly independent, then for any it is possible to find such that

.

For arbitrary

By assumption then in the limit case

where , …., .

Theorem

Let us consider:

1)system of equation in the form,

2),

3) for

then

where for any starting point .

Proof

For arbitrary

In the limit case

Theorem

Let us consider:

1)system of equation in the form,

2),

3) for

then

where for any starting point .

Proof

Theorem

Let be a square matrix.

If , then the series

converges to .

Proof

Let then

then

We know that

Then

In the limit case

Theorem

Let us consider:

1)system of equation in the form,

2) (matrix is diagonally dominant),

3) for

then

where for any starting point .

Proof

then

The system has the following form

Matrix norm

By assumption then

and

We know that the norm is smaller than 1, then the method converges according to the previous theorem.

Example

1.16.3(*) Convergence of the iterative methods

The method converges if

1.16.4Jacobi method

1.16.5Introduction

I-th component of the approximation is

1.16.6Example

Starting point

1.16.6.1Matrix form of Jacobi method (*)
1.16.6.2Code for Jacobi method (*)

#include <iostream>

int main(int argc, char** argv) {

double x,y,x0,y0;

x0=10;

y0=10;

int n=20;

for(int i=0;i<n;i++)

{

x=(4-y0)/3;

y=(4-x0)/3;

x0=x;

y0=y;

std::cout<"x= "<x<"\n";

std::cout<"y= "<y<"\n\n";

}

return 0;

}

1.16.6.3Solve the following system of equations by using Jacobi method .

Matrix is diagonally dominant if

1.16.6.4Example ,
1.16.6.5Example
1.16.6.6Example

1.16.6.7Solve the following system of equations by using Jacob i method

1.16.7Gauss-Seidel method

1.16.7.1Code for Gauss-Seidel Method

C++ code

#include <iostream>

int main(int argc, char** argv) {

double x,y;

x=10;

y=10;

int n=20;

for(int i=0;i<n;i++)

{

x=(4-y)/3;

y=(4-x)/3;

std::cout<"x= "<x<"\n";

std::cout<"y= "<y<"\n\n";

}

return 0;

}

1.16.7.2Solve the following system of equation by using Gauss-Seidel method .

1.16.7.3Example ,

Mathematica stript

------

x=10;

y=10;

z=10;

x=1/4 (6-(y+z));

y=1/4 (6-(x+z));

z=1/4 (6-(x+y));

x

y

z

x=1/4 (6-(y+z));

y=1/4 (6-(x+z));

z=1/4 (6-(x+y));

x

y

z

x=1/4 (6-(y+z));

y=1/4 (6-(x+z));

z=1/4 (6-(x+y));

N[x]

N[y]

N[z]

------

-(7/2)

-(1/8)

77/32

119/128

341/512

2255/2048

1.05823

0.960175

0.995399

1.16.7.4Solve the following system of equations by using Gauss-Seidel method

1.16.7.5Solve the following system of equations by using Gauss-Seidel method

1.16.7.6Write the system of equations in the form. Find one iteration of the Gauss-Seidel method. Calculate the norm (e.g. maximum absolute row sum norm) of the matrix.

Solution

One iteration of the Gauss-Seidel method.

Matrix form of the system of equation

Maximum Absolute Row Sum Norm

1.16.8(*) Successive Over-Relaxation Method - SOR

In this case we have

If then we call the method under relaxation.

If then we call the method over relaxation.

If then - reduce to the Gauss-Seidel method.

Going back to the example

After multiplication

1.16.9(*)Conjugent gradient method

Example

Next iteration


1.16.10(*) Steepest descent and conjugant gradient method

Consider a system of equation where A is SPD matrix and .

Definition

If is twice continuously differentiable, then

Definition

Let be a local minimizer of of if and is PSD (positive semidefinite).

Theorem

Let be twice differentiable in a neighborhood of the point and and is PSD, then is a local minimizer of .

Theorem

Solving is equivalent to minimizing the quadratic form .

Proof

It is possible to show that

and .

Part 1

Let us assume that then

and

which is PD (by assumption), then is a minimizer.

Part 2

Let assume that is minimizer i.e. and is PSD

then and is a solution of the system Ax=b.

Minimize by using the iterative methods (Krylow space method).

(1)Choose an initial guess .

(2)For x=0,1,…,n we know the solutions .

  1. If is optimal, then stop.
  2. Determine the search directions and the step size .
  3. Set

Steepest descent

i) - search direction is a residual

ii)Determine by using the line search

  1. Find a along the line where f is a minimum

After the calculations we have.

Update the solution

Level curve (contour lines)

Let assume that

Search direction is normal to the level curve.

Search directions are orthogonal to each other (????)

Energy norm

Spectral conditional norm

We see slow convergence for large condition numbers .

Definition

Let is SPD.

Two vector are called conjugate or A-orthogonal of .

Conjugant Gradient (CG) as a direct method

are mutually conjugate directions.

They are linearly independent and form a basis for

Let is a solution of then

In this way it is possible to find the exact solution as

Conjugate Method as an iterative method.

(1)Choose and

(2)Take

(3)For

  1. are mutually conjugate
  2. Then are mutually conjugate.

In general