Matrix Algebra and Chemometrics Using Matlab

Lab #7 - Matrix Algebra and Chemometrics

PART A: Basic Matrix Calculations

Enter these two matrices:

3 0 3 1 1 1

A =4-1-1 B = -2 1 6

1 2 5 3 4 5

Give the result of adding(A + B) and multiplying (A*B) the two matrices together:

Now calculate B*A and write the results. Are the results of the two multiplications the same? Is matrix multiplication commutative?

Calculate the transpose of matrix A:

Calculate the inverse of matrix A:

Calculate A*A-1 – what matrix is the result?

Calculate A-1*A. Is the result the same as in Step 6?

PART B: Solving Simultaneous Equations

Let's say that you have the following three simultaneous equations, and you want to solve for x, y, and z -

x + 2y + 7z = 1

5x + y - z = 6

2x - 3y + 4z = 3

The problem can easily be solved using matrix methods. First recast the system of equations in matrix form -

Solve for x, y, and z using an inverse and show your work here.

PART C: Linear Least Squares Revisited

Remember those complicated least squares equations you used in Program #5? They are equivalent to the matrix solution to the problem, and of course using matrices is so much easier! Here's how to do it. We have a data matrix Y, which consists of a single row: Y = [y1 y2 y3 ....yn]

For example, this could be the data you fit before in the file ls.dat. Now, the equation for a straight line is y = mx + b, but we're going to write it a little differently as:

Y = ao + a1xwhere ao = b and a1 = m.

The corresponding xaxis data is placed into a matrix X, where the first row of X is all ones, and the second row is the xaxis data. The whole problem can be re-written as -

Y = AXEquation 1

Expanding the matrices results in this -

[y1 y2 y3 ....yn] = [ao a1] 1 1 1 ...... 1

x1 x2 x3...... xn

We want to solve for the fitting coefficients in matrix A, which can be done by using the "right pseudoinverse". If X was a square matrix, the problem could be solved simply by right multiplying both sides of Equation 1 by the inverse of X, so that the solution would be-

a = Y X-1where the "-1" indicates taking the inverse.

But X is most definitely not square. So we right multiply both sides of Equation 1 by the transpose of X, call it X' -

YX' = (a X)X'which can be rewritten as -

YX' = a  (XX')because matrix multiplication is associative.

Now the matrix product (XX') is just another matrix, so it will have an inverse, call it (XX')-1. We now right-multiply by this new inverse -

YX'  (XX')-1 = a  (XX')  (XX')-1which simplifies to

Equation 2 is the least squares solution to the problem.

YX'  (XX')-1 = aEquation 2

Load the file ls.dat into Matlab. Find the least squares best slope and y-intercept using matrices. Plot the best-fit line over the data (plot the data as points and the fit as a line). Print out the graph and hand-in with the exercise.

Enter values for b and m here: ______

PART D: Linear Models

Consider a matrix D containing spectra of an acid-base indicator. Each column contains the spectrum obtained at a particular pH. The pH of each spectrum increases from left to right in the matrix. An example of this kind of data is shown in Figure 1. Let's say that the matrix D has m rows and n columns. The dimensions of matrix D and subsequent matrices can be found in Figure 2 on the next page.

Figure 1

Acid-base indicators exist in two forms, a protonated and deprotonated species -

HIn  H+ + In-Equation 3

(acidic (basic

form) form)

The acidic form of the indicator is found as the left-hand peak in Figure 1; the magnitude of this peak decreases with increasing pH. Conversely, the basic form of the indicator is found as the right-hand peak; the magnitude of this peak increases with increasing pH.

Now let's say that we have the spectra for the pure acidic form of the indicator (HIn) and the basic form (In-). Place these spectra as columns in another matrix, call it A. The dimensions of A are m rows by k columns as shown in Figure 2 (k = 2 in this example but let's keep it more general).

Define a third matrix C such that the rows of C vary according to how the peaks change in magnitude in the spectra. This matrix is called the design or model matrix. In our example, the first row of C controls how the pure spectra found in column 1 of matrix A changes. The second row of C controls the spectrum in column 2 of matrix A. Matrix C will have dimensions k x n as shown in Figure 2.

Now comes the fun part. Matrix D is equal to the product of matrix A times C, it's that simple! When matrix A is multiplied by C, each data point in D is a linear combination of pure spectrum 1 plus pure spectrum 2. For example, in matrix D element d11 = a11 x c11 + a12 x c21 (for the case where k = 2 again). The coefficients c11 and c21 tell us how much of each pure component to add together, which depends on the model we are using. The overall process is given by Equation 4 -

D = ACEquation 4

d11 d12 ...... d1na11 a12 ...... a1k c11 c12 ...... c1n

d21 d22 ...... d2na21 a22 ...... a2k c21 c22 ...... c2n

......

. . . . ck1 ck2 ...... ckn

. . . .

dm1 dm2 ...... dmnam1 am2 .....amk

D = m x n A = m x k C = k x n

Figure 2

Usually you are trying to obtain matrix A because you are interested in obtaining the spectra of the pure components of the mixture you are analyzing. Matrix A can be solved for by using the matrix techniques we have been using already. There is one major difference, and that is matrix C is usually never square. This means that we cannot solve Equation 4 by rightmultiplying both sides by the inverse of C. Instead we form the right pseudoinverse as follows. First right multiply both sides by the transpose of C -

DC' = A(CC')

Then right multiply by the inverse of (CC') -

DC'(CC')-1 = A(CC')(CC')-1

Since (CC')(CC')-1 = I, the equation simplifies to -

A = DC' (CC')-1Equation 5

This procedure is known as LINEAR MODELING. In general, linear modeling consists of the following steps -

1. Obtain data and place into the D matrix in column format.

2. Choose a model and place it into the C matrix in row format

3. Solve for A by using the right pseudoinverse

So how do you form matrix C in this problem? You place in the rows of C values describing the variation in the pure spectral components. Typically there is some experimental parameter being varied; in this example it is pH or the hydrogen ion concentration. For acid-base indicators, the forms of the indicator vary according to these expressions -

HIn = [H+]/([H+] + Ka)

In- = Ka/([H+] + Ka)

The term HIn gives the fraction of the indicator found in the acidic form; In- represents the fraction found as the basic form of the indicator. These functions are known as the Distribution Function of the species.

With these background topics in mind, follow the procedure on the next page for the acid-base indicator bromothymol blue. Bromothymol blue has a color change from yellow to blue in the pH region 6.2-7.6. The pKa of this indicator is 7.10. We are going to use linear modeling to extract the pure acid-base spectra of the indicator.

The spectra of bromothymol blue in Figure 1 were taken at pH's of 6.06, 6.43, 7.04, 7.21, and 7.62. These data are contained in the file brom.txt which you can download from the class webpage. The first column of this file contains the wavelengths; the remaining columns contain the spectra in order of increasing pH.

Write a program that calculates the pure spectra of the acidic and basic forms of the indicator, and plots (1) the spectra, (2) the C-matrix versus [H+], and (3) the A-matrix containing the pure spectra.

Print out these graphs and hand them in with the lab.