Undergraduate Research Opportunity Programme in Science

(UROPS)

Jordan Canonical Forms of Linear Operators

Submitted by

Teo Koon Soon

Supervised by

Dr. Victor Tan

Department of Mathematics

National University of Singapore

Academic Year 2001/2002 Semester 2

Table of Contents

Introduction …………………………………………………………………...…………. 1

Chapter 0: Preliminaries ………………………………………………………...………. 2

Chapter 1: Fundamentals of Jordan Canonical Form …………………………………… 4

Chapter 2: Relationship between Minimum Polynomial and Jordan Canonical Form ... 19

Chapter 3: Finding the Jordan Canonical Form and Basis …………………………..… 28

Conclusion …………………………………………………………………………..…. 38

References …………………………………………………………………………….... 38

Introduction

Any linear transformation can be expressed by its matrix representation. In an ideal world, all linear operators are diagonalisable. The advantage lies in the simplicity of its description; as such an operator has a diagonal matrix representation. Unfortunately, there are linear operators that are not diagonalisable. However, a ‘near-diagonal’ matrix, called the canonical form, may represent a non-diagonalisable linear operator.

This report is focused on the underlying principles in constructing the Jordan canonical forms of linear operators, and determining its associated Jordan canonical basis. We will only deal with one of the most common canonical forms, the Jordan canonical form, which can be used to represent a linear operator if its characteristic polynomial splits. Particularly, every linear operator performed over the complex field always has a Jordan canonical form.

0.  Preliminaries

Definition 0.1: A polynomial f(x) splits if there are scalars c, a1, … , an (not necessarily distinct) such that f(x) = c(x-a1)(x-a2)…(x-an).

Remark: When a polynomial splits over a particular field, the scalars c, a1, … , an are elements of that field.

Definition 0.2: Let T : V à V be a linear operator on a vector space V. A subspace W is T-invariant if T(v)  W for all v  W, i.e., T(W)  W.

Definition 0.3: Let T : V à V be a linear operator on a vector space V. Let W be T-invariant subspace of V. The restriction of T on W is defined to be the function Tw : W à W where Tw(v) = T(v) for all v  W.

Proposition 0.4: Let T : V à V be a linear operator. If f(T) and g(T) are any polynomials of T, then f(T)g(T) = g(T)f(T).

Proof: Let f(T) = akTk + ak-1Tk-1 + … + a0I and g(T) = bnTn + bn-1Tn-1 + … + b0I

Suppose g’(T) consists of a single term, say g’(T) = bmTm

f(T)g’(T) = (akTk + ak-1Tk-1 + … + a0I) (bmTm)

= (akbmTk+m + ak-1bmTk+m-1 + … + a0bmTm)

= (bmTm) (akTk + ak-1Tk-1 + … + a0I) = g’(T)f(T)

Using the above result, we will prove the proposition for any polynomial g(T).

f(T)g(T) = f(T)(bnTn + bn-1Tn-1 + … + b0I)

= f(T)(bnTn) + f(T)(bn-1Tn-1) + … + f(T)(b0I)

= (bnTn)f(T) + (bn-1Tn-1)f(T) + … + (b0I)f(T)

= (bnTn + bn-1Tn-1 + … + b0I)f(T)

= g(T)f(T), proven

Proposition 0.5: Let V be of finite dimension and T : V à V be a linear operator. Let W be a T-invariant subspace of V. Then the characteristic polynomial of Tw divides the characteristic polynomial of T.

Proof: Let  = { v1, v2, … , vk } be a basis for W, and extend it to a basis S = {v1, v2, … , vk, vk+1, … , vn } for V. Let A = [T]S and B1 = [Tw]

Since T(vi)  V, where i = 1, 2, … , k, we let T(vi) = a1v1 + …+ akvk + ak+1vk+1 +…+ anvn, where a1,…, an  R.

Also, T(vi)  W and  is a basis for W, T(vi) can be expressed as a linear combination of elements in . Hence we have ak+1vk+1 + ak+2vk+2 + ak+1vk+1 …+ anvn = 0.

But {vk+1, … , vn} is linearly independent, therefore ak+1 = ak+2 = … = an = 0.

Recall that [T(vi)]S is the ith-column for [T]S. Hence A = where

O is the (n-k) x k zero matrix, and B2 and B3 are matrices of suitable sizes.

Let f(t) be the characteristic polynomial of T and g(t) be that of Tw. Then

f(t) = | A-tIn | = B1-tIk B2 = | B1-tIk | . | B3 – tIn-k | = g(t) . | B3 – tIn-k |

O B3-tIn-k

Hence g(t) divides f(t). 

1.  Fundamentals of Jordan Canonical Form

For the purpose of this report, we will assume that the characteristic polynomial of a transformation T always splits, i.e., we assume that the transformation is performed over the complex field.

Recall that T is diagonalisable if and only if T has n linearly independent eigenvectors, where n = dim(V). However, if T has less than n linearly independent eigenvectors, we can still construct the Jordan canonical form of T by extending the eigenspaces to generalised eigenspaces, from which we select ordered bases whose union is a basis for V.

In this chapter, we will prove that T has a Jordan canonical form. We will also demonstrate how to select the Jordan canonical basis from the generalised eigenspaces, and why they form a basis.

But before we can go into the important theorems, we need to first define a few basic concepts. Particularly, we need to know how does a Jordan canonical form look like, and thus understand why it is ‘almost-diagonal’.

Definition 1.1: A square matrix is called a Jordan block corresponding to a scalar λ if it has λ along the diagonal, 1 along the superdiagonal, and 0 everywhere else.

Definition 1.2: A matrix is in Jordan canonical form if it is a block diagonal matrix with Jordan blocks along the diagonal.

As an example, the above matrix is in Jordan canonical form, consisting of two Jordan blocks, one (3 by 3) corresponding to the eigenvalue 2 and the other (2 by 2) corresponding to the eigenvalue 4.

We will next introduce the concept of generalised eigenvectors, which is contained in generalised eigenspaces, that will eventually be used to form the required Jordan canonical basis.

Definition 1.3: Let T : V à V be a linear operator. A non-zero vector x  V is a generalised eigenvector of T corresponding to the scalar λ if (T- λI)P(x) = 0 for some positive integer p.

Remark: If p is the smallest integer that satisfy the above equation, then v = (T- λI)P-1(x) is an eigenvector of T corresponding to λ since (T- λ I)(v) = 0. Therefore λ is an eigenvalue of T.

Definition 1.4: Let T : V à V be a linear operator, and let λ be an eigenvalue of T. The generalised eigenspace of T corresponding to λ, denoted by Kλ(T), is the subset of V such that Kλ(T) = { x  V | (T- λI)P(x) = 0 for some positive integer p }

We will first prove that a vector space V is a direct sum of the generalised eigenspaces corresponding to each eigenvalue of T. Then we can proceed to prove that the union of the bases that we select from these generalised eigenspaces forms a basis for V.

Prior to that, the following theorem and proposition (Theorem 1.5 and Proposition 1.6) has to be established. They will be needed in the crucial proofs that follows.

Theorem 1.5: Let T : V à V be a linear operator, and let λ be an eigenvalue of T. Then

(a) Kλ(T) is a T-invariant subspace of V containing Eλ (the eigenspace of T corresponding to λ).

(b) For any scalar   , the restriction of T - I to Kλ(T) is one-to-one.

Proof: (a) We will prove this by showing that (i) Kλ(T) is a subspace, (ii) Kλ(T) is T-invariant, and (iii) Kλ(T) contains Eλ.

(i) First of all, (T- λI)P(0) = 0 for any positive integer p. So 0  Kλ(T) and hence it is non-empty.

Now let u, v  Kλ(T) and k be a non-zero scalar. So (T- λI)p(u) = (T- λI)q(v) = 0 for some positive integers p and q.

(T- λI)p+q(u+v) = (T- λI)p+q(u) + (T- λI)p+q(v) = 0. So u+v  Kλ(T) and Kλ(T) is closed under addition

Also, (T- λI)p(ku) = k(T- λI)p(u) = 0. So ku  Kλ(T) and Kλ(T) is closed under scalar multiplication.

Hence Kλ(T) is a subspace.

(ii) Let v  Kλ(T). Then (T- λI)P(v) = 0 for some positive integer.

By Proposition 0.4, (T- λI)P[T(v)] = T(T- λI)P(v) = T(0) = 0.

Hence T(v)  Kλ(T) and Kλ(T) is T-invariant.

(iii)  Let v  Eλ. Then (T- λI)(v) = 0 and so v  Kλ(T). Hence Eλ  Kλ(T)

(b) We need to show that ker(T-I) = {0}, where the domain of T-I is Kλ(T).

Let v be a non-zero vector belonging to Kλ(T) such that (T-I)(v) = 0. We shall prove this result by way of contradiction.

Let p be the smallest integer for which (T- λI)P(v) = 0 and let u = (T- λI)P-1(v). So (T- λI)(u) = (T- λI)p(v) = 0 and therefore u  Eλ .

Also, (T-I)(u) = (T-I)(T- λI)p-1(v) = (T- λI)p-1(T-I)(v) = (T- λI)p-1(0) = 0. Therefore u  E

Hence T(u) = u = u. This implies that u = 0 since   

We now have (T- λI)p-1(v) = 0, which contradicts the hypothesis that p is the smallest integer for which (T- λI)p(v) = 0.

Hence v = 0 and the restriction of T - I to Kλ(T) is one-to-one

Proposition 1.6: Let V be of finite dimension. Let T : V à V be a linear operator. Suppose that λ is an eigenvalue of T with multiplicity m. Then

(a)  dim(Kλ(T))  m

(b)  Kλ(T) = Ker((T- λI)m)

Proof: (a) Let W = Kλ(T) and h(t) be the characteristic polynomial of Tw.

Claim: λ is the only eigenvalue of Tw.

Proof of claim:

Since λ is an eigenvalue of T, then (T- λI)v = 0 for some v  V. Therefore v  W and λ is an eigenvalue of Tw.

Let w be a non-zero vector belonging to W. Suppose   λ is also an eigenvalue of Tw. So (T- I)w = 0.

By Theorem 1.5(b), the restriction of T - I to W is one-to-one. So w = 0 is the only solution and therefore  is not an eigenvalue of Tw. We arrived at a contradiction and hence we have proven the claim.

Since λ is the only eigenvalue of Tw, we h(t) = (t – λ)d, where d = dim(W).

By Proposition 0.5, h(t) divides the characteristic polynomial of T. Hence we conclude that d  m.

(b) We will show that Ker((T- λI)m)  Kλ(T) and Kλ(T)  Ker((T- λI)m).

Let v  Ker((T- λI)m), then (T- λI)m(v) = 0. Therefore v  Kλ(T) and hence Ker((T- λI)m)  Kλ(T),

Now let v  Kλ(T). Recall that W = Kλ(T) and h(t) is the characteristic polynomial of Tw. So h(Tw) = 0 by Cayley-Hamilton theorem.

From (a), h(Tw) = (Tw - λI)d = 0. So (T - λI)d(v) = 0. Since d  m, we have (T - λI)m(v) = 0. Hence v  Ker((T- λI)m) and therefore Kλ(T)  Ker((T- λI)m).

Hence we have proven that Kλ(T) = Ker((T- λI)m).

Now we can go into detail to prove that V is a direct sum of the generalised eigenspaces corresponding to all each of the eigenvalues of T.

Theorem 1.7: Let T : V à V be a linear operator on a finite dimensional vector space V. Let λ1, … , λk be distinct eigenvalues of T. Then

(i)  V = Kλ1(T) + … + Kλk(T) and

(ii)  Kλi(T)  Kλj(T) = {0} if i  j

Proof: (i) The proof is by mathematical induction on k. Let m = multiplicity of k, and f(t) be the characteristic polynomial of T.

When k = 1, f(t) = (t - )m. By Cayley-Hamilton theorem, f(T) = (T - I)m = 0. So for all v  V, (T - I)mv = 0. Therefore v  Kλ(T) and hence V = Kλ(T).

Suppose the result is true when T has k-1 distinct eigenvalues.

Now suppose T has k distinct eigenvalues. Then f(t) = (t - k)mg(t) for some g(t) not divisible by (t - k). Let W = Range of (T - kI)m, denoted by of R((T - kI)m)

Claim 1: W is T-invariant.

Suppose v  W. So v = (T - kI)m(x) for some x  V. T(v) = T(T - kI)m(x) = (T - kI)mT(x). Therefore T(v)  W and hence W is T-invariant.

Claim 2: (T - kI)m maps Kλi(T) onto itself for i  k

Let x be any vector belonging to Kλi(T). So (T - iI)p(x) = 0 for some positive integer p.

(T - iI)p(T - kI)m(x) = (T - kI)m(T - iI)p(x) = (T - kI)m(0) = 0. Hence (T - kI)m(x)  Kλi(T) and (T - kI)m maps Kλi(T) into itself for i  k.

Since k  i, by Theorem 1.5b, the restriction of T - kI to Kλi(T) is one-to-one and hence (T - kI)m maps Kλi(T) onto itself for i  k and we have proven the claim.

Let x  Kλi(T). So (T - iI)p(x) = 0. By claim 2, Kλi(T)  W. So x  W. And since W is T-invariant, (T - iI)p-1(x)  W. Therefore, for i  k, i is an eigenvalue for Tw since (T - iI)(T - iI)p-1(x) = 0. Hence Tw has at least k-1 eigenvalues.

Now we want to show that k is not an eigenvalue of Tw.

Suppose k is an eigenvalue of Tw. Then (T - kI)(v) = 0 for some non zero v  W. Now v = (T - kI)m(y) for some y  V. But (T - kI)(v) = (T - kI)m+1(y) = 0. Therefore y  Kλk(T). Since Kλk(T) = Ker((T- λkI)m), we have v = (T - kI)m(y) = 0 by Proposition 1.6(b). This is a contradiction, and hence Tw has exactly k-1 distinct eigenvalues, λ1, … , λk-1.

Let x  V, then (T - kI)m(x)  W.

Since Tw has exactly k-1 distinct eigenvalues, the induction hypothesis applies. Hence there exist wi  Kλi(Tw), i = 1, …, k-1 such that (T - kI)m(x) = w1 + … + wk-1