Effects of Block Size on the Block Lanczos Algorithm

Christopher Hsu

2003

Advisor

Professor James Demmel

Contents

Introduction
Single-Vector Lanczos Method
Block Lanczos Method
BLZPACK
Integrating SPARSITY Into BLZPACK
Nasasrb Matrix
Bcsstk, Crystk, and Vibrobox Matrices
Protein Matrices
Conclusion
Bibliography

Acknowledgements

First, I would like to thank Professor Demmel and Professor Yelick for supporting me as a new member in a research group among members already familiar with the material I had to spend weeks to learn. Their patience has allowed me to catch up and complete my research for the semester. I would like to thank Professor Demmel especially for his advice specific to my project, and for editing this report. I would also like to thank Rich Vuduc for all his help in starting everything and with programming issues. I owe much of my results to Benjamin Lee, who provided me with the information I needed to run all my tests. I am grateful for the correspondence of Osni Marques, who provided suggestions to point me in the right direction of obtaining the desired results. Finally, I thank the BEBOP group members for all their support and insight.

1. Introduction

The problem of computing eigenvalues occurs in countless physical applications. For a matrix A, an eigenvalue is a scalar l such that for some nonzero vector x, Ax = lx. The vector x is called the corresponding eigenvector. We will refer to the eigenvalue and eigenvector collectively as an eigenpair. The set of all eigenvalues is called the spectrum. Such a problem is an example of a standard eigenvalue problem (SEP). In different situations such as structural problems, we are given two matrices, a mass matrix M and a stiffness matrix K. Then an eigenvalue is a scalar l such that Kx = lMx. This type of problem is known as a generalized eigenvalue problem (GEP). In my experiments I deal with large, sparse real symmetric matrices. Real symmetric matrices have applications in clustering analysis (power systems), physics (arrangements of atoms in a disordered material), chemistry (calculation of bond energies in molecules), and much more. The matrices I work with are of dimension ranging from 8070 to 54870.

For different eigenproblems, there are many possible algorithms to choose from when deciding how to approach solving for eigenvalues. Templates for the Solution of Algebraic Eigenvalue Problems [3] classifies eigenproblems by three characteristics. The first is the mathematical properties of the matrix being examined. This includes whether the matrix is Hermitian (or real symmetric), and whether it is a standard or generalized problem. The second characteristic to consider is the desired spectral properties of the problem. This includes the required accuracy of the eigenvalues, whether we want the associated eigenvectors, and which and how many of the eigenvalues from the spectrum of the matrix we desire. The third characteristic is the available operations and their costs. Implementation of a specific algorithm may depend on a particular data structure and the cost of operations on such a structure; restrictions on how data is represented may restrict the options of available algorithms.

Depending on the properties of the problem we are solving, Templates [3] provides a recommended algorithm to find eigenvalues. Classes of algorithms include direct methods, Jacobi-Davidson methods, and Arnoldi methods. My research will examine a particular variant of the Lanczos method, namely the block Lanczos method. BLZPACK [8] (Block Lanczos Package) is a Fortran 77 implementation of the block Lanczos algorithm, written by Osni Marques. My project consists of integrating BLZPACK with SPARSITY [7], a toolkit that generates efficient matrix-vector multiplication routines for matrices stored in a sparse format. I will give a description of SPARSITY as well as BLZPACK and its interface in later sections. BLZPACK is capable of solving both the standard and the generalized eigenvalue problems; however, my research focuses on the standard problem.

The purpose of this project in particular is to examine the effects of changing the block size (explained later) when applying the block Lanczos algorithm. My goal is to decrease the overall execution time of the algorithm by increasing the block size. This involves observing the time spent in different sections of the algorithm, as well as carefully choosing optimal matrix subroutines. This work complements that of the BEBOP group at Berkeley, whose interests include software performance tuning. In particular, BEBOP explores SPARSITY performance in great detail. The relevance to BLZPACK lies in matrix-vector multiplication unrolled across multiple vectors. Depending on the matrix, the time spent for the operation A*[x1,...,xk] (i.e. a matrix A times k vectors) may be much less than k times the time spent on Ax (i.e. a matrix A times a vector, done k times). BLZPACK can be used with A*[ x1,...,xk] for any k (known as block size); my work explores whether we can accelerate eigenvalue computation by using block size greater than 1. The work performed by BLZPACK depends in a complicated fashion on the matrix, block size, number of desired eigenvalues, and other parameters; hence the answer to that question is not immediately obvious. As I will show, there are cases when using a block size of 1 performs best, and also cases where a greater block size is better.

A related problem of using a blocked procedure for solving linear systems of equations prompted the question of the usefulness of the analogous problem of computing eigenvalues. GMRES [2] is a method for solving Ax = b, where A is large, sparse, and non-symmetric. A blocked version LGMRES instead solves AX = B, incorporating fast matrix-multivector-multiply routines. BLZPACK uses a similar approach involving Krylov subspaces, which will be explained in greater detail in Section 2.

The first part of this report deals with the tools I work with: the Lanczos method (with a description of the algorithm), BLZPACK, and SPARSITY. The second part is a collection of the data gathered from my experiments and the results and conclusions drawn from them.

2. Single-Vector Lanczos Method

The class of Lanczos methods refers to procedures for solving for eigenvalues by relying on Lanczos recursion. Lanczos recursion (or tridiagonalization) was introduced by Cornelius Lanczos in 1950. In this section I will describe the algorithm for the single-vector Lanczos method[1].

The simplest Lanczos method uses single-vector Lanczos recursion. Given a real symmetric (square) matrix A of dimension n and an initial unit vector v1 (usually generated randomly), for j = 1,2,…,m the Lanczos matrices Tj are defined recursively as follows:

Let z := Avi

ai := viTz

z := z - aivi - bi-1vi-1

bi := ||z||2

vi+1 := z/bi

The Lanczos matrix Tj is defined to be the real symmetric, tridiagonal matrix with diagonal entries ai for i = 1,2,...,j and subdiagonal and superdiagonal entries bi for i = 1,2,...,j.

The vectors aivi and bivi-1 are the orthogonal projections of the vector Avi onto vi and vi-1 respectively. For each i, the Lanczos vector vi+1 is determined by orthogonalizing Avi with respect to vi and vi-1. The Lanczos matrices are determined by the scalar coefficients ai and bi+1 obtained in these orthogonalizations.

In all Lanczos methods, solving for an eigenvalue of a matrix A is simplified by replacing A with one or more of the Lanczos matrices Tj’s. Real symmetric tridiagonal matrices have small storage requirements and algorithms for their eigenpair computations are efficient [4].

The basic Lanczos procedure follows these steps:

Given a real symmetric matrix A, construct (using Lanczos recursion) a family of real symmetric tridiagonal matrices Tj for j = 1,2,...,M.
For some m £ M compute the relevant eigenvalues of the Lanczos matrix Tm. (Relevance refers to which eigenvalues from the spectrum are desired, e.g. the eigenvalue with highest absolute value.)
Select some or all of these eigenvalues as approximations to eigenvalues of the given matrix A.
For each eigenvalue m of A for which an eigenvector is required, compute a unit eigenvector u such that Tmu = mu. Map u into a vector y º Vmu (where Vm is the matrix whose kth column is the kth Lanczos vector), which is used as an approximation to an eigenvector of A. Such a vector y is known as a Ritz vector; eigenvalues of Lanczos matrices are called Ritz values of A.

In the Lanczos recursion formulas, the original matrix A is used only for the product Avi; hence it is never modified. For large sparse matrices, this enables storage optimizations, since a user only needs a subroutine which computes Ax for any vector x, which can be done in space linear in the dimension of the matrix. Furthermore, the number of arithmetic operations required to generate a Lanczos matrix is proportional to the number of nonzero entries of A, as opposed to O(n3) (where n is the size of A) for procedures which completely transform A into a real symmetric tridiagonal matrix by performing an orthogonal similarity transformation.

That the Lanczos matrices possess eigenvalues that can reasonably approximate those of A is not immediately clear. The key facts (which I state without proof[2]) are that the Lanczos vectors form an orthonormal set of vectors, and that the eigenvalues of the Lanczos matrices are the eigenvalues of A restricted to the family of subspaces Kj º sp{v1,Av1,A2v1,...,Aj-1v1}, known as Krylov subspaces. If j is sufficiently large, the eigenvalues of Tn should be good approximations to the eigenvalues of A. Continuing the Lanczos recursion until j = n (where n is the size of A), Tn is an orthogonal similarity transformation of A, and therefore has the same eigenvalues as A. A Ritz vector Vju obtained from an eigenvector u of a given Tj is an approximation to a corresponding eigenvector of A.

Error analysis is given in terms of the angle between a vector and a subspace, as explained in Chapter 2 of Lanczos Algorithms [4]. It turns out that Krylov subspaces are very good subspaces on which to compute eigenpair approximations. An important result is that the error bound increases as we proceed into the spectrum; that is, extreme eigenvalues and corresponding eigenvectors have higher expected accuracy than eigenvalues in the interior of the spectrum. (Depending on the particular implementation, however, this may not be the case.) For this reason the basic Lanczos method is often a good algorithm to use when we desire a few extreme eigenvalues.

The above description of the Lanczos algorithm assumes exact arithmetic; in practice this is usually not the case. Indeed, in computer implementations, finite precision causes roundoff errors at every arithmetic operation. Computed quantities will obviously differ from the theoretical values. When constructing the Lanczos matrices Tj, the Lanczos vectors lose their orthogonality (and even linear independence) as j increases. The Lanczos matrices are no longer orthogonal projections of A onto the subspaces sp{Vj}. The theoretical relationship between Tj and A and error estimates are no longer applicable. It was originally assumed that a total reorthogonalization of the Lanczos vectors was required. There are Lanczos methods which in fact use no reorthogonalization and employ modified recursion formulas. However, the variant I will be working with uses selective orthogonalization and modified partial reorthogonalization to preserve orthogonality of the Lanczos vectors, as will be described later.

3. Block Lanczos Method

Many eigenvalue algorithms have variants in which blocks of vectors are used instead of single vectors. When multiplying by vectors, these blocks can be considered matrices themselves. As a result, instead of performing matrix-vector operations (Level 2 BLAS routines [5]), matrix-matrix operations (Level 3 BLAS operations [5]) are used. This allows for possible optimizations in the execution of the algorithm. I/O costs are decreased by essentially a factor of the block size [1].

Lanczos methods in particular benefit from a blocked algorithm variant in another way. Depending on the implementation, single-vector methods sometimes have difficulty computing the multiplicities of eigenvalues, and for a multiple eigenvalue a complete basis for the subspace may not be directly computed. A block Lanczos method may be better for computing multiplicities and bases for invariant subspaces corresponding to eigenvalues.

As an alternative to the recurrence relations for the single-vector variant, we use the following approach[3]. Define matrices B1 º 0 and Q0 º 0. Let Q1 be an nxq matrix whose columns are orthonormalized, randomly generated vectors, where n is the size of the original matrix A and q is the block size. For i = 1,2,...,s define Lanczos blocks Qi according to the following recursive equations:

Let Z := AQi

Ai := QiTZ

Z := Z – QiAi – Qi-1Bi-1

Factor Z = Qi+1Bi by Gram-Schmidt

The blocks Qj for j = 1,2,...,s form an orthonormal basis for the Krylov subspace Ks(Q1,A) º sp{Q1,AQ1,...,As-1Q1} corresponding to the first block. The Lanczos matrices Ts are defined as the block tridiagonal matrices with A1,A2,...,As along the diagonal, and B1,B2,...,Bs along the subdiagonal, and B1T,B2T,...,BsT along the superdiagonal. Analogously to the single-vector case, we approximate eigenvalues of A by computing eigenvalues of the Ts matrices.

When A is large enough, the cost of this algorithm should be dominated by the multiplication AQi, which is the operation for which we have specially tuned routines to evaluate.