Integrated High-Order Automatic Differentiation and Variable Order Newton S Method For

AAS 03-261

GENERALIZED GRADIENT SEARCH AND NEWTON’S METHODS FOR MULTILINEAR ALGEBRA ROOT-SOLVING AND OPTIMIZATION APPLICATIONS

James D. Turner, Ph.D

ABSTRACT

A standard problem in optimization involves solving for the roots of nonlinear functions defined by f(x) = 0, where x is the unknown variable. Classical algorithms consist of first-order gradient search and Newton-Raphson methods. This paper generalizes the Newton-Raphson Method for multilinear algebra root-finding problems by introducing a non-iterative multilinear reversion of series approximation. The series solution is made possible by introducing an artificial independent variable through an embedding process. Automatic Differentiation techniques are defined for evaluating the generalized iteration algorithms. Operator-overloading strategies use hidden tools for redefining the computers intrinsic mathematical operators and library functions for building high-order sensitivity models. Exact partial derivatives are computed for first though fourth order, where the numerical results are accurate to the working precision of the machine. The analyst is completely freed from having to build, code, and validate partial derivative models. Accelerated convergence rates are demonstrated for scalar and vector root-solving problems. An integrated generalized gradient search and Newton-Raphson algorithm is presented for rapidly optimizing the very challenging classical Rosenbrock’s Banana function. The integration of generalized algorithms and automatic differentiation is expected to have broad potential for impacting the design and use of mathematical programming tools for knowledge discovery applications in science and engineering.

INTRODUCTION

Many applications in engineering, science, and mathematics require solutions for equations of the form , where x is the unknown root of the equation. Given an initial guess, x, the classical Newton-Raphson strategy seeks a correction, , by assuming that the function can be expanded as the following Taylor series

(1)

where denotes the derivative with respect to x. Eliminating terms of O(2) and higher, provides the following correctionmodel for the improved root guess given by x:= x + . Under well-defined conditions for f(x) and its derivatives the solution accuracy is improved by repeating the procedure until |f(x)| < , where  is a prescribed solution tolerance. Even though each iteration doubles the number of accurate digits, many algorithms have been proposed for accelerating the convergence rate15. All of these algorithms retain O(2) and higher terms in Eq. (1), and generate higher-order approximations for  that cancel errors through a specified approximation order. Scalar equations have been successfully handled, unfortunately vector and higher-order algebraic problems have met with limited success.

This paper generalizes the Newton-Raphson Method for multilinear algebra root-finding problems by introducing a non-iterative multilinear reversion of series approximation. The proposed series expansion handles scalar, vector, matrix, and tensor problem formulations for root solving and optimization applications. Four assumptions are required for developing the multilinear series approximation:

Introducing an artificial independent variable (i.e., s) by defining an embedding function,
Assuming that the unknown root is a function s, leading to x = x(s),
Developing an analytic continuation model for x(s) as

, and

Differentiating the embedding function to build models for .

The Multilinear Problem

The standard approach for multilinear root-solving applications consists of Taylor expanding the necessary condition as the following Taylor series:

(2)

where denotes the multilinear object, x denotes the current guess for the solution, denotes the unknown correction term, denotes an nth order gradient tensor operator, and (.) denotes an n dimensional dot product operator. The right hand side of Eq. (2) defines a highly nonlinear algebraic necessary condition for the unknown . Successive approximations strategies are normally required for solving for .

A major contribution of this paper is that the iterative solution for  is replaced with an analytic series model. A series expansion solution is made possible by introducing an artificial independent variable through an embedding process. Mathematically the embedding process replaces the original problem necessary condition with an embedding function of the form, where s denotes an artificial independent variable. The multilinear reversion of series solution for the correction term is shown to be:

(3)

where the partial derivatives for are obtained by implicitly differentiating the embedding function . The solution accuracy is adjusted by varying the number of terms retained in Eq. (3). For example, retaining n terms and introducing the result into Eq. (2), one obtains a solution with the following accuracy:

The residual errors are one order higher than the number of terms retained in the series approximation.

Computational Issues for Generating High-Order Partial Derivative Models

High-order partial derivative models are required for successfully applying the multilinear reversion of series solution. To this end, both theoretical and computational issues are addressed. The theoretical issue is addressed by introducing a coordinate embedding strategy, which provides an embedding function that is processed for building analytic partial derivative models. The computational issue is addressed by introducing Automatic Differentiation (AD) techniques. These tools enable the automatic derivation and evaluation of arbitrarily complex partial derivative models. For all derivative orders, the AD-generated models are accurate to the working precision of the machine. The analyst is completely freed from having to build, code, and validate partial derivative models.

Object-Oriented Kernel Capabilities. Object-oriented language features and operator overloading computational capabilities provide the foundation for all AD technologies presented in this paper. An Object-Oriented Coordinate Embedding Method (OCEA) is introduced for providing a conceptual framework for developing AD-based application toolkits. Arbitrarily complex and large problem sizes are easily handled. OCEA data structures are ideally suited for computing the rates appearing in multilinear reversion of series algorithms. The proposed generalized gradient search/Newton-Raphson methods only become practical when linked to automated capabilities for computing high-order partial derivative models. The integration of multilinear reversion of series and AD tools are expected to impact both real-world applications as well as problems on the frontiers of computational science and engineering.

HISTORICAL BACKGROUND

Sir Isaac Newton (1643-1727) presented a new algorithm for solving a polynomial equation in 16691, where an initial guess plus an assumed small correction is used to transform the polynomial function. By linearizing the transformed polynomial for the small correction term and repeating the process several times, he effectively solved the polynomial equation. Newton did not define a derivative for the polynomial as part of his algorithm. The notion of a derivative was introduced into what has become known as Newton’s method in 1690 by Joseph Raphson (1678-1715)2. By including the derivative the method is also known as the Newton-Raphson method, where the iteration formula is given by . Given a starting guess for the root x0, the limit of the sequence for x1, x2, …, xn converges to the desired root under well defined conditions for f(x) and its derivatives. For a convergent sequence for x1, x2, …, xn, the number of accurate digits double at each stage of the iteration.

Accelerated Convergence for Scalar Newton-Raphson Problems. Many Authors have proposed methods to accelerate the convergence rate for Newton’s Method. For example, the astronomer E. Halley (1656-1743) in 1694 developed the second-order expansion

where the number of accurate digits triple at each stage of the iteration2. Householder3 later developed the general iteration formula

where p is an integer and (1/f)p denotes the derivative of order p of the inverse of the function f. Remarkably, this iteration has a convergence rate of order (p+2). Consequently, when p=0 the algorithm has quadratic convergence and reproduces the performance of the original Newton’s Method, and when p=1 the algorithm has cubical convergence that reproduces Halley’s results. Several ad-hoc methods have been proposed for extending the accelerated convergence rates for scalar equations through eighth order15. Extensions for vector systems have been limited to cubic convergence rates using Chebyshev’s method.

On the Link Between Newton’s And Reversion of Series Methods. All versions of the Newton-Raphson method and the classical reversion of series algorithm are shown to be identical. The key step involves introducing an embedding parameter that transforms the inequality constraint form of the Newton’s method necessary condition into an equality constraint. To this end, recall that the Eq. (1) is defined by the following nonlinear inequality constraint:

whereis the unknown. This nth order polynomial equation is difficult to solve and multiple solutions can exist. The solution process is simplified by transforming the inequality constraint into an equality constraint by introducing an artificial embedding parameter, s, leading to:

This equation defines a functional relationship of the form , where the desired solution for is recovered when s = 0. Classically, the solution for is obtained by reverting the series, yielding an equation of the form . Retaining fifth order terms in the approximation for , and using the symbol manipulating computer algebra system MACSYMA, a classical series reversion method yields:

Introducing s = 0 into the equation above (i.e., ), one can identify all previously reported algorithms for accelerated Newton-Raphson algorithms for scalar equations, as follows:

where the convergence rate is defined by n+1 and n is defined by the expression . Extensions for arbitrary order are easily formulated. Unfortunately, the multilinear generalizations of these approximations are not immediately obvious. The success of the embedding strategy for scalar equations, however, suggests that an embedding strategy can be equally effective for multilinear problems.

Generalized Newton Method for Non-Scalar Equation. A multilinear reversion of series solution is developed for Newton’s Method. As in the scalar equation case, an artificial independent variable, s, is introduced into the original problem, which allows a series expansion model to be developed. The previously successful scalar function embedding approach, however, does not extend to multilinear systems because many simultaneous constraint conditions must be satisfied. A new embedding strategy is required for defining a scalar embedding parameter so that x = x(s).

EMBEDDING / REVERSION OF SERIES ALGORITHM

The original root-solving problem is embedded into a space of higher dimension, by introducing an artificial independent variable. To this end, the original necessary condition root is redefined as the following parameter embedding problem10:

(4)

where s is scalar embedding parameter, xg is the starting guess, and x = x(s). Like the scalar example, the embedding function transforms an inequality constraint into an equality constraint. Even though H( x(s), s) is a multilinear object, Eq.(4) is easily handled by introducing a scalar parameter. The initial and final states of the embedding function are presented in Table 1. As in the scalar case, the solution for the original problem is recovered by selecting so that the original problem necessary condition is satisfied. Specifying how x changes along the curve defined by H(x(s), s) solves the root-solving problem for

H(x(s), s) = 0 as s varies from 1 to 0. The solution curve for x is known as a homotopy path10.

Table 1

EMBEDDING FUNCTION INITIAL AND FINALSTATES

Embedding Function Evaluation Point / Embedding Function / Comment
s = s0 = 1 / / Because (initial guess)
s = sf = 0 / / Original Function Returned

Series Expansion for the Root of

Along the homotopy path defined by Eq. (4) (i.e., analytic continuation path), the solution for x(s) is generated by the following Taylor series

(5)

where and denotes the starting guess for the root. Setting in Eq. (5) eliminates the artificial independent variable from the series solution and restores the original problem dimension, yielding the non-iterative reversion of series solution:

(6)

Implicit Solution for the Multilinear Partial Derivatives. The differential rates appearing in Eq. (6) are obtained by repeatedly differentiating H(x(s),s) = 0 w.r.t. s. For the second- and higher-order partials, the mathematical models are simplified by observing that s is contained linearly in Eq. (4). For example, computing the first partial derivative of H( x(s) s) = 0 leads to , where denotes the first-order gradient of H(*), denotes the standard kronecker delta function, denotes an nx1 unit vector for i = 1,…,n, denotes the desired implicit rate as a function of the artificial independent variable, and (.) denotes an n-dimensional dot product. Extending the first order partial derivative to higher order, one easily shows that the partial derivatives of satisfy a functional equation of the form

(7)

where the symmetric higher-order gradient tensor functions for are defined by

AD-based tools are used for numerically building and evaluating the gradient tensor functions for . Analytic models for are presented for the first four partial derivatives Eq. (4), leading to

These nonlinear equations are easily manipulated to provide as

(8)

A multilinear inversion algorithm is required for evaluating in Eq. (9). Equation (9) is evaluated sequentially because higher-order calculations make use of previously computed lower-order solutions. The computer symbol-manipulating program MACSYMA 2.416 has been used to validate Eq. (9).

(9)

AUTOMATED PARTIAL DERIVATIVE MODELS

The partial derivative models appearing in Eq. (9) are extremely complicated. Automated numerical techniques are introduced for building and evaluating these models. Research for AD methods has existed as a research topic since the 1980’s11-14. Examples of previously developed AD tools include:

ADIFOR11-12 for evaluating 1st order partials,
AD018 which uses F90 operator-overloading techniques7-9 and computational graphs,
ADOL-C which uses a graph-based C/C++ code for 1st order partials
ADMIT-1which represents a MATLAB interface toolbox for linking different AD codes to the MATLAB environment,
AUTO_Derivwhich transformations of FORTRAN codes, and
OCEA for building 1st through 4th order partials in FORTRAN 90 and MACSYMA.

AD-based applications have appeared for optimization, robotics, multibody, algorithms, and molecular dynamics. A significant limitation of these tools has been that Hessian capabilities are only available for scalar objective functions and vector objective function applications are limited to Jacobians.

OCEA-Based Automated Partial Derivative Capabilities

OCEA provides a powerful language extension for engineering and scientific software development. Arbitrary order partial derivative capabilities are enabled by embedding multiple levels of the chain rule of calculus in the transformed operators and functions. The embedded chain rules build the partial derivative models on-the-fly, without analyst intervention, using hidden computational resources. The analyst uses standard programming constructs for building computer-based models: hidden operator-overloading tools build and evaluate the partial derivative models.

Arbitrarily large and complex problems are handled because OCEA exploits generalized scalar operations for all numerical calculations. Operator-overloading techniques transform the computers intrinsic operators and functions for enabling these advanced capabilities. A two-stage transformation process is employed. First, each OCEA-transformed scalar becomes an abstract compound data object, consisting of the original scalar plus hidden artificial dimensions for performing and storing the partial derivative calculations. Second, multiple levels of the chain rule of calculus are embedded in each mathematical operator and library function. An OCEA module consists of a linked set of intrinsic operators, math library functions, embedded chain rules, and composite function rules.

For the root-solving and optimization problems considered in this paper, automated tools virtually eliminate the time consuming, error prone process of modeling, coding, and verifying sensitivity calculations. Many algorithms only require the insertion of a USE Module Statement and some minor changes of variable types, including matrix operations4-6.

OCEA Algorithm

OCEA methodology is a transformational process that changes functional and partial derivative data into new forms during calculations. A single OCEA function evaluation generates exact numerical values for the function as well as hidden values for the Jacobian, and higher-order partial derivatives. All of the mathematical library functions require generalized composite function transformations for linking current calculations with all previous sensitivity calculations. Module functions support a mixed object/data type computational environment, where integer, real, double precision, complex, and OCEA data types can co-exist. The partial derivatives are extracted by utility routines as a post-processing step. Fortran 90 (F90) and Macsyma 2.4 OCEA prototype codes have been developed.

The development of an OCEAtoolbox for engineering and scientific applications is addressing the following seven software issues: Defining how independent variables are transformed to OCEA form; Developing derived data types for vectors, tensors, and embedded variables; Defining interface operators for supporting generalized operations; Using Module functions to hide OCEA computational resources; Defining OCEA-enhanced library routines that encode chain rule models; Providing utility routines to access the OCEA partial derivative calculations; and Developing application suites of software for solving broad classes of mathematical programming problems. Second-order OCEA models are presented for discussing each of these issues.

Data Structures.Each scalar variable is modeled as a compound data object consisting of a concatenation of the original scalar variable and its first and higher order partial derivatives. Compound data objects are created using Derived data types. The partial derivative models are not visible to the user during calculations, and can be thought of as hidden artificial dimensions for the transformed scalar variables. A significant benefit of employing hidden artificial dimensions is that the structure and computational sequencing of standard algorithms is not impacted.