Kernel CMAC with Improved Capability

> SMCB-E-08182005-0575 <

[(]

Gábor Horváth Senior Member, IEEE and Tamás Szabó

Abstract— Cerebellar Model Articulation Controller (CMAC) has some attractive features: fast learning capability and the possibility of efficient digital hardware implementation. Although CMAC was proposed many years ago, several open questions have been left even for today. Among them the most important ones are about its modeling and generalization capabilities. The limits of its modeling capability were addressed in the literature and recently certain questions of its generalization property were also investigated. This paper deals with both modeling and generalization properties of CMAC. First it introduces a new interpolation model then it gives a detailed analysis of the generalization error, and presents an analytical expression of this error for some special cases. It shows that this generalization error can be rather significant, and proposes a simple regularized training algorithm to reduce this error. The results related to the modeling capability show that there are differences between the one-dimensional and the multidimensional versions of CMAC. This paper discusses the reasons of this difference and suggests a new kernel-based interpretation of CMAC. The kernel interpretation gives a unified framework; applying this approach both the one-dimensional and the multidimensional CMACs can be constructed with similar modeling capability. Finally the paper shows that the regularized training algorithm can be applied for the kernel interpretations too, resulting in a network with significantly improved approximation capabilities.

Index Terms— Cerebellar Model Articulation Controller, neural networks, modeling, generalization error, kernel machines, regularization.

I. INTRODUCTION

erebellar Model Articulation Controller (CMAC) [1] - a special neural architecture, which belongs to the family of feed-forward networks with a single linear trainable layer - has some attractive features. The most important ones are its extremely fast learning capability and the special architecture that lets effective digital hardware implementation possible [2], [3]. The CMAC architecture was proposed by Albus in the middle of the seventies [1] and it is considered as a real alternative to MLP and other feed-forward neural networks [4]. Although the properties of the CMAC network were analyzed mainly in the nineties [5]-[14], some interesting features were only recognized in the recent years. These results show that the attractive properties of the CMAC have a price: both its modeling and generalization capabilities are inferior to those of an MLP. This is especially true for multivariate cases, as multivariate CMACs can learn to reproduce the training points exactly only if the training data come from a function belonging to the additive function set [8].

The modeling capability can be improved if the complexity of the network is increased. More complex versions were proposed in [10], but as the complexity of the CMAC depends on the dimension of the input data, in multivariate cases the high complexity can be an obstacle of implementation in any way.

Concerning the generalization capability the first results were presented in [15], where it was shown that the generalization error can be rather significant. Now we will deal with both the generalization and the modeling properties. We will derive an analytical expression of the generalization error. The derivation is based on a novel piecewise linear filter model of the CMAC network that connect CMAC to polyphase multirate filters [16]. The paper also proposes a simple regularized training rule to reduce the generalization error.

The second main result of the paper is that a CMAC network can be interpreted as a kernel machine. This kernel interpretation gives a general framework: it shows that a CMAC corresponds to a kernel machine with second order B-spline kernel functions. Moreover the paper shows that using the kernel interpretation it is possible to improve the modeling capability of the network without increasing its complexity even in multidimensional cases.

The kernel interpretation may suffer from the same poor generalization capability, however it will be shown that the previously introduced regularization can be applied for the kernel CMAC too. This means that using kernel CMAC with regularization both the modeling and the generalization capabilities can be improved significantly. Moreover, the paper shows that similarly to the original CMAC even the regularized kernel versions can be trained iteratively, which may be important in such applications where real-time on-line adaptation is required.

The paper is organized as follows. In Section II a brief summary of the original CMAC and its different extensions is given. This section surveys the most important features, the advantages and drawbacks of these networks. In section III the new interpolation model is introduced. Section IV deals with the generalization property, and - for some special cases – it gives an analytical expression of the generalization error. Section V presents the proposed regularized training rule. In Section VI the kernel interpretation is introduced. This section shows what kernel functions correspond to the classical binary CMACs. It also shows that with a properly constructed kernel function it is possible to improve the modeling capability for multivariate cases. Finally it presents the regularized version of the kernel CMAC, and shows that besides analytical solutions both kernel CMAC versions can also be trained iteratively. Section VII discusses the network complexity and gives some examples to illustrate the improved properties of the proposed kernel CMACs. Some mathematical details are given in Appendices.

II. A Short Overview of the CMAC

A. The classical binary CMAC

CMAC is an associative memory type neural network, which performs two subsequent mappings. The first one - which is a nonlinear mapping - projects an input space point into a M-bit sparse binary association vector a which has only C < M active elements: C bits of the association vector are ones and the others are zeros. The second mapping calculates the output of the network as a scalar product of the association vector a and the weight vector w:

y(u)=a(u) Tw (1)

CMAC uses quantized inputs and there is a one-to-one mapping between the discrete input data and the association vectors, i.e. each possible input point has a unique association vector representation.

Another interpretation can also be given to the CMAC. In this interpretation for an N-variate CMAC every bit in the association vector corresponds to a binary basis function with a compact N-dimensional hypercube support. The size of the hypercube is C quantization intervals. This means that a bit of the association vector will be active if and only if the input vector is within the support of the corresponding basis function. This support is often called receptive field of the basis function [1].

The two mappings are implemented in a two-layer network architecture (see Fig. 1). The first layer implements a special encoding of the quantized input data. This layer is fixed. The second layer is a linear combiner with the trainable weight vector w.

B. The complexity of the network

The way of encoding - the positions of the basis functions in the first layer - and the value of C determine the complexity, and the modeling and generalization properties of the network. Network complexity can be characterized by the number of basis functions what is the same as the size of the weight memory. In one-dimensional case with r discrete input values a CMAC needs weight values. However, if we follow this rule in multivariate cases, the number of basis functions and the size of the weight memory will grow expo-

Fig. 1. The mappings of a CMAC

nentially with the input dimension. If there are ri discrete values for the i-th input dimension an N-dimensional CMAC needs weight values. In multivariate cases the weight memory can be so huge that practically it cannot be implemented. As an example consider a ten-dimensional binary CMAC where all input components are quantized into 10 bits. In this case the size of the weight memory would be enormous, approximately 2100.

To avoid this high complexity the number of basis functions must be reduced. In a classical multivariate CMAC this reduction is achieved by using basis functions positioned only at the diagonals of the quantized input space as it is shown in Fig. 2.

Fig. 2. The receptive fields of a two-variable Albus CMAC

The shaded regions in the figure are the receptive fields of the different basis functions. As it is shown the receptive fields are grouped into overlays. One overlay contains basis functions with non-overlapping supports, but the union of the supports covers the whole input space, that is each overlay covers the whole input N-dimensional lattice. The different overlays have the same structure; they consist of similar basis functions in shifted positions. Every input data will select C basis functions, each of them on a different overlay, so in an overlay one and only one basis function will be active for every input point.

The positions of the overlays and the basis functions of one overlay are represented by definite points. In the original Albus scheme the overlay-representing points are in the main diagonal of the input space, whereas the basis-function-positions are represented by the sub-diagonal points, as it is shown in Fig. 2 (black dots). In the original Albus architecture the number of overlays does not depend on the dimension of the input vectors; it is always C. This is in contrast to the so-called full overlay version when CN overlays are used. In a multivariate Albus CMAC the number of basis function will not grow exponentially with the input dimension, it will be , where denotes integer part. Using this architecture there will be “only” 255 weight values in the previous example. Although this is a great reduction, the size of the memory is still prohibitively large so further complexity reduction is required.

In the Albus binary CMAC hash-coding is applied for this purpose. It significantly reduces the size of the weight memory, however, it can result in collisions of the mapped weights and some unfavorable effects on the convergence of CMAC learning [17], [18]. As it will be seen later the proposed new interpretation solves the complexity problem without the application of hashing, so we will not deal with this effect.

The complexity reduction will have an unwanted consequence too, even without hashing. An arbitrary classical binary multivariate CMAC can reproduce exactly the training points only if they are obtained from an additive function [8]. For more general cases there will be modeling error, i.e. error at the training points. More details can be found in [9].

Another way of reducing the complexity is to decompose a multivariate problem into many smaller-dimensional ones [19] - [26]. [19] - [22] propose such modular architectures where a multivariate network is constructed as a sum of several submoduls, and the submodules are formed from small-dimensional (one- or two-dimensional) CMACs. The output of a submodule is obtained as a product of the elementary low-dimensional CMACs (sum-of-product structure). In [23] a hierarchical architecture called HCMAC is proposed, which is a multilayer network with log2N layers. Each layer is built from two-variable elementary CMACs, where these elementary networks use finite-support Gaussian basis functions.

A similar multilayer architecture is introduced in [24]. The MS_CMAC, or its high-order version HMS_CMAC [25] applies one-dimensional elementary networks. The resulted hierarchical, tree-structured network can be trained using time inversion technique [27]. Although all these modular CMAC-variants are significantly less complex, the complexity of the resulted networks is still rather high. A further common drawback of all these solutions is that their training will be more complex. Either a proper version of backpropagation algorithm should be used (e.g. for sum-of-product structures and for HCMAC) or several one-dimensional CMACs should be trained independently (in case of MS_CMAC). Moreover in this latter case all of the simple networks except in the first layer need training even in the recall phase increasing the recall time greatly. A further deficiency of MS_CMAC is that it can be applied only if the training points are positioned at regular grid-points [24].

C. The capability of CMAC

An important feature of the binary CMAC is that it applies rectangular basis functions. Consequently the output – even in the most favorable situations – will be piecewise linear, and the derivative information is not preserved. For modeling continuous functions the binary basis functions should be replaced by smooth basis functions that are defined over the same finite supports (hypercubes) as the binary functions in the Albus CMAC. From the beginning of the nineties many similar higher-order CMACs have been developed. For example Lane et al. [10] use higher-order B-spline functions to replace the binary basis functions, while Chian and Lin apply finite-support Gaussian basis functions [12]. Gaussian functions are used in HCMAC network too [23].

When higher-order basis functions are applied a CMAC can be considered as an adaptive fuzzy system. This approach is also frequently applied. See e.g. [28]-[31]. Higher-order CMACs result in smooth input-output mapping, but they do not reduce the network complexity as the number of basis functions and the size of the weight-memory is not decreased. Moreover, applying higher-order basis functions the multiplierless structure of the original Albus binary CMAC is lost.

Besides basis function selection CMACs can be generalized in many other aspects. The idea of multi-resolution interpolation was suggested by Moody [32]. He proposed a hierarchical CMAC architecture with L levels, where different quantization resolution lattices are applied in the different levels. In [33] Kim et al. proposed a method to adapt the resolution of the receptive fields according to the variance of the function to be learned. The generalized CMAC [13] assigns different generalization parameter C to the different input dimensions. The root idea behind all these proposals is that for functions with more variability more accurate representation can be achieved by decreasing the generalization parameter and increasing the number of training points. But using smaller C without having more training data will reduce the generalization capability. The multi-resolution solutions try to find a good compromise between the contradictory requirements.