Improvement of the Recognition Module of WinBank
6.199 Advanced Undergraduate Project
Daniel González
MIT EECS 2002
Advisors: Professor Amar Gupta
Dr. Rafael Palacios
Table of Contents
1. Introduction
2. Background
2.1WinBank
2.1.1 Preprocessing Module
2.1.2 Recognition Module
2.1.3 Postprocessing Module
2.2 Neural Networks
3. Procedure
3.1 Creation
3.2 Training
3.3 Testing
3.4 Evaluation
4. Network Parameters
4.1 Neural Network Architecture
4.1.1 Hidden Layer Size
4.1.2 Network Type
4.1.3 Transfer Functions
4.2 Neural Network Training
4.2.1 Performance Functions
4.2.2 Training Algorithms
5. Results
5.1 Feed-Forward Network Results
5.1.1 Hidden Layer Sizes
5.1.2 Transfer Functions
5.1.3 Performance Functions
5.1.4 Training Algorithms
5.1.5 Total Network Analysis
5.2 LVQ Network Results
5.3 Elman Network Results
6. Conclusion
References
Appendix A: MATLAB Code
1. Introduction
More than 60 billion checks are written annually in the United States alone. The current system for processing these checks involves human workers who read the values from the checks and enter them into a computer system. Two readers are used for each check to increase accuracy. This method of processing checks requires an enormous amount of overhead. Because such a large number of checks are written annually, even a small reduction in the cost of processing a single check adds up to significant savings. WinBank is a program that is being created to automate check processing, drastically reducing the time and money spent processing checks.
WinBank receives the scanned image of a check as input, and outputs the value for which the check was written. This process of translating physical text (in this case, hand-written numerals) into data that can be manipulated and understood by a computer is known as Optical Character Recognition (OCR). WinBank implements OCR through heavy use of a concept from artificial intelligence known as neural networks. Neural networks can be used to solve a variety of problems and are a particularly good method for solving pattern recognition problems. The effectiveness of a neural network at solving problems depends on many different network parameters, including its architecture and the process by which a network is taught to solve problems (known as training).
This paper explores the different neural network architectures considered for use in WinBank and the processes used to train them. The following section presents background information on WinBank and neural networks, and is followed by a discussion of the procedures used to test the different types of neural networks considered. This procedural information is followed by an explanation the different parameters (and their associated values) used for creating and training the networks. The next section presents the values obtained from evaluating the performances of the neural networks. The final section identifies the best neural network for use in WinBank, as well as other neural networks that may be useful in other problems.
2. Background
The main focus of this paper is the module of WinBank that uses neural networks to recognize handwritten numbers. However, a brief overview of the entire WinBank system and background information on neural networks are presented here for the readers’ benefit.
2.1WinBank
The Productivity From Information Technology Initiatives (PROFIT) group at MIT’s Sloan School of Management is developing a program called WinBank in an effort to automate check processing in both the United States and Brazil. WinBank achieves this automation by implementing OCR with a heavy dependence on neural networks. The program is organized into three main modules that combine to implement OCR. The three modules that make up WinBank are the preprocessing module, the postprocessing module, and the recognition module.
2.1.1 Preprocessing Module
The preprocessing module takes the scanned image of a check as input, and outputs binary images in a format that is useful for the recognition module. The preprocessing module first analyzes the scanned image to determine the location of the courtesy amount block (CAB). The CAB is the location on the check that contains the dollar amount of the check in Arabic numerals (figure 1). After determining the location of the CAB, the preprocessing module next attempts to segment the value written in the CAB into individual digits. These segments are then passed through a normalization procedure designed to make all of the characters a uniform size and a uniform thickness. The preprocessed images are then individually output to the recognition module.
Figure 1: Courtesy Amount Block (CAB) Location
2.1.2 Recognition Module
The recognition module is the main engine that attempts to classify the number represented by each image received from the preprocessing module. The recognition module feeds the output obtained from the preprocessing module into a neural network. The neural network then attempts to identify the value represented by this input and outputs a number from zero to nine.
2.1.3 Postprocessing Module
The postprocessing module receives the output of the recognition module and gauges the strength of the recognition module’s guess. If the postprocessing module is not satisfied that the recognition module has output a correct value, then either the entire process begins again (making different decisions along the way) or the check is rejected and a human steps in to identify the value of the check. If the postprocessing module is satisfied with the recognition module’s output, then it outputs this value as the output of WinBank.
Figure 2: The three major modules of WinBank
2.2 Neural Networks
Artificial neural networks are a modeled after the organic neural networks in the brain of an organism. The fundamental unit of an organic neural network is the neuron. Neurons receive input from one or more different neurons. The strength of the effect that each input has on a neuron depends on the neuron’s proximity to the neuron from which it received the input[3]. If the combined value of these inputs is strong enough, then the neuron receiving these signals outputs a brief pulse. When neurons combine with many other neurons (there are approximately 1011 neurons in the human brain [3]) to form networks, an organism can learn to think and make decisions.
Figure 3: Real Neuron (left), Model of an Artificial Neuron (right)
Although much simpler, artificial neural networks perform much the same way as organic neural networks. Artificial neurons receive inputs from other neurons. The strength of the effect that each input has on a neuron is determined by a weight associated with the input. The receiving neuron then takes the sum these weighted inputs and outputs a value according to its transfer function (and possibly a bias value). Neurons can be combined into sets of neurons called layers. The neurons in a layer do not interconnect with each other, but interconnect with neurons in other layers. A neural network is made up of one or more neurons, organized into one or more layers. The layer that receives the network input is called the hidden layer and the layer that outputs the network output is called the output layer. Neural networks can have one or more layers between the input and output layers. These layers are called hidden layers. Two major components that contribute to the effectiveness of a neural network at solving a particular problem are its architecture and the method by which it is trained.
Different neural networks can have different architectures. In this paper, the following parameters are considered when discussing neural network architecture: hidden layer size, the type of network, and the transfer function or functions used at each layer.
In order for a neural network to learn how to correctly solve a problem, appropriate network connections and their corresponding weights must be determined through a process called training. There are many different algorithms used for training a neural network. The various training procedures and neural network architectures considered for use in WinBank are presented in later sections.
Figure 4: Basic Neural Network Structure
3. Procedure
Many different types of neural networks were designed, created, trained, tested, and evaluated in an effort to find the appropriate neural network architecture and training method for use in WinBank. These networks were evaluated according to the main goal of WinBank: decrease the overhead involved in check processing as much as possible while achieving the highest possible degree of accuracy. Neural networks that decrease the overhead involved in check processing are fast and require little human intervention, while neural networks that achieve a high degree of accuracy make the fewest number of errors when classifying numbers. This section discusses the procedure used to create, train, test, and evaluate the various neural networks according to this goal.
The creation, training, and testing of each neural network was done using the MathWorks software package MATLAB. MATLAB contains a “Neural Network Toolbox” that facilitates rapid creation, training, and testing of neural networks. MATLAB was chosen to use for WinBank development because this toolbox would save an enormous amount programming effort.
3.1 Creation
Creating a neural network is simply a matter of calling the appropriate MATLAB function and supplying it with the necessary information. For example, the following code creates a new feed-forward network that uses the logarithmic-sigmoidal transfer function in both layers and trains its neurons with the resilient backpropagation training algorithm:
net=newff(mm, [25 10], {‘logsig’ ‘logsig’}, ‘RP’);
This network has an input layer, a hidden layer consisting of 25 neurons, and an output layer consisting of 10 neurons. mm is a matrix of size number_of_inputs x 2. Each row contains the minimum and maximum value that a particular input node can have. See appendix A for more MATLAB code that can be used to create and analyze other neural networks.
3.2 Training
Neural networks are useful for OCR because they can often generalize and correctly classify inputs they have not previously seen. In order reach a solid level of generalization, large amounts of data must be used during the training process. We used data from the National Institute of Standards and Technology’s (NIST) Special Database 19: Handprinted Forms and Characters Database.
NIST Special Database 19 (SD19) is a database that contains Handwriting Sample Forms (HSF) from 3699 different writers (figure 5). The HSF’s each had thirty-four different fields used to gather samples of letters and numbers. Some fields were randomly generated for each HSF to obtain a larger variety of samples. Twenty-eight of the thirty-four fields were digit fields. SD19 contains scanned versions of each HSF (11.8 dots per millimeter) as well as segmented versions of the HSF’s, allowing for easy access to specific samples.
Digit samples were obtained from SD19 for use in training and testing the neural networks. Once obtained, the samples were normalized so that each sample was upright and of the same thickness. Some of these samples were used to create a training set and others were used to create a validation set. A training set is used to update network weights and biases, while a validation set is used to help prevent overfitting. After training, each network went through a testing procedure to gather data for evaluation of its usefulness in WinBank.
Figure 5: Handwriting Sample Form from SD19
3.3 Testing
Two different sets of data were obtained in order to test each network. The first set of data consisted of 10000 samples from SD19 (1000 samples per digit). These samples were presented to each network using the sim function of MATLAB. Network specific procedures were then used to compare the output of each neural network against the desired outputs. The second set of data used to test each network was a set of multiples.
A multiple occurs when image segmentation fails to recognize two adjacent numbers as individual numbers and presents the recognition module with one image of two numbers (figure 6). Because a multiple is not a number, a multiple should be sent back to the preprocessing module for resegmentation. In order to test the different neural networks on multiples, multiples from several checks were used to create a testing set of multiples.
Figure 6: Example of a multiple (double zero)
3.4 Evaluation
Running a network simulation in MATLAB produces a matrix of outputs. This matrix of actual network outputs can be compared to a target matrix of desired network outputs to evaluate the performance of each network. Here, the main goal of WinBank should be divided into its two components: the accuracy of a network, and its ability to reduce processing overhead. Several parameters were obtained from each network test to evaluate the performance of each network according to these goals. The percentage of correct outputs (GOOD), the percentage of incorrect outputs (WRONG), and the percentage of rejected outputs (REJECT) were obtained from the SD19 test set. The ideal network maximizes GOOD while minimizing REJECT and WRONG. MULTIPLES REJECTED and NUMBER are two parameters obtained from testing the networks on the testing set of multiples. MULTIPLES REJECTED is the percentage of multiples rejected by the network, and should be maximized. NUMBER is the percentage of multiples classified as numbers, and should be minimized. Another useful values for network evaluation is the amount of time spent training it.
Important data for each neural network trained and tested was maintained in a MATLAB struct array named netData. Each netData struct array has fields for the each important value, such as the training time (obtained using MATLAB’s tic and toc functions) and hidden layer size of the network. This struct array allowed for easy storage and access to important information.
4. Network Parameters
The following parameters were varied during the creation and training of the neural networks:
- hidden layer size
- 25
- 50
- 85
- network type
- feed-forward
- learning vector quantization
- Elman
- transfer function used at network layers
- logarithmic-sigmoidal
- tangential-sigmoidal
- hard limit
- linear
- competitive
- performance function
- least mean of squared errors
- least sum of squared errors
- training algorithm
- batch gradient descent with momentum
- resilient backpropagation
- BFGS
- Levenberg-Marquardt
- random
4.1 Neural Network Architecture
4.1.1 Hidden Layer Size
Each neural network tested for use in WinBank had the same base structure. The input layer consisted of 117 nodes that receive input from the preprocessing module. These nodes correspond to the 13 x 9 pixels of the normalized binary image produced by the preprocessing module. The output layer consisted 10 nodes, the output of which is ideally high at the output node corresponding to the appropriate digit, and low at every other output node. The hidden layer structure, however, is architecture dependent. The number of hidden layers is not an important factor in the performance of a network because it has been rigorously proven one hidden layer can match the performance achieved with any number of hidden layers [2]. Because of this, all of the neural networks tested were implemented using only one hidden layer. The size of the hidden layer, however, is an important factor. Three values were tested for the number of nodes in the hidden layer of each neural network architecture: 25, 50, and 80. These values were obtained based on previous experience, and provide a diverse group of values without creating excessive computation.
Figure 7: Basic Neural Network Architecture
4.1.2 Network Type
There are a variety of network types that can be used when creating neural networks. The network type can determine various network parameters, such as the type of neurons that are present in each layer and the method by which network layers are interconnected. Past experience indicates that feed-forward networks work very well for OCR. Because of this, much more time was spent analyzing feed-forward network networks than any other networks. The following types were evaluated for use in WinBank:
- Feed-forward neural networks (also known as multi-layer perceptrons) are made up of two or more layers of neurons. The output of each layer is simply fed into the next layer, hence the name feed-forward networks. Each layer can have a different transfer function and size.
- Learning Vector Quantization (LVQ) networks consist of an input layer, a hidden competitive layer, and an output linear layer. Competitive layers output zero for all neurons except for the neuron that is associated with the most positive element of the net input, which outputs one. The linear layer transforms the competitive layer’s output into target classifications defined by the user [1].
- Elman networks are a type of recurrent network that consists of two feed-forward layers and have feedback from the first layer’s output to the first layer’s input. The neurons of the hidden layer have a tangential-sigmoidal transfer function, and the neurons of the output layer have a linear transfer function [1].
4.1.3 Transfer Functions
Each neuron uses a transfer function in order to determine its output based on its input. The following five transfer functions have been tested for use in WinBank:
- The logarithmic-sigmoidal transfer function takes an input valued between negative infinity and positive infinity and outputs a value between zero and positive one.
- The tangential-sigmoidal transfer function takes an input valued between negative infinity and positive infinity and outputs a value between negative one and positive one.
- The hard limit transfer function outputs zero if the net input of a neuron is less than zero, and outputs one if the net input of a neuron is greater than or equal to zero.
- The linear transfer function produces a linear mapping of input to output.
- The competitive transfer function is used in competitive learning and accepts a net input vector for a layer and returns neuron outputs of zero for all neurons except for the winner, the neuron associated with the most positive element of the net input [1].
4.2 Neural Network Training
Two important training parameters that effect neural network performance are the performance function and the training algorithm.