Meeting Scheduling by Agents Using Local Search Algorithms Application Requirements Document

Pattern Recognition Using Artificial Neural Networks

Project Team

Eyal Ittah (60407301)

Ittai Doron (53084489)

Introduction

One of the uses of computational vision is the recognition of shapes, patterns and objects in an input image. Pattern recognition aims to classify data based on information extracted from the data. In our case, we chose to classify images of playing cards by their suit (spades, clubs, hearts and diamonds) and value ( 2-10, J, Q, K and A ). While this problem is simple enough for a human brain, recognizing the shapes and numbers in the image is a difficult operation for the computer.

Our application will use relaxation labeling as a process of extracting the relevant data needed for pattern recognition from the cards images. Then, it will use the processed image as an input for anArtificial Neural Networkaimed to find the suit and value of the card. Once the card information is extracted, our application will present the user with the result.

Pattern Recognition

A complete pattern recognition system consists of:

A sensor -
In our case we bypassed the sensor stage and supplied the application with images of the cards. The same system can be used with a camera continually taking photographs, saving them to the computer and having the application analyze them.
A feature extraction mechanism -
The image was pre-processed using a relaxation-labeling algorithm which received the color image and labeled it with two labels – object and background. By reducing the incoming data from 3 x 255 bit variables (R,G,B) for each pixel to 1 bit (Boolean) for each pixel, we reduced the noise of irrelevant information and made the Artificial Neural Network smaller (due to fewer input values) and more efficient.

Classification scheme -
In order to classify the data derived from the relaxation-labeling algorithm as a number and suit we used two Artificial Neural Networks
Recognizing the value -
This Artificial Neural Network received an input of 20x40 pixels (800 input neurons) and returned the value true in one of 13 output neurons (representing the 13 classifications of the card's value).
Recognizing the suit -
This Artificial Neural Network received an input of 20x20 pixels (400 input neurons) and returned the value true in one of 4 output neurons (representing the 4 classifications of the card's suit).

Relaxation Labeling

The relaxation labeling process used the following properties of the card images:

The objects used for labeling were the card image pixels.
The labels used were – object or background. An object label signified a card suit or value.
The world our application lives in is such that background tends to be white, while objects are either black or red. This information was taken into consideration while determine the initial confidence for each label.
The initial confidence function used was the amount of white in the pixel's RGB color representation. Hence, a higher value in either the red, green or blue colors signifies a higher degree of white. During the calculation we summed up the degree of RGB, and then divide the result with 255 * 3. A lower value of RGB had given us a value closer to zero, which means a higher probability to be labeled object.

Artificial Neural Networks

An Artificial Neural Network (ANN) is a computational model based on the way neurons are connected in the brain. Each individual neuron is a simple calculation unit which is connected to numerous other neurons. The network itself is a DAG (Directed acyclic graph). The neurons are arranged in layers:

Input layer
Each neuron in this layer represents a single input variable.
In our project, each input neuron represents a single boolean value belonging to a pixel in the input image.
Output layer
Each neuron in this layer represents a single output variable.
In our project, each input neuron represents a single boolean value belonging to a specific class value. For example, when classifying playing cards by their suit (spades, clubs, hearts and diamonds), 4 output neurons are needed where each one represents the input being classified as a specific suit.
Hidden layers
An ANN without hidden layers is only able to learn to identify linearly separable problems (problems where the results can be separated as being classified to a single class using a linear function). Since our problem is more complex, we needed to add hidden layers between the input and output layers.
We used a single hidden layer in each of the ANNs.

Each neuron is connected by an edge to neurons in the next layer. Each edge has a weight which is chosen randomly in the beginning and then corrected throughout the learning process. These weights are the knowledge gained during the learning process and they allow the network to classify future inputs.

Each artificial neuron is a basic computing unit capable of simple calculations – it sums the incoming values and sets the outgoing value based on a threshold value or function.

In order to evaluate and classify an input, the input values are set to the input neurons. The values are then propagated through the network – each neuron's new value is the sum of incoming values, each multiplied by its weight. The output values can then be retrieved from the output neurons.

The learning process

In order to achieve a neural network that is capable of classifying input, it needs to improve its initial edge weights. We do this through supervised learning. In this process we present the network with input from a training set. In each iteration of the learning process (or epoch), we iterate through all the values in the training set in a random order. With each input we let the values propagate through the network and retrieve an output. If the output is wrong, we adjust the weights of the network using back-propagation.

In back-propagation we basically set the error values of the output neurons based on the difference between the desired and actual outputs. We then propagate the error values back through the network, updating the weights of the edges. A learning rate  is used to decide the rate in which the weights change.

A smaller learning rate results in more subtle changes to the weights and more exploitation of good results. A larger learning rate, on the other hand, results in more drastic changesto the weights and more exploration.

Results

After the training of the two ANNs has been completed, a card image can be loaded and the card's image and classification will be shown on the screen.

The following screenshot is an example of a classification of a card without noise :

The following screenshot is an example of a classification of a card with added noise (Added noise with Gaussian distribution using Photoshop):

Conclusions

Using two ANNs instead of one
In the first attempts we used a single ANN that received an image part that included both the value and suit of the card. This was a much larger image and thus resulted in many more input neurons. This network didn't produce good results. In fact, during the training phase the number of errors was usually around 50%.
Decreasing the learning rate of the ANN
During the back-propagation stage of training the ANN, the new weight of an edge is updated based on the error and a learning rate (). We found that while rates of =0.1 and 0.05 produced sporadic results and the network couldn't finish it's learning process with zero errors, lowering the lowering the learning rate to =0.005 produced consistently better results and managed to finish the learning process.
Using an output neuron for each class
There are two ways of using the output neurons :

Each output neuron is a boolean value indicating that the evaluated input belongs to this class.
For example, we will have output neurons O1..O4

Suit / O1 / O2 / O3 / O4
spades / + / - / - / -
clubs / - / + / - / -
hearts / - / - / + / -
diamonds / - / - / - / +

Treat the output neurons as bits and encode the answer using these bits. For example, for the different suits of cards, we can specify that
spades=0, clubs=1, hearts=2 and diamonds=3.
Then we can encode the output as follows :

Suit / O1 / O2
spades / - / -
clubs / - / +
hearts / + / -
diamonds / + / +

We have found that we get better results using the first method for the output neurons.

Resources

Moshe Sipper, Evolutionary Computation and Artificial Life (course), Semester A, 2007/8
Tettamanzi & M. Tomassini, Soft Computing: Integrating Evolutionary, Neural, and Fuzzy Systems, Springer-Verlag, Heidelberg, 2001

1 | Page