Computer Vision Workshop - Project Report

Real-Time Traffic Camera Analysis

Group members: Leron Fliess, Yifat Chernihov, Stav Ashuri

Workshop Instructor: Dr. Lior Wolf

Workshop TA: mr. Assaf Zaritsky

Introduction

Our goal is to create an efficient traffic analysis application that given a live feed from a static traffic camera, will analyze the traffic congestion and output relevant statistics.

For this workshop, we plan on using a live feed from one of the “Ayalon” cameras, available online.

our working environment is Visual Studio + cygwin on Windows 7 and some extra platforms we will indicate during this report.

The main stages of our work:

First step: Data collection

In this step we collected the following data:

v  Positive Images

These are images that contain only objects of interest, in our case, cars.

We used a live feed from “Ayalon” cameras, the camera that we worked with

is the one that films Herzliya Interchange. In order to capture fames which contain cars we run the video stream using VLC player and captured frames using the "take snapshot" option it supplies. In order to crop Images manually from the frames we created, we used "ImageClipper" which is a multi platform we downloaded from http://code.google.com/p/imageclipper/ that enabled us to mark a car object and save it as a PNG format (OpenCV does not support GIF).

Eventually we created 834 positives.

v  Negative Images

These are background images that do not contain objects of interest in order to train haarcascade classifier.

We used 1655 negative (the number of negative samples should be as twice as the positive). Our negative samples contain 50 frames of empty road (that doesn’t contain any car) and 1605 photos of arbitrary backgrounds' we collected from the database that can be found here:

http://tutorial-haartraining.googlecode.com/svn/trunk/data/negatives/

v  Ratio between positive and negative samples:

We used twice as many negative samples than positive samples, as program will have to cope with blocks of the image without cars much more often than blocks with cars.

v  EXCEL file

Using Photoshop we estimated the relation between the car location and the number of its height pixels in order to determine future scaling.

In order to perform the next steps we used :

·  Mencoder.exe- which is a free command line video decoding that can convert any video that MPlayer can read into uncompressed format that the openCV can work with. (since it handles a IplImage )

·  Cygwin- a Unix-like environment and command-line interface for Microsoft Windows in order to run UNIX commands.

·  ImageMagick toolkit- open source software suite for displaying, converting, and editing raster image files we added it to the UNIX system DIR in order to run the "find" unix command, in order to produce description file.

·  Pre compiled directory of haar training we downloaded from:

http://tutorial-haartraining.googlecode.com/svn/trunk/HaarTraining/

·  Cmake : the cross-platform, open-source build system. It is a family of tools designed to build, test and package software. We used it in order to compile our project using the modified openCV components.

Second step: Samples Creation

In this step we created training samples from the cropped images we created at the first step.

v  Positive description file and vec file:

In order to do so we generated a description file using an appropriate UNIX command- "find" and supplied this file along with the name of the vector we would like to create to the cvCreateTestSamples function that the openCV supllies (from the pre compiled directory we downloaded), in order to get a vector file of positive samples.

The vec file starts with 12-byte header features as the number of [positive] samples,length of each sample in pixels ,and the samples themselves go one by one, 2 bytes per pixel,with 1 zero byte before each sample.

v  Negative description file:

A list of all the negative samples names. (which is located at the same DIR with the samples).

Third step: Haar Training

In this step we generated a xml file as a result of the haartraining process.

We run the haartraining command on the DIR which contains the negative samples we collected at the first step, the negative description file and the positive vec file that were generated at the second step . This process was influenced by the following features:

Feature / meaning
data / directory name in which the trained classifier is stored
vec / vector file name (which was created on the second step)
bg / the negative description file
npos / number_of_positive_samples of each classifier stage. We used 834.
nneg / number_of_negative_samples of each classifier stage. We used 1655.
nstages / number_of_stages to be trained. We used 20 as recommended on the net
nsplits / way to set the complexity of this simple classifier which consists of nsplits features. But the training process will always add up simple classifiers
to the classifier stage until the quality requirements for the classifier
stage - max false alarm and min. hit rate - will be achieved. We used 2 which indicates that a CART classifier with number_of_splits nodes is used.
mem / memory_in_MB Default is 200MB we used 1024
sym / we used the –nonsym flag which indicates that the object class does not have vertical symmetry.
minhitrate / minimal desired hit rate for each stage classifier. Overall hit rate may be estimated as (min_hit_rate^number_of_stages)
we used 0.9999
maxfalsealarm / maximal desired false alarm rate for each stage classifier.
Overall false alarm rate is estimated as:
(max_false_alarm_rate^number_of_stages).
We used the value of 0.5
weighttrimming / whether and how much weight trimming should be used. We used the default value of 0.95.
eqw / the difference between number of pos and neg images.
mode / selects the type of haar features set used in training. BASIC use only upright features, while ALL uses the full set of upright and 45 degree rotated feature set . we used the ALL mode.
w / sample_width in pixels , we used 19
h / sample_height in pixels , we used 21
bt / what Adaboost algorithm to use: Real AB, Gentle AB, etc . We used default
err / misclass (default) | gini | entropy.
We used default.
maxtreesplits / max number of splits in tree cascade
minpos / min number of positive samples per cluster

Fourth step: simple frame manipulation

Our algorithm captures frame by frame from the video and transforms it into a data type which OpenCV understands.

The next step was to define a Range Of Interest (ROI), from the frame, an area in which all the "action" occurs, the size of the ROI affects the processing time of the frame directly.

Mid term results

We scanned each frame's ROI with a sliding window, which grows with each scan, until it reaches a maximal size. We run the haar object detect on the window, and see if we find a car in it, using the given classifier.

Once an object is detected, it is immediately drawn on screen with a surrounding circle.

Average frame processing time using these methods was 448.336 ms per frame.

this was only to detect cars, with no tracking.

Fifth step: advanced frame manipulation

With detection working well, we decided to put more effort on accelerating the process.

The main idea was to quickly distinguish between static background and moving objects. The easiest method we thought of was comparing each frame with the previous one, we wrote some code that gets the last two frames as matrices, converts them to grayscale, calculates the difference between them and puts the result in a third matrix, which is on grayscale as well. Now we turn it into a pure binary matrix, named "changedbits", by applying a threshold to it. This produced the following result:

Since the trees are moving, we needed a way to filter out what we called "dynamic background": background which contains small movement most of the time.

After trying many different methods to replace the one we just explained, we found the best way was just to make an average of the first few "changedbits" matrices (we found an average of the first 30 frames produces a good result). After averaging the first 30 frames of the video we make a "mask" matrix, which contains '0' only in places where there was change throughout most of the first 30 frames.

Now all we have to do is multiply this mask with any "changedbits" matrix we get, and we get an improved "changedbits" matrix clean from "dynamic background".

This produced great results we felt we could work with, and with very little processing time.

Sixth step: finding blobs

The next goal was to use "changedbits" to reduce the area we scan with haar objectdetect, focusing it to only scan for cars in the areas that might contain a moving object.

We used cvblobslib, which contains functions that handle amorphic blobs.

We took the SimpleBlobDetector::findBlobs function and made some changes to it to fit our needs: find white blobs in a certain range of sizes in a frame.

This gave us good results, but with it many problems. At some point we get detection of a few small blobs inside one car.

A good result:

A bad result:

We get a pretty close detection of cars, less accurate than with haardetect, but 10 times faster.

Our next goal is to filter blobs that refer to the same car, to track the car and know when it leaves the screen, and to use haardetect to make sure the object we are tracking is actually a car , so we can count it.


Seventh step: tracking and recognition

In order to reach the goals we have mentioned at the previous step we wrote "blob" and "blob_manager" classes.

The class "blobManager" handles tracking and recognition of cars. This class holds the current location of cars and candidates which could be cars. It operates in this manner:

Blobdetect returns a list of blobs found in the current frame, this list contains center points of each blob.

For each blob we found, we run blobManager's "testAndAddCandidate", which – given a specific blob, checks if it should be added to the list of candidates.

We defined some terms which a blob has to fulfill in order to be added to the list of candidates:

First, if no other candidate exists, the blob becomes a valid candidate.

Otherwise, we check if the closest candidate from the blob is far enough to be considered a different car, otherwise, it might be an updated location of that candidate, we determined criteria for that as well.

Using this class, we overcame the unwanted result of detecting a few candidates inside one car.

Once every frame. We scan over all the candidates that have still not been recognized as cars, and use haardetect to scan the small area around them for a car. If a car was recognized inside this small frame, we mark the candidate as a car, and continue tracking it, but we do not need to use haardetect on it again. Thus, once a candidate was recognized as a car we do not waste any CPU time on recognizing it again.

To count the amount of cars that actually passed the camera, we check when the last location update of a candidate was made. If the candidate was not seen for a predefined amount of frames, and was recognized as a car at some point, we consider it as a car that had left the frame, and count it.

Performance comparison between new and old versions

Average processing time – old version vs. new version

In the second part of the workshop, our main goal was to take the car detection mechanism and speed it up significantly, up to the point where we could run it on a real-time video feed. To achieve this, our goal was to maintain processing time of no more than (1/25)fps = 0.04 seconds = 40 miliseconds per frame.

As can be seen in the following graph, the performance increased drastically.

This increase in performance was achieved due to the following reasons:

·  HaarDetect now only has to run on very small frames, and not the full ROI.

·  The HaarDetection algorithm only runs when it is needed, meaning once we have determined that an object is indeed a car, we do not send it for recognition once again.

The relatively small price to pay for these drastic improvements is the background differentiating and blob detection mechanisms. For instance, the "BackDiff" algorithm, written by us, performs very non-consuming manipulations on the video.

ROI impact on the algorithm performance

The following graph shows the relative time taken for each task in the car counting algorithm, in two different scenarios:

ý  One with a very large ROI of 480X640.

ý  One with a small ROI, containing only the road.

Comparison between percentage of time which car is tracked

old version vs. new version

Another performance enhancement is the percentage of time in which the car is tracked – the old version had to run the recognition algorithm for each frame and naturally had some misses. The new algorithm locks in on a car and tracks it from the moment it is recognized and until it leaves the screen.