Comparing Images Using the Hausdorff Distance

Mark Bouts

6 April 2006

Introduction

In pattern recognition and computer vision the central problem is determining the extent to which a shape resembles another shape. In this topic we will consider using the Hausdorff distance in determining the extent to which a point of an ‘image’resembles a point of a ‘model’. In this topic efficient algorithms are presented on computing the Hausdorff distance between all possible relative positions of a model and an image.

When putting the emphasis on the case in which the model is only allowed to translate with respect to the image, two methods will be discussed. First measuring the distance between an image and a model and vice versa will be discussed. Secondwill the case of matching portions of the image and model be discussed. In addition a small glimpse will be given in the extension of the set of transformations with rigid motion. In the end some examples will be given of how to improve these algorithms and with which result.

The Hausdorff distance

For a set A = {a1,….,ap} and B = {b1,….,bp} we can define the Hausdorff distance as:

with the directed Hausdorff distance defined as: . This means that in effect ranks each point of A based on its distance to the nearest point of B, and then uses the largest ranked such point as the distance. In the end the H(A,B) chooses the maximum of both and . The Hausdorff distance can be computed in O(pq) time, but as we have seen in one of the lectures this can be improved to O((p+q)log(p+q)).

The Hausdorff distance measures in this context of shape comparison, the amount of mismatch between two sets of points. Normally the HD is defined for two fixed sets at fixed position to one another, but within the paper of [Huttenlocher et al.] the emphasis lies on the mismatch between all possible relative positions of two sets. Therefore the minimum Hausdorff distance is defined as for all transformations(G) it holds that:

in which all metric properties of identity, positivity, symmetry and triangle inequality are preserved. In this paper the focus is on the case where the relative position of the model with respect to the image is the group of translations. The minimal value of the Hausdorff distance under translation is then defined as:

where H is the Hausdorff distance and is the standard Minkowski sum notation (i.e., ).

How is the (minimal) Hausdorff distance computed?From the above definition of the Hausdorff distance we can derive the next formula: in which the and . In other words the H(A,B) can be computed by computing d(a) and d’(b) for all and . In which d(a) gives the distance from any point x to the nearest point in a set of source points i.e. it is the distance transform. This graph of d(x), ,creates a surface that is called the Voronoi surface of B, in which cone shapes define the distance from a certain point x to its nearest neighbor in the set B. The nearest distance can then be derived by shooting a ray upwards. The distance d that the ray then travels before it hits the lower envelope of the Voronoi surface is then the distance d to the nearest neighboring pixel,this holds of course similarly for d’(x).

Where the different cone shapes intersect, is the local maxima of two points in B. This means that these local maxima are equidistant from two or more local minima (as has also been defined in Voronoi diagram), hence the name Voronoi surface.

In the case of the Hausdorff Distance as a function of translation the formula can be rewritten as . Now wedefine, i.e. the upper envelope (pointwise maximum) of p copies of the function d(-t), which have been translated to each other by each . This gives for each translation t the distance of the point of A that is farthest from any point of . As a result the directed Hausdorff distance has been obtained. By taking the maximum of both functions and , the HD () has been calculated.

Huttenlocher [HT] came up with some efficient algorithms for computing f(t). This resulted in a run-time of O(pq(p+q)log(pq)) time, where ||.|| is either L1,L2 or L∞and in which p and q are points in the plane of A and B. This run-time can even be more improved when using just L1 and L∞ to O(pq log pq).

Comparing portions of shapes

In machine vision and pattern recognition applications it’s important to be able to identify instances of a model that are only partly visible. Therefore we extend the definition of the Hausdorff distance to allow for the comparison of portions of two shapes. This is made available by defining partial distances based on ranking. In this each point of B is ranked by the distance to the nearest point of A and the largest ranked point determines the distance. So a natural way of defining the directed Hausdorff distance for this case is: where and denotes the K-th ranked value in the set of distances. In order to compute the partial directed ‘distance’ , some fraction K is defined in which K=, where . This gives us the Kth ranked value, which can be computed in O(q) time. In the end the bi-directional definition of the HD is now defined as: . From this definition it becomes clear that the triangle inequality () does not always hold.When two portions of an image have a small distance to the whole image, it doesn’t mean that both the portions are quite similar.This means that the metric properties aren’t preserved anymore. When considering subsets of A and B of size L and K respectively these metric properties can be preserved again.

The minimum Hausdorff Distance for Grid Points

So far the exact case, the focusnow shifts to a specialized case in which the point sets lie in a integer grid. This is appropriate because for many computer vision and pattern recognition applications, the data is derived from a raster device. In this case the points from which these surfaces are derived are represented by binary arrays, where nonzero elements of the array correspond to the elements of the sets. So for a set A = {a1,…,ap}, are the characteristics defined by using a binary array A[k,l], where k,l is nonzero exactly when and similarly for set B. Now we can compute a rasterized approximation of the respective Voronoi surface d’(x) and d(x). These distance arrays, or distance transforms, specify for each pixel location (x,y) the distance to the nearest nonzero pixel of A or B. Thus, D’[x,y] is zero wherever A[k,l] is one. This all results in a rasterized formulation of the HD:

.

Now the HD computation can be thought of as probing locations on the Voronoi surface of the image corresponding to the transformed model points. The probe values are then maximized in order to compute the distance for that transformation

Not always will the value of the exact case be equal to the rasterized case, but since the values (cost) for each translation are nearly the same and lie close to the minimal value, this won’t be a big problem.

How are the Voronoi surfaces and in the end the F[x,y] computed? Computing the distance transform or the Voronoi surface array D[x,y], can be done by using various methods.Among them are the uses of a local distance transform algorithm, like the two-pass serial algorithm or by taking advantage of the available hardware. Computing D[x,y] and D’[x,y] is just computing the pointwise minimum, or lower envelope of the Voronoi surface. By using rendering hardware as z-buffers this process can be sped-up considerably.

The process of computing the HD array can be seen as the maximization ofe.g. D’[x,y] shifted by each location where B[k,l] takes on a nonzero value. I.e.

.This maximization can be quickly either using graphics hardware as z-buffers or using standard array operations by slightly adjusting the calculation to: . In this way the locations in the Voronoi surface of the image are probed and then FB[x,y] is the maximum of these probe values for each position (x,y) of the model B[k,l].

When considering the partial distance between the image and the model we must consider that only a small part of the image will have a small distance to the model. When considering the partial distance from the image to the model this will not be ideal, since the distance will depend on how many objects, which the model represents, are in the image. A natural way to cope with this is to consider only those points that are near the current hypothesized position of the model, for other points will probably be part of another object. In practice this will result in only considering those points in the image that lie ‘underneath’ the current position of the model. This case of matching a portion of the image to the model can be redefined as: . This definition doesn’t mean we can’t use the rank-based distance measure, by adjusting the value L to the number of pixels ‘underneath’ the model (so adjusting r in where r is the number of pixels ‘underneath’ the model), and use this L instead of the max in the definition.

The Hausdorff distance under rigid motion

Up till now only the case of translation has been investigated. The case can be extended by adding rotation to the set of transformations, i.e. rigid motion. This limits the usable Lp distances to the L2 distance only. As has been the case before we allow the set B to move and the set A is fixed. The minimal value of the Hausdorff distance under rigid motion. ME(A,B), is now defined as: . In which the HD is the same as before, and is the standard rotation matrix. The distance is small exactly when there is a Euclidean transformation that brings every point of B near some point of A and vice versa. The minimal HD under rigid motion can be computed using a array which stores for each value of θ the minimal possible value of the HD. For each rotation θ the distance transform is probed and maximized with the corresponding values in the array. Once all points in B have been considered the array gives the directed HD for each rotation. By performing this computation for each translation t the minimal HD for rigid motion can be computed.

Efficient computation of F[x,y]

The article presents us with some pruning techniques in order to speed up the computation time. The amount of speed up depends on the images used, the speedup is most effective when the image is sparse and the model has a lot of points.

The pruning techniques are based on the fact that a lot of applications calculate F[x,y] and then only consider the values in which . Techniques as ruling out circles, early scan termination, skipping forward and a combination of these lead to a time gain, this because now not all values of F[x,y] are considered but only the ones that are beneath a certain threshold.

Examples

As a matter of completeness has the article been provided with some examples in which is shown that the proposed speed measures have considerable gain in comparison with the naïve computation. When comparing this technique with binary correlation, in which a local peak was considered a ‘match’, the Voronoi technique proves itself to outweigh the results of binary correlation.

References

[HK]Huttenlocher Daniel P, Kedem K and M. Sharir
The Upper Envelope of Voronoi Surfaces and Its Applications, ACM symposium on Computational Geometry, 194-292, 1991

1