IMAGE PROCESSING
Construction of holographic images using cross sample modified laplacian method
Abstract—This paper presents a shape-from-focus method,which is improved with regard to the mathematical operatorused for contrast measurement, the selection of the neighborhoodsize, surface refinement through interpolation, and surface postprocessing.Three-dimensional models of living human faces arepresented with such a high resolution that single hairs are visible.
Key words—Autofocusing, depth estimation, depth-from focus,facial measurement, focus analysis, focusing, focus measure,holography, shape estimation, shape-from-focus, 3-D reconstruction.
I. INTRODUCTION
THE acquisition of 3-D information out of 2-D images isone of the most important issues in computer vision. Techniquesexecuting this task are often referred to as shape-from-x,where x is a visual cue such as shading, texture, contour, focus,stereo, and motion [1]. Blurring phenomena due to defocusing
are among the important cues for depth recovery. Shape-from focusis a method where a sequence of images is taken whilechanging the focus setting in small steps. One image of sucha sequence, further on referred to as focus series, contains afocused region of the imaged object as well as an unfocusedbackground. By determining the focus setting, that locally optimizesimage focus, 3-D information about the imaged object isretrieved [2].If the object is moving and cannot be expected to remain in acertain position during the capturing time, an alternative methodfor 3-D measurement is holography, where part of the 3-D informationabout the recorded object is stored by superimposing theback-scattered light coherently with a reference wave. Throughthis, the amplitude and the phase information of the scatteredlight field are stored as an interference pattern, for example inan analogue photosensitive material.This kind of acquisition of 3-D data has the tremendousadvantage of storing the high-resolved information in an extremelyshort amount of time, namely in the duration of therecording laser pulse (35 ns for the setup described here).However, the only way to get access to the stored informationis reconstructing the hologram.
Fig. 1. Optical reconstruction of the hologram and digitalization of a set of
2-D projections at different axial positions.
This is done optically, if thehologram was stored on an analogue recording material. Itis realized by illuminating the hologram with the complexconjugate reference beam, which produces the so-called realimage, a 3-D light field with one-to-one correspondence to therecorded object. It has to be digitized into a set of 2-D projections,because of the 2-D nature of digital recording devices.The result of the holographic reconstruction is a set of images containing focused and unfocused regions similar to the focusseries gained through subsequent recording. This is why thesame methods designed for shape-from-focus in microscopy are applicable for holographic reconstructions [3]–[7].
This paper discusses improvements of shape-from-focusmethods, applicable to all kind of focus series. The application,that provides the exemplary material for the discussion, is the analogue holographic recording of living human faces. For that kind of application, the full advantage of the extremelyshort recording time arises since motion artifacts caused bychanges of the face due to mimic, breathing, or heartbeat can be avoided. The completely focused image, extracted from thefocus series incombination with the 3-D information, serves asprecisely fitting texture.Such highly resolved textured3-D models of living human faces are of. great interest in manykinds of applications, among others in medicine and forensicscience [8]–[14].
Fig. 1(a) shows exemplarily one image
from the focus series and (b) screenshot of the resulting, textured3-D computer model of the face.
.
SHAPE-FROM-FOCUS
The considered focus series may consist of gray-scaleimages, each characterized by their intensity distributionin the plane with the coordinate . The distinctionbetween focused and unfocused image parts is realized witha so-called focus criterion, a mathematical operator whichmeasures the local image contrast of each pixelin each image with the axial position with the aid of alocal pixel neighborhood. These values are further on referredto as focus values. A maximization of the focus values along
the direction localizes the focused regions and producesinformation about the 3-D geometry of the imaged object (seeFig. 2). This process is further on called surface extraction.In addition to the topometrical data, the procedure can be used to generate an image with unlimited depth of field by compoundingthe detected sharp patches from each image into oneimage described by . This so-called textureimage can be used for texturing the resulting 3-D computer
models. The pixel-to-pixel correspondence between texture andgeometry, which is inherent to the described process, makes anyfurther registration between the model and the texture expendable[see Fig. 1(b)].
Pioneering work in microscopic shape-from-focus was doneby Nayar [2]. It is used for a wide range of applications like the measurement of firearm bullets and cartridge cases and for machinetools for inspection purposes [15], for material structure analysis in mineralogical research [16] or for micro structures such as a micro-cogwheel[17].
The optimization of the depth-form-focus procedure depends strongly on thenatural frequency content of the texture of theobject to be recorded. For objects with low intrinsic contrast,active illumination can be used [18]. In the case of human faces,the intrinsic contrast of the skin is sufficient for shape-from focus.
OPTIMIZED FOCUS CRITERIA
The quintessence of the depth-from-focus procedure is thedetection of focused image parts, the same task that is performedby auto focus algorithmsused in many different applications.The performance of the different auto focus algorithms dependsstrongly on the properties of the object to be focused on, as wellas those of the imaging systems.Alot of research was performed
for finding the suitable algorithm for the different applications[19]–[26].The core of an autofocusing algorithm is a mathematical operator,a focus criterion, working on the pixel representation of
the digital image and assigning a quantitative value to a wholeimage or a specified part. In shape-from-focus, a sharpness valueis assigned to every pixel. It is calculated using a local pixel
neighborhood.To find a suitable focus criterion for holographic facial measurement,a quantitative comparison of the extracted facial surfacewas performed with the help of a simulated focus serieswith different amounts of added noise. The evaluated criteria
were among others Tenengrad [31], [19], Squared Gradient,Absolute Gradient, Sum Modified Laplacian (SML) [2], Variance,and Cross Sum Modified Laplacian (XSML). The latter is a newly developed modification of Nayars SML operator, presentedfor the first time in this paper. The Laplacian in the SMLtakes only direct neighbors in `the x- and y-directions into account. Additionally, also the diagonal neighbours could be included,weighted with a factor of to compensate fortheir larger distance to the central pointwhere is a square pixel neighborhood of theparameter can be adjusted to the characteristic size of texture elements in the recorded scene.
A detailed discussion on the criteria and their performancewould go beyond the scope of the paper. An in-depth study canbe found in [7]. It demonstrates that the newly invented XSMLcriteria produces the lowest deviation between the extracted anda reference surface for any amount of added noise. The XSMLoperator shows a slightly higher robustness against noise thanthe SML-criterion due to the increased number of points usedfor the calculation.
ADAPTIVE SELECTION OF NEIGHBORHOOD SIZE
The choice of the neighborhood size is a critical step in theprocedure of surfaceextraction. The lateral resolution of the ex-tracted surface is limited by the size of the neighborhood, since structures smaller than this size cannot be resolved. Thus, theneighborhood size should be chosen as small as possible.
Fig. 2. For each fixed lateral point (x,y), the focus values F(x,y,z)are maximized,
which leads to 3-D geometry information about the object.
On the other hand, small neighborhoods are very sensitive to noise.The lower the signal-to-noise ratio, the larger the neighborhoodshould be. Even in the absence of noise, the minimal neighborhoodsize is limited to a certain extent by the feature size of thetexture of the recorded object. With a neighborhood smaller thanany feature it is impossible to distinguish between focused or defocused image regions. When choosing a neighborhood size,one always has to compromise between lateral resolution androbustness against noise.. In regions of high intrinsiccontrast, like the mustache of the proband, a high-quality surfacecan be extracted using a neighborhood size of 3
Fig. 3. Exemplary height maps extracted from a holographic data set with a
neighborhood size of 3 (a) and 17 (b)
pixelspixels witha lateral resolution so high that even single hairs can be distinguished.In regions of lower intrinsic contrast, however, thesurface extracted with a neighborhood size of 3 is affected by noise. Using a neighborhood size of 17 pixels on the other handallows for a noise resistant surface extraction in regions of lower
intrinsic contrast, while the lateral resolution is reduced.To overcome this tradeoff, an adaptive algorithm was developedchoosing locally the smallest possible neighborhood sizeso that the surface is not corrupted by noise. It is based on thedefinition of a confidencevalue for each found surfacepoint and several neighborhood sizes . The surface extraction is done by comparing sharpness values calculatedwith aneighborhood size d along the direction Z and finding the
Fig. 4. Histograms of confidence values for a typical facial data set calculated
with a neighborhood size of 3 (a), 5 (b), and 11 pixels (c). An increasing neighborhoodsize leads to the appearance of a second peak belonging to reliablesurface points.
maximum. The higher the maximum in comparison to the meansharpness value, the more reliable is the generated surface point.is defined as follows:
(1)
with
(2)
The normalization leads to confidence values between zero andone, a higher value indicates a more reliable surface point. Theconfidence values of one surface point differ for different neighborhoodsizes. The concept of the adaptive algorithm is to startwith the smallest neighborhood size and accept extracted surfacepoints only if they are reliable. If not, a larger neighborhoodis considered. Reliability can be defined with the help ofthe confidence values by selecting a threshold . The thresholds are determined by regarding the histogram
of the confidence values created with different neighborhoodsizes for a typical data set of a recorded face.
Fig. 4 showssuch histograms for a neighborhood size of 3 pixels (a), 5 pixels(b), and 11 pixels (c). For a small neighborhood size, very fewreliable surface points are present. The histogram shows mainlya normal distribution of confidence values of unreliable points
with a slight asymmetry caused by the reliable points. Increasingthe size of the neighborhood, the number of reliable points increases,so that a second peak appears with its center shifting to higher confidence values. Unreliable points can be excluded ifthe threshold is chosen such that the first peak is eliminated.In this manner, the neighborhood size depending thresholdswere determined for 25 different data sets manually.4
Fig. 5shows the mean thresholds, as well as an exponential fit. Theresultingcurve is further on used for automated threshold selection and the distinction between reliable and unreliable surfacepoints without manually analyzing the histograms.Limiting the maximal neighborhood size to a certain value,
there are still points where no reliable surface value could befound. In order to fill these gaps, the surface extracted with
the maximal neighborhood can be smoothed by weighted averaging.
The confidence values act thereby as weights. Thesevalues are used to fill the gaps if desired. The result can be seenin
Fig. 6. It shows a surface not corrupted by noise with the maximalachievable lateral resolution.4An automated selection is also possible, but is not necessary, as is explainedin the following.
. Fig. 5. Threshold ?against neighborhood size with exponential fit. Thethresholds were averaged over 25 values, the error bars represent the standard deviation.
Fig. 6. Height map created with the presented algorithm corresponding to the
height maps created with a fixed neighborhood size shown in Fig. 3.
Fig. 7. Section of the texture extracted from a holographically recorded face
digitized with a resolution of 40 _m (a). A screenshot of the 3-D computermodel of the marked section is shown in (b).
Another example can be seen in Fig. 7. It shows a section of the texture image extracted form a holographic recording,digitized with a resolution of 40 m. The adaptive approach presented in this paper permitted the creation of a closed surfacewith a local resolution so high that single hairs are visible in the3-D model.
SURFACE INTERPOLATION
The set of 2-D images, from which the surface information is extracted, represents the holographical image at discrete position.
Fig. 8. Exemplary focus profile with fitted Gaussian curve. The dashed line
represents the threshold for points to be considered.
tions. Thus, focus information is also available only at discretepositions and with it surface information. Hence, the choice ofthe interslice distance limits the maximal achievable axialresolution. An interpolation of the discrete focus values wouldprovide continuous information and consequently a potentiallyhigher resolution. Nayar [2] proposed an interpolation method
based on a Gaussian distribution(3)
By choosing the maximal focus value and its two neighbors,the unknown parameters and can be calculated just by
solving a system of algebraic equations. With this, continuousvalues are produced that are not restricted anymore to thediscreteslicepositions.An alternative approach is to fit the Gaussian model mentionedin (3) to a particular chosen number of points. For theselection of points used for the fit it was found best to first determine the height of the maximum above the
average focus valuesand consider only points in the upper half of that interval (dashedline in Fig. 8). Additionally, the points have to lie in an interval of 20 tomograms before and behind the maximal value, to eliminatenoisy datapointscoincidentallylyingabovethethresholdbut far away from themaximum. The fitting algorithm was implemented using a downhill simplex method in multidimensions[27, p. 408] proposed by Nelder and Mead [28]. Fig. 8 shows thewhole profile with the fitted curve. The center of the Gaussianserves as new coordinate.The interpolation method is based upon three focus values exactlymet by the curve, while the fit method takes up to twentypoints into account and finds thecurve by minimizing the overalldeviation. That makes the fit more accurate and morestable inthe presence of noise. However, its computational effort is muchhigher since an iterative optimization step is involved.In holographic facial measurement, where computational time is not asimportant as in real time applications, the Gaussian fit proved
to be a useful tool for surface refinement. A quantitative comparison of the interpolation methods mentioned above and anadditional one can be found in [7].Additionally to the increased robustness against noise, the
fit also eliminates step like structures in the extracted surface,which especially appear if using large neighborhood sizes. Anexample can be seen in the height profile in Fig. 9. The stepsin the height map are much larger than the interslice distance.These steps are lessened considerably through the Gaussian fit,as can be seen in Fig. 9. The height profile gained with the fit isshifted for better comparability.
SURFACE POSTPROCESSING
The previous sections presented new methods invented forthe improvement of the classical depth-from-focus method.However, despite sophisticated surface extraction methods,little noisy surface fluctuation cannot be avoided entirely.Often denoising is done with low-pass filtering, which reduces
noise, but also blurs sharp features and details. We foundthat a smoothing method proposed by Tasdizen [29] based
on anisotropic diffusion of normals worked perfectly for thepostprocessing of facial models.Anisotropic diffusion of normals is the natural expansion ofthe anisotropic diffusion method for 2-D images introducedby Perona and Malik [30] to curved surfaces in the 3-D space.
Smoothing in regions of high total curvature is suppressedthrough the introduction of a nonconstant diffusion coefficient depending on the total curvature.
Fig. 10 shows a facial model gained with the presented adaptivealgorithm and Gaussian fitting without any postprocessing(a), with isotropic smoothing (b) and with anisotropic diffusionof normals.5 While small features are almost obliteratedin (b) they are nearly as pronounced in (c) as in the raw data.The generation of continuoussurface data through the Gaussianfit or Gaussian interpolation is obligatory for the appliance ofthe anisotropic diffusion of normals. For discrete surfaces, asthey originate from classic shape-from-focus, the anisotropicsmoothing does not behave stable.With all improvements presented in this paper applied, the
computation time on a standard PC for facial models as shownin this paper increases from 1 min for the standard procedure toapproximately 30 min, whereas the Gaussian fitting requires the
most of the additional time
Conclusions
This paper demonstrated that the classical shape-from-focusmethod is perfectly suited for the extraction of 3-D geometryand texture information from holographic reconstructions, especially
for the creation of highly resolved facial models.We evaluated several mathematical operators for contrast
measurement and found that the best criterion for this applicationis the newly developed XSML criterion, a modification ofSML operator proposed by Nayar [2].Additionally, we presented an algorithm for the automatedadaption of theneighborhood size to the conditions ofthe data.
The neighborhood size is chosen as small as possible while preventing
corruption through noise. With this, an optimal lateralresolution can be achieved. It was demonstrated that in facialmeasurement single hairs can be visualized through this procedure. It was also shown that a Gaussian fit to the focus profile improvesthe quality of the surface tremendously. Artifacts are