Text Extraction from Document Images Using Fourier Transform Based Method
1Vikas K. Yeotikar
Department of Computer Science,
SSESA, Science College,
Congress Nagar, Nagpur, (M. H.), India
2Manish T. Wanjari
Department of Computer Science,
SSESA, Science College,
Congress Nagar, Nagpur, (M. H.), India
3Mahendra P. Dhore
Department of Electronics & Computer Science,
R.T.M. Nagpur University Campus,
Nagpur, (M. H.), India
ABSTRACT
Text extraction in document images has been an important research area. Extraction of this information in the form of text involves detection, localization, tracking, extraction, enhancement, and recognition of the text from a given document image. A large number of techniques have been proposed to address this problem and the purpose of this paper is to review and implementation of Fourier Transform based methods such as Spatial domain filter, 2D-Fast Fourier Transform, 2D Fourier Transform, Image Enhancement and Edge Reinforcement for some of the Document images is carried out in this paper.
Keywords
Fourier Transform, Document Image Analysis (DIA), DFT, FFT, Text Extraction, Text Detection, Text Localization, Text Enhancement.
1. INTRODUCTION
Achieving human functions like reading, recognizing and thinking through machines is an ancient dream. Much of our interaction with environment requires recognition of `things' such as sounds, smells, shapes in a scene (text characters, faces, flowers, plants), etc. Analysis of document images for information extraction has gained immense importance in recent past. Wide variety of information, which has been conventionally stored on paper, is now being converted into electronic form for better storage and intelligent processing. This needs processing of documents using image analysis algorithms. Locating text image blocks and tables, and defining appropriate algorithm is the major challenge in document image analysis [1, 2].
In order to understand how different document image processing filters work, it is a good idea to begin by understanding what frequency has to do with document images. A document image is in essence a two dimensional collection of discrete signals. Therefore, the signals have frequencies associated with them. For instance, if there is relatively little change in grayscale values as you scan across an image, then there is lower frequency content contained within the document image. If there is wide variation in grayscale values across a document image then there will be more frequency content associated with the document image. This may seem somewhat confusing, so let us think about this in terms that are more familiar to us. From signal processing, we know that any signal can be represented by a collection of sine waves of differing frequencies, magnitudes and phases. This transformation of a signal into its constituent sinusoids is known as the Fourier Transform.[4]
This collection of sine waves can potentially be infinite, if the signal is difficult to represent, but is generally truncated at a point where adding more signals does not significantly improve the resolution of the recreation of the original signal. In digital systems, we use a Fourier Transform designed in such a way that we can enter discrete input values, specify our sampling rate, and have the computer generate discrete outputs. This is known as the Discrete Fourier Transform, or DFT.
MATLAB uses a fast algorithm for performing a DFT, which is called the Fast Fourier Transform, or FFT, whose MATLAB command is fft. The FFT can be performed in two dimensions, fft2 in MATLAB. This is very useful in image processing because we can then determine the frequency content of an image. Picture a document image as a two dimensional matrix of signals. If you plotted just one row, so that it showed the grayscale value stored within each pixel, you might end up with something that looks like a bar graph, with varying values in each pixel location. Each pixel value in this signal may appear to have no correlation to the next one. However, the Fourier Transform can determine which frequencies are present in the signal. In order to see the frequency content, it is useful to view the absolute value of the magnitude of the Fourier Transform, since the output of a Fourier Transform is complex in nature.
1.1. Document Image Analysis(DIA)
Over the last few decades, machine reading is changing from a dream to reality. Optical Character Recognition systems (OCR) deal with the recognition of printed or handwritten characters. Methodically, character recognition is a subset of the pattern recognition area. However, it was character recognition that gave the impetus to the Pattern Recognition and Image Analysis to become matured fields of science.
At present, reasonably efficient and inexpensive OCR packages are commercially available to recognize printed texts in languages such as English, Chinese, and Japanese. On the contrary, there is only limited research effort made in the recognition of Indian and other oriental languages. While research in OCR for printed Roman script has reached a point of diminishing returns, OCR for handwriting and for printed non-Roman scripts continues to be a very active field.
Apart from character recognition, recognizing the font of a printed document is not even attempted on Indian language documents; while some successful studies are made in English. Typographically, a font is a particular instantiation of a typeface design, often in a particular size, weight and style.
In many occasions, printed documents may contain words in various font faces and sizes. For Indian and many other oriental languages, OCR systems are not yet able to successfully recognize printed document images of varying scripts, quality, size, style and font.[9]
1.2 Purpose of texture analysis:
· To identify different textured and non-textured regions in an document image.
· To classify/segment different texture regions in an image.
· To extract boundaries between major texture regions.
· To describe the Texel unit.
· 2-D, 3-D shape from texture [3]
Larger number of approaches is available. In this paper we discuss some of Fourier Transform method:
1.3 Fourier Transform:
The Fourier transform is a representation of an image as a sum of complex exponentials of varying magnitudes, frequencies, and phases. The Fourier transform plays a critical role in a broad range of image processing applications, including enhancement, analysis, restoration, and compression.
Some Properties of Fourier transform:
· Frequency Response of Linear Filters:
· Fast Convolution:
· Locating Document Image Features[4]
·
Some Method of Fourier transforms:
· Spatial Domain Filter
· 2D Fourier Transform
· 2D-Fast Fourier Transform
· Image Enhancement and
· Edge Reinforcement
In this paper, we used some of Fourier Transform methods to remove the noise from the document image which consists of text. After that we used one of the method of Fourier transform on document image to extract the text.
2. APPLICATIONS OF THE FOURIER TRANSFORM
This paper presents a few of the many document image processing-related applications of the Fourier transform.
2.1. Filters
Image processing is based on filtering the content of images. Filtering is used to modify an image in some way. This could entail blurring, deblurring, locating certain features within an image, etc… Linear filtering is accomplished using convolution, as discussed above. A filter, or convolution kernel as it is also known, is basically an algorithm for modifying a pixel value, given the original value of the pixel and the values of the pixels surrounding it. There are literally hundreds of types of filters that are used in image processing. However, we will concentrate on several common ones.
The first filters we will talk about are low pass filters. These filters blur high frequency areas of images. This can sometimes be useful when attempting to remove unwanted noise from an image. However, these filters do not discriminate between noise and edges, so they tend to smooth out content that should not be smoothed out.
Median Filters can be very useful for removing noise from images. A median filter is like an averaging filter in some ways. The averaging filter examines the pixel in question and its neighbor’s pixel values and returns the mean of these pixel values. The median filter looks at this same neighborhood of pixels, but returns the median value. In this way noise can be removed, but edges are not blurred as much, since the median filter is better at ignoring large discrepancies in pixel values.
2.2. Frequency Response of Linear Filters
The Fourier transform of the impulse response of a linear filter gives the frequency response of the filter. The function freqz2 computes and displays a filter’s frequency response. The frequency response of the Gaussian convolution kernel shows that this filter passes low frequencies and attenuates high frequencies.
Fig. 1. Frequency Response of a Gaussian Filter
2.3. Fast Convolution
A key property of the Fourier transform is that the multiplication of two Fourier transforms corresponds to the convolution of the associated spatial functions. This property, together with the fast Fourier transform, forms the basis for a fast convolution algorithm.
Convolution is a linear filtering method commonly used in image processing. Convolution is the algebraic process of multiplying two polynomials. An image is an array of polynomials whose pixel values represent the coefficients of the polynomials. Therefore, two images can be multiplied together to produce a new image through the process of convolution. If the convolution kernel, or filter, is large, this can be a very tedious process involving many multiplication steps. However, the convolution theorem states that convolution is the same as the inverse Fourier Transform of the multiplication of two Fourier Transforms.
· Create two matrices.
· Zero-pad A and B so that they are at least (M+P-1)-by-(N+Q-1). (Often A and B are zero-padded to a size that is a power of 2 because fft2 is fastest for these sizes.) The example pads the matrices to be 8-by-8.
· Compute the two-dimensional DFT of A and B using fft2, multiply the two DFTs together, and compute the inverse two-dimensional DFT of the result using ifft2
· Extract the nonzero portion of the result and remove the imaginary part caused by roundoff error.
8.0000 9.0000 15.0000 7.0000 6.000011.0000 17.0000 30.0000 19.0000 13.0000
15.0000 30.0000 45.0000 30.0000 15.0000
7.0000 21.0000 30.0000 23.0000 9.0000
4.0000 13.0000 15.0000 11.0000 2.0000
Table: 1 Two by Two Matrices for testing convolution
2.4. Locating Document Image Features
The Fourier transform can also be used to perform correlation, which is closely related to convolution. Correlation can be used to locate features within an document image; in this context correlation is often called template matching.[7]
Fig: 2.a. Original Document Image from NIST database.
Fig: b. Text Extracted from above document image
3. DISCRETE FOURIER TRANSFORM
Working with the Fourier transform on a computer usually involves a form of the transform known as the discrete Fourier transform (DFT). A discrete transform is a transform whose input and output values are discrete samples, making it convenient for computer manipulation. There are two principal reasons for using this form of the transform:
• The input and output of the DFT are both discrete, which makes it convenient for computer manipulations.
• There is a fast algorithm for computing the DFT known as the fast Fourier transform (FFT).
Visualizing the Discrete Fourier Transform
3.1 Construct a matrix f that is similar to the function f(m,n) in the example in “Definition of Fourier Transform”. Remember that f(m,n) is equal to 1 within the rectangular region and 0 elsewhere. Use a binary image to represent f(m,n).
Fig.a. Rectangular Region by DFT
3.2 Compute and visualize the 30-by-30 DFT.
Fig. b. DFT computed without padding
3.3 This plot differs from the Fourier transform displayed in “Visualizing the Fourier Transform”. First, the sampling of the Fourier transform is much coarser. Second, the zero-frequency coefficient is displayed in the upper left corner instead of the traditional location in the center.
Fig. c. Color DFT computed without padding
3.4 To obtain a finer sampling of the Fourier transform, add zero padding to f when computing its DFT. The zero padding and DFT computation can be performed in a single step.
Fig. d. DFT computed with Padding
3.5 The zero-frequency coefficient, however, is still displayed in the upper left corner rather than the center. You can fix this problem by using the function fftshift, which swaps the quadrants of F so that the zero-frequency coefficient is in the center. [4,5]
Fig. e. Log of the Fourier Transform of a Rectangular Function
Fig. 3. Discrete Fourier transform
4. IMPLEMENTED METHODS OF THE FOURIER TRANSFORM 4.1 2D Fourier Transform
The Fourier transform is a representation of an image as a sum of complex exponentials of varying magnitudes, frequencies, and phases. The Fourier transform plays a critical role in a broad range of document image processing.
If f(m,n) is a function of two discrete spatial variables m and n, then the two-dimensional Fourier transform of f(m,n) is defined by the relationship:
The variables ω1 and ω2 are frequency variables; their units are radians per sample. F(ω1,ω2) is often called the frequency-domain representation of f(m,n).
F(ω1,ω2) is a complex-valued function that is periodic both in ω1 and ω2, with period 2.
4.2 2D-Fast Fourier Transform
The 2-D FFT block computes the fast Fourier transform (FFT) of a two-dimensional M-by-N input matrix in two steps. First it computes the one-dimensional FFT along one dimension (row or column). Then it computes the FFT of the output of the first step along the other dimension (column or row). The dimensions of the input matrix, M and N, must be powers of two.[8]
4.3. Spatial Domain Filter
Spatial domain filtering are generally defined as small windows that are convolved with the original document image. In the spatial domain, the action of a filter can be seen by looking the structure of filter. The concept of filtering has its roots in the use of the Fourier transform for signal processing in the so-called frequency domain. Spatial filtering term is the filtering operations that are performed directly on the pixels of an image.[9]