Identification of Leaves by Interior Shape and Texture
Veronica Dooly, Jennifer LeBlanc, R. Mitchell Parry
Department of Computer Science
Appalachian State University
Research Experience for Teachers Program
National Science Foundation
1
Abstract – It is difficult to automate the process of leaf identification from texture or pattern with image processing. Keypoints can be found from images in a database and clustered using SURF. This paper discusses a process that uses SURF to create clusters and MATLAB to analyze the clusters with Kmeans. This research shows that examination of the interior leaf patterns does have merit because texture within the leaf remains less variant than other image properties such as color, size and lighting.
I. Introduction
Leaf identification is a widely studied topic. Leaves can be classified in a variety of methods. The most readily available method is simply looking at the leaf to determine to which tree or plant it belongs. When the leaf is at its source, the process is easier as the bark is also available to narrow choices.
Image processing can help in the process of identification when a leaf is not at its source, such as with an image in a database. One procedure looks for the vein structureof the leaf(Xiaodong, et. al., 2010), which is similar to finding the vein structure of a hand (Crisan, et. al., 2010). Leaves have also been identified by shape. Research has been conducted to facilitate this process by extracting a single leaf from a photo with multiple leaves (Henries, et. al., 2012).
Leaves develop patterns in their interior shapes based on their growth (Xia, 2007). Recognizing differing patterns will help in identification of similarly shaped leaves.Image processing also allows for identification of points that can be identified by a computer program (Raza, et. al., 2011) automating the process of identifying patterns.
II. Methodologies
- Magnification
In order to distinguish patterns of interior shapes and textures within sample leaves, a ProScope was used to magnify the leaves. The ProScope is a hand-held digital microscope that may be used to create images or videos of magnified items. Digital images of each leaf sample were also photographed without any magnification. Three magnifications were used for recording the leaves: 30x, 100x, and 400x as seen in Figure 1. The images were saved for further analysis. At each level of magnification, images of the center, edge and point where the stem met each leaf were taken.
(a)Non-magnified
(b) 30x magnification
(c) 100x magnification
(d) 400x magnification
Figure 1. Magnification of Leaves, Column 1Leaf Type 1, Sample 1 and Column 2 Leaf Type 2, Sample 2
Thirteen types of leaves were gathered for the sampling. After initial images, the evergreen leaves, which were Type 8 and Type 9, were removed from the group. This left eleven types with one-three examples of each type.
The first analysis was a simple visual comparison of the images. The researchers looked for patterns, shape, and textures. Each magnification was viewed and compared to each example of a particular leaf and to other leaf samples.
The leaves were then processed with ImageJ software to visualize the existing patterns. The images were first modified by changing the color threshold. By adjusting the red, green, and blue (RGB) color levels, some details of the patterns could be enhanced as shown in Figure 2. Also, if a consistent adjustment could be made to the RGB levels of all the samples, the process could be automated.
Figure 2. Leaf Type 3, Sample 1 after modification of color threshold
The images were then modified by adjusting the black and white (BW) threshold. In order to do this, the images were changed to grayscale. Again, some details could be enhanced. Consistent adjustments to the BW threshold across different leaf types, if found, could then be automated to speed processing in the future.
- Computer Analysis with SURF
Leaf images viewed without magnification also contain many distinct texture patterns. Further classification of leaves by these unique patterns required implementation of automated image processing. This was accomplished using a three step process as seen in Figure 3.
Two leading programs that detect keypoints and descriptors on images are Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF). OpenSURF in MATLAB was preferred for this study because of its accessibility.
Figure 3. Analyzing clusters of leaf patterns with SURF
A database of 389 leaf images was first processed using SURF to find the keypoints on each leaf. During this process SURF creates a vector of 64 descriptors gleaned around each keypoint as shown in Figure 4. Although each descriptor had unique data, for speed and simplification it was beneficial to reduce the descriptors into clusters using Kmeans. Kmeans clustering groups the descriptors into N clusters based on their distance from the nearest centroid (MathLab, n.d.). This research study used 100 clusters for identification. The final step in processing the images implemented the CrossVal function in MATLAB.CrossVal was run with two different classifiers Linear Discriminant Analysis (LDA) and Diagonal Linear Discriminate Analysis (DLDA). This function used one partition of leaves to train and tests a second set for correct identification. This repeated with each fold functioning as a test set (Schmitt & McCoy, 2011).
Figure 4: Keypoints found by SURF on Leaf Type 3, Sample 1
After the process was established, it was repeated on a much larger database containing 11,566 leaves. This set was sectioned into 8416 training leaves and 3150 test leaves. These groups were processed independently for more accurate testing. The descriptors from the training leaves were then grouped into 100 classifiers using Kmeans. The test database was then processed using External Validation which labels test leaves based on the training data’s Kmeans clusters. The LDA and DLDA classifiers were then used for analyzing the separate directories of data, one for training and one for testing. Crossval was also implemented on the train and test leaf databases for comparison to the first results.
III. Results
- Magnification
The first analysis of the leaf images was through visual observation. On the non-magnified images general leaf shape was observed, such as heart-shaped or oval. When possible, patterns were also noted. Under magnification, features that were not visible to the naked eye appeared, such as little circles on the type 2 leaves (which were seen in Figure 1). There appeared to be irregular polygons similar to a Voronoi Diagram, an example is shown in Figure 5. Additionally, some leaves had shapes which were transformed by translation, rotation, or dilation.
Figure 5. Example of Voronoi Diagram
Image found at:
as_st=y&tbm=isch&hl=en&as_q=voronoi+polygons&as_epq=&as_oq=&as_eq=&cr=&as_sitesearch=&safe=images&tbs=sur:f&biw=1920&bih=932&sei=BDrwUam8KIna8AT-oYCgDw
There were limitations to visually observing the leaves. First, the process of magnification created a very limited view of the leaf. The image had to focus on one section, such as the center or the edge. Second, there was great inconsistency in which magnification yielded the best clarity between the leaves. This was also shown to be true within different samples of the same leaf. Table 1 shows the best magnification for each sample.
Table 1Best Visual Patternfrom Images per Leaf
Leaf Type / Level of Magnification for Best Visual PatternsLeaf sample / Magnification
Type 1 / Sample 1 / 400
Sample 2 / 100
Sample 3 / 100
Type 2 / Sample 1 / 100
Sample 2 / 100
Sample 3 / 30
Type 3 / Sample 1 / 30
Sample 2 / 30
Type 4 / Sample 1 / Non-magnified
Sample 2 / 100
Type 5 / Sample 1 / 30
Type 6 / Sample 1 / Non-magnified
Type 7 / Sample 1 / 100
Type 10 / Sample 1 / 30
Type 11 / Sample 1 / 30
Type 12 / Sample 1 / 100
Sample 2 / 30
Type 13 / Sample 1 / 30
Sample 2 / 30
After the visual examination of the leaf images, the images were processed with ImageJ to enhance the patterns. The first adjustment made was to the color threshold. Some details were enhanced on the images while other items were lost. Additionally, no common threshold was found to facilitate automation. Thresholds varied widely even between leaves of the same type. This was found to be true in different locations on the same leaf at the same magnification as seen in Figure 6. For this leaf, the red threshold ranged from 0-255 for Sample 1 and ranged from 210-255 for Sample 2. Green ranged from 67-158 for Sample 1 and 147-255 for Sample 2. Blue ranged from 66-88 for Sample 1 and 0-255 for Sample 2.
Figure 6.Adjusting the color threshold of Type 1, Sample 1 at 400x magnification
Next, the BW threshold was modified as the example shows in Figure 7. Again there was great variation in the ranges. The example, which is leaf type 12, leaf 2, shows a range of 34-53. Type 12, leaf 1 had a BW threshold range of 129-255. Type 10, leaf 1 had a range of 0-49.
Figure 7.Adjusting the BW threshold of Type 12, Sample 2
- Computer Analysis with SURF
The initial results were analyzed visually after the first two procedures, OpenSURF and Kmeans Clustering, were completed. This was done by plotting the (x, y) coordinates from the one hundred clusters groups onto its matching leaf image in MATLAB which is shown in Figure 8. This procedure was repeated isolating cluster groups individually to identify patterns within a particular leaf type. For example, Figure 9 shows that on the Common Hazel cluster 38 is consistently located along the top right edge of all leaf samples. Clusters 3 and 76 were on the top left edge. Cluster 42 is on the bottom right edge. Cluster 55 is on the bottom left edge. Figure 10 shows clusters 3, 42, 55, and 76 for one Common Hazel leaf. In addition other observations were made about the random disbursement of individual clusters and the absence of certain clusters altogether.
Figure 8. Common Hazel leaves showing the cluster mapping of up to 100 clusters per leaf
After visually comparing multiple samples of the same types of leaves, the data pointed to the possibility of various trends that could lead to leaf identification. This led to modifying the image processing procedure to include the CrossVal function. There were several limitations in the implementation of the Crossval procedure based on the small size of the 389 leaf database. The lack of multiple samples of the same species limited the number of Train/Test candidates to 10 species. This was run using LDA and DLDA classifiers with the species that had four or more samples. This resulted in a 20% correct identification rate or two times more than the random probability.
Figure 9.Cluster 38 consistently found on top right edge
Figure 10. Clusters 3, 42, 55, and 76 on a single sample of a Common Hazel leaf
The second much larger database containing two separate directories of train and test leaves were not visually analyzed due to file size. The Crossval function was run using LDA and DLDA classifiers on the training leaves and test leaves mutually exclusively. These results had a more optimistic performance rate. These databases were then processed by external validation using the classify function which used the training leaves classifiers to identify the test leaves in the separate directories. The external validation using the train/test classifying results were significantly lower as seen in Table 2
Table 2 Performance Measures by Classifier
Classifying Method / Database Set / LDA / DLDACrossVal / Train Leaves / 61.7% / 52.7%
CrossVal / Test Leaves / 59.0% / 51.8%
External Validation / Train & Test Leaves / 24.5% / 19.7%
iv. conclusion
Identification of leaves using interior shapes and textures has a scope well beyond this research. This research shows that examination of the interior leaf patterns does have merit in the identification of leaves. The texture within the leaf remains less variant than other image properties such as color, size and lighting.
This research shows that magnification, although helpful in visual analysis, proved difficult to automate. The inconsistency of optimum magnification across the species was the most significant barrier. Another issue was determining which segment of the leaf provided the best information under magnification. Therefore, this study found no means of automating the process of adjusting the threshold.
Automated processing using OpenSURF rendered some positive results. The DLA and DLDACrossVal on the training leaves and testing leaves all had a performance above fifty percent while the theoretical probability for correct identification was closer to one percent. The External Validation classification yielded much lower performance at twenty-five percent, but still well above the theoretical probability.
The discrepancy between the CrossVal and the External Validation systems would be an area for future examination. The research might focus on whether the batches were gathered or photographed in different ways and the effects therein. Further examination after processing, this study found that the test leaf files were three different types; scan, photograph, and pseudoscan. This may have been a contributing factor to the lower performance in the external validation classification process. Test leaves that were photographs performed significantly lower when identified as seen Table 3.
Table 3Performance Measures by Image Type
Scan / Image / PseudoscanLDA Performance / 23.9% / 5.3% / 31.4%
DLDA Performance / 18.4% / 5.2% / 22.2%
There is much further study to be pursued in the identification of leaves by their interior shapes and texture. Several ideas provoked from this research include: the relationship between Voronoi shapes and centroids to a leaves interior shapes, the use of other classifiers besides LDA and DLDA, and causes of discrepancies between CrossVal and External Validation. One logical offspring of this study would be applying the OpenSURF automated processing to leaf images under magnification.
Acknowledgments
The authors wish to thankDr. RahmanTashakkori for leading the Research Experience for Teachers, and Appalachian Academy of Science Scholar, Clint Guin, for his help in obtaining the images database from ImageClef and helping with some of the programming. We also thank the National Science Foundation and the Department of Computer Science at Appalachian State University for this opportunity.Thanks also to Asheville-Buncombe Technical Community College and West Wilkes High School for their support.
References
Crisan, S.; Tarnovan, I.; Crisan, T., “Radiation optimization and image processing algorithms in the identification of hand vein patterns.” Computer Standards & Interfaces, vol. 32, no. 3, pp. 130-140, March 2010, DOI: doi.org/10.1016/j.csi.2009.11.008.
Henries, D.; Tashakkori, R., ‘Extraction of leaves from herbarium images,’ Electro/Information Technology (EIT), 2012 IEEE International Conference, vol. 1, no. 6, pp. 6-8, May 2012. DOI: 10.1109/EIT.2012.6220752
MathWorks.(n.d.).Kmeans [Online]. Available:
Raza, S.; Parry R.; Moffitt, R.; Young, A.; Wang, M., “An analysis of scale and rotation invariance in the bag-of-features method for hitopathological image classification.” MICCAI 2011 Lecture Notes in Computer Science, vol., 6893, pp. 66-74. Available: DOI: 10.1007/978-3-642-23626-6_9.
Schmitt, D.; McCoy, N., “Object classification and localization using surf descriptors.” December 2011. Available:
Xia, Q., “The formation of a tree leaf.” ESAIM.Control, Optimisation and Calculus of Variations, vol. 13, no. 2,pp. 359-377, 2007. Available: DOI: 10.1051/cocv:2007016.
Xiaodong, Z.; Xiaojie W., “Leaf vein extraction based on gray-scale morphology”, IJIGSP, vol.2, no.2, pp.25-31, 2010. Available:
1