Introduction
In this paper we address this shortcoming by comparing a large set of commonly used features for block classification and include in the comparison three features that are known to yield good performance in content-based image retrieval (CBIR) and are applicable to binary images (Deselaers et al., 2004). Interestingly, we found that the single feature with the best performance is the Tamura texture histogram, which belongs to this latter class.
Related Work and Contribution
The widespread use of features based on connected components run-length statistics, combined with the simplicity of implementation of such features, led us to use these feature types in our experiments as well, comparing them to the use of features used in content-based image retrieval. Our CBIR features are based on the open source image retrieval system FIRE (Deselaers et al., 2004). We restrict our analysis for zone classification to those features that are promising for the analysis of binary images.
The most recent and detailed overview of the progress in document zone classification and a very accurate system is presented in (Wang et al., 2006). The authors use a decision tree classifier and model contextual dependencies for some zones. In our work we do not model zone context, although it is likely that a context model (which can be integrated in a similar way as presented by Wang et al.) would help the overall classification performance.
We expand on the work presented in (Wang et al., 2006) in the following ways:
• We include a detailed feature comparison including a comparison with commonly used CBIR features. It turns out that the single best feature is the Tamura texture histogram which was not previously used for zone classification.
• We present results both for a simple nearest neighbor classifier and for a very fast linear classifier based on logistic regression and the maximum entropy criterion.
• We introduce a new class of blocks containing speckles that has not been labeled in the UW-III database. This typical class of noise is important to detect during the layout analysis especially for images of photocopied documents.
• We present results for the part of the UW-III database without using duplicates and achieve a similar error rate of 1.5%.
• We introduce the use of histograms for the measurements of connected components and run lengths and show that this leads to a performance increase.
Feature Extractions
1. Tamura texture features histogram (TTFH)
2. Relational invariant feature histograms (RIFH)
3. Down-scaled images of size 32×32 (DSI)
4. The fill ratio, i.e. the ratio of the number of black pixels in a horizontally smeared (Wong et al., 1982) image to the area of the image (FR)
5. Run-length histograms of black and white pixels along horizontal, vertical, main diagonal, and side diagonal directions; each histogram uses eight bins, spaced apart as powers of 2, i.e. counting runs of length _ 1,3,7,15,31,63,127 and _ 128 (RL{B,W}{X,Y,M,S}H)
6. The vector formed by the total number, mean, and variance of the runs of black and white pixels along the horizontal, vertical, main diagonal, and side diagonal directions as used in (Wang et al., 2006) (RL{B,W}{X,Y,M,S}V)
7. Histograms (as in 5) of the widths and heights of connected components (CCXH, CCYH)
8. The joint distribution of the widths and heights of connected components as a 2-dimensional 64-bin histogram (CCXYH)
9. The histogram of the distances between a connected component and its nearest neighbor component (CCNNH)
Classifications
Source: Daniel Keysers, Faisal Shafait, Thomas M. Breuel, "DOCUMENT IMAGE ZONE CLASSIFICATION A Simple High Performance Approach", VISAPP 2007, pages 44-51
No comments:
Post a Comment