Saturday, July 19, 2008

Training data for ocropus-tesseract

We have created training data for ocropus-tesseract. The training data is available for both Bangla and Devanagari. For Bangla we tried to train all the combinations of minimally segmented data units. For Devanagari we trained with the very basic units.

We are testing the recognition performance with the trained data units for Bangla script and continue adding more data units to enhance the recognition accuracy. The Devanagari training data is useful for testing the basic character recognition only. It will be helpful to guide anyone who just start trying to recognize Devanagari script using ocropus-tesseract.

The training data is freely available to download. Anyone can download these from the following links:

Download Training data for Bangla
Download Training data for Devanagari

No comments: