Saturday, November 1, 2008

Bangla tesseract training data v-2.0 have been uploaded

I just uploaded the Bangla training data for tesseract engine. To be honest there are lot more works to do to improve the training data so that the recognition performance increases. So, I hope we will be able to improve the training data and thus newer version of the data will be available soon. If anyone want to take part of this task (preparing training data) and need any help then please feel free to contact with me. The links of the training data are given below:
http://ocropus-bengali.googlecode.com/files/Bangla%20tesseract%20training%20data%20v-2.0.zip

or

http://mhasnat.googlepages.com/Bangla_training_data_v_2_0.zip

2 comments:

জয়ন্ত said...

How could I help this matter?

Md. Abul Hasnat said...

Dear Joyonto da,
this data will be helpful for the researcher and developers or existing tesseract ocr users. If you have tesseract ocr installed on your computer then you can change the existing training data with these and observe the output.