I just uploaded the Bangla training data for tesseract engine. To be honest there are lot more works to do to improve the training data so that the recognition performance increases. So, I hope we will be able to improve the training data and thus newer version of the data will be available soon. If anyone want to take part of this task (preparing training data) and need any help then please feel free to contact with me. The links of the training data are given below:
http://ocropus-bengali.googlecode.com/files/Bangla%20tesseract%20training%20data%20v-2.0.zip
or
http://mhasnat.googlepages.com/Bangla_training_data_v_2_0.zip
Subscribe to:
Post Comments (Atom)
2 comments:
How could I help this matter?
Dear Joyonto da,
this data will be helpful for the researcher and developers or existing tesseract ocr users. If you have tesseract ocr installed on your computer then you can change the existing training data with these and observe the output.
Post a Comment