CRBLP Bangla OCR: Best parameters for bpnet line training for bangla scripts

Wednesday, July 30, 2008

nhidden - 500
epochs - 200
learningrate - 0.2
testportion - 0
normalize - 1
shuffle - 1

Md. Abul Hasnat said...: Souro... thanks for writing something finally. Hope you will continue...; July 30, 2008 at 4:51 AM
Tom said...: Great! Please keep in mind that we'll be working on improving the MLP over the next 6-12 months, in particular with the view towards working on larger character sets (like the Indic characters).

Could you make your training data available for download somewhere? That would be very useful for us.

Also, you may find the document image degradation tools in OCRopus useful; they let you take a single scanned character image and simulate a wide variety of real-world degradations. Right now, you have to use them manually, but we will be incorporating them into the training procedure.; July 30, 2008 at 2:15 PM
Md. Abul Hasnat said...: Dear Thomas Breuel,
Thank you very much for your appreciation and advice.

We are concentrating on preparing a large scale training dataset. For this purpose we are concentrating on training from a line image. To make the training more realistic we pass the basic training data through the segmenter and then use the output of the segmenter as a training data set.

Right now we are experimenting the plan and soon we will be able to generate the training data. We are planning to upload the training data after completing the data creation. We have plan to use the degradation tool also.; July 30, 2008 at 8:16 PM