Extracting text from grocery receipts
I recently attempted to extract text from some receipts by taking pictures of them. Tesseract is a widely recommended tool for OCR tasks. My past experiences with Tesseract have shown it performs well with scanned text. However, I was uncertain about its effectiveness on camera-taken images, especially since these images were not always taken with precision. The challenges included receipts positioned at angles, sometimes crumpled, and occasionally obstructed by my thumb, among other issues.
Despite these potential obstacles, I decided to give Tesseract a try. Unfortunately, it accurately interpreted only about 50% of the content. This prompted me to search for ready-to-use ML models that could offer a solution, but I couldn’t find anything suitable for my needs.
My search led me to a GitHub page (https://github.com/lucasvianav/grocery-receipt/blob/main/docs/SETUP.md), which described a project aimed at addressing a similar challenge. Notably, this project had also developed its own model. Encouraged by this discovery, I followed a path similar to what was described on the Tesseract OCR training page (https://github.com/tesseract-ocr/tesstrain?tab=readme-ov-file), which appeared to be straightforward.
Given that the receipts were not scanned, I had to manually extract the text lines using GIMP. This process is generally more manageable if the image is symmetrical, as it allows for the use of scripts to evenly split the image.
The training process with approximately 25 images took about 10 minutes on my computer. Unfortunately, the results showed no significant improvement.
I plan to continue searching for a more effective solution. However, for the time being, I will make do with what I have achieved so far.