Giving the image to Tesseract-OCR it can detect the names fine - can't see any errors in any of the letters - but for some of the numbers it's doing a ridiculously bad job, even after processing the image to increase contrast and make things completely unambiguous, I still get "ns" in place of "118", or when the numbers are "0 124 97 221" it provides "Q 124 on eel"
Anyone know of OCR that can actually detect digits? Or even just something which behaves consistently - I can easily de-1337 if it simply mis-identifies - but not when "ee" might represent 77 or 84 or something else entirely! :@
I could probably ask for an emailed copy, but I don't understand how detecting numerical digits can be difficult.
What's even more frustrating is that it's supposed to be possible to limit Terreract to only detecting digits, but it doesn't work - it turns out they removed the ability to blacklist/whitelist characters in the current version. :@
It is annoying it doesn't seem to have moved on in the past two decades - it should be possible to point OCR at anything, have it identify glyphs, then ask for feedback on which ones it got wrong, repeat until happy. Bleh.
For example, attached is a crop of the row that gave "Q 124 on eel" - on its own it produces "124 97 2el", and in the first image (fixed horizontal/verticals, but gridlines still present and no brightness/contrast changes), it came closest with "0 124 on 221".
The formatting it produced was all over the place, but it did a good job on the numbers - a handful of mistakes, mostly with zeroes. A couple of incorrect numbers (161->151 and 77->17) which were highlighted through the totals not matching, but compared to Tesseract it was brilliant.
Happy Peter -> :)
We need to set Stallman on them all.