Concoctions of Caliginosity #2
I've been doing some more thinking about Word Verification parsing. And I think it is do-able in a fashion that would not have to amount to a Ph.D. project. The biggest challenge would probably be that of dealing with letters that touch each other. Naughty, naughty letters.
First, something like libjpeg would suffice to decompress the image and read the scanlines. And that would probably be the only 3rd party library I'd have to use. I would also need to express each of the twenty-six letters as a set of lines and tolerances.
The first step of parsing would be to reduce all of the letters into line segments (ie, straight and finite). Then, starting with the left-most line segment, check to see if it looks like a letter (based on those aforementioned lines-and-tolerances). If not, add the next line segment to it, and check again. Once a match is made, remove those line segments and start again with the left most line segment.
Although I haven't seen this particular situation come up yet, it is possible that a 'd' and a 'b' could back into each other, and share a common, vertical line. I could deal with that situation by leaving the last line segment of a letter match and starting the next check both with and without it.
I'll mull this over a little more before I start doing any coding. And it's also much easier to say, "reduce the letters to line segments" than it is to actually do that....
In the mean time, I give you "free cat"; a beautiful piece of art.
