New package: tesseract-ocr-2.04-1 et al

Reini Urban rurban@x-ray.at
Mon May 10 11:27:00 GMT 2010


tesseract-ocr, a command line ocr package, been added to the cygwin 
distribution.

The Tesseract OCR engine was originally developed at HP between 1985 and 
1995. It was open-sourced by HP and UNLV in 2005 and Google has lead 
further development.
The Tesseract OCR engine was one of the top 3 engines in the 1995 UNLV 
Accuracy test. Between 1995 and 2006 it had little work done on it, but 
it is probably one of the most accurate open source OCR engines 
available. It will read a binary, grey or color image and output text.

Homepage: http://code.google.com/p/tesseract-ocr/

Notes:
* Built with libtiff, nevertheless it only accepts certain
   tiff image formats. convert with -depth from the ImageMagick
   package is my friend. I use convert <any> -depth 8 <any.tif>
* I haven't tried http://code.google.com/p/ocropus/

Packages:
tesseract-ocr
tesseract-ocr-devel

And the following languages as in debian:
tesseract-ocr-eng (default)
tesseract-ocr-deu
tesseract-ocr-deu-f (deutsch fraktur)
tesseract-ocr-fra
tesseract-ocr-ita
tesseract-ocr-nld
tesseract-ocr-por
tesseract-ocr-spa
tesseract-ocr-vie


If you have questions or comments, please send them to
the Cygwin mailing list at: cygwin@cygwin.com .
I'll answer only there and I don't answer private mails.

                 *** CYGWIN-ANNOUNCE UNSUBSCRIBE INFO ***

If you want to unsubscribe from the cygwin-announce
mailing list, look at the "List-Unsubscribe: " tag in
the email header of this message. Send email to the
address specified there. It will be in the format:

cygwin-announce-unsubscribe-you=yourdomain.com@cygwin.com

If you need more information on unsubscribing, start
reading here:

http://sources.redhat.com/lists.html#unsubscribe-simple

Please read *all* of the information on unsubscribing
that is available starting at this URL.






More information about the Cygwin-announce mailing list