| Tesseract release notes June 30 2009 - V2.04 |
| Integrated patches for portability and to remove some of the |
| "access" macros. |
| Removed dependence on lua from the viewer making it a *lot* |
| faster. Also the viewer now compiles and works (on Linux.) |
| Fixed the following issues: |
| 1, 63, 67, 71, 76, 79, 81, 82, 84, 106, 108, 111, 112, 128, 129, 130, 133, 135, |
| 142, 143, 145, 146, 147, 153, 154, 160, 165, 169, 170, 175, 177, 187, 192, |
| 195, 199, 201, 205, 209. |
| This is the last version to support VC++6! |
| This may also be the last version to compile without leptonica! |
| Windows version now outputs to stderr by default, fixing a lot of the problems with lack of visible meaningful error messages. |
| |
| Tesseract release notes April 22 2008 - V2.03 |
| 2.02 was unrunnable, due to a last-minute "simple" change. |
| 2.03 fixes the problem and also adds an include check for leptonica |
| to make it more usable. |
| |
| Tesseract release notes April 21 2008 - V2.02 |
| Improvements to clustering, training and classifier. |
| Major internationalization improvements for large-character-set |
| languages, eg Kannada. |
| Removed some compiler warnings. |
| Added multipage tiff support for training and running. |
| Updated graphics output to talk to new java-based viewer. |
| Added ability to save n-best lists. |
| Added leptonica support for more file types. |
| Improved Init/End to make them safe. |
| Reduced memory use of dictionaries. |
| Added some new APIs to TessBaseAPI. |
| Fixed namespace collisions with jpeg library (INT32). |
| Portability fixes for Windows for new code. |
| Updates to autoconf system for new code. |
| |
| Tesseract release notes August 27 2007 - V2.01 |
| Fixed UTF8 input problems with box file reader. |
| Fixed various infinite loops and crashes in dawg code. |
| Removed include of config_auto.h from host.h. |
| Added automatic wctype encoding to unicharset_extractor. |
| Fixed dawg table too full error. |
| Removed svn files from tarball. |
| Added new functions to tessdll. |
| Increased maximum utf8 string in a classification result to 8. |
| |
| Tesseract release notes July 17, 2007 - V2.00 |
| |
| First release of the International version. |
| This version recognizes the following languages: |
| English - eng |
| French - fra |
| Italian - ita |
| German - deu |
| Spanish - spa |
| Dutch - nld |
| The language codes follow ISO 639-2. The default language is English. |
| To recognize another language: |
| tesseract inputimage outputbase -l langcode |
| |
| To train on a new language, see separate documentation. |
| More languages will be appearing over time. |
| |
| List of changes in this release: |
| Converted internal character handling to UTF8. |
| Trained with 6 languages. |
| Added unicharset_extractor, wordlist2dawg. |
| Added boxfile creation mode. |
| Added UNLV regression test capability. |
| Fixed problems with copyright and registered symbols. |
| Fixed extern "C" declarations problem. |
| Made some improvements to consistency of accuracy across platforms. |
| Added vc++ express support. |
| |
| Instructions for downloading and building version 2.00. |
| Things have changed quite a bit since the previous versions so please read carefully. |
| *All users* |
| The tarballs are split into pieces. |
| tesseract-2.00.tar.gz contains all the source code. |
| tesseract-2.00.<lang>.tar.gt contains the data files for <lang>. You need at least one of these or tesseract will not work. |
| tesseract-2.00.exe.tar.gz is not for the 'exe' language. It is windows executables. They are built with VC++ express and come with absolutely no warranty. If they work for you then great, otherwise get visual C++ express (and the platform sdk) and build from the source. |
| |
| *Non-windows users* |
| As with 1.04, this version works with make install. |
| *New* there is a tesseract.spec for making rpms. (Thanks to Andrew Ziem for the help.) |
| It might work with your OS if you know how to do that sort of thing. |
| If you are linking to the libraries, as with Ocropus, there is now a single master |
| library called libtesseract_full.a. |
| |
| *Windows users* |
| If you are building from the sources, there are still dsw and dsp files for vc++6 and also |
| sln and vcproj files for vc++ express. |
| The dll has been updated to allow input of non-binary images. (Thanks to Glen of Jetsoft.) |
| |
| |
| Tesseract release notes May 15, 2007 - V1.04. |
| |
| === Windows users only === |
| Added a dll interface for windows. Thanks to Glen at Jetsoft for contributing |
| this. To use the dll, include tessdll.h, import tessdll.lib and put tessdll.dll |
| somewhere where the system can find it. There is also a small dlltest program |
| to test the dll. Run with: |
| dlltest phototest.tif phototest.txt |
| It will output the text from phototest.tif with bounding box information. |
| **New for Windows** the distribution now includes tesseract.exe and tessdll.dll |
| which *might* work out of the box! There are no guarantees as you need |
| VC++6 versions of mfc and crt (at least) for it to work. (Batteries not |
| included, and certainly no installshield.) |
| |
| == Important note for anyone building with make: i.e. anyone except devstudio |
| users == |
| This release includes new standardization for the data directory. To enable |
| Tesseract to find its data files, you must either: |
| ./configure |
| make |
| make install |
| to move the data files to the standard place, or: |
| export TESSDATA_PREFIX="directory in which your tessdata resides/" |
| (or equivalent) in your .profile or whatever or setenv to set the environment |
| variable. Note that the directory must end in a / |
| HAVING tesseract and tessdata IN THE SAME DIRECTORY DOES NOT WORK ANY MORE. |
| |
| == All users == |
| Fixed a bunch of name collisions - mostly with stl. |
| Made some preliminary changes for unicode compatibility. Includes a new data |
| file (unicharset) and renaming of the other data files to eng.* to support |
| different languages. |
| There are also several other minor bug fixes and portability improvements |
| for 64 bit, the latest visual studio compiler etc. Thanks to all who have |
| contributed these fixes. |
| |
| NOTE: This is likely to be the last English-only release! |
| Apologies in advance to non-windows users for bloating the distribution with |
| windows executables. This will probably get fixed in the next release with |
| the multi-language capability, since that will also bloat the distribution. |
| |
| |
| Tesseract release notes Feb 2, 2007 - V1.03. |
| Added mftraining and cntraining. Using an image with a box file, tesseract |
| generates .tr output files. cntraining runs on the .tr files to make |
| normproto that lives in tessdata. mftraining runs on the .tr files to |
| make inttemp and pffmtable in tessdata. These are the main data files |
| that tesseract uses to recognize characters. At present, the code to make |
| dictionary files is not yet available, nor are any sample box files or |
| rebuilt inttemp or documentation to create any of these. Recognition is |
| still limited to the ASCII set, but when this problem is fixed, documentation |
| will follow. |
| |
| Added a new API with adaptive thresholding for grey and color images. |
| See ccmain/baseapi.h/cpp for details. The main program has been converted |
| to use the API as an example. See main() in ccmain/tesseractmain.cpp for |
| details. The API is designed to make it easy to add subclasses with ability |
| to output the bounding boxes etc from the internal structures. The adaptive |
| thresholding improves accuracy (most of the time) on non-binary images. |
| |
| Many memory leaks have been fixed. There are no known leaks left from using |
| the API correctly. |
| |
| The adaptive classifier was not operating correctly. This bug, and several |
| others have been fixed, including poor chopping, an indefinite (if not quite |
| infinite) loop in the number parser, and a couple of crash bugs. Thanks to |
| all that have contributed bugs and bug fixes. |
| |
| It is now possible to build without any of the graphics support to save code |
| size using #define GRAPHICS_DISABLED. There is also a new EMBEDDED define |
| for use on operating systems with limited library support. |
| |
| 64-bit and Mac OSX buildability is now included in the mainline source tree. |
| Thanks to all that have contributed patches and comments to help with that. |
| 1.03 is also endian-independent, apart from the tiff i/o, so if you use |
| libtiff, the code should run on all platforms, even if you get/create new |
| data files of a different endinanness. |
| |
| Some of the bug fixes improve accuracy, and so do some of the changes to |
| DangAmbigs and user-words. |
| |
| Tesseract release notes, Oct 4 2006 - V1.02. |
| Removed dependency on aspirin. *All* code is now licensed under Apache2.0. |
| |
| Tesseract release notes, Sep 7 2006 - V1.01. |
| |
| Fixes for this release: |
| Added mfcpch.cpp and getopt.cpp for VC++. |
| Fixed problem with greyscale images and no libtiff. |
| Stopped debug window from being used for the usage output. |
| Fixed load of inttemp for big-endian architectures. |
| Fixed some Mac compilation issues. |
| |
| This version should read uncompressed 8 bit grey and 24 bit color tiffs |
| without having to have libtiff. It does a dumb threshold though, so don't |
| expect good results from poor contrast or images of natural scenes etc. |
| |
| If you just run tesseract with no command line args you should now get a |
| sensible usage message on linux, with or without X-windows. |
| |
| If you can get it to compile on a PPC Mac, it may now run correctly, |
| although not all the build issues are fixed yet. |
| |
| Building Tesseract: |
| Windows: |
| Unpack the tar.gz archive |
| Open tesseract.dsw in DevStudio (preferably version 6, higher versions will be more difficult) |
| Set Win32 - Release as the active configuration. |
| Build. |
| Copy tesseract.exe from bin.rel up one directory level. |
| Run tesseract phototest.tif phototest |
| This will create phototest.txt. |
| |
| Linux: |
| Unpack the tar.gz archive |
| ./configure |
| make |
| Copy tesseract from ccmain up one directory level (or create a symbolic link) |
| Run tesseract phototest.tif phototest |
| This will create phototest.txt. |