OCR software 4 Linux and Windows - Cuneiform - with cyrillic support (Serbian)
Saturday, 5. September 2009, 15:15:51
http://en.openocr.org/
4 linux - https://launchpad.net/cuneiform-linux/+download
windows version - http://www.cuneiform.ru/downloads/cuneiform.zip
english translation on forum - http://www.cuneiform.ru/forum/viewtopic.php?p=2492#2492
On 2nd April 2008. Cognitive Technologies opened source code of OCR Cuneiform
Working in Wine on Gentoo: /marked as gold app from Ubuntu Hardy and Gentoo users/
Softpedia tutorial: (and newest version with bug-fixes - http://linux.softpedia.com/get/Text-Editing-Processing/Others/Cuneiform-41079.shtml )
Compiling
Extract the source and go to the root folder (the one this file is in).
Then type the following commands:
mkdir builddir
cd builddir
cmake -DCMAKE_BUILD_TYPE=debug ..
make
make install
By default Cuneiform installs to /usr/local. You can specify a different prefix by giving a command line switch "-DCMAKE_INSTALL_PREFIX=/what/ever/yo /want" to CMake.
If you have ImageMagick++ on your system, Cuneiform autodetects and builds against it. Then Cuneiform can process any image that ImageMagick knows how to open. Otherwise it can only read uncompressed BMP images.
If you want to run Cuneiform without installing it on your system, you have to point the CF_DATADIR environment variable to a directory containing the .dat files. These can be found in the "datafiles" directory of the source package.
Running
After install you simply run.
cuneiform [-l language -o result_file --html --dotmatrix --fax] < image_file >
Output is written to pumaout.txt. Cuneiform assumes that your image contains only a single column of text.
By default Cuneiform recognizes English text. To change the language use the command line switch -l followed by your language string. To get a list of supported languages type "cuneiform -l".
By default Cuneiform outputs plain text. You can specify the "--html" switch to make it output in HTML format.
If you do not define an output file with the -o switch, Cuneiform writes the result to a file "cuneiform-out.[format]". The file extension is either "txt" or "html" depending on your output format.
What's New in This Release:
· This release adds hOCR output and all sorts of general tweaks.
· It also compiles on Windows using MinGW.
UBUNTU versions installing:
# Скачать CuneiForm для Windows: http://www.cuneiform.ru/downloads/
Скачать YAGF для Windows: http://symmetrica.net/cuneiform-linux/yagf-ru.html
# Установка CuneiForm и YAGF в Ubuntu / Kubuntu / Xubuntu:
# sudo apt-get install libmagick++1 aspell aspell-ru sane xsane sane-utils quiteinsane
# Устанавливаем дополнительные пакеты для полноценной работы YAGF
Source: http://itshaman.ru/it-programmy-dlya-linux/50/raspoznavanie-teksta-v-linux-ubuntu-s-pomoshchyu-cuneiform-yagf
NOTE *added 07.10.2009. On Ubuntu Gscan2pdf works great with cyrrillic (Hardy H.)! - http://my.opera.com/linuxadore/blog/2008/12/16/gscan2pdf-works-great-with-cyrillic-on-hardy-herron















michele # 1. October 2009, 11:07
I tried and it recognize very well. I'm developing win32 app and i would like to know if Cuneiform has also win32 command line utility... i couldn't find this information until now.
Mirjana Toskov # 3. October 2009, 09:28