Skip navigation.

Open your heart, open your mind!

Open Opera - Open Source

OCR software 4 Linux and Windows - Cuneiform - with cyrillic support (Serbian)

, , , , ,

Scan cyrillic content with russian software...

http://en.openocr.org/

4 linux - https://launchpad.net/cuneiform-linux/+download

windows version - http://www.cuneiform.ru/downloads/cuneiform.zip

english translation on forum - http://www.cuneiform.ru/forum/viewtopic.php?p=2492#2492

On 2nd April 2008. Cognitive Technologies opened source code of OCR Cuneiform

Working in Wine on Gentoo: /marked as gold app from Ubuntu Hardy and Gentoo users/



Softpedia tutorial: (and newest version with bug-fixes - http://linux.softpedia.com/get/Text-Editing-Processing/Others/Cuneiform-41079.shtml )

Compiling

Extract the source and go to the root folder (the one this file is in).
Then type the following commands:

mkdir builddir
cd builddir
cmake -DCMAKE_BUILD_TYPE=debug ..
make
make install

By default Cuneiform installs to /usr/local. You can specify a different prefix by giving a command line switch "-DCMAKE_INSTALL_PREFIX=/what/ever/yo /want" to CMake.

If you have ImageMagick++ on your system, Cuneiform autodetects and builds against it. Then Cuneiform can process any image that ImageMagick knows how to open. Otherwise it can only read uncompressed BMP images.

If you want to run Cuneiform without installing it on your system, you have to point the CF_DATADIR environment variable to a directory containing the .dat files. These can be found in the "datafiles" directory of the source package.

Running

After install you simply run.

cuneiform [-l language -o result_file --html --dotmatrix --fax] < image_file >

Output is written to pumaout.txt. Cuneiform assumes that your image contains only a single column of text.

By default Cuneiform recognizes English text. To change the language use the command line switch -l followed by your language string. To get a list of supported languages type "cuneiform -l".

By default Cuneiform outputs plain text. You can specify the "--html" switch to make it output in HTML format.

If you do not define an output file with the -o switch, Cuneiform writes the result to a file "cuneiform-out.[format]". The file extension is either "txt" or "html" depending on your output format.

What's New in This Release:

· This release adds hOCR output and all sorts of general tweaks.
· It also compiles on Windows using MinGW.

UBUNTU versions installing:
# Скачать CuneiForm для Windows: http://www.cuneiform.ru/downloads/
Скачать YAGF для Windows: http://symmetrica.net/cuneiform-linux/yagf-ru.html
# Установка CuneiForm и YAGF в Ubuntu / Kubuntu / Xubuntu:
# sudo apt-get install libmagick++1 aspell aspell-ru sane xsane sane-utils quiteinsane
# Устанавливаем дополнительные пакеты для полноценной работы YAGF
Source: http://itshaman.ru/it-programmy-dlya-linux/50/raspoznavanie-teksta-v-linux-ubuntu-s-pomoshchyu-cuneiform-yagf

NOTE *added 07.10.2009. On Ubuntu Gscan2pdf works great with cyrrillic (Hardy H.)! - http://my.opera.com/linuxadore/blog/2008/12/16/gscan2pdf-works-great-with-cyrillic-on-hardy-herron

PyTube - video downloader and converter for linuxSolved sound problems in Ubuntu Hardy

Comments

michele 1. October 2009, 11:07

Great software !
I tried and it recognize very well. I'm developing win32 app and i would like to know if Cuneiform has also win32 command line utility... i couldn't find this information until now.

Mirjana Toskov 3. October 2009, 09:28

Glad to hear that. Happy hacking! Wish you the best...

Write a comment

You must be logged in to write a comment. If you're not a registered member, please sign up.

Download Opera, the fastest and most secure browser
December 2009
S M T W T F S
November 2009January 2010
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31