Using the Tesseract OCR library
While Tesseract OCR is already integrated with OpenCV 3.0, it's still worth studying its API since it allows for finer grained control over Tesseract parameters. This integration will be studied in Chapter 11, Text Recognition with Tesseract.
Creating an OCR function
We'll change the previous example to work with Tesseract. Start by adding tesseract/baseapi.h
and fstream
to the include
list:
#include opencv2/opencv.hpp; #include tesseract/baseapi.h; #include vector; #include fstream;
Then, we'll create a global TessBaseAPI
object that represents our Tesseract OCR engine:
tesseract::TessBaseAPI ocr;
Note
The ocr
engine is completely self-contained. If you want to create a multi-threaded piece of OCR software, just add a different TessBaseAPI
object in each thread, and the execution will be fairly thread-safe. You just need to guarantee that file writing is not done over the same file, otherwise you'll need to guarantee safety for this operation.
Next, we will...