Identifying text in an image is a very popular application for computer vision. This process is commonly called optical character recognition, and is divided as follows:
- Text preprocessing and segmentation: During this step, the computer must deal with image noise, and rotation (skewing), and identify what areas are candidate text.
- Text identification: This is the process of identifying each letter in text. Although this is also a computer vision topic, we will not show how you to do this in this book purely using OpenCV. Instead, we will show you how to use the Tesseract library to do this step, since it was integrated in OpenCV 3.0. If you are interested in learning how to do what Tesseract does by yourself, take a look at Packt's Mastering OpenCV book, which presents a chapter on car plate recognition.
The preprocessing and segmentation...