In this section, we are going to extract the text from images with Tesseract. As we mentioned, to install Tesseract on Windows, we can use a prebuilt binary package. On a UNIX-like system, we can use the system package manager to install, for example, apt-get on Debian, or brew on macOS. Take Debian as an example—we can install the libtesseract-dev and tesseract-ocr-all packages to install all the library and data files we need. No matter how you install it, please ensure that you have the correct version, 4.0.0, installed.
Although there are prebuilt packages, for the pedagogical purpose, we will build it from the source on a Linux system to see what components it contains and how to use its command-line tool.