Tesseract is an optical text recognition (OCR) engine developed by Google. Tesseract allows to recognize the text in image and supports more than 100 languages. Tesseract is an open-source project, available under the Apache 2.0 license. Tesseract can be used with many programming languages through wrappers or directly from the command line.
This tutorial shows how to install Tesseract OCR on Raspberry Pi.
Connect to Raspberry Pi via SSH and execute the following commands to install Tesseract OCR:
sudo apt update sudo apt install -y tesseract-ocr
After installation we can check Tesseract OCR version.
Now we can test Tesseract OCR. First download image from the Internet using
tesseract command to recognize the text in image. First argument is the name of the image. Second argument is the name of the output file which will hold recognized text. We don’t need to provide the file extension (
txt extension will be appended).
tesseract test.png result cat result.txt
Results can be written to standard output with
tesseract test.png stdout
If we want completely remove any package with a name that starts with
tesseract and anything related to it we can execute this command:
sudo apt purge -y tesseract.*