How to Install Tesseract-OCR on macOS

How to Install Tesseract-OCR on macOS

Getting Started with Image Processing and Text Recognition

From the Pytesseract's documentation on The Python Package Index, Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images.

Python-tesseract is a wrapper for Google’s Tesseract-OCR Engine.

tesseract gif.gif

While working on image processing/text recognition with python, you might have tried to use pytesseract. And of course, the first step is to install dependencies.

Installing pytesseract on your macOS is as straightforward as running pip install pytesseract on your terminal, but you, most likely, would encounter an error when you get into code.

from PIL import Image
import pytesseract

# To get text from image
print(pytesseract.image_to_string(Image.open('test.png')))

When you run the above code in your IDE, you should get a TesseractNotFoundError.

TesseractNotFoundError.PNG

Let's walk through how we can fix this error.

Step 1 (Optional)

Install MacPorts

If you already have MacPorts installed, you can skip this step.
See MacPorts to check for and install the right version for your Mac operating system.

Make sure your have enough power. I spent about 2 hours installing MacPorts.

Step 2

Update the port tree

Run sudo port selfupdate on your terminal to update the port tree.

Screenshot 2022-07-26 at 12.30.12.png

Step 3

Install Tesseract

Run sudo port install tesseract on your terminal.

Screenshot 2022-07-26 at 15.24.50.png

Step 4

Install Imagemagick

You would need a graphic file converter and image adjustment package to support Tesseract to read jpeg.
Run sudo port install tesseract on your terminal.

You can now try the code block above again and won't get a TesseractNotFoundError. But a new error occurs.

NoLanguageSupportError.png

This error has come up because we have not installed tesseract's English data support, which is the default language setting.

Step 5

Install a language data support

Tesseract currently supports over 130 languages. For this article, we would install english. Run sudo port install tesseract-eng on your terminal. The English data support is large, so it could take 5 to 10+ minutes to complete the download.

Screenshot 2022-07-26 at 16.20.46.png

Installing other language data supports follows the same syntax. Run sudo port install tesseract-langcode, where langcode is a three-letter language code.

Run Code

We have successfully installed Tesseract-OCR and can now use Pytesseract for object character/text recognition. Running the code block with the cover image as a test, we get this output.

Result.png

You can reference Roel Van de Paar's solutions if this doesn't work for you.

That's it !!! We have successfully installed Tesseract-OCR on the macOS. 😌