OCR with Deep Learning in PyTorch (EasyOCR)

Part 1: A Beginner Guide of OCR with Python Code

3 min readApr 30, 2023

1. Introduction

If your business workflow involves extracting text from images, you need a process called Optical Character Recognition (OCR). This is the extraction and recognition of text from images such as scanned documents, camera images, image-only pdfs, posters, street signs or receipts.

Modern OCR uses machine learning techniques to train computers to read the text inside images. Specifically, deep Neural Networks (NN) are trained to analyze the text in images over many levels, and combine that to get the final text result as a sequence of characters. In general, two types of networks are required: 1) a network to extract features from the images e.g. Convolutional NN (CNN), and 2) a network to generate the output sequence of characters e.g. Recurrent NN (RNN).

Next, we briefly outline EasyOCR, one of the most straightforward Python packages for text recognition. Then, a simple Python code implementation is explained.

2. EasyOCR

Ready-to-use OCR package with 80+ supported languages and all popular writing scripts including: Latin, Chinese, Arabic, etc. All deep learning execution is based on PyTorch. The following framework outlines the pipeline of the package:

As shown, the project is based on research and code from several papers (e.g. CRAFT and CRNN) and open-source repositories. CRNN combines convolutional and recurrent neural networks. The network architecture consists of three parts: 1) convolutional layers, which extract a feature sequence from the input image; 2) recurrent layers, which predict a label distribution for each frame; 3) transcription layer, which translates the per-frame predictions into the final label sequence.

CRNN network architecture (LSTM in the recurrent layer stands for Long Short-Term Memory, and is a common example of RNNs).

3. Code

The following lines of code recognize the text(s) existing in a given image (image_filename) and display the results:

import cv2
import easyocr
import matplotlib.pyplot as plt

# This needs to run only once to load the model into memory
reader = easyocr.Reader(['en'])

# reading the image
img = cv2.imread(image_filename)

# run OCR
results = reader.readtext(img)

# show the image and plot the results
plt.imshow(img)
for res in results:
    # bbox coordinates of the detected text
    xy = res[0]
    xy1, xy2, xy3, xy4 = xy[0], xy[1], xy[2], xy[3]
    # text results and confidence of detection
    det, conf = res[1], res[2]
    # show time :)
    plt.plot([xy1[0], xy2[0], xy3[0], xy4[0], xy1[0]], [xy1[1], xy2[1], xy3[1], xy4[1], xy1[1]], 'r-')
    plt.text(xy1[0], xy1[1], f'{det} [{round(conf, 2)}]')

The code starts by importing the required packages (opencv to read the image, easyocr for the main job and matplotlib to show the results). The pre-trained model that will be used for text recognition is loaded into memory by easyocr.Reader(), where we choose ‘en’ to set the required language to English. In case you plan to run this script on several images, please note that this line needs to be run only once.

After reading the image into img (using cv2.imread()), reader.readtext runs OCR to detect and recognize all instances of text inside the image. The output is saved into a list of detections (results). We then show the image (using plt.imshow()) before displaying the detected text(s).

For each detection res in results, the first item res[0] represents the xy image coordinates of the bounding box containing the detected text. The next two items res[1] and res[2] represent the detected text and confidence of detection, respectively. The last two lines of the code plots a red rectangle around each text detection, and prints the text and detection confidence at the top left corner of the rectangle.

References

EasyOCR on GitHub.

CRNN Paper.

OCR by Microsoft, IBM and Amazon.

Stay tuned for the next tutorial explaining more advanced OCR (with Python implementation).