Read Printed and Handwritten Text

Learn to read text from images and PDFs using OCR.

Read printed and handwritten text

Using the OCR service, we can read the visible text in an image and convert it to a character stream. There are two steps to successfully using the OCR service:

  1. Call the Read API
  2. Get the read results

Let’s see both in detail.

Call the Read API

First, we’ll call the API with the image URL. We will use the read method as follows:

read(read_image_url, language = None, pages = None, raw=True)

It will start an asynchronous process to read the image and will return an operation ID.

Other than the image URL, we can also specify the following parameters:

Name Description
language (optional): There are a lot of languages you can use. Supported languages for handwritten and print text are different. See the Language Support page of the documentation for a list of supported languages.
pages (optional): This option is only used for multi-page PDF and TIFF documents.

Accepted inputs for pages include:

  • Single pages: 1, 2 will process pages 1 and 2.
  • Finite: 2-5 will process pages from 2 to 5.
  • Open-ended ranges: 5- will process all the pages beginning from page 5. Similarly, -10 will process pages from 1 to 10.

The code for the read method is on lines 13–16 in the code below.

Get the read results

We’ll get an operation ID from the results returned by the read method. Using this ID, we query the results from the service.

We use the get_read_result method to get the results of the image. The code on lines 24–28 checks the operation every one second until the results are returned. The one-second interval is added because the API might not have processed the image by the time we call the get_read_result method.

The extracted data contains the following:

  • Text: This is the string value of a recognized word or line.
  • Bounding box: This is a list of eight integers that represent the bounding box of a recognized region, line, or word. These integers are a set of x and y coordinates ordered in the clockwise direction. They start from the top-left corner, then top-right, then bottom-right and end at the bottom left.

Now, let’s look at the complete demo to see how it actually works.

Complete demo

In this demo, we’ll take an image and print each word as well as its bounding box. For this, we’ll follow the process explained above.

  1. Call the read method. This will give us an operation ID.
  2. Send this ID to the get_read_result method to get the final results.

Let’s read the words written in the image given below.

As we can see, the image has three words: “GROW”, “GOOD”, and “VIBES”. The code given below will return each word and its bounding box.

The image URL is provided at line 10. Feel free to change it and test the API on other images.

Press + to interact
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
import time
computervision_client = ComputerVisionClient("{{ENDPOINT}}", CognitiveServicesCredentials("{{SUBSCRIPTION_KEY}}"))
# Get an image with text
read_image_url = "https://images.pexels.com/photos/4577514/pexels-photo-4577514.jpeg"
# Call API with URL and raw response (allows you to get the operation location)
read_response = computervision_client.read(read_image_url, \
language = None, \
pages = None, \
raw=True)
# Get the operation location (URL with an ID at the end) from the response
read_operation_location = read_response.headers["Operation-Location"]
# Grab the ID from the URL
operation_id = read_operation_location.split("/")[-1]
# Call the "GET" API and wait for it to retrieve the results
while True:
read_result = computervision_client.get_read_result(operation_id)
if read_result.status not in ['notStarted', 'running']:
break
time.sleep(1)
# Print the detected text, line by line
if read_result.status == OperationStatusCodes.succeeded:
for text_result in read_result.analyze_result.read_results:
for line in text_result.lines:
print(line.text)
print("Bounding box: ",line.bounding_box)
print()

The output shows all the words and their bounding boxes.