Read printed and handwritten text

Using the OCR service, we can read the visible text in an image and convert it to a character stream. There are two steps to successfully using the OCR service:

Call the Read API
Get the read results

Let’s see both in detail.

Call the Read API

First, we’ll call the API with the image URL. We will use the read method as follows:

read(read_image_url, language = None, pages = None, raw=True)

It will start an asynchronous process to read the image and will return an operation ID.

Other than the image URL, we can also specify the following parameters:

Name	Description
`language` (optional):	There are a lot of languages you can use. Supported languages for handwritten and print text are different. See the Language Support page of the documentation for a list of supported languages.
`pages` (optional):	This option is only used for multi-page PDF and TIFF documents.

Accepted inputs for pages include:

Single pages: 1, 2 will process pages 1 and 2.
Finite: 2-5 will process pages from 2 to 5.
Open-ended ranges: 5- will process all the pages beginning from page 5. Similarly, -10 will process pages from 1 to 10.

The code for the read method is on lines 13–16 in the code below.

Get the read results

We’ll get an operation ID from the results returned by the read method. Using this ID, we query the results from the service.

We use the get_read_result method to get the results of the image. The code on lines 24–28 checks the operation every one second until the results are returned. The one-second interval is added because the API might not have processed the image by the time we call the get_read_result method.

The extracted data contains the following:

Text: This is the string value of a recognized word or line.
Bounding box: This is a list of eight integers that represent the bounding box of a recognized region, line, or word. These integers are a set of x and y coordinates ordered in the clockwise direction. They start from the top-left corner, then top-right, then bottom-right and end at the bottom left.

Now, let’s look at the complete demo to see how it actually works.

Press + to interact

Python 3.5

from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from msrest.authentication import CognitiveServicesCredentials
import time
computervision_client = ComputerVisionClient("{{ENDPOINT}}", CognitiveServicesCredentials("{{SUBSCRIPTION_KEY}}"))
# Get an image with text
read_image_url = "https://images.pexels.com/photos/4577514/pexels-photo-4577514.jpeg"
# Call API with URL and raw response (allows you to get the operation location)
read_response = computervision_client.read(read_image_url, \
                language = None, \
                pages = None, \
                raw=True)
# Get the operation location (URL with an ID at the end) from the response
read_operation_location = read_response.headers["Operation-Location"]
# Grab the ID from the URL
operation_id = read_operation_location.split("/")[-1]
# Call the "GET" API and wait for it to retrieve the results 
while True:
    read_result = computervision_client.get_read_result(operation_id)
    if read_result.status not in ['notStarted', 'running']:
        break
    time.sleep(1)
# Print the detected text, line by line
if read_result.status == OperationStatusCodes.succeeded:
    for text_result in read_result.analyze_result.read_results:
        for line in text_result.lines:
            print(line.text)
            print("Bounding box: ",line.bounding_box)
print()

Get Started

Optical Character Recognition

Image Analysis

Sample Application

Conclusion

Read Printed and Handwritten Text