Read Printed and Handwritten Text
Learn to read text from images and PDFs using OCR.
We'll cover the following
Read printed and handwritten text
Using the OCR service, we can read the visible text in an image and convert it to a character stream. There are two steps to successfully using the OCR service:
- Call the Read API
- Get the read results
Let’s see both in detail.
Call the Read API
First, we’ll call the API with the image URL. We will use the read
method as follows:
read(read_image_url, language = None, pages = None, raw=True)
It will start an asynchronous process to read the image and will return an operation ID.
Other than the image URL, we can also specify the following parameters:
Name | Description |
---|---|
language (optional): |
There are a lot of languages you can use. Supported languages for handwritten and print text are different. See the Language Support page of the documentation for a list of supported languages. |
pages (optional): |
This option is only used for multi-page PDF and TIFF documents. |
Accepted inputs for pages
include:
- Single pages:
1, 2
will process pages 1 and 2. - Finite:
2-5
will process pages from 2 to 5. - Open-ended ranges:
5-
will process all the pages beginning from page 5. Similarly,-10
will process pages from 1 to 10.
The code for the read
method is on lines 13–16 in the code below.
Get the read results
We’ll get an operation ID from the results returned by the read
method. Using this ID, we query the results from the service.
We use the get_read_result
method to get the results of the image. The code on lines 24–28 checks the operation every one second until the results are returned. The one-second interval is added because the API might not have processed the image by the time we call the get_read_result
method.
The extracted data contains the following:
- Text: This is the string value of a recognized word or line.
- Bounding box: This is a list of eight integers that represent the bounding box of a recognized region, line, or word. These integers are a set of x and y coordinates ordered in the clockwise direction. They start from the top-left corner, then top-right, then bottom-right and end at the bottom left.
Now, let’s look at the complete demo to see how it actually works.
Complete demo
In this demo, we’ll take an image and print each word as well as its bounding box. For this, we’ll follow the process explained above.
- Call the
read
method. This will give us an operation ID. - Send this ID to the
get_read_result
method to get the final results.
Let’s read the words written in the image given below.
As we can see, the image has three words: “GROW”, “GOOD”, and “VIBES”. The code given below will return each word and its bounding box.
The image URL is provided at line 10. Feel free to change it and test the API on other images.
from azure.cognitiveservices.vision.computervision import ComputerVisionClientfrom azure.cognitiveservices.vision.computervision.models import OperationStatusCodesfrom msrest.authentication import CognitiveServicesCredentialsimport timecomputervision_client = ComputerVisionClient("{{ENDPOINT}}", CognitiveServicesCredentials("{{SUBSCRIPTION_KEY}}"))# Get an image with textread_image_url = "https://images.pexels.com/photos/4577514/pexels-photo-4577514.jpeg"# Call API with URL and raw response (allows you to get the operation location)read_response = computervision_client.read(read_image_url, \language = None, \pages = None, \raw=True)# Get the operation location (URL with an ID at the end) from the responseread_operation_location = read_response.headers["Operation-Location"]# Grab the ID from the URLoperation_id = read_operation_location.split("/")[-1]# Call the "GET" API and wait for it to retrieve the resultswhile True:read_result = computervision_client.get_read_result(operation_id)if read_result.status not in ['notStarted', 'running']:breaktime.sleep(1)# Print the detected text, line by lineif read_result.status == OperationStatusCodes.succeeded:for text_result in read_result.analyze_result.read_results:for line in text_result.lines:print(line.text)print("Bounding box: ",line.bounding_box)print()
The output shows all the words and their bounding boxes.