Introduction to OCR using Computer Vision's Read API
Learn about optical character recognition and how it works with Azure Computer Vision's Read API.
We'll cover the following
Introduction to OCR
The term OCR stands for Optical Character Recognition. Optical Character Recognition deals with the problem of recognizing all the different handwritten and printed characters. These characters can be converted into a machine-readable, digital data format. OCR consists of several sub-processes to perform this operation in an efficient and accurate manner. The sub-processes are:
- Preprocessing of the image
- Text localization
- Character segmentation
- Character recognition
- Post processing
The processes mentioned in the above list could differ on a case by case basis, but these are the steps that would be needed to perform OCR on printed and handwritten characters.
The Read API
The Azure Computer Vision service provides a Read API. This API is used to extract text (both printed and handwritten) from images and multi-page PDFs. The Read API is designed in such a way that it can extract text from text-heavy and multi-page PDFs in an optimized manner.
Below is a snapshot taken from the official Microsoft Azure documentation to help us understand the functioning of this API with images and PDFs.
Get hands-on with 1200+ tech skills courses.