OCR and Read API
Learn about OCR and the Computer Vision Read API.
We'll cover the following
What is OCR?
Optical character recognition (OCR) is a technique for recognizing and extracting textual characters from images. The text may be handwritten, such as photos of lecture notes, or printed, such as pictures of products or screenshots of online conversations. Microsoft’s implementation of OCR enables it to recognize text in various languages.
Read API
The Computer Vision Read API supports the extraction of handwritten and printed text, numerical digits, and currency symbols from images. The API recognizes only English text from handwritten sources but supports many languages for printed text. Additionally, it can extract multi-language text from large, text-heavy images and PDF documents.
Input requirements
The Read API takes either images or documents as inputs with the following requirements:
- The files must be of one of the following formats: JPEG, PNG, BMP, PDF, and TIFF.
- The API can process 2000 pages of PDF and TIFF documents for the paid service or two pages for the free tier.
- The files must be smaller than 50 MB for the paid service or 6 MB for the free tier.
- The image dimensions must be between 50 x 50 pixels and 10000 x 10000 pixels.
Key features
The Read API enables us to do the following:
- Extract printed text in 73 languages and its location within the image. The user does not need to specify the language.
- Extract handwritten text in English.
- Choose pages (or ranges of pages) to be processed from large documents.