Amazon Textract

Get a thorough understanding of Amazon Textract, its features, and its working.

Optical Character Recognition (OCR) is a process to convert images into text documents. Companies use OCR technologies to digitize text and data from documents such as PDFs, scanned images, and physical records. However, the OCR technologies had their limitations, as they were unable to extract text from some layouts, such as forms and tables. This did not fulfill the requirements of companies to accurately identify and extract data from any file type.

Recognizing the shortcomings of traditional OCR technologies, Amazon introduced a new machine learning service, Amazon Textract. It allows accurate text extraction from documents and layouts of any type. It can also detect typed and handwritten text from records and reports and can be integrated into applications through the Textract API.

Press + to interact

Primary features of Amazon Textract

Amazon Textract revolutionizes document processing with its advanced features. Let’s look at some of its primary features below:

Extract layout elements

One of the features of Amazon Textract is its ability to extract layout elements from documents. These elements include paragraphs, lists, page numbers, footers, headers, figures, tables, titles, and section headers. The layout feature can be used separately in an application or with other analyzed document features through Amazon Textract Analyze Document API.

Create custom queries

In addition to pretrained queries, Amazon Textract also provides Custom Queries. The ...