

Manipulating Scanned PDF Files

Manipulating Scanned PDF Files

Learn to manipulate scanned PDF documents using the Google Tesseract OCR engine.


PDF documents are mainly created in two different ways. They are either generated by an electronic source, known as a native PDF, or by scanning in paper documents, known as a scanned PDF.

Native PDF documents contain an internal structure that can be read and interpreted, whereas scanned PDFs consist of scanned images, meaning that their content cannot be searched or edited.

Performing OCR on a scanned PDF

Optical Character Recognition (OCR) is an adaptive technology that turns printed or written text into an electronic character-based file using a visual recognition process.

For ...