Metadata Treatment
Learn how to gather, modify, and delete the various types of Metadata embedded within a PDF File.
Introduction
Metadata is typically populated by PDF conversion applications. It encloses relatively common fields showing the document version, creation date, and creation program, among others. Some overlooked attributes merit a closer look in case you want to dive into PDF analysis.
Scope
The objective of this lesson is to show how to extract, update, and delete the metadata of a PDF file using the Python programming language.
Prerequisites
We need two libraries for metadata manipulation:
PyPDF4
It is a pure-python PDF library best suited to split, merge, crop, and transform the pages of a PDF file. Additionally, it can retrieve text and metadata from PDFs.
Pikepdf
It is a library intended for developers to create, manipulate, and parse the PDF format. It supports reading and writing PDFs, including creating from scratch.
Library | Version |
---|---|
PyPDF4 | 1.27.0 |
Pikepdf | 3.0.0 |