...

/

Metadata Treatment

Metadata Treatment

Learn how to gather, modify, and delete the various types of Metadata embedded within a PDF File.

Introduction

Metadata is typically populated by PDF conversion applications. It encloses relatively common fields showing the document version, creation date, and creation program, among others. Some overlooked attributes merit a closer look in case you want to dive into PDF analysis.

Scope

The objective of this lesson is to show how to extract, update, and delete the metadata of a PDF file using the Python programming language.

Prerequisites

We need two libraries for metadata manipulation:

PyPDF4

It is a pure-python PDF library best suited to split, merge, crop, and transform the pages of a PDF file. Additionally, it can retrieve text and metadata from PDFs.

Pikepdf

It is a library intended for developers to create, manipulate, and parse the PDF format. It supports reading and writing PDFs, including creating from scratch.

Library Version
PyPDF4 1.27.0
Pikepdf 3.0.0
...