Discover how to manipulate PDFs using Python. Gain hands-on experience with real-life scenarios and broaden your knowledge in handling and processing PDF files efficiently.

PDF Management Using Python_468 x 60 copy.png

mypdftoolbox.tar.gz

pdf_compare

pdf_did_metadata

pdf_xmp_metadata

pdf_compute_checksum

pdf_merger

pdf_pages_splitter

pdf_pages_rotator

pdf_pages_remover

pdf_pages_shuffler

pdf_pages_watermarker

pdf_convert2img

pdf_extract_tables

pdf_extract_images

pdf_extract_links

pdf_annotator

pdf_redactor

pdf_parser

pdf_convert2docx

pdf_convert2pptx

pdf_compress

pdf_secure

pdf_crack

pdf_create

pdf_sign

pdf_scan

pdf_comment

pdf_compare_files

pdf_attach

pdf_extract_attachments

pdf_embed_js

pdf_change_rights

This course will provide you with hands-on experience in PDF manipulation using the Python programming language. It integrates the most common real-life scenarios into its proceedings and supplies you with a framework of "how to do it". 

This course is addressed to Python programmers who seek to broaden their knowledge in the Python programming language. Moreover, it targets those who are eager to gain in-depth experience in handling and processing PDF files which constitute a large part of our day-to-day lives.

PDF Management in Python

## Introduction ##

Metadata is typically populated by PDF conversion applications. It encloses relatively common fields showing the document version, creation date, and creation program, among others. Some overlooked attributes merit a closer look in case you want to dive into PDF analysis.

## Scope ##
 
The objective of this lesson is to show how to extract, update, and delete the metadata of a PDF file using the Python programming language.

## Prerequisites ##

We need two libraries for metadata manipulation:

### PyPDF4 ### 
It is a pure-python PDF library best suited to split, merge, crop, and transform the pages of a PDF file. Additionally, it can retrieve text and metadata from PDFs.

### Pikepdf ### 
It is a library intended for developers to create, manipulate, and parse the PDF format. It supports reading and writing PDFs, including creating from scratch. 

|Library|Version |
|:-| - |:-| - |
|PyPDF4|1.27.0|
|Pikepdf|3.0.0|

> The **Pikepdf** library allows PDF XMP metadata editing in contrast to the **PyPDF4** library. Therefore, we will leverage its capabilities during this lesson.

## Let’s start coding ##

By harnessing the capabilities of the **PyPDF4** library, we will define the functions `collect_did_metadata`, `update_did_metadata` and `collect_xmp_metadata`.

Next, we will rely on the **PikePDF** library to develop the functions `modify_metadata` and `delete_metadata`.

Afterward, we will utilize these functions in different scenarios to manipulate the metadata of sample PDF files.

Let's see what that looks like in code:


# Introduction ##

Metadata is typically populated by PDF conversion applications. It encloses relatively common fields showing the document version, creation date, and creation program, among others. Some overlooked attributes merit a closer look in case you want to dive into PDF analysis.

# Scope ##
 
The objective of this lesson is to show how to extract, update, and delete the metadata of a PDF file using the Python programming language.

# Prerequisites ##

We need two libraries for metadata manipulation:

## PyPDF4 ### 
It is a pure-python PDF library best suited to split, merge, crop, and transform the pages of a PDF file. Additionally, it can retrieve text and metadata from PDFs.

## Pikepdf ### 
It is a library intended for developers to create, manipulate, and parse the PDF format. It supports reading and writing PDFs, including creating from scratch. 

|Library|Version |
|:-| - |:-| - |
|PyPDF4|1.27.0|
|Pikepdf|3.0.0|

> The **Pikepdf** library allows PDF XMP metadata editing in contrast to the **PyPDF4** library. Therefore, we will leverage its capabilities during this lesson.

# Let’s start coding ##

By harnessing the capabilities of the **PyPDF4** library, we will define the functions `collect_did_metadata`, `update_did_metadata` and `collect_xmp_metadata`.

Next, we will rely on the **PikePDF** library to develop the functions `modify_metadata` and `delete_metadata`.

Afterward, we will utilize these functions in different scenarios to manipulate the metadata of sample PDF files.

Let's see what that looks like in code:


Learn how to gather, modify, and delete the various types of Metadata embedded within a PDF File. 

Metadata Treatment

Introduction

PDF Management Core Functions

Pages Processing

Content Processing

Document Processing

Conclusion

Appendices

Metadata Treatment

Introduction

Scope

Prerequisites

PyPDF4

Pikepdf

Library	Version
PyPDF4	1.27.0
Pikepdf	3.0.0