In this exercise, we’ll practice using the COPY command to incrementally load data. In incremental loading, we only load the recent changes in the source into the destination repository. As always, the logic by which we do it depends on the business requirement and context.

Example

As data engineers working for a retail company, we’re tasked with creating an ETL pipeline for loading customers’ log activity data from a transactional database into a PostgreSQL database for analysis.

According to the business requirements, new batches of data should be loaded at the end of each day. Also, to save storage costs, the PostgreSQL database should only contain data from the last month. After creating the extract and transform steps of our pipeline, we’ll now focus on the load step. The extract and transform steps results are a CSV file containing the log activity data for the last day.

Our task is to create a table for storing the log activity data and write a PostgreSQL function that does the following:

  1. Load the data from the CSV file into the PostgreSQL database.

  2. Check if there’s outdated data in the database (data that is older than one month) and delete it.

Here’s a sample of the log activity data:

Get hands-on with 1400+ tech skills courses.