Search⌘ K
AI Features

ETL Example—Load

Explore how to complete the ETL pipeline by loading cleaned data into PostgreSQL. Learn to create and use Bash scripts to manage the database environment and SQL scripts to create tables and import CSV data, ensuring a smooth transfer from processing to storage.

We'll cover the following...

The last step in the ETL pipeline is to load the cleaned and processed data stored in clean_data.csv into a PostgreSQL database.

Connecting to the database

First, we create a Bash script called load_data.sh that will switch users from “root” to “postgres,” start a Postgres database, and run an SQL script to load the data. The SQL script will be stored in a file called load_data.sql.

Shell
#!/bin/bash
# Setting the project location as an environment variable
export WORKDIR='/usercode/ETL_Lottery'
# Setting Postgres bin and data directories as environment variables
export POSTGRES_BIN='/usr/lib/postgresql/16/bin'
export POSTGRES_DATA='/var/lib/postgresql/data'
# Create and start a new PostgreSQL database cluster.
# Connect to Postgres and run the file: load_data.sql
su - postgres -c "
${POSTGRES_BIN}/initdb -D ${POSTGRES_DATA}
${POSTGRES_BIN}/pg_ctl -D ${POSTGRES_DATA} start
psql postgres -f ${WORKDIR}/load_data.sql
"
Deploying a PostgreSQL database: load_data.sh
  • Lines 4–8: We set the project location and Postgres bin directory as environment variables.

  • ...