ETL Example—Scheduling

Load the clean and transformed data into a PostgreSQL database and learn how to schedule tasks using cron.

We'll cover the following

The company wants us to schedule the ETL pipeline we built so it would run once a week on Monday at 09:00 a.m. That way, the data scientist will have updated data as new data about lottery numbers comes in without bothering us to deploy the ETL processes repeatedly. To schedule the ETL pipeline, we’ll use cron.

Cron

Cron is a command-line utility for scheduling jobs on Unix operating systems. Assuming we’re operating on one, we can easily schedule commands or shell scripts to run automatically and on a schedule. Tasks scheduled using cron are also known as cronjobs.

Cron is a very useful tool for repetitive tasks like the one we just built. To create cronjobs, we must insert the right syntax in the crontab, a file that stores all the scheduled jobs.

We can edit the crontab file and add tasks by using the command crontab -e. This will launch an interface provided by the system for users to manipulate their cronjobs and store them in /var/spool/cron/crontabs.

Syntax

The syntax for creating a cronjob is five characters representing the time interval, followed by the command to run.

Get hands-on with 1400+ tech skills courses.