ETL Transformation Example: Handling Missing Values and Data
Use Bash scripting to transform and process the raw lottery data before loading.
We'll cover the following...
Task 3: Delete rows with null values
As we might’ve noticed, some rows had null values. According to the data scientist, all rows with null values must be dropped.
Press + to interact
# *** *** ## *** Issue #3 - Get rid of all rows with null values *** ## *** *** #echo -e "\nTask #3 - Searching for Null Values"extract_nulls(){# Create an empty file called null_values.txt> null_values.txt# extract all lines with "null" and append to the filecat raw_data.csv | grep null >> null_values.txt}extract_nullsdelete_nulls(){if [ -s null_values.txt ]thenecho -e "\nFound Null Values:"cat null_values.txt# Grab the dates of linesnull_dates=$(cut --delimiter "|" --fields 1 null_values.txt)echo -e "\nRemoving Null Values..."# Remove all rows with null dates from raw_data.csvgrep -v "$null_dates" raw_data.csv > temp.csv; mv temp.csv raw_data.csvecho -e "Done."elseecho "There Are No Null Values"fi}delete_nulls
We’ve created a file called transform_data_3_null.sh
to perform these operations. Let’s walk through the code line by line.
The script contains two functions,
extract_nulls()
anddelete_nulls()
.Line 8: We create an empty ...