...

/

ETL Transformation Example: Handling Missing Values and Data

ETL Transformation Example: Handling Missing Values and Data

Use Bash scripting to transform and process the raw lottery data before loading.

Task 3: Delete rows with null values

As we might’ve noticed, some rows had null values. According to the data scientist, all rows with null values must be dropped.

Press + to interact
# *** *** #
# *** Issue #3 - Get rid of all rows with null values *** #
# *** *** #
echo -e "\nTask #3 - Searching for Null Values"
extract_nulls(){
# Create an empty file called null_values.txt
> null_values.txt
# extract all lines with "null" and append to the file
cat raw_data.csv | grep null >> null_values.txt
}
extract_nulls
delete_nulls(){
if [ -s null_values.txt ]
then
echo -e "\nFound Null Values:"
cat null_values.txt
# Grab the dates of lines
null_dates=$(cut --delimiter "|" --fields 1 null_values.txt)
echo -e "\nRemoving Null Values..."
# Remove all rows with null dates from raw_data.csv
grep -v "$null_dates" raw_data.csv > temp.csv; mv temp.csv raw_data.csv
echo -e "Done."
else
echo "There Are No Null Values"
fi
}
delete_nulls

We’ve created a file called transform_data_3_null.sh to perform these operations. Let’s walk through the code line by line. 

  • The script contains two functions, extract_nulls() and delete_nulls().

  • Line 8: We create an empty ...