ETL Pipeline Exercise: Transform
Learn about the transform social media pipeline using Apache Airflow.
To continue our pipeline implementation, we’ll now focus on transforming the extracted data. According to the business requirements and the schema of the data warehouse, there are a few issues we need to fix with our extracted data. They are:
To change the month format of all date columns from numerical to text (for example, from
08
toAug
)To remove tabs and new lines from columns
comment_text
andpost_text
To bin the number of followers into three categories, ...