Data Typing and Structuring Using Python
Learn how to apply data typing and structuring as a part of the transform stage in an ETL pipeline.
We'll cover the following
Data typing and data structuring often go hand in hand and are some of the first transformations we need to do. In ETL pipelines, the destination schema or repository will heavily dictate the type of typing and structuring we need.
Data typing
Data typing involves converting a column or multiple columns of data to a standardized data type such as integer, float, string, or boolean or changing how the data is represented.
An ETL pipeline usually extracts data from many different sources, and the extracted data is expected to have some inconsistencies. For example, we often need to convert date formats from different sources, like DD/MM/YYYY or MM/DD/YYYY, to a common format, like YYYY-MM-DD, to enable easier comparison, processing, and loading of the data. This is an example of using data typing and changing a column of data from one string representation (DD/MM/YYYY) to another (MM/DD/YYYY).
Data typing helps ensure that the transformed data is consistent and is critical for meeting the data type constraints of the destination schema.
Get hands-on with 1400+ tech skills courses.