Fixing the Columns
Learn the steps to fix the columns of a dataset.
Understanding the dataset's columns
As a first step when cleaning data, we retrieve the columns and apply standard data wrangling techniques. The goal is to ensure column names are easy to read and reference later during analysis.
Press + to interact
main.py
employees.csv
NAME,CITY,COUNTRY, HEIGHT ,WEIGHT, ACCOUNT A,ACCOUNT B,TOTAL ACCOUNTKevin Hart,MELBOURNE,AUSTRALIA,57,134,2392,4342,6734Judith Elliot,MANCHESTER,UNITED KINGDOM,61,167,4502,34334,38836Lydia Carrasco,Oslo,Norway,56,119,,5505,8950Jane Mattew,AMSTERDAM,NEDERLANDS,59,123,4346,9000,400Von Gard,Berlin,GERMANYY,,127,7002,19002,26004Juio Hernade,Mexico City,MEXICO,67,168,5000,4000,3452Lydia Carrasco,Oslo,Norway,,119,3445,5505,8950Judith Elliot,MANCHESTER,UNITED KINGDOM,61,,4500,2300,6800Juio Hernade,Mexico City,MEXICO,67,168,5000,4000,3452Judith Elliot,MANCHESTER,UNITED KINGDOM,61,167,4502,34334,38836Lydia Carrasco,Oslo,Norway,56,119,,5505,8950
Let’s review the code line by line:
Line 1: We import the pandas library.
Line 2: We load the
employees.csv
dataset.Line 3: We retrieve column names from a DataFrame using the
columns
property and print them using theprint()
function.
As we can see, the output comprises a list of the DataFrame column names. We can also see that the columns HEIGHT
, WEIGHT
, and ACCOUNT A
have spaces as part of the column names. We'll remove these spaces in the next section. ...