Indexing and Selection
This lesson will focus on how to view, add, and rename columns of a Pandas dataframe.
We'll cover the following...
Indexing is the technique of efficiently retrieving records from data based on some criteria that the data has been arranged by. As we saw in the previous lesson, the data is organized in rows and columns in a dataframe. So, we can index data using the positions and names of these rows and columns. Now let’s see how to select rows and columns from the data.
Columns
To view the names of the columns we use df.columns.values
. We will be using the file housing.csv.
In Machine Learning terminology, a column in a spreadsheet is referred to as a feature, while in Statistics it is referred to as a variable. It is also referred to as an
attribute
. We will be using all of these terms interchangeably in this course.
import pandas as pddf = pd.read_csv('housing.csv')# Print Column namesprint(df.columns.values)# Number of columnsnum = len(df.columns)print("number of columns: ",num)
Selecting columns
Let’s see how we can select the data of a few columns of housing.csv.
import pandas as pddf = pd.read_csv('housing.csv')new_df = df['population']print(new_df.head())# Make a list of the columns to selectcol_to_select = ['longitude','latitude','population', 'ocean_proximity']# Collect the specified columns and print themnew_df = df[col_to_select]print('\n\n',new_df.head())
We view the values in a column by simply typing their name as we did in line 4. In line 8, we create a list of columns that we want to select. In line 11, we retrieve those columns out of the dataframe df
and save it as a new dataframe called new_df
. Line 12 prints the head
of the dataframe. ...