...

/

Indexing and Selection

Indexing and Selection

This lesson will focus on how to view, add, and rename columns of a Pandas dataframe.

Indexing is the technique of efficiently retrieving records from data based on some criteria that the data has been arranged by. As we saw in the previous lesson, the data is organized in rows and columns in a dataframe. So, we can index data using the positions and names of these rows and columns. Now let’s see how to select rows and columns from the data.

Columns

To view the names of the columns we use df.columns.values. We will be using the file housing.csv.

housing.csv

In Machine Learning terminology, a column in a spreadsheet is referred to as a feature, while in Statistics it is referred to as a variable. It is also referred to as an attribute. We will be using all of these terms interchangeably in this course.

Press + to interact
import pandas as pd
df = pd.read_csv('housing.csv')
# Print Column names
print(df.columns.values)
# Number of columns
num = len(df.columns)
print("number of columns: ",num)

Selecting columns

Let’s see how we can select the data of a few columns of housing.csv.

Press + to interact
import pandas as pd
df = pd.read_csv('housing.csv')
new_df = df['population']
print(new_df.head())
# Make a list of the columns to select
col_to_select = ['longitude','latitude','population', 'ocean_proximity']
# Collect the specified columns and print them
new_df = df[col_to_select]
print('\n\n',new_df.head())

We view the values in a column by simply typing their name as we did in line 4. In line 8, we create a list of columns that we want to select. In line 11, we retrieve those columns out of the dataframe df and save it as a new dataframe called new_df. Line 12 prints the head of the dataframe. ...

Access this course and 1400+ top-rated courses and projects.