A DataFrame is a commonly used 2-dimensional data structure. It is a table that consists of columns and rows and is primarily used as a pandas
object.
DataFrames require the pandas
library, as shown below.
import pandas as pd
A DataFrame can be formed as shown below.
In this example, we create a DataFrame that contains countries that have been put in different groups and are given different a_score
and b_score
.
Both scores are imaginary values for this example.
import pandas as pda_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]country = ['Pakistan', 'USA', 'Canada', 'Brazil','India', 'Beligium', 'Malaysia', 'Peru','England', 'Scotland']groups = ['A', 'A', 'B', 'A', 'B', 'B', 'C', 'A', 'C', 'C']df = pd.DataFrame({'group':groups,'country':country,'a_score':a_score,'b_score':b_score})print(df)
loc
and iloc
functionsThe loc
and iloc
functions allow the selection of rows and columns.
loc[]
: selection by labels
iloc[]
: selection by positions
Upper boundaries are included when you use
loc
, and are excluded when you useiloc
.
The prototypes of the loc
and iloc
functions are as follows.
df.loc[3:, ['country', 'a_score']]
df.iloc[2:, 3:]
loc
: the labels you want to select
iloc
: the positions you want to select
These functions return the filtered values.
The example below selects the first 3 rows and last 2 columns with loc
and iloc
.
import pandas as pda_score = [4, 5, 7, 8, 2, 3, 1, 6, 9, 10]b_score = [1, 2, 3, 4, 5, 6, 7, 10, 8, 9]country = ['Pakistan', 'USA', 'Canada', 'Brazil','India', 'Beligium', 'Malaysia', 'Peru','England', 'Scotland']groups = ['A', 'A', 'B', 'A', 'B', 'B', 'C', 'A', 'C', 'C']df = pd.DataFrame({'group':groups,'country':country,'a_score':a_score,'b_score':b_score})print("loc")print(df.loc[:2, ['country', 'group']])print("iloc")print(df.iloc[:3, 2:])