The pandas library in Python is used to work with dataframes that structure data in rows and columns. It is widely used in data analysis and machine learning.
The loc
operator is used to index a portion of the dataframe. loc
supports indexing both by row and column names and by using boolean expressions.
The loc
operator can take in two arguments: rows and columns.
Rows will be in the form of row numbers, whereas column names need to be specified for columns. The syntax is as follows:
dataframe.loc[rows, columns]
Row numbers are inclusive in
loc
.
We can mention row numbers in the form of a range, such as 0:5
. The syntax will be as follows:
df.loc[0:5, "column1"]
We can also index rows separately by enclosing them as a list. The syntax will be as follows:
df.loc[[2,4,5], "column1"]
Similarly, we can index a single column using the column name. If we do not enclose it within []
, a series is returned. The syntax will be as follows:
df.loc[[2,4,5], "column1"]
If we enclose it within []
, a dataframe is returned. The syntax is as follows:
df.loc[[2,4,5], ["column1"]]
The code snippet below shows how we can use the loc
operator for rows and columns:
import pandas as pd# Creating a dataframedf = pd.DataFrame({'Sports': ['Football', 'Cricket', 'Baseball', 'Basketball','Tennis', 'Table-tennis', 'Archery', 'Swimming', 'Boxing'],'Player': ["Messi", "Afridi", "Chad", "Johnny", "Federer","Yong", "Mark", "Phelps", "Khan"],'Rank': [1, 9, 7, 12, 1, 2, 11, 1, 1] })print(df.loc[0:5, ['Player', 'Rank']]) # using row range and multiple columnsprint('\n')print(df.loc[[1,2,3], "Player"]) # Using specific rows and returning a seriesprint('\n')print(df.loc[[1,2,3], ["Player"]]) # Using specific rows and returning a dataframe
We can also index the dataframe by placing boolean expressions within loc
. The syntax is as follows:
dataframe.loc[expression]
Boolean expressions use conditions and operators, such as
==
,>
, and<
.
The code snippet below shows loc
using boolean expressions:
import pandas as pd# Creating a dataframedf = pd.DataFrame({'Sports': ['Football', 'Cricket', 'Baseball', 'Basketball','Tennis', 'Table-tennis', 'Archery', 'Swimming', 'Boxing'],'Player': ["Messi", "Afridi", "Chad", "Johnny", "Federer","Yong", "Mark", "Phelps", "Khan"],'Rank': [1, 9, 7, 12, 1, 2, 11, 1, 1] })print(df.loc[df["Rank"]== 1])print('\n')print(df.loc[df["Sports"] == "Football"])
Free Resources