What is loc in pandas?

The pandas library in Python is used to work with dataframes that structure data in rows and columns. It is widely used in data analysis and machine learning.

The loc operator is used to index a portion of the dataframe. loc supports indexing both by row and column names and by using boolean expressions.

Indexing using rows and columns

The loc operator can take in two arguments: rows and columns.

Rows will be in the form of row numbers, whereas column names need to be specified for columns. The syntax is as follows:

dataframe.loc[rows, columns]

Row numbers are inclusive in loc.

We can mention row numbers in the form of a range, such as 0:5. The syntax will be as follows:

df.loc[0:5, "column1"]

We can also index rows separately by enclosing them as a list. The syntax will be as follows:

df.loc[[2,4,5], "column1"]

Similarly, we can index a single column using the column name. If we do not enclose it within [], a series is returned. The syntax will be as follows:

df.loc[[2,4,5], "column1"]

If we enclose it within [], a dataframe is returned. The syntax is as follows:

df.loc[[2,4,5], ["column1"]]

Example

The code snippet below shows how we can use the loc operator for rows and columns:

import pandas as pd
# Creating a dataframe
df = pd.DataFrame({'Sports': ['Football', 'Cricket', 'Baseball', 'Basketball',
'Tennis', 'Table-tennis', 'Archery', 'Swimming', 'Boxing'],
'Player': ["Messi", "Afridi", "Chad", "Johnny", "Federer",
"Yong", "Mark", "Phelps", "Khan"],
'Rank': [1, 9, 7, 12, 1, 2, 11, 1, 1] })
print(df.loc[0:5, ['Player', 'Rank']]) # using row range and multiple columns
print('\n')
print(df.loc[[1,2,3], "Player"]) # Using specific rows and returning a series
print('\n')
print(df.loc[[1,2,3], ["Player"]]) # Using specific rows and returning a dataframe

Indexing using a boolean expression

We can also index the dataframe by placing boolean expressions within loc. The syntax is as follows:

dataframe.loc[expression]

Boolean expressions use conditions and operators, such as ==, >, and <.

Example

The code snippet below shows loc using boolean expressions:

import pandas as pd
# Creating a dataframe
df = pd.DataFrame({'Sports': ['Football', 'Cricket', 'Baseball', 'Basketball',
'Tennis', 'Table-tennis', 'Archery', 'Swimming', 'Boxing'],
'Player': ["Messi", "Afridi", "Chad", "Johnny", "Federer",
"Yong", "Mark", "Phelps", "Khan"],
'Rank': [1, 9, 7, 12, 1, 2, 11, 1, 1] })
print(df.loc[df["Rank"]== 1])
print('\n')
print(df.loc[df["Sports"] == "Football"])

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved