Search⌘ K

Call Functions on Pandas DataFrames Values

Explore how to call functions on pandas DataFrame values, focusing on Unicode normalization issues that affect string comparisons. Understand why some cities appear unequal due to encoding differences and learn practical methods, like using unicodedata and str.casefold, to handle case-insensitive and normalized string matching in pandas data.

We'll cover the following...

Try it yourself

Try executing the code below to see the result.

import pandas as pd

cities = pd.DataFrame([
  ('Vienna', 'Austria', 1_899_055),
  ('Sofia', 'Bulgaria', 1_238_438),
  ('Tekirdağ', 'Turkey', 1_055_412),
], columns=['City', 'Country', 'Population'])

def population_of(city):
  return cities[cities['City'] == city]['Population']

city = 'Tekirdağ'
print(population_of(city))
How to retrieve specific data from a DataFrame

Explanation

The output is telling us that Tekirdağ couldn’t be found in the cities DataFrame. But, it’s clearly there!

Let’s investigate the code below:

In [1]: city
Out[1]: 'Tekirdağ'
In [2]: city2 = cities.loc[2]['City']
In [3]: city2
Out[3]: 'Tekirdağ'
In [4]: city2 == city
Out[4]: False
In [5]: len(city)
Out[5]: 9
In [6]: len(city2)
Out[6]: 8
...