pandas provides the useful function value_counts()
to count unique items – it returns a Series
with the counts of unique values.
First of all, let’s see a simple example without any parameter:
import pandas as pdimport numpy as np# create a dataframe with one columndf = pd.DataFrame({"col1": ["a", "b", "a", "c", "a", "a", "a", "c"]})# print the dataframe objectprint(df)# line breakprint("=" * 30)# counting unique itemsitem_counts = df["col1"].value_counts()print(item_counts)
line 10
you can see the result, which is a count of the column col1
.Sometimes, we don’t care about the exact number for each item of one column, but we care about the relative percentage. Setting normalize=True
can return the percentage instead of the counts.
import pandas as pd# create a dataframe with one columndf = pd.DataFrame({"col1": ["a", "b", "a", "c", "a", "a", "a", "c"]})# setting normalize=Trueitem_counts = df["col1"].value_counts(normalize=True)print(item_counts)
line 7
sets normalize=True
.
From the output of line 8
, you can see the difference from the last demo: the output is a relative percentage, not counts.
value_counts()
can handle continuous values as well as category values. By setting bin=n
parameters, you can group those continuous values into n
groups.
import pandas as pdimport numpy as np# create a array with random value between 0 and 1data = np.random.random((30,))# create a DataFrame object from arraydf = pd.DataFrame(data, columns=["col1"])# show the first five rows of this dataframe objectprint(df.head())# line breakprint("=" * 30)# set bins=8value_bins = df['col1'].value_counts(bins=8)print(value_bins)
line 12
sets bins=8
.
From the output of line 13
, you can see that the original values are grouped into 8 bins.