Creating a Histogram
Learn how to generate and customize histograms using Plotly Express.
We want to see how we can get the distribution of a sample of data and get an idea of where values are concentrated as well as how much variability/spread it has. We’ll do this by creating a histogram.
As always, we’ll start with the simplest possible example.
# Creating a subset of the poverty DataFrameimport pandas as pdpoverty = pd.read_csv('data/poverty.csv')df = poverty[poverty['is_country'] & poverty['year'].eq(2015)]## Generating the histogramimport plotly.express as pxgini = 'GINI index (World Bank estimate)'px.histogram(data_frame=df, x=gini)
- Lines 2–4: We open the
poverty
DataFrame and create a subset of it, containing only countries and data from the year2015
. - Lines 6–8: We import Plotly Express and run the
histogram
function withdf
as the argument to thedata_frame
parameter and the indicator of our choice for thex
parameter.
As a result, we get the histogram in the Jupyter Notebook set up below:
-
The axis was named using the indicator we chose, and the axis was given the title count. This is the default function that the
histogram
function uses, and it’s also clear from the hover box that we see when hovering over any of the bars. -
Here we learn that there are 18 countries with a Gini index that was in the interval (35, 39.9) in the year 2015. We have previously visualized this indicator by country (visualizing each and every country), but this time, we’re getting an idea of how many values are available in each bin and how those values are distributed.
-
We can see that the majority of countries have a Gini index between 25 and 40 and that the numbers get progressively lower the higher the Gini index becomes. This is valid for this particular year only, of course. ...