Cumulative Univariate Analysis
Continue learning plots that can be used to undertake univariate analysis when there are certain data attributes.
Let’s extend our knowledge of univariate analysis with some advanced plots:
Cumulative histogram
A cumulative histogram is a graphical representation of the cumulative distribution of a numeric variable. It shows the cumulative frequency of the data points that are less than or equal to a certain value .
Advantages | Disadvantages |
Can help us identify percentile values for a numeric variable | Can be tricky to identify specific values within a data set, as the focus is on the overall distribution of the data |
Easy to interpret and aesthetically pleasing | -- |
Cumulative histogram: Plotly Express
All we have to do is add a cumulative=True
argument.
# Import librariesimport plotly.express as pximport plotly.graph_objects as goimport pandas as pdimport numpy as np# Import datasetgolf = pd.read_csv('/usr/local/csvfiles/driving_distances.csv')# Create the plotfig = px.histogram(data_frame=golf,x='avg_drive_distance',cumulative=True)# Show the plotfig.show()
Cumulative histogram: Plotly graph objects
The process is the same. However, we can now add cumulative_enabled=True
.
# Import librariesimport plotly.express as pximport plotly.graph_objects as goimport pandas as pdimport numpy as np# Import datasetgolf = pd.read_csv('/usr/local/csvfiles/driving_distances.csv')# Create the plottrace = go.Histogram(x=golf['avg_drive_distance'],histnorm='probability density',cumulative_enabled=True)# Add to the figurefig = go.Figure(data=[trace])# Show the plotfig.show()
Empirical cumulative distribution function
An empirical cumulative distribution function (ECDF) is similar to a cumulative ...