Cumulative Univariate Analysis

Continue learning plots that can be used to undertake univariate analysis when there are certain data attributes.

Let’s extend our knowledge of univariate analysis with some advanced plots:

Cumulative histogram

A cumulative histogram is a graphical representation of the cumulative distribution of a numeric variable. It shows the cumulative frequency of the data points that are less than or equal to a certain value xx.

Advantages

Disadvantages

Can help us identify percentile values for a numeric variable

Can be tricky to identify specific values within a data set, as the focus is on the overall distribution of the data

Easy to interpret and aesthetically pleasing

--

Cumulative histogram: Plotly Express

All we have to do is add a cumulative=True argument.

Press + to interact
# Import libraries
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# Import dataset
golf = pd.read_csv('/usr/local/csvfiles/driving_distances.csv')
# Create the plot
fig = px.histogram(data_frame=golf,
x='avg_drive_distance',
cumulative=True)
# Show the plot
fig.show()

Cumulative histogram: Plotly graph objects

The process is the same. However, we can now add cumulative_enabled=True.

Press + to interact
# Import libraries
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# Import dataset
golf = pd.read_csv('/usr/local/csvfiles/driving_distances.csv')
# Create the plot
trace = go.Histogram(x=golf['avg_drive_distance'],
histnorm='probability density',
cumulative_enabled=True)
# Add to the figure
fig = go.Figure(data=[trace])
# Show the plot
fig.show()

Empirical cumulative distribution function

An empirical cumulative distribution function (ECDF) is similar to a cumulative ...