...

/

Continuous vs. Categorical Bivariate Analysis: ECDF & Violin Plot

Continuous vs. Categorical Bivariate Analysis: ECDF & Violin Plot

Extend your knowledge on bivariate analysis, learning how to create more plots to visualize a continuous variable against a categorical variable.

Bar chart

Here we explore the concept of a bar chart and where it is most useful.

A bar chart is a type of graph used to display and compare the frequency, total, or average values of categorical data values. It consists of rectangular bars, with the height or length of each bar representing the value of the data for a specific category. Bar charts are used to compare data between categories and to visualize patterns, trends, and relationships in data.

Advantages

Disadvantages

Extremely easy to interpret

When the differences between categories are small, bar plots become difficult to interpret, as the bars then become to similar in size and thus hard to distinguish the value difference between them

Allows us to compare values within a categorical variable

Perhaps overly simplified. This can be viewed as both an advantage and disadvantage

Can be placed on the same figure as other visualizations a scatterplot for added interpretability

--

Credit scores by country

Suppose we wanted to get the average of the CreditScore variable per country.

We’ll use pandas functionality to find the average credit score per country and create a DataFrame of the results. We’ll do it using the groupby functionality:

gb = churn.groupby('Geography')['CreditScore'].mean().sort_values(ascending=True).reset_index()

This creates the following table:

Geography CreditScore
France 649.668329
Spain 651.333872
Germany 651.453567

Bar chart: Plotly Express

Let’s see what the bar charts of Plotly Express look like:

Press + to interact
# Import libraries
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# Import dataset
churn = pd.read_csv("/usr/local/csvfiles/churn.csv")
# Create dataset
gb = churn.groupby('Geography')['CreditScore'].mean().sort_values(ascending=True).reset_index()
# Plot data
fig = px.bar(data_frame=gb, x='Geography', y='CreditScore')
# Show the plot
fig.show()

Bar Chart: Plotly graph objects

The process is very simple when using graph objects too. Here we use the Bar class from the graph objects library in the creation of the plot, where we pass in our desired columns as x and y.

Press + to interact
# Import libraries
import plotly.express as px
import plotly.graph_objects as go
import pandas as pd
import numpy as np
# Import dataset
churn = pd.read_csv("/usr/local/csvfiles/churn.csv")
# Create dataset
gb = churn.groupby('Geography')['CreditScore'].mean().sort_values(ascending=True).reset_index()
# Create our bar chart
trace = go.Bar(x=gb['Geography'], y=gb['CreditScore'])
# Create go.Figure() object
fig = go.Figure(data=[trace])
# # Make changes to layout
fig.update_layout(title='Average Credit Score by Country',
xaxis_title='Country',
yaxis_title='Average Credit Score')
# Show figure to the screen
fig.show()

Bivariate violin plot

Here we will create a violin plot with the customers’ estimated salary and group it per country. ...