...
/Continuous vs. Categorical Bivariate Analysis: ECDF & Violin Plot
Continuous vs. Categorical Bivariate Analysis: ECDF & Violin Plot
Extend your knowledge on bivariate analysis, learning how to create more plots to visualize a continuous variable against a categorical variable.
We'll cover the following...
Bar chart
Here we explore the concept of a bar chart and where it is most useful.
A bar chart is a type of graph used to display and compare the frequency, total, or average values of categorical data values. It consists of rectangular bars, with the height or length of each bar representing the value of the data for a specific category. Bar charts are used to compare data between categories and to visualize patterns, trends, and relationships in data.
Advantages | Disadvantages |
Extremely easy to interpret | When the differences between categories are small, bar plots become difficult to interpret, as the bars then become to similar in size and thus hard to distinguish the value difference between them |
Allows us to compare values within a categorical variable | Perhaps overly simplified. This can be viewed as both an advantage and disadvantage |
Can be placed on the same figure as other visualizations a scatterplot for added interpretability | -- |
Credit scores by country
Suppose we wanted to get the average of the CreditScore
variable per country.
We’ll use pandas
functionality to find the average credit score per country and create a DataFrame of the results. We’ll do it using the groupby
functionality:
gb = churn.groupby('Geography')['CreditScore'].mean().sort_values(ascending=True).reset_index()
This creates the following table:
Geography | CreditScore |
---|---|
France | 649.668329 |
Spain | 651.333872 |
Germany | 651.453567 |
Bar chart: Plotly Express
Let’s see what the bar charts of Plotly Express look like:
# Import librariesimport plotly.express as pximport plotly.graph_objects as goimport pandas as pdimport numpy as np# Import datasetchurn = pd.read_csv("/usr/local/csvfiles/churn.csv")# Create datasetgb = churn.groupby('Geography')['CreditScore'].mean().sort_values(ascending=True).reset_index()# Plot datafig = px.bar(data_frame=gb, x='Geography', y='CreditScore')# Show the plotfig.show()
Bar Chart: Plotly graph objects
The process is very simple when using graph objects too. Here we use the Bar
class from the graph objects library in the creation of the plot, where we pass in our desired columns as x
and y
.
# Import librariesimport plotly.express as pximport plotly.graph_objects as goimport pandas as pdimport numpy as np# Import datasetchurn = pd.read_csv("/usr/local/csvfiles/churn.csv")# Create datasetgb = churn.groupby('Geography')['CreditScore'].mean().sort_values(ascending=True).reset_index()# Create our bar charttrace = go.Bar(x=gb['Geography'], y=gb['CreditScore'])# Create go.Figure() objectfig = go.Figure(data=[trace])# # Make changes to layoutfig.update_layout(title='Average Credit Score by Country',xaxis_title='Country',yaxis_title='Average Credit Score')# Show figure to the screenfig.show()
Bivariate violin plot
Here we will create a violin plot with the customers’ estimated salary and group it per country. ...