How to create a box plot with Plotly Express in Python

A box plot, also known as a box-and-whisker plot, is a graphical representation that displays the distribution of a dataset. Plotly Express is a data visualization library in Python that provides a simple and concise way to create box plots.

In a box plot, a rectangular box represents the IQRInterquartile Range, which contains the middle 50% of the data. The box is divided into two parts by a vertical line representing the median. The whiskers, represented by lines extending from the box, show the range of the data, excluding any outliers. Outliers are displayed as individual points beyond the whiskers.

Features of the box plot

Some of the key features of a box plot include:

Easy and concise syntax: Plotly Express offers a high-level API with a simplified syntax, allowing us to create box plots with minimal code. We can quickly generate a basic box plot by specifying the dataset and the variables to plot.

Grouping and faceting: Plotly Express allows us to create grouped box plots, where multiple boxes are displayed side by side for different categories or groups within the data. We can easily specify the grouping variable and visualize the distribution for each group. Additionally, Plotly Express supports faceting, enabling us to create separate box plots for different subsets of the data.

Customizable appearance: We can customize various aspects of the box plot’s appearance using Plotly Express. Options include changing the color scheme, line styles, and marker styles for the boxes and whiskers. Additionally, we can add titles, axis labels, and legends to enhance the overall visual presentation.

Interactive features: Plotly Express creates interactive plots by default. This means we can hover over the plot to see specific data values, zoom in and out, pan the plot, and even save the plot as an interactive HTML file or an image. These interactive features enhance the exploration and analysis of the box plot.

Outlier handling: Plotly Express provides options to handle outliers in box plots. Depending on our preference, we can choose to display outliers as individual points beyond the whiskers or hide them altogether.

Integration with pandas DataFrames: Plotly Express seamlessly integrates with pandas DataFrames, making creating box plots from our tabular data easy. We can directly pass a DataFrame to Plotly Express functions and specify the columns to plot.

Syntax

The box function syntax typically follows this structure:

import plotly.express as px
fig = px.box(data_frame, x='x_variable', y='y_variable')
Syntax of the box function

Parameters

When creating a box plot using Plotly Express, we can customize its appearance and behavior with various parameters. Here are some commonly used parameters:

  • data_frame: The DataFrame that contains the data to be plotted.

  • x: The column name or an array-like object that represents the variable to be plotted on the x-axis.

  • y: The column name or an array-like object that represents the variable to be plotted on the y-axis.

  • color: Specifies a column name or array-like object to group the box plots by different categories. Each category is represented by a different color.

  • title: Sets the title of the plot.

  • labels: A dictionary that maps column names to labels, allowing us to override the default axis labels.

  • notched: Boolean value that indicates whether to draw a notched box plot. The notches provide a rough estimate of the uncertainty around the median.

  • boxmode: Specifies how the boxes are displayed. Options include 'group' (default), 'overlay', 'relative', or 'normalized'.

  • points: Specifies whether to show individual data points as markers on the box plot.

  • hover_data: A list of column names or array-like objects that specify additional data to be shown when hovering over the plot.

  • template: Sets the template theme for the plot.

  • width and height: Set the width and height of the plot in pixels.

  • facet_col and facet_row: Column names or array-like objects for creating faceted box plots and splitting the data into subplots based on different values of these variables.

Return type

The px.box() function returns a Plotly figure object that can be displayed with fig.show(). The figure object contains all the information required to produce the line plot, including the data, layout, and style.

Implementation

In the following playground, we create a box plot using a sample dataset called tips provided by Plotly Express. The dataset contains information about restaurant tips. The attributes total_bill and day are defined as follows:

  • total_bill: This attribute represents the total bill amount for a given meal, including the cost of food, drinks, taxes, and any additional charges. It is a continuous numeric variable representing monetary values.

  • day: This attribute represents the day of the week when a meal took place at the restaurant. It is a categorical variable with four possible values: Thur (Thursday), Fri (Friday), Sat (Saturday), and Sun (Sunday).

cd /usercode && python3 main.py
python3 -m http.server 5000 > /dev/null 2>&1 &
Create a box plot of the tips dataset

Explanation

The code above is explained in detail below:

  • Lines 2–3: We import the required libraries for the code: plotly.express as px for creating the box plot, and pandas as pd for handling data in a DataFrame.

  • Line 6: We load a sample dataset called tips using the px.data.tips() function provided by Plotly Express. The dataset contains information about restaurant tips.

  • Line 9: We print the first five rows of the loaded dataset. The head() function retrieves the top rows of the DataFrame and print() displays the result in the console. It helps to quickly inspect the data and verify its structure.

  • Line 12: We create a box plot using Plotly Express. The px.box() function is called, and the DataFrame df is passed as the data source. The x='day' parameter specifies the column in the DataFrame to be plotted on the x-axis, the y='total_bill' parameter specifies the column to be plotted on the y-axis, and the color='smoker' parameter groups the box plots by the column smoker. The title parameter sets the title of the plot.

  • Line 15: We display the plot using the fig.show() method, which shows the interactive plot.

Conclusion

The box plot functionality provided by Plotly Express offers a convenient and powerful way to visualize and analyze data distributions. With just a few lines of code, we can create informative box plots that display the median, quartiles, and any outliers present in the data. Plotly Express allows for easy plot customization, including grouping the data by categorical variables, adding titles, and modifying colors. The box plot feature of Plotly Express is a valuable tool for gaining insights into the distribution and characteristics of datasets, making it a popular choice among data analysts and researchers.

Free Resources

Copyright ©2024 Educative, Inc. All rights reserved