Python Bokeh box plot

Bokeh is a Python library used for creating interactive visualizations in a web browser. It provides powerful tools that offer flexibility, interactivity, and scalability for exploring various data insights.

What is a box plot?

Box plots are widely used to represent a visual data summary for the dataset using statistical measures. These measures are commonly used to assess the range and tendency of the dataset for detailed insights into the data distribution.

Statistical measures used in a box plot.
Statistical measures used in a box plot.

  • Upper extreme: It is the maximum value in the dataset that depicts the highest data range can go.

  • Upper quartile: It is the third quartile that represents the upper bound value below which 75% of the data falls.

  • Median: It is the middle value that divides the dataset into two halves, i.e., 50% dataset is above it, and 50% dataset is below it.

  • Lower quartile: It is the first quartile that represents the upper bound value below which 25% of the data falls.

  • Lower extreme: It is the minimum value in the dataset that depicts the lowest data range can go.

Real-life application

Box plots are widely used in industry and research centers to analyze the achieved outputs and results in various domains.

Real life applications.
Real life applications.

Required imports

import pandas as pd
from bokeh.io import output_file, save
from bokeh.models import ColumnDataSource, Whisker
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg2 import autompg2
from bokeh.transform import factor_cmap
  • pandas: To manipulate data.

  • bokeh.io: To control the output and display of the plots. We specifically import output_file and save methods from.

  • bokeh.models: To create highly customized visualizations in Bokeh. We specifically import ColumnDataSource and Whisker methods from it.

  • bokeh.plotting: To create and customize plots without working directly with the lower-level Bokeh models. We specifically import figure and show methods from it.

  • bokeh.sampledata: To import and access the available datasets for Python Bokeh and use them to test your code. autompg2 is one of the datasets that contain information about various car models, including MPG, engine displacement, cylinders, and fuel consumption.

  • bokeh.transform: To transform the data by adding visual properties such as colors, sizes, and positions. We specifically import factor_cmap methods from it.

Example code

import pandas as pd
from bokeh.io import output_file, save
from bokeh.models import ColumnDataSource, Whisker
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg2 import autompg2
from bokeh.transform import factor_cmap

dataFrame = autompg2[["class", "cty"]].rename(columns={"class": "kind"})

kinds = dataFrame.kind.unique()

#compute quartiles
quartilesDF = dataFrame.groupby("kind").cty.quantile([0.25, 0.5, 0.75])
quartilesDF = quartilesDF.unstack().reset_index()
quartilesDF.columns = ["kind", "q1", "q2", "q3"]
dataFrame = pd.merge(dataFrame, quartilesDF, on="kind", how="left")

#compute IQR outlier bounds
iqr = dataFrame.q3 - dataFrame.q1
dataFrame["upper"] = dataFrame.q3 + 1.5*iqr
dataFrame["lower"] = dataFrame.q1 - 1.5*iqr

source = ColumnDataSource(dataFrame)

#create plot
myPlot = figure(x_range=kinds, tools="", toolbar_location=None,
           title="City driving MPG distribution by vehicle class",
           background_fill_color="#bbbfbf", y_axis_label="Feul efficiency")

#outlier range
whisker = Whisker(base="kind", upper="upper", lower="lower", source=source)
whisker.upper_head.size = whisker.lower_head.size = 20
myPlot.add_layout(whisker)

#colour pallete
cmap = factor_cmap("kind", "TolRainbow7", kinds)

#quartile boxes
myPlot.vbar("kind", 0.7, "q2", "q3", source=source, color=cmap, line_color="black")
myPlot.vbar("kind", 0.7, "q1", "q2", source=source, color=cmap, line_color="black")

# outliers
outliers = dataFrame[~dataFrame.cty.between(dataFrame.lower, dataFrame.upper)]
myPlot.scatter("kind", "cty", source=outliers, size=6, color="black", alpha=0.3)

output_file("output.html")
show(myPlot)
Creating box plot for city driving MPG distribution by vehicle class.

Code explanation

  • Lines 1–6: Import all the necessary libraries and modules.

  • Line 8: Select class and cty column from autompg2 dataset to create a new dataFrame and rename() class column as kind. Note that it is not necessary to rename, but we do it for ease to refer it in the code.

  • Line 10: Extract all the unique values from the kind column and assign the values to the kinds variable.

  • Line 13: Use groupby() to group the kind column and calculate the quartiles for the cty column. The obtained pandas series is then assigned to the quartileDS data frame.

  • Lines 14–15: Create separate columns for each quartile using unstack() and assign names to each column.

  • Line 16: Merge the data frames dataFrame and quartilesDF, according to the kind column and using the left joint.

  • Lines 19–21: Calculate the interquartile range, i.e., the difference between the 75th and 25th percentile, and assign it to iqr variable. Then save the upper and lower bounds in new dataFrame columns.

Note: We multiply the iqr with 1.5 because it is a widely accepted convention to use it when calculating the bounds in inter-quartile range.

  • Line 23: Create a ColumnDataSource object and assign the dataFrame to it so the data can be provided to the plot.

  • Lines 26–28: Create myPlot using figure() function and pass all the specifications as parameters. Set x-range as kinds and specify the title, y-axis label, and background color for the plot.

  • Line 31: Create a whisker object using Whisker() and pass the base, upper, and lower as parameters.

  • Line 32: Specify the upper_head and lower_head size for the whisker as it represents the length of them in the plot.

  • Line 33: Add the whisker plot to the myPlot figure using the add_layout() method.

  • Line 36: Select the color palette for the kind column's attributes using the factor_cmap() function and assign them to cmap.

  • Lines 39–40: Create the quartile boxes on myPlot using the vbar() function and pass the column name, quartiles, source, and color palette as parameters. Call the function twice for the upper and lower quartile, respectively.

  • Line 43: Identify the outlying rows from the dataFrame where the cty column values are not between the upper and lower bound and assign them to outliers.

  • Line 44: Create the scattered points for the outliers using scatter() and pass the column, source, size, color, and transparency as parameters.

  • Lines 46–47: Set the output to output.html to specify the endpoint where the plot will appear and using show() to display the created plot.

Code output

The box plot is displayed at the output.html endpoint with TolRainbow7 color palette boxes, #bbbfbf shade grid, and whiskers and labels as specified in the code.

Output for the box plot for city driving MPG distribution by vehicle class.
Output for the box plot for city driving MPG distribution by vehicle class.

Common Query

Question

Can we modify the visual appearance of the plot?

Show Answer

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved