Bokeh is a Python library used for creating interactive visualizations in a web browser. It provides powerful tools that offer flexibility, interactivity, and scalability for exploring various data insights.
A histogram is a graphical representation of statistical data that has grouped frequency distribution with continuous classes. It has all adjacent rectangular glyphs since the base covers the intervals between class boundaries.
Histograms are widely used in industry and research centers to examine the results for different data categories in various domains.
import numpy as npfrom bokeh.io import output_file, savefrom bokeh.plotting import figure, show
numpy:
To generate random data.
bokeh.io:
To control the output and display of the plots. We specifically import output_file
and save
methods from it.
bokeh.plotting:
To create and customize plots without working directly with the lower-level Bokeh models. We specifically import figure
and show
methods from it.
import numpy as np from bokeh.io import output_file, save from bokeh.plotting import figure, show #specify range rng = np.random.default_rng() sampleData = rng.normal(loc=0, scale=1, size=500) #create plot myPlot = figure(width=670, height=400, toolbar_location=None, title="Impact of spice level on CRC") # Histogram bins = np.linspace(-4, 4, 30) hist, edges = np.histogram(sampleData, density=True, bins=bins) myPlot.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], fill_color="purple", line_color="white", legend_label="500 random samples") #labels myPlot.y_range.start = 0 myPlot.xaxis.axis_label = "Spice level" myPlot.yaxis.axis_label = "CRC" output_file("output.html") show(myPlot)
Lines 1–3: Import all the necessary libraries and modules.
Line 6: Create a random number generator using the random.default_rng()
function from numpy
.
Line 7: Generate an array of 500 samples using normal()
and pass mean, standard deviation, and size as parameters. Not that, in this case, we are generating samples from a normal distribution.
Lines 10–11: Create myPlot
using the figure()
function and pass all the specifications as parameters. Set the width, height, and location, and specify the title for the plot.
Line 14: Specify the x-axis range and total number of bins using the linespace()
function from numpy
and assign it to the bin
variable.
Line 15: Calculate the histogram using the histogram()
function from numpy
by passing the plot, density
and bins
as parameter. In this case, density
is equal to true
to normalize the histogram.
Lines 16–18: Fill the histogram glyph using the quad()
function and pass the coordinates for each, fill color, line color, and label as a parameter. It is used to enhance visual representation and can be modified as per need.
Lines 21–23: Assign starting point and the x-axis and y-axis labels to myPlot
.
Lines 25–26: Set the output to output.html
to specify the endpoint where the plot will appear and using show()
to display the created plot.
A normally distributed histogram is displayed at the output.html endpoint with 30 purple-filled bins for a range of (-4, 4) on the x-axis as specified in the code.
PDF stands for the probability density function showing continuous random data's probability distribution. Histograms are used to represent the data on which the skewness is measured to examine the data variance. The above example code shows a simple histogram with random data generated using the normal()
function.
Can we modify the histogram and add the probability density function to it?
Free Resources