Data visualization is a fundamental aspect of data analysis and interpretation, enabling us to uncover patterns, trends, and relationships within complex datasets. Python offers a plethora of powerful libraries for creating visualizations, among which Matplotlib and seaborn stand out as popular choices. In this blog, we'll compare and contrast these two titans of data visualization.
import matplotlib.pyplot as pltimport seaborn as sns
Matplotlib, often referred to as the “grandfather” of Python plotting libraries, has been a cornerstone of the Python data science ecosystem for over two decades. It provides a comprehensive toolkit for creating a wide range of static, interactive, and publication-quality plots. From simple line plots to complex multi-panel visualizations, Matplotlib offers unparalleled flexibility and customization options, making it a favorite among data scientists and researchers worldwide.
import numpy as np# Basic Matplotlib Line Plotx = np.linspace(0, 10, 100)y = np.cos(x)plt.plot(x, y)plt.xlabel('x')plt.ylabel('cos(x)')plt.title('Basic Matplotlib Line Plot')
On the other hand, seaborn is a high-level data visualization library built on top of Matplotlib, with a focus on statistical plotting and aesthetics. Introduced in 2012, seaborn quickly gained popularity for its intuitive interface, attractive default styles, and specialized functions for creating complex statistical plots. By abstracting away much of the boilerplate code required for common visualization tasks, seaborn enables users to create informative plots with minimal effort.
import seaborn as snsimport numpy as npimport matplotlib.pyplot as pltimport pandas as pd# Basic seaborn scatter plotdf = pd.DataFrame({'x': np.random.randn(100),'y': np.random.randn(100),'category': np.random.choice(['A', 'B', 'C'], 100) # Three categories})# Plotting with seabornsns.scatterplot(data=df, x='x', y='y', hue='category', palette='Set2', alpha=0.7)plt.title('Seaborn Scatter Plot with Different Colors for Categories')plt.xlabel('X')plt.ylabel('Y')plt.legend(title='Category')plt.grid(True)plt.savefig('output/graph.png');
The above code gives the following output:
In this blog, we’ll explore the key differences between Matplotlib and seaborn, their respective strengths, weaknesses, and use cases. We’ll compare their syntax, default aesthetics, plot types, customization options, and integration with other Python libraries such as pandas. Both for seasoned data scientists and novice Python enthusiasts, this comparison equips us with the knowledge to choose the right tool for our data visualization needs.
So, let’s start creating compelling visual narratives with Matplotlib and seaborn. Whether we’re crafting exploratory plots for data analysis or producing polished visualizations for presentations and reports, understanding the nuances of Matplotlib and Seaborn will enhance our data visualization skills. Let’s dive in!
Here are some things that Matplotlib can do that seaborn does not offer.
Fine-grained control over customization
Matplotlib allows us to customize every aspect of a plot, from the size and shape of markers and lines to adding colors to text, transparency, and spacing. While seaborn provides good defaults and some level of customization, Matplotlib offers more control over these details.
Complex multiple axes layouts
Matplotlib supports complex layouts with multiple axes arranged in grids, subplots, or arbitrary arrangements. We can create grids of plots with different sizes and shapes, and control the spacing between them.
import matplotlib.pyplot as pltimport numpy as np# Create some sample datax = np.linspace(0, 2*np.pi, 400)y1 = np.sin(x)y2 = np.cos(x)y3 = np.tan(x)# Create a figure and a grid of subplots with different sizesfig, axs = plt.subplots(2, 2, figsize=(10, 8), gridspec_kw={'width_ratios': [2, 1], 'height_ratios': [1, 2]})# Plot the first subplot in the first rowaxs[0, 0].plot(x, y1, color='r')axs[0, 0].set_title('Sine Function')# Plot the second subplot in the first rowaxs[0, 1].plot(x, y2, color='g')axs[0, 1].set_title('Cosine Function')# Plot the third subplot in the second rowaxs[1, 0].plot(x, y3, color='b')axs[1, 0].set_title('Tangent Function')# Remove the empty subplot in the second row and second columnfig.delaxes(axs[1, 1])# Adjust spacing between subplotsplt.tight_layout(pad=3.0)# Add a main title to the entire figurefig.suptitle('Grid of Trigonometric Functions with Different Sizes', fontsize=16)
Low-level drawing primitives
Matplotlib allows us to draw directly onto a plot using low-level drawing primitives such as lines, polygons, and patches. This can be useful for creating custom annotations or highlighting specific regions of a plot.
import matplotlib.pyplot as pltimport numpy as np# Create some sample datax = np.linspace(0, 10, 100)y = np.sin(x)# Create a plotplt.figure(figsize=(8, 6))plt.plot(x, y, label='Sine Curve')# Draw a line segmentplt.gca().add_line(plt.Line2D([2, 4], [0.5, 0.5], color='red', linewidth=2))# Draw a rectanglerectangle = plt.Rectangle((6, -0.5), 2, 1, edgecolor='blue', facecolor='none', linewidth=2)plt.gca().add_patch(rectangle)# Draw a circlecircle = plt.Circle((8, 0), 0.5, color='green', fill=False, linewidth=2)plt.gca().add_patch(circle)# Add text annotationplt.text(1, 0.7, 'Line Segment', fontsize=12, color='red')plt.text(6.5, -1, 'Rectangle', fontsize=12, color='blue')plt.text(8, 0.7, 'Circle', fontsize=12, color='green')# Add legendplt.legend()# Add title and labelsplt.title('Custom Annotations using Low-Level Drawing Primitives', fontsize=16)plt.xlabel('X-axis', fontsize=14)plt.ylabel('Y-axis', fontsize=14)# Set plot limitsplt.xlim(0, 10)plt.ylim(-1.5, 1.5)# Show the plotplt.grid(True)
Custom 3D plotting
Matplotlib has built-in support for creating 3D plots, including scatter plots, surface plots, and wireframes. While seaborn focuses on 2D statistical visualizations, Matplotlib provides tools for visualizing data in three dimensions.
import matplotlib.pyplot as pltfrom mpl_toolkits.mplot3d import Axes3Dimport numpy as np# Generate random datanp.random.seed(0)n = 100x = np.random.standard_normal(n)y = np.random.standard_normal(n)z = np.random.standard_normal(n)# Create a 3D scatter plotfig = plt.figure()ax = fig.add_subplot(111, projection='3d')# Scatter plotax.scatter(x, y, z, c='b', marker='o')# Set labels and titleax.set_xlabel('X Label')ax.set_ylabel('Y Label')ax.set_zlabel('Z Label')ax.set_title('3D Scatter Plot')
Animation
Matplotlib includes support for creating animated visualizations using the animation module. We can create animations of changing data over time or animate transitions between different views of the data.
Advanced text and annotation features
Matplotlib offers advanced text handling capabilities, including support for LaTeX rendering, text rotation, and text alignment. We can also create custom annotations with arrows, shapes, and text, and position them precisely on the plot.
import matplotlib.pyplot as plt# Create a figure and axisfig, ax = plt.subplots()# Plot some datax = [1, 2, 3, 4, 5]y = [2, 3, 5, 7, 11]ax.plot(x, y, 'b-', label='Data')# Add text with LaTeX renderingax.text(2, 5, r'$y = mx + c$', fontsize=12, color='green', verticalalignment='bottom')# Add rotated textax.text(4, 7, 'Rotated Text', fontsize=10, color='red', rotation=45)# Add annotation with arrowax.annotate('Maximum', xy=(5, 11), xytext=(4, 10),arrowprops=dict(facecolor='orange', shrink=0.05))# Add rectangle annotationrect = plt.Rectangle((1.9, 4.3), 1, 2, edgecolor='purple', facecolor='none')ax.add_patch(rect)# Set plot title and legendax.set_title('Advanced Text and Annotation Features')ax.legend()
Low-level image manipulation
Matplotlib provides functions for working with images, including loading image files, displaying images in plots, and manipulating image data directly. This can be useful for tasks such as image processing, computer vision, or visualizing image-based data.
Seaborn, alongside Matplotlib, emerges as a powerful Python library for data visualization, distinguished by its unique strengths and weaknesses, as elaborated below.
Ease of use
Because of its more complex customization options and low-level interface, Matplotlib has a longer learning curve despite its tremendous capabilities.
With its high-level interface, clearer syntax, and more intuitive functionalities, seaborn’s user-friendly design makes it simpler for novices to swiftly construct eye-catching plots.
Color palettes
Choosing visually appealing colors for plots is made easier with seaborn’s collection of built-in color palettes that are optimized for various data types and plot styles. Some of them include:
Cubehelix palette: Smooth gradient of colors from black to white
Husl palette: Designed for uniform perception and colorblind-friendly visuals
xkcd palette: Utilizes named colors from the xkcd color survey
Color Brewer palettes: Collection of color schemes sourced from Color Brewer tool
To set a palette in seaborn, we can use the set_palette()
function as follows:
# Cubehelix palettesns.set_palette("cubehelix")# Husl palettesns.set_palette("husl")# xkcd palettesns.set_palette("xkcd")# Color Brewer palettessns.set_palette("colorblind")
Integration with pandas data structures
pandas data structures can be leveraged with Matplotlib. However, some plot types might require additional human data processing. pandas DataFrames and seaborn integrate smoothly to make it simple to visualize data straight from pandas objects.
Distribution plots
While Matplotlib can create simple distribution plots such as density plots and histograms, users might still have to compute and plot distributions by writing some code. Data distribution visualization is made easier using seaborn’s dedicated functions for generating distribution plots, including rug plots, kernel density estimates (KDEs), and histograms.
Below, we see an example of a KDE plot. We’re using the built-in penguins
dataset for our example.
import pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltsns.set_theme() # set default themepeng_df = sns.load_dataset('penguins')sns.kdeplot( x='body_mass_g' ,hue = 'species', shade =True, multiple='stack', data = peng_df)plt.legend(loc = 'lower right')plt.savefig('output/graph.png');
The above code generates an image like this:
For histogram, we can use the histplot()
function:
sns.histplot(x='body_mass_g', hue='species', multiple='stack', data=peng_df)
We get an illustration like this:
Pairplots
A pairplot displays pairwise relationships between a dataset’s variables and plots all numeric variables by default. It’s essentially a grid of subplots, where each subplot displays a scatterplot of a pair of variables from the dataset, along with the corresponding histograms or kernel density estimates (KDEs) of each variable along the diagonal. Pairplots with Matplotlib usually involve manual iteration over the dimensions of the data. For each pair of variables in a dataset, seaborn offers a useful pairplot()
function that allows us to create pairplots using scatterplots and histograms.
Here’s a sample of code that uses the pairplot()
method.
import pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltsns.set_theme() # set default themepeng_df = sns.load_dataset('penguins')#create pair plotsns.pairplot(data = peng_df, dropna = True)plt.savefig('output/graph.png')
The above code generates an image as seen below. Note that columns of datasets are plotted against each other.
Jointplots
While seaborn provides the jointplot()
method that combines scatter plots, histograms, and kernel density estimations to visualize joint distributions along with marginal distributions, in Matplotlib, we have to write some code to construct jointplots.
Let’s look at an example of the jointplot()
function being used. Once again, we employ the penguins
dataset.
import pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltsns.set_theme() # Set default themepenguins = sns.load_dataset('penguins')# Create a joint plotsns.jointplot(x="flipper_length_mm", y="body_mass_g", data=penguins, kind="scatter")plt.savefig('output/graph.png')
The above code generates an image like this:
Matrix plots
For the purpose of constructing informative matrix plots with integrated characteristics for color mapping and grouping, seaborn offers functions such as heatmap()
and clustermap()
. Heatmaps are a popular tool for showing how variables are correlated.
In this instance, we generate a heatmap to display the correlation matrix of the penguins
dataset by passing penguins.corr()
to the sns.heatmap()
function.
Note: To ensure that the complete column names are visible in the plot, we customize the font size using the
sns.set(font_scale=0.7)
function call.
import pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltsns.set_theme() # set default themepenguins = sns.load_dataset('penguins')sns.set(font_scale=0.7)sns.heatmap(penguins.corr())plt.savefig('output/graph.png')
The resulting heatmap is displayed below:
Styling
Although Matplotlib has a large number of stylistic options, it might need more manual settings. Seaborn has built-in themes and styles that are simple to apply to plots, providing a rapid way to alter the overall look.
There are a lot of factors to consider while deciding between the two libraries, but one of the most important ones is our desire to alter our graphs. If we don’t want to worry about customizing our graphs, Matplotlib is a good choice because it’s easier to use and has more capabilities. Seaborn, on the other hand, is a superior option if we want to create graphs that look sophisticated and knowledgeable. Ultimately, the library to be selected is up to us. We need to examine the two libraries and determine which one we’re more comfortable using. Happy plotting!
Don’t stop here! We recommend you take a look at the following courses at Educative for further guidance and capabilities of the two libraries discussed in this blog:
Data Visualization and Analysis With Seaborn Library
This course aims to provide an introduction to data visualization and analysis using Python and the Seaborn library. The course begins by introducing various variable types and statistical analysis methods. Then, you get to review the foundations of data cleaning and extraction using the pandas library. In the second half of the course, you will go over different plots in Seaborn for numerical, continuous, and categorical data, as well as distribution and regression plots to gain insightful information and identify patterns in the data. Lastly, you get to learn to create complex visualizations that are also aesthetically pleasing and go into great detail about the Seaborn themes, color palettes, styling, and multiplot grids. By the end of this course, you’ll apply the knowledge you’ve gained with a hands-on project.
Matplotlib for Python: Visually Represent Data with Plots
For data science, Matplotlib is one of the most popular tools for representing data in a visual manner. There are many other tools, but for the Python user, Matplotlib is a must-know. In this course, you will learn how to visually represent data in several different ways. You will learn how to use figures and axes to plot a chart, as well as how to plot from multiple types of objects and modules. You will also discover ways to control the spine of an axes and how to create complex layouts for a figure using GridSpec so you can create visually stunning charts. In the latter half of the course, you will focus on how to draw various types of plots, whether it be a line plot, a stem plot, or a heatmap plot. Overall, this is your no-fuss introduction to creating impactful data charts. By the end, you will have an important new skill to add to your resume. As any data scientist knows, it is necessary that you be able to show insights found from analyzing data.
Free Resources