Home/Blog/Data Science/Exploring data visualization: Matplotlib vs. seaborn
Home/Blog/Data Science/Exploring data visualization: Matplotlib vs. seaborn

Exploring data visualization: Matplotlib vs. seaborn

Kamran Lodhi
Apr 19, 2024
7 min read

Data visualization is a fundamental aspect of data analysis and interpretation, enabling us to uncover patterns, trends, and relationships within complex datasets. Python offers a plethora of powerful libraries for creating visualizations, among which Matplotlib and seaborn stand out as popular choices. In this blog, we'll compare and contrast these two titans of data visualization.

import matplotlib.pyplot as plt
import seaborn as sns

Matplotlib, often referred to as the “grandfather” of Python plotting libraries, has been a cornerstone of the Python data science ecosystem for over two decades. It provides a comprehensive toolkit for creating a wide range of static, interactive, and publication-quality plots. From simple line plots to complex multi-panel visualizations, Matplotlib offers unparalleled flexibility and customization options, making it a favorite among data scientists and researchers worldwide.

import numpy as np
# Basic Matplotlib Line Plot
x = np.linspace(0, 10, 100)
y = np.cos(x)
plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('cos(x)')
plt.title('Basic Matplotlib Line Plot')

On the other hand, seaborn is a high-level data visualization library built on top of Matplotlib, with a focus on statistical plotting and aesthetics. Introduced in 2012, seaborn quickly gained popularity for its intuitive interface, attractive default styles, and specialized functions for creating complex statistical plots. By abstracting away much of the boilerplate code required for common visualization tasks, seaborn enables users to create informative plots with minimal effort.

import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Basic seaborn scatter plot
df = pd.DataFrame({
'x': np.random.randn(100),
'y': np.random.randn(100),
'category': np.random.choice(['A', 'B', 'C'], 100) # Three categories
})
# Plotting with seaborn
sns.scatterplot(data=df, x='x', y='y', hue='category', palette='Set2', alpha=0.7)
plt.title('Seaborn Scatter Plot with Different Colors for Categories')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend(title='Category')
plt.grid(True)
plt.savefig('output/graph.png');

The above code gives the following output:

A basic seaborn scatter plot
A basic seaborn scatter plot

In this blog, we’ll explore the key differences between Matplotlib and seaborn, their respective strengths, weaknesses, and use cases. We’ll compare their syntax, default aesthetics, plot types, customization options, and integration with other Python libraries such as pandas. Both for seasoned data scientists and novice Python enthusiasts, this comparison equips us with the knowledge to choose the right tool for our data visualization needs.

Matplotlib vs. seaborn
Matplotlib vs. seaborn

So, let’s start creating compelling visual narratives with Matplotlib and seaborn. Whether we’re crafting exploratory plots for data analysis or producing polished visualizations for presentations and reports, understanding the nuances of Matplotlib and Seaborn will enhance our data visualization skills. Let’s dive in!

Unique capabilities of Matplotlib#

Here are some things that Matplotlib can do that seaborn does not offer.

Fine-grained control over customization

Matplotlib allows us to customize every aspect of a plot, from the size and shape of markers and lines to adding colors to text, transparency, and spacing. While seaborn provides good defaults and some level of customization, Matplotlib offers more control over these details.

Complex multiple axes layouts

Matplotlib supports complex layouts with multiple axes arranged in grids, subplots, or arbitrary arrangements. We can create grids of plots with different sizes and shapes, and control the spacing between them.

import matplotlib.pyplot as plt
import numpy as np
# Create some sample data
x = np.linspace(0, 2*np.pi, 400)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.tan(x)
# Create a figure and a grid of subplots with different sizes
fig, axs = plt.subplots(2, 2, figsize=(10, 8), gridspec_kw={'width_ratios': [2, 1], 'height_ratios': [1, 2]})
# Plot the first subplot in the first row
axs[0, 0].plot(x, y1, color='r')
axs[0, 0].set_title('Sine Function')
# Plot the second subplot in the first row
axs[0, 1].plot(x, y2, color='g')
axs[0, 1].set_title('Cosine Function')
# Plot the third subplot in the second row
axs[1, 0].plot(x, y3, color='b')
axs[1, 0].set_title('Tangent Function')
# Remove the empty subplot in the second row and second column
fig.delaxes(axs[1, 1])
# Adjust spacing between subplots
plt.tight_layout(pad=3.0)
# Add a main title to the entire figure
fig.suptitle('Grid of Trigonometric Functions with Different Sizes', fontsize=16)

Low-level drawing primitives

Matplotlib allows us to draw directly onto a plot using low-level drawing primitives such as lines, polygons, and patches. This can be useful for creating custom annotations or highlighting specific regions of a plot.

import matplotlib.pyplot as plt
import numpy as np
# Create some sample data
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Create a plot
plt.figure(figsize=(8, 6))
plt.plot(x, y, label='Sine Curve')
# Draw a line segment
plt.gca().add_line(plt.Line2D([2, 4], [0.5, 0.5], color='red', linewidth=2))
# Draw a rectangle
rectangle = plt.Rectangle((6, -0.5), 2, 1, edgecolor='blue', facecolor='none', linewidth=2)
plt.gca().add_patch(rectangle)
# Draw a circle
circle = plt.Circle((8, 0), 0.5, color='green', fill=False, linewidth=2)
plt.gca().add_patch(circle)
# Add text annotation
plt.text(1, 0.7, 'Line Segment', fontsize=12, color='red')
plt.text(6.5, -1, 'Rectangle', fontsize=12, color='blue')
plt.text(8, 0.7, 'Circle', fontsize=12, color='green')
# Add legend
plt.legend()
# Add title and labels
plt.title('Custom Annotations using Low-Level Drawing Primitives', fontsize=16)
plt.xlabel('X-axis', fontsize=14)
plt.ylabel('Y-axis', fontsize=14)
# Set plot limits
plt.xlim(0, 10)
plt.ylim(-1.5, 1.5)
# Show the plot
plt.grid(True)

Custom 3D plotting

Matplotlib has built-in support for creating 3D plots, including scatter plots, surface plots, and wireframes. While seaborn focuses on 2D statistical visualizations, Matplotlib provides tools for visualizing data in three dimensions.

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
# Generate random data
np.random.seed(0)
n = 100
x = np.random.standard_normal(n)
y = np.random.standard_normal(n)
z = np.random.standard_normal(n)
# Create a 3D scatter plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Scatter plot
ax.scatter(x, y, z, c='b', marker='o')
# Set labels and title
ax.set_xlabel('X Label')
ax.set_ylabel('Y Label')
ax.set_zlabel('Z Label')
ax.set_title('3D Scatter Plot')

Animation

Matplotlib includes support for creating animated visualizations using the animation module. We can create animations of changing data over time or animate transitions between different views of the data.

Advanced text and annotation features

Matplotlib offers advanced text handling capabilities, including support for LaTeX rendering, text rotation, and text alignment. We can also create custom annotations with arrows, shapes, and text, and position them precisely on the plot.

import matplotlib.pyplot as plt
# Create a figure and axis
fig, ax = plt.subplots()
# Plot some data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
ax.plot(x, y, 'b-', label='Data')
# Add text with LaTeX rendering
ax.text(2, 5, r'$y = mx + c$', fontsize=12, color='green', verticalalignment='bottom')
# Add rotated text
ax.text(4, 7, 'Rotated Text', fontsize=10, color='red', rotation=45)
# Add annotation with arrow
ax.annotate('Maximum', xy=(5, 11), xytext=(4, 10),
arrowprops=dict(facecolor='orange', shrink=0.05))
# Add rectangle annotation
rect = plt.Rectangle((1.9, 4.3), 1, 2, edgecolor='purple', facecolor='none')
ax.add_patch(rect)
# Set plot title and legend
ax.set_title('Advanced Text and Annotation Features')
ax.legend()

Low-level image manipulation

Matplotlib provides functions for working with images, including loading image files, displaying images in plots, and manipulating image data directly. This can be useful for tasks such as image processing, computer vision, or visualizing image-based data.

Unique capabilities of seaborn#

Seaborn, alongside Matplotlib, emerges as a powerful Python library for data visualization, distinguished by its unique strengths and weaknesses, as elaborated below.

Ease of use

Because of its more complex customization options and low-level interface, Matplotlib has a longer learning curve despite its tremendous capabilities.

With its high-level interface, clearer syntax, and more intuitive functionalities, seaborn’s user-friendly design makes it simpler for novices to swiftly construct eye-catching plots.

Color palettes

Choosing visually appealing colors for plots is made easier with seaborn’s collection of built-in color palettes that are optimized for various data types and plot styles. Some of them include:

  • Cubehelix palette: Smooth gradient of colors from black to white

  • Husl palette: Designed for uniform perception and colorblind-friendly visuals

  • xkcd palette: Utilizes named colors from the xkcd color survey

  • Color Brewer palettes: Collection of color schemes sourced from Color Brewer tool

To set a palette in seaborn, we can use the set_palette() function as follows:

# Cubehelix palette
sns.set_palette("cubehelix")
# Husl palette
sns.set_palette("husl")
# xkcd palette
sns.set_palette("xkcd")
# Color Brewer palettes
sns.set_palette("colorblind")

Integration with pandas data structures

pandas data structures can be leveraged with Matplotlib. However, some plot types might require additional human data processing. pandas DataFrames and seaborn integrate smoothly to make it simple to visualize data straight from pandas objects.

Distribution plots

While Matplotlib can create simple distribution plots such as density plots and histograms, users might still have to compute and plot distributions by writing some code. Data distribution visualization is made easier using seaborn’s dedicated functions for generating distribution plots, including rug plots, kernel density estimates (KDEs), and histograms.

Below, we see an example of a KDE plot. We’re using the built-in penguins dataset for our example.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme() # set default theme
peng_df = sns.load_dataset('penguins')
sns.kdeplot( x='body_mass_g' ,hue = 'species', shade =True, multiple='stack', data = peng_df)
plt.legend(loc = 'lower right')
plt.savefig('output/graph.png');

The above code generates an image like this:

KDE plot of the penguins dataset
KDE plot of the penguins dataset

For histogram, we can use the histplot() function:

sns.histplot(x='body_mass_g', hue='species', multiple='stack', data=peng_df)

We get an illustration like this:

Histogram plot of the penguins dataset
Histogram plot of the penguins dataset

Pairplots

A pairplot displays pairwise relationships between a dataset’s variables and plots all numeric variables by default. It’s essentially a grid of subplots, where each subplot displays a scatterplot of a pair of variables from the dataset, along with the corresponding histograms or kernel density estimates (KDEs) of each variable along the diagonal. Pairplots with Matplotlib usually involve manual iteration over the dimensions of the data. For each pair of variables in a dataset, seaborn offers a useful pairplot() function that allows us to create pairplots using scatterplots and histograms.

Here’s a sample of code that uses the pairplot() method.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme() # set default theme
peng_df = sns.load_dataset('penguins')
#create pair plot
sns.pairplot(data = peng_df, dropna = True)
plt.savefig('output/graph.png')

The above code generates an image as seen below. Note that columns of datasets are plotted against each other.

A pairplot of the penguins dataset
A pairplot of the penguins dataset

Jointplots

While seaborn provides the jointplot() method that combines scatter plots, histograms, and kernel density estimations to visualize joint distributions along with marginal distributions, in Matplotlib, we have to write some code to construct jointplots.

Let’s look at an example of the jointplot() function being used. Once again, we employ the penguins dataset.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme() # Set default theme
penguins = sns.load_dataset('penguins')
# Create a joint plot
sns.jointplot(x="flipper_length_mm", y="body_mass_g", data=penguins, kind="scatter")
plt.savefig('output/graph.png')

The above code generates an image like this:

A jointplot of the penguins dataset
A jointplot of the penguins dataset

Matrix plots

For the purpose of constructing informative matrix plots with integrated characteristics for color mapping and grouping, seaborn offers functions such as heatmap() and clustermap(). Heatmaps are a popular tool for showing how variables are correlated.

In this instance, we generate a heatmap to display the correlation matrix of the penguins dataset by passing penguins.corr() to the sns.heatmap() function.

Note: To ensure that the complete column names are visible in the plot, we customize the font size using the sns.set(font_scale=0.7) function call.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme() # set default theme
penguins = sns.load_dataset('penguins')
sns.set(font_scale=0.7)
sns.heatmap(penguins.corr())
plt.savefig('output/graph.png')

The resulting heatmap is displayed below:

Styling

Although Matplotlib has a large number of stylistic options, it might need more manual settings. Seaborn has built-in themes and styles that are simple to apply to plots, providing a rapid way to alter the overall look.

Which library should be used for data visualization?#

There are a lot of factors to consider while deciding between the two libraries, but one of the most important ones is our desire to alter our graphs. If we don’t want to worry about customizing our graphs, Matplotlib is a good choice because it’s easier to use and has more capabilities. Seaborn, on the other hand, is a superior option if we want to create graphs that look sophisticated and knowledgeable. Ultimately, the library to be selected is up to us. We need to examine the two libraries and determine which one we’re more comfortable using. Happy plotting!

Don’t stop here! We recommend you take a look at the following courses at Educative for further guidance and capabilities of the two libraries discussed in this blog:

Data Visualization and Analysis With Seaborn Library

Cover
Data Visualization and Analysis With Seaborn Library

This course aims to provide an introduction to data visualization and analysis using Python and the Seaborn library. The course begins by introducing various variable types and statistical analysis methods. Then, you get to review the foundations of data cleaning and extraction using the pandas library. In the second half of the course, you will go over different plots in Seaborn for numerical, continuous, and categorical data, as well as distribution and regression plots to gain insightful information and identify patterns in the data. Lastly, you get to learn to create complex visualizations that are also aesthetically pleasing and go into great detail about the Seaborn themes, color palettes, styling, and multiplot grids. By the end of this course, you’ll apply the knowledge you’ve gained with a hands-on project.

12hrs
Beginner
325 Playgrounds
7 Quizzes

Matplotlib for Python: Visually Represent Data with Plots

Cover
Matplotlib for Python: Visually Represent Data with Plots

For data science, Matplotlib is one of the most popular tools for representing data in a visual manner. There are many other tools, but for the Python user, Matplotlib is a must-know. In this course, you will learn how to visually represent data in several different ways. You will learn how to use figures and axes to plot a chart, as well as how to plot from multiple types of objects and modules. You will also discover ways to control the spine of an axes and how to create complex layouts for a figure using GridSpec so you can create visually stunning charts. In the latter half of the course, you will focus on how to draw various types of plots, whether it be a line plot, a stem plot, or a heatmap plot. Overall, this is your no-fuss introduction to creating impactful data charts. By the end, you will have an important new skill to add to your resume. As any data scientist knows, it is necessary that you be able to show insights found from analyzing data.

6hrs
Intermediate
69 Playgrounds
2 Quizzes

  

Free Resources