Data visualization using the Python Altair library

Share

Altair is a Python declarative visualization library. It is a straightforward, user-friendly, and uniform API built on top of powerful visualization grammars such as Vega and Vega-Lite. These visualization grammars aid in decreasing code writing time, leaving more time for data exploration. An alternative for creating similar visualizations can be the Seaborn library, an imperative visualization library based on the Matplotlib.

A declarative visualization library is a library for which, instead of specifying how to create a step-by-step visualization, we declare what we want to visualize and let the library handle all the details. On the other hand, in an imperative visualization library, we have to specify the exact steps and instructions to be followed to create desired visualizations.

Using Altair, we can create many data visualizations, such as bar charts, grid plots, histograms, bubble charts, etc. Here, we will discuss the steps of creating charts using Altair.

Installing dependencies

Altair is not a standalone library because it is dependent upon several base libraries. To install all the dependencies from PyPi (Python Package Index), a software repository for Python, use the following command:

pip3 install pandas seaborn altair
Installing required libraries

We install pandas for data manipulation to create charts and the seaborn to import datasets.

Importing libraries

Data visualization is mostly done via Jupyter Notebooks. Hence, we will create a Notebook called the altair.ipynb and in the first code tab, import the required libraries.

import altair as alt
import seaborn as sns
import pandas as pd
from IPython.display import display
  • Lines 1–3: Import the altair library as alt, seaborn as sns, and pandas library as pd in the DataFrame.

  • Line 4: Import the display method from the IPython.display to display charts in the Jupyter Notebook.

Load data

Here, we are going to use the "iris" dataset provided by the Seaborn library. The attributes of the iris dataset is explained as follows:

  • Petal length (in centimeters): Is the iris flower’s petal length, which is the inner and colorful part of the flower.

  • Petal width (in centimeters): Is the iris flower’s petal width.

  • Sepal length (in centimeters): Is the iris flower’s sepal length, which is the outermost and protective structure of the flower.

  • Sepal width (in centimeters): Is the iris flower’s sepal width.

  • Species: Is the label to represent the species of the flower. The categories are “setosa,” “versicolor,” and “virginica.”

To load the dataset in the Notebook, use the following command:

df = sns.load_dataset("iris")
print(df.head())
  • Line 1: Load dataset from the Seaborn into the variable named the df.

  • Line 2: Print the header of the dataset using the head() method. Since there is no specific number mentioned, the head() method will return first five rows of the data by default.

Create an Altair chart

We’ll create a basic scatter plot using Altair, where we will map the sepal_length to the x-axis and sepal_width to the y-axis. We’ll also use the species as the color encoding, which is a method to represent numbers as colors.

# Create an Altair scatter plot
chart = alt.Chart(df).mark_circle(size=60).encode(
x='sepal_length:Q',
y='sepal_width:Q',
color='species:N',
tooltip=['sepal_length:Q', 'sepal_width:Q', 'species:N']
).properties(
title='Iris Sepal Length vs. Width',
width=500,
height=300
)
# Display the chart using IPython's display function
display(chart)

Code explanation

  • alt.Chart(df): This initializes a base Altair chart using the df as the data reference.

  • .mark_circle(size=60): This sets the mark type for the chart circles and sets the circle size to 60.

  • .encode(...): This determines that the variables will be mapped as:

    • x='sepal_length:Q': This maps the sepal_length column to x-axis, implying that it is a quantitative variable.

    • y='sepal_width:Q': This maps the sepal_width column to y-axis and shows that it is also a quantitative variable.

    • color='species:N': This maps the species column to color encoding and indicates that it is a categorical variable.

    • tooltip=['sepal_length:Q', 'sepal_width:Q', 'species:N']: This shows the data variables that will be displayed in the tooltip when we hover over the circles.

  • .properties(...): This sets several chart properties, such as width, height, and title.

  • display(chart): This renders the chart and displays it in the Notebook.

By clicking the “Run” button below, we can practice creating a chart using the Altair library.

import React from 'react';
require('./style.css');

import ReactDOM from 'react-dom';
import App from './app.js';

ReactDOM.render(
  <App />, 
  document.getElementById('root')
);
Jupyter plaground to create charts using Altair library

In the Jupyter Notebook, we can see the different steps of making a scatter plot using Altair library. We import libraries, load a dataset, declare values for visualization, and display the scatterplot. Any Altair chart can be created using the similar steps mentioned above.

Copyright ©2024 Educative, Inc. All rights reserved