In the realm of data analysis and visualization, the ability to compare and contrast different datasets is crucial for gaining insights and making informed decisions. In this Answer, we will explore how we can visually compare two plots using matplotlib
.
matplotlib
is a popular Python library for creating static interactive and animated plots. It provides powerful tools for visualizing data, and we'll go over a few of them.
Comparing various plots lies at the heart of data analysis. It allows us to discern patterns, trends, and anomalies within datasets, enabling informed decision-making. Whether it's tracking the performance of different products, assessing the impact of variables over time, or understanding the correlation between datasets, the ability to compare visually is crucial for extracting actionable insights.
Let's explore several effective methods within matplotlib
that empower data analysts to visually compare datasets, each offering its own advantages depending on the nature of the data and the goals of the analysis.
Subplots: We can create distinct subplots for each dataset side by side using plt.subplots()
.
Shared x-axis: We can create subplots that share the x-axis but have their own set of y-axis using plt.subplots()
with sharex=True
.
Dual-axis approach: We can overlay two plots on the same set of axes but with different y-axes using plt.twinx()
. This can be particularly useful when the datasets have different scales but share a common x-axis.
To better understand each of the above-written visualization techniques, let's see a few examples.
In this example, we demonstrate how to create a simple side-by-side comparison of monthly sales for two products using plt.subplots
. The data is randomly generated, and the resulting plot shows the trend in sales for each product over the 12 months of the year.
import matplotlib.pyplot as pltimport numpy as npmonths = np.arange(1, 13)sales_product_a = np.random.randint(50, 200, size=12)sales_product_b = np.random.randint(50, 200, size=12)fig, axes = plt.subplots(1, 2, figsize=(12, 4))axes[0].plot(months, sales_product_a, color='blue', marker='o', label='Product A')axes[0].set_title('Monthly Sales - Product A')axes[1].plot(months, sales_product_b, color='green', marker='s', label='Product B')axes[1].set_title('Monthly Sales - Product B')for ax in axes:ax.set_xlabel('Month')ax.set_ylabel('Sales')ax.legend()# Adjust layout for better appearanceplt.tight_layout()# Show the plotplt.show()plt.savefig("output/graph.png")
Code explanation
Lines 4–6: Create an array months
representing the 12 months of a year and then generate random sales data for two products (A and B) using NumPy's randint
function.
Line 8: Create a figure fig
and a set of subplots axes
arranged in one row and two columns. Set the figure size to 12x4
inches.
Lines 10–14: Plot the sales data for Product A on the left subplot and Product B on the right subplot. Different markers and colors are used for clarity. Titles are set for each subplot.
Lines 16–19: Axis labels are set for both x and y axes, and legends are added to identify the products.
In this example, we create a vertically stacked two-subplot visualization comparing monthly temperature and precipitation data. The x-axis is shared between the subplots for better comparison.
import matplotlib.pyplot as pltimport numpy as np# Sample datamonths = np.arange(1, 13)temperature_data = np.random.uniform(10, 30, size=12)precipitation_data = np.random.uniform(0, 100, size=12)# Create subplots with a shared x-axisfig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=(8, 6))# Plot temperature data on the first subplotax1.plot(months, temperature_data, color='orange', marker='o', label='Temperature')ax1.set_ylabel('Temperature (°C)')ax1.set_title('Monthly Temperature and Precipitation')# Plot precipitation data on the second subplotax2.plot(months, precipitation_data, color='blue', marker='s', label='Precipitation')ax2.set_xlabel('Month')ax2.set_ylabel('Precipitation (mm)')# Display legendsax1.legend()ax2.legend()# Adjust layout for better appearanceplt.tight_layout()# Show the plotplt.show()plt.savefig("output/graph.png")
Code explanation
This code widget is similar to the previous one, so let's examine what has changed.
Line 10: Create a figure with two vertically stacked subplots. The sharex=True
parameter ensures that both subplots share the same x-axis.
In this example, we plot the first dataset, y1
, on the left y-axis, and the second dataset, y2
, on the right y-axis.
import matplotlib.pyplot as pltimport numpy as np# Generate sample datax = np.linspace(0, 10, 100)y1 = np.sin(x)y2 = 2 * np.cos(x)# Plot the first datasetplt.plot(x, y1, color='blue', label='Dataset 1')plt.xlabel('X-axis')plt.ylabel('Dataset 1', color='blue')# Create a twin Axes sharing the xaxisax2 = plt.twinx()ax2.plot(x, y2, color='green', label='Dataset 2')ax2.set_ylabel('Dataset 2', color='green')# Display legendsplt.legend(loc='upper left')ax2.legend(loc='upper right')# Display the plotplt.title('Visual Comparison of Two Datasets')plt.show()plt.savefig("output/graph.png")
Code explanation
Lines 10–12: Plot the first dataset, y1
, with a blue color. Set labels for the x-axis and y-axis for the first dataset.
Line 15: Create a twin axes, ax2
, that shares the same x-axis with the original plot using plt.twinx()
. This allows two different y-axes to be plotted on the same x-axis.
Lines 16–17: Plot the second dataset, y2
, on the twin axes with a green color. Set the y-axis label for the second dataset.
matplotlib
provides a rich set of tools for visually comparing plots, empowering data analysts to unearth insights and make informed decisions. Whether through subplots, dual-axis representation, or shared axes, the library offers flexibility to tailor visualizations to the unique characteristics of diverse datasets.
Free Resources