A choropleth map displays various data in different regions to visualize geographical data on a region-by-region basis. Due to significant progress in geospatial analysis, the market now offers various map plotting techniques. In this blog, we will study choropleth maps and plot them using the GeoPandas Python library.
Geospatial analysis examines, interprets, and manipulates geographic data and information associated with specific locations on the Earth’s surface. It consists of various techniques and methodologies for understanding spatial patterns, relationships, and trends within geographic data.
GeoPandas is an open-source Python library that extends the capabilities of pandas, a widely used data manipulation library, to handle geospatial data more efficiently. It provides a user-friendly and powerful interface for working with geospatial datasets.
GeoPandas helps read, write, visualize, and analyze geographic data in various formats, such as shapefiles, GeoJSON, Geospatial Data Abstraction Library (GDAL) formats, and more. GeoPandas allows seamless integration with data analysis workflows, making combining geospatial data with non-spatial data and performing complex spatial operations easier.
We can easily use this library with just one command:
pip install geopandas
Let’s learn about choropleth maps, their types, and their advantages.
Choropleth originates from combining two Greek words: choros, which signifies region, and plethos, which means multitude. This type of map displays various data in different regions to visualize geographical data on a region-by-region basis. These maps represent data using different colors or shading patterns to indicate the variation in a specific variable across geographic areas, such as countries, states, provinces, counties, or other administrative divisions.
Let’s create our first choropleth map in Python:
import geopandas as gpdimport matplotlib.pyplot as pltworld_map = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))fig, ax = plt.subplots(figsize=(10, 6))variable = "pop_est"cmap = "viridis"world_map.plot(column=variable, cmap=cmap, linewidth=0.8, ax=ax, edgecolor='0.8', legend=True)ax.set_ylabel('Latitude')ax.set_xlabel('Longitude')plt.show()
Line 1: We import the geopandas
library.
Line 2: We import the pyplot
module from the matplotlib
library, which is used for creating plots and visualizations.
Line 4: We retrieve the file path for the built-in Natural Earth dataset called naturalearth_lowres
in geopandas
to the variable world_map
which contains low-resolution geometries and attributes of countries. The gpd.read_file
method reads the data from the file specified in the argument (in this case, the Natural Earth dataset). It returns a GeoDataFrame, a specialized data structure in GeoPandas for handling geospatial data.
Line 6: We define the variable to be displayed on the choropleth map. In this case, it is set to "pop_est"
, representing the population estimate attribute in the dataset.
Line 7: We define the colormap to be used for the choropleth map. The term "viridis"
represents the perceptually uniform sequential colormap.
Line 8: We plot the choropleth map using the GeoDataFrame world_map
.
column=variable
: Specifies the column in the GeoDataFrame that contains the data to be visualized. Here, it is set to the value of the variable
, which is "pop_est"
.
cmap=cmap
: Sets the colormap for the choropleth map.
linewidth=0.8
: Sets the width of the boundary lines between the polygons on the map.
ax=ax
: Specifies the axes on which to plot the map. In this case, it uses the ax
created earlier in Line 5 with plt.subplots
.
edgecolor='0.8
': Sets the color of the boundary lines between the polygons.
Here is the result of the aforementioned code:
To create a categorical choropleth map with legends, we’ll use a modified version of the world map data that contains categorical data, such as regions or categories for different countries. For this example, we’ll use the Natural Earth dataset again, but create a new column called Category
to represent the categories for each country.
Here’s the code to create a categorical choropleth map with legends:
import geopandas as gpdimport matplotlib.pyplot as pltworld_map = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))categories = {'United States': 'North America','Canada': 'North America','Brazil': 'South America','China': 'Asia','India': 'Asia','Australia': 'Oceania','France': 'Europe','South Africa': 'Africa',}world_map['Category'] = world_map['name'].map(categories)fig, ax = plt.subplots(figsize=(20, 10))variable = 'Category'cmap = 'Set1'world_map.plot(column=variable, cmap=cmap, linewidth=0.8, ax=ax, edgecolor='0.8', legend=True)ax.set_ylabel('Latitude')ax.set_xlabel('Longitude')plt.show()
Lines 5–14: We define a dictionary named categories
. It maps country names to their respective categories or regions.
Line 16: We add a new column Category
to the world_map
GeoDataFrame. It maps the values in the name
column (country names) to the categories defined in the categories
dictionary and assigns the corresponding category to each country.
The output of this code will be the following map:
Remember, these maps are versatile tools that can uncover insights with just a glance and can be employed in a wide range of real-world scenarios.
Let's explore an application scenario for choropleth maps.
In response to the COVID-19 pandemic, governments worldwide launched vaccination campaigns to curb the spread of the virus. To assess the progress of these campaigns, we are required to generate a choropleth map that visualizes the hypothetical COVID-19 vaccination rates by country. Let’s have a look at the code:
import geopandas as gpdimport pandas as pdimport numpy as npimport matplotlib.pyplot as pltworld = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))# Create dummy vaccination data for all countriesnp.random.seed(42) # For reproducibilityvaccination_data = {'Country': world['name'],'Vaccination Rate (%)': np.random.randint(0, 100, len(world)) # Generate random values}vaccination_df = pd.DataFrame(vaccination_data)merged_data = world.merge(vaccination_df, left_on='name', right_on='Country')fig, ax = plt.subplots(figsize=(12, 8))variable = 'Vaccination Rate (%)'cmap = 'YlGnBu'merged_data.plot(column=variable, cmap=cmap, linewidth=0.8, ax=ax, edgecolor='0.8', legend=True)ax.set_ylabel('Latitude')ax.set_xlabel('Longitude')plt.show()
Line 10: We set the random seed for reproducibility, ensuring that random numbers generated are consistent across runs.
Lines 11–14: A dictionary is created containing two keys: Country
and Vaccination Rate (%)
. The Country
key gets values from the name
column of the world
DataFrame. The Vaccination Rate (%)
and key is populated with random integer values (between 0 and 100) generated using np.random.randint()
for the same length as the world
DataFrame.
The output of this code will be the following map:
This blog has introduced geospatial analysis, GeoPandas library, and the exciting world of choropleth maps. Our exploration dived deeper into the details of choropleth maps and explored the diverse scenarios in which they apply.
If you want to learn more about choropleth maps, look no further! Check out the exciting new courses available on the Educative platform:
Free Resources