Spatial-Specific Attributes
Learn about the spatial attributes available in a GeoSeries.
What are accessors?
In pandas, we can use domain-specific (or dtype-specific) attributes and methods called accessors. Accessors offer a convenient way to access and manipulate data within the context of their respective data structures. For example, two common accessors in regular pandas are the .dt
accessor for manipulating datetime
objects and the .str
accessor for string data types.
In the example below, we have the GPS track data loaded in pandas with latitude and longitude coordinates, as well as a datetime
column named time
. It is possible to use the accessor on the time
column to extract specific elements from the date, such as splitting the exact hour, minute, and second as separate columns for each coordinate measurement, as shown below:
import pandas as pd# Open a sample GPS tracking (.txt) filedf = pd.read_csv('go_track_trackspoints.csv')# Convert the timestamp column to datetime dtypedf['timestamp'] = pd.to_datetime(df['time'])# extract the time values using .dt accessordf['hour'] = df['timestamp'].dt.hourdf['minute'] = df['timestamp'].dt.minutedf['second'] = df['timestamp'].dt.secondprint(df.head().to_html())
In line 7, we convert the time
column, initially imported as a string to a datetime
type column. Once the timestamp column is created, we can then access the datetime
-specific attributes by using the .dt
accessor (lines 10–12) to extract hours, minutes, and seconds. Keep in mind that pandas accessors are bound to a Series (i.e., column).
Spatial-specific attributes from Shapely
GeoPandas extends the features of regular pandas by adding a geometry data type to its manipulation. For this purpose, it relies on the Shapely library, so that each geometry is a Shapely object stored in the GeoDataFrame (or GeoSeries). Similar to pandas accessors, GeoPandas also provides spatial-specific attributes and methods designed to work with spatial data (i.e., Shapely geometries) and perform spatial operations. These attributes and methods are the same as those provided by Shapely.
While accessors are meant to operate on Series (as in pandas), GeoPandas also allows accessing spatial attributes directly on the GeoDataFrame. That's because a GeoDataFrame can have only one active geometry (GeoSeries) column at a time. Therefore, accessing the GeoDataFrame is equivalent to accessing the specific geometry column.
The most common spatial-specific attributes available are:
area: This attribute returns the area of each geometry in a GeoSeries. The area is calculated using the underlying Shapely library and is expressed in the units of the CRS of the geometries.
length: This attribute returns the length (or perimeter) of each geometry in a GeoSeries. This is particularly useful for LineString and Polygon geometries. The length is expressed in the units of the geometries’ CRS.
bounds: This attribute returns a DataFrame containing the bounding box (i.e., minimum and maximum extents of latitude and longitude) for each geometry in a GeoSeries. The bounding box is the smallest axis-aligned rectangle that contains the geometry.
centroid: This attribute returns a Shapely Point representing the centroid (geometric center) of each geometry in a GeoSeries.
geom_type: This attribute returns the geometry type of each geometry in a GeoSeries as a string, such as a Point, LineString, Polygon, MultiPoint, MultiLineString, or MultiPolygon.
These spatial-specific attributes can be accessed directly from the geometry column (which is a GeoSeries) of a GeoDataFrame. They provide essential geometric properties that can be used in various spatial analysis tasks. The result is a GeoSeries or a Series with the resulting geometry or value.
Example: Calculating the area of countries
As an example, we are going to use the area attribute to plot the top six countries in terms of total area. To do that, we are going to use the 'countries.geojson'
dataset preloaded in the code widget. As mentioned earlier, the area will be expressed in the units of the CRS. Therefore, we need to first check the CRS of our Earth dataset with the following code:
import geopandas as gpdcountries = gpd.read_file('countries.geojson')print(countries.crs)
The EPSG:4326 refers to the World Geodetic System 1984 (WGS84) coordinate system and its coordinates are expressed in decimal degrees for latitude and longitude. This CRS is not projected and is not suitable for area calculation unless the geodesic area is being calculated, but that's an advanced topic and we won't go into that here. Normally, cylindrical projections used to visualize the whole country, such as the World Mercator (EPSG:3395), distort the shapes, especially toward the higher latitudes, as seen in the figure below. This distortion can cause anomalies in the area calculated by Shapely. That's why it is so important to select an appropriate projection when quantifying spatial attributes such as area and distance.
Thankfully, there are several different projections to work with these distinct tasks. For global-level area calculation, the cylindrical equal-area CRS (ESRI:54034) is a projected CRS using meters for units, which is suitable for preserving the area measurements in different latitudes.
In this case, before calculating the area, we need to project the GeoDataFrame to the cylindrical equal-area CRS, as shown in the following example. The area will be given in m2, so we will convert it to km2 by dividing the result by 106. The first result panel displays the top 6 countries in terms of total area and the second panel shows the actual areas in km2.
import geopandas as gpd# open the datasetgdf = gpd.read_file('countries.geojson')# project to ESRI:54034gdf = gdf.to_crs('ESRI:54034')# calculate the area in squared kilometersgdf['area'] = gdf.area / 1e6# get the 6 biggest countriesbiggest = gdf.sort_values(by='area', ascending=False).head(6)# create a dictionary to place the legend horizontallylegend_kwds={'loc': 'upper center', 'bbox_to_anchor': (0.7, .45), 'ncol': 2}# project back to WGS84 (for better visualization)biggest = biggest.to_crs(4326)ax = biggest.plot(figsize=(10, 8), column='sovereignt', legend=True, legend_kwds=legend_kwds)ax.set_ylabel('Latitude')ax.set_xlabel('Longitude')# save the figureax.figure.savefig('./output/biggest_countries.png', dpi=300)# print the areasprint(biggest[['name', 'area']].to_html())
Lines 1–4: We import
geopandas
and read the dataset.Line 7: We change the projection to a cylindrical equal-area (ESRI:54034).
Line 10: We get the area of each polygon and convert it to km2 by dividing the result by 106.
Line 13: We sort the values by area, and get the six biggest countries.
Line 16: We specify the properties for the map's legend.
Lines 19–22: We plot the map using the WGS:84 projection.