How to retrieve polygon coordinates from a GeoPandas DataFrame ?

Published: February 20, 2024

Tags: Geopandas;

DMCA.com Protection Status

Introduction

GeoPandas is a powerful open-source Python library that allows for the analysis and manipulation of geospatial data. One common task when working with geospatial data is retrieving polygon coordinates from a GeoPandas DataFrame.

This can be done using several different methods, which will be discussed in this guide.

Creating a GeoPandas DataFrame

To replicate the example below, you can download the data from the following link.

import geopandas

gdf = geopandas.read_file('dataframe.geojson', driver='GeoJSON')

gdf.head()

The code above will return:

  city_name  longitude_c1  latitude_c1  longitude_c2  latitude_c2  \
0     Paris        2.2522      48.7566        2.2522      48.9566   
1    London       -0.2276      51.4072       -0.2276      51.6072   
2    Moscow       37.5173      55.6558       37.5173      55.8558   
3  Istanbul       28.8784      40.9082       28.8784      41.1082

   longitude_c3  latitude_c3  longitude_c4  latitude_c4  \
0        2.4522      48.9566        2.4522      48.7566   
1       -0.0276      51.6072       -0.0276      51.4072   
2       37.7173      55.8558       37.7173      55.6558   
3       29.0784      41.1082       29.0784      40.9082

                                            geometry  
0  POLYGON ((2.25220 48.75660, 2.25220 48.95660, ...  
1  POLYGON ((-0.22760 51.40720, -0.22760 51.60720...  
2  POLYGON ((37.51730 55.65580, 37.51730 55.85580...  
3  POLYGON ((28.87840 40.90820, 28.87840 41.10820...

In this example, we can see that the data contains information about four different cities - Paris, London, Moscow, and Istanbul. Each city has its own coordinates represented by longitude and latitude values. These coordinates are then used to create polygons on a map using the shapely library.

Extracting Polygon Coordinates

There are various methods to extract Polygon Coordinates, each with varying levels of efficiency based on your dataframe's size and whether all polygons share the same number of vertices. Let's explore these approaches

Iterate over each row in the GeoDataFrame

This approach is suitable for small dataframes as it uses a for loop to iterate over each row:

for idx, row in gdf.iterrows():

    X,Y = row['geometry'].exterior.coords.xy

    print( list(X),  list(Y) )

This code snippet iterates through each row of a GeoDataFrame named gdf using a for loop. Within each row, it extracts the X and Y coordinates of the exterior boundary of the geometry. This is achieved by accessing the 'geometry' column, then drilling down to the exterior and its coords attribute. The .xy method finally separates the coordinates into individual lists of X and Y values:

[2.2522, 2.2522, 2.4522, 2.4522, 2.2522] [48.7566, 48.9566, 48.9566, 48.7566, 48.7566]
[-0.2276, -0.2276, -0.0276, -0.0276, -0.2276] [51.4072, 51.6072, 51.6072, 51.4072, 51.4072]
[37.5173, 37.5173, 37.7173, 37.7173, 37.5173] [55.6558, 55.8558, 55.8558, 55.6558, 55.6558]
[28.8784, 28.8784, 29.0784, 29.0784, 28.8784] [40.9082, 41.1082, 41.1082, 40.9082, 40.9082]

Creating a function to extract Polygon coordinates

In order to extract Polygon coordinates from a given dataset, we can create a function that will take in the necessary input parameters and return the desired output. The following steps outline how to create such a function:

Using apply()

The apply() function in pandas allows us to apply a given function along an axis of the DataFrame. We can use this function to iterate over the rows or columns of our dataset and extract the necessary coordinates.

Here's an example code for creating our custom function using apply():

def get_polygon_coordinates(p):

    X,Y = p.exterior.coords.xy

    return list(X[:-1]), list(Y[:-1])

This function takes in a parameter 'p' which represents the Polygon object from our dataset. We use the exterior.coords.xy method to extract the x and y coordinates separately, and then return them as lists. The [:-1] at the end is used to exclude the last coordinate since it is a duplicate.

For instance, by accessing the first polygon

get_polygon_coordinates(gdf['geometry'][0])

we obtain:

([2.2522, 2.2522, 2.4522, 2.4522], [48.7566, 48.9566, 48.9566, 48.7566])

To apply our function to all rows of the geopandas dataframe, we can proceed with the following:

gdf['geometry'].apply(get_polygon_coordinates)

This will return a series with lists of x and y coordinates for each Polygon in our dataset:

0    ([2.2522, 2.2522, 2.4522, 2.4522], [48.7566, 4...
1    ([-0.2276, -0.2276, -0.0276, -0.0276], [51.407...
2    ([37.5173, 37.5173, 37.7173, 37.7173], [55.655...
3    ([28.8784, 28.8784, 29.0784, 29.0784], [40.908...
Name: geometry, dtype: object

This series contains all the extracted coordinates for each Polygon in our dataset. From here, we can easily manipulate and use these coordinates for further analysis or visualization purposes.

Using map()

For enhanced speed, utilizing the Python map function is another effective approach. By applying this function to our geometry column, we can obtain a list of the exterior coordinates for each Polygon in our dataset. The code would look like this:

list( map(get_polygon_coordinates,gdf['geometry']) )

which would return a list of tuples with x and y coordinate lists for each Polygon.

[([2.2522, 2.2522, 2.4522, 2.4522], [48.7566, 48.9566, 48.9566, 48.7566]),
 ([-0.2276, -0.2276, -0.0276, -0.0276], [51.4072, 51.6072, 51.6072, 51.4072]),
 ([37.5173, 37.5173, 37.7173, 37.7173], [55.6558, 55.8558, 55.8558, 55.6558]),
 ([28.8784, 28.8784, 29.0784, 29.0784], [40.9082, 41.1082, 41.1082, 40.9082])]

If our polygons have an equal number of vertices, we can store the coordinates in a NumPy matrix. Here is the code snippet for achieving this:

import numpy as np

coords = np.asarray( list( map(get_polygon_coordinates,gdf['geometry']) ) )

Please note that the matrix shape is displayed here:

coords.shape

which results in:

(4, 2, 4)

We can reshape our matrix coordinates using the command

coords.reshape(4,8)

which results in the following array structure:

array([[ 2.25220e+00,  2.25220e+00,  2.45220e+00,  2.45220e+00,
         4.87566e+01,  4.89566e+01,  4.89566e+01,  4.87566e+01],
       [-2.27600e-01, -2.27600e-01, -2.76000e-02, -2.76000e-02,
         5.14072e+01,  5.16072e+01,  5.16072e+01,  5.14072e+01],
       [ 3.75173e+01,  3.75173e+01,  3.77173e+01,  3.77173e+01,
         5.56558e+01,  5.58558e+01,  5.58558e+01,  5.56558e+01],
       [ 2.88784e+01,  2.88784e+01,  2.90784e+01,  2.90784e+01,
         4.09082e+01,  4.11082e+01,  4.11082e+01,  4.09082e+01]])

Subsequently, store the outcome in new columns within our dataframe:

import pandas as pd

pd.concat([gdf, 
           pd.DataFrame( coords.reshape(4,8) )], 
           axis=1)

The following dataframe will be generated:

  city_name  longitude_c1  latitude_c1  longitude_c2  latitude_c2  \
0     Paris        2.2522      48.7566        2.2522      48.9566   
1    London       -0.2276      51.4072       -0.2276      51.6072   
2    Moscow       37.5173      55.6558       37.5173      55.8558   
3  Istanbul       28.8784      40.9082       28.8784      41.1082

   longitude_c3  latitude_c3  longitude_c4  latitude_c4  \
0        2.4522      48.9566        2.4522      48.7566   
1       -0.0276      51.6072       -0.0276      51.4072   
2       37.7173      55.8558       37.7173      55.6558   
3       29.0784      41.1082       29.0784      40.9082

                                            geometry        0        1  \
0  POLYGON ((2.25220 48.75660, 2.25220 48.95660, ...   2.2522   2.2522   
1  POLYGON ((-0.22760 51.40720, -0.22760 51.60720...  -0.2276  -0.2276   
2  POLYGON ((37.51730 55.65580, 37.51730 55.85580...  37.5173  37.5173   
3  POLYGON ((28.87840 40.90820, 28.87840 41.10820...  28.8784  28.8784

         2        3        4        5        6        7  
0   2.4522   2.4522  48.7566  48.9566  48.9566  48.7566  
1  -0.0276  -0.0276  51.4072  51.6072  51.6072  51.4072  
2  37.7173  37.7173  55.6558  55.8558  55.8558  55.6558  
3  29.0784  29.0784  40.9082  41.1082  41.1082  40.9082

We can also assign names to the new columns:

pd.concat([gdf, 
           pd.DataFrame( coords.reshape(4,8), columns=['x1','x2','x3','x4','y1','y2','y3','y4'] )], 
           axis=1)

The following dataframe will be generated:

  city_name  longitude_c1  latitude_c1  longitude_c2  latitude_c2  \
0     Paris        2.2522      48.7566        2.2522      48.9566   
1    London       -0.2276      51.4072       -0.2276      51.6072   
2    Moscow       37.5173      55.6558       37.5173      55.8558   
3  Istanbul       28.8784      40.9082       28.8784      41.1082

   longitude_c3  latitude_c3  longitude_c4  latitude_c4  \
0        2.4522      48.9566        2.4522      48.7566   
1       -0.0276      51.6072       -0.0276      51.4072   
2       37.7173      55.8558       37.7173      55.6558   
3       29.0784      41.1082       29.0784      40.9082

                                            geometry       x1       x2  \
0  POLYGON ((2.25220 48.75660, 2.25220 48.95660, ...   2.2522   2.2522   
1  POLYGON ((-0.22760 51.40720, -0.22760 51.60720...  -0.2276  -0.2276   
2  POLYGON ((37.51730 55.65580, 37.51730 55.85580...  37.5173  37.5173   
3  POLYGON ((28.87840 40.90820, 28.87840 41.10820...  28.8784  28.8784

        x3       x4       y1       y2       y3       y4  
0   2.4522   2.4522  48.7566  48.9566  48.9566  48.7566  
1  -0.0276  -0.0276  51.4072  51.6072  51.6072  51.4072  
2  37.7173  37.7173  55.6558  55.8558  55.8558  55.6558  
3  29.0784  29.0784  40.9082  41.1082  41.1082  40.9082

References