Introduction
GeoPandas is a powerful open-source Python library that allows for the analysis and manipulation of geospatial data. One common task when working with geospatial data is retrieving polygon coordinates from a GeoPandas DataFrame.
This can be done using several different methods, which will be discussed in this guide.
Creating a GeoPandas DataFrame
To replicate the example below, you can download the data from the following link.
import geopandas
gdf = geopandas.read_file('dataframe.geojson', driver='GeoJSON')
gdf.head()
The code above will return:
city_name longitude_c1 latitude_c1 longitude_c2 latitude_c2 \
0 Paris 2.2522 48.7566 2.2522 48.9566
1 London -0.2276 51.4072 -0.2276 51.6072
2 Moscow 37.5173 55.6558 37.5173 55.8558
3 Istanbul 28.8784 40.9082 28.8784 41.1082
longitude_c3 latitude_c3 longitude_c4 latitude_c4 \
0 2.4522 48.9566 2.4522 48.7566
1 -0.0276 51.6072 -0.0276 51.4072
2 37.7173 55.8558 37.7173 55.6558
3 29.0784 41.1082 29.0784 40.9082
geometry
0 POLYGON ((2.25220 48.75660, 2.25220 48.95660, ...
1 POLYGON ((-0.22760 51.40720, -0.22760 51.60720...
2 POLYGON ((37.51730 55.65580, 37.51730 55.85580...
3 POLYGON ((28.87840 40.90820, 28.87840 41.10820...
In this example, we can see that the data contains information about four different cities - Paris, London, Moscow, and Istanbul. Each city has its own coordinates represented by longitude and latitude values. These coordinates are then used to create polygons on a map using the shapely library.
Extracting Polygon Coordinates
There are various methods to extract Polygon Coordinates, each with varying levels of efficiency based on your dataframe's size and whether all polygons share the same number of vertices. Let's explore these approaches
Iterate over each row in the GeoDataFrame
This approach is suitable for small dataframes as it uses a for loop to iterate over each row:
for idx, row in gdf.iterrows():
X,Y = row['geometry'].exterior.coords.xy
print( list(X), list(Y) )
This code snippet iterates through each row of a GeoDataFrame named gdf using a for loop. Within each row, it extracts the X and Y coordinates of the exterior boundary of the geometry. This is achieved by accessing the 'geometry' column, then drilling down to the exterior and its coords attribute. The .xy method finally separates the coordinates into individual lists of X and Y values:
[2.2522, 2.2522, 2.4522, 2.4522, 2.2522] [48.7566, 48.9566, 48.9566, 48.7566, 48.7566]
[-0.2276, -0.2276, -0.0276, -0.0276, -0.2276] [51.4072, 51.6072, 51.6072, 51.4072, 51.4072]
[37.5173, 37.5173, 37.7173, 37.7173, 37.5173] [55.6558, 55.8558, 55.8558, 55.6558, 55.6558]
[28.8784, 28.8784, 29.0784, 29.0784, 28.8784] [40.9082, 41.1082, 41.1082, 40.9082, 40.9082]
Creating a function to extract Polygon coordinates
In order to extract Polygon coordinates from a given dataset, we can create a function that will take in the necessary input parameters and return the desired output. The following steps outline how to create such a function:
Using apply()
The apply() function in pandas allows us to apply a given function along an axis of the DataFrame. We can use this function to iterate over the rows or columns of our dataset and extract the necessary coordinates.
Here's an example code for creating our custom function using apply():
def get_polygon_coordinates(p):
X,Y = p.exterior.coords.xy
return list(X[:-1]), list(Y[:-1])
This function takes in a parameter 'p' which represents the Polygon object from our dataset. We use the exterior.coords.xy method to extract the x and y coordinates separately, and then return them as lists. The [:-1] at the end is used to exclude the last coordinate since it is a duplicate.
For instance, by accessing the first polygon
get_polygon_coordinates(gdf['geometry'][0])
we obtain:
([2.2522, 2.2522, 2.4522, 2.4522], [48.7566, 48.9566, 48.9566, 48.7566])
To apply our function to all rows of the geopandas dataframe, we can proceed with the following:
gdf['geometry'].apply(get_polygon_coordinates)
This will return a series with lists of x and y coordinates for each Polygon in our dataset:
0 ([2.2522, 2.2522, 2.4522, 2.4522], [48.7566, 4...
1 ([-0.2276, -0.2276, -0.0276, -0.0276], [51.407...
2 ([37.5173, 37.5173, 37.7173, 37.7173], [55.655...
3 ([28.8784, 28.8784, 29.0784, 29.0784], [40.908...
Name: geometry, dtype: object
This series contains all the extracted coordinates for each Polygon in our dataset. From here, we can easily manipulate and use these coordinates for further analysis or visualization purposes.
Using map()
For enhanced speed, utilizing the Python map function is another effective approach. By applying this function to our geometry column, we can obtain a list of the exterior coordinates for each Polygon in our dataset. The code would look like this:
list( map(get_polygon_coordinates,gdf['geometry']) )
which would return a list of tuples with x and y coordinate lists for each Polygon.
[([2.2522, 2.2522, 2.4522, 2.4522], [48.7566, 48.9566, 48.9566, 48.7566]),
([-0.2276, -0.2276, -0.0276, -0.0276], [51.4072, 51.6072, 51.6072, 51.4072]),
([37.5173, 37.5173, 37.7173, 37.7173], [55.6558, 55.8558, 55.8558, 55.6558]),
([28.8784, 28.8784, 29.0784, 29.0784], [40.9082, 41.1082, 41.1082, 40.9082])]
If our polygons have an equal number of vertices, we can store the coordinates in a NumPy matrix. Here is the code snippet for achieving this:
import numpy as np
coords = np.asarray( list( map(get_polygon_coordinates,gdf['geometry']) ) )
Please note that the matrix shape is displayed here:
coords.shape
which results in:
(4, 2, 4)
We can reshape our matrix coordinates using the command
coords.reshape(4,8)
which results in the following array structure:
array([[ 2.25220e+00, 2.25220e+00, 2.45220e+00, 2.45220e+00,
4.87566e+01, 4.89566e+01, 4.89566e+01, 4.87566e+01],
[-2.27600e-01, -2.27600e-01, -2.76000e-02, -2.76000e-02,
5.14072e+01, 5.16072e+01, 5.16072e+01, 5.14072e+01],
[ 3.75173e+01, 3.75173e+01, 3.77173e+01, 3.77173e+01,
5.56558e+01, 5.58558e+01, 5.58558e+01, 5.56558e+01],
[ 2.88784e+01, 2.88784e+01, 2.90784e+01, 2.90784e+01,
4.09082e+01, 4.11082e+01, 4.11082e+01, 4.09082e+01]])
Subsequently, store the outcome in new columns within our dataframe:
import pandas as pd
pd.concat([gdf,
pd.DataFrame( coords.reshape(4,8) )],
axis=1)
The following dataframe will be generated:
city_name longitude_c1 latitude_c1 longitude_c2 latitude_c2 \
0 Paris 2.2522 48.7566 2.2522 48.9566
1 London -0.2276 51.4072 -0.2276 51.6072
2 Moscow 37.5173 55.6558 37.5173 55.8558
3 Istanbul 28.8784 40.9082 28.8784 41.1082
longitude_c3 latitude_c3 longitude_c4 latitude_c4 \
0 2.4522 48.9566 2.4522 48.7566
1 -0.0276 51.6072 -0.0276 51.4072
2 37.7173 55.8558 37.7173 55.6558
3 29.0784 41.1082 29.0784 40.9082
geometry 0 1 \
0 POLYGON ((2.25220 48.75660, 2.25220 48.95660, ... 2.2522 2.2522
1 POLYGON ((-0.22760 51.40720, -0.22760 51.60720... -0.2276 -0.2276
2 POLYGON ((37.51730 55.65580, 37.51730 55.85580... 37.5173 37.5173
3 POLYGON ((28.87840 40.90820, 28.87840 41.10820... 28.8784 28.8784
2 3 4 5 6 7
0 2.4522 2.4522 48.7566 48.9566 48.9566 48.7566
1 -0.0276 -0.0276 51.4072 51.6072 51.6072 51.4072
2 37.7173 37.7173 55.6558 55.8558 55.8558 55.6558
3 29.0784 29.0784 40.9082 41.1082 41.1082 40.9082
We can also assign names to the new columns:
pd.concat([gdf,
pd.DataFrame( coords.reshape(4,8), columns=['x1','x2','x3','x4','y1','y2','y3','y4'] )],
axis=1)
The following dataframe will be generated:
city_name longitude_c1 latitude_c1 longitude_c2 latitude_c2 \
0 Paris 2.2522 48.7566 2.2522 48.9566
1 London -0.2276 51.4072 -0.2276 51.6072
2 Moscow 37.5173 55.6558 37.5173 55.8558
3 Istanbul 28.8784 40.9082 28.8784 41.1082
longitude_c3 latitude_c3 longitude_c4 latitude_c4 \
0 2.4522 48.9566 2.4522 48.7566
1 -0.0276 51.6072 -0.0276 51.4072
2 37.7173 55.8558 37.7173 55.6558
3 29.0784 41.1082 29.0784 40.9082
geometry x1 x2 \
0 POLYGON ((2.25220 48.75660, 2.25220 48.95660, ... 2.2522 2.2522
1 POLYGON ((-0.22760 51.40720, -0.22760 51.60720... -0.2276 -0.2276
2 POLYGON ((37.51730 55.65580, 37.51730 55.85580... 37.5173 37.5173
3 POLYGON ((28.87840 40.90820, 28.87840 41.10820... 28.8784 28.8784
x3 x4 y1 y2 y3 y4
0 2.4522 2.4522 48.7566 48.9566 48.9566 48.7566
1 -0.0276 -0.0276 51.4072 51.6072 51.6072 51.4072
2 37.7173 37.7173 55.6558 55.8558 55.8558 55.6558
3 29.0784 29.0784 40.9082 41.1082 41.1082 40.9082
References
Links | Site |
---|---|
How to extract coordinates from a Shapely polygon in python ? | en.moonbooks.org |
How to create and plot polygons in python using shapely and matplotlib ? | en.moonbooks.org |
shapely user manual: polygons | shapely.readthedocs.io |