How to create a GeoDataFrame with Polygon geometry from a Pandas DataFrame with coordinates ?

Published: February 16, 2024

Tags: Python; Geopandas;

DMCA.com Protection Status

Introduction

Creating a GeoDataFrame with Polygon geometry from a simple Pandas DataFrame involves a few steps. These steps include preparing your data, importing necessary libraries, creating the Polygon geometries, and finally constructing the GeoDataFrame. Here's a step-by-step guide to accomplishing this:

Create a Pandas DataFrame with coordinates

First, ensure your Pandas DataFrame contains the coordinates necessary to construct your Polygon. These coordinates should be in a format that can be interpreted as points to make up the Polygon(s). Typically, this would be columns for latitude and longitude, or a single column with tuples/lists containing both values.

import pandas as pd

data = {'city_name':['Paris','London','Moscow', 'Istanbul'],
       'longitude_c1':[2.3522-0.1,-0.1276-0.1,37.6173-0.1,28.9784-0.1],
       'latitude_c1':[48.8566-0.1,51.5072-0.1,55.7558-0.1,41.0082-0.1],
       'longitude_c2':[2.3522-0.1,-0.1276-0.1,37.6173-0.1,28.9784-0.1],
       'latitude_c2':[48.8566+0.1,51.5072+0.1,55.7558+0.1,41.0082+0.1],           
       'longitude_c3':[2.3522+0.1,-0.1276+0.1,37.6173+0.1,28.9784+0.1],
       'latitude_c3':[48.8566+0.1,51.5072+0.1,55.7558+0.1,41.0082+0.1],           
       'longitude_c4':[2.3522+0.1,-0.1276+0.1,37.6173+0.1,28.9784+0.1],
       'latitude_c4':[48.8566-0.1,51.5072-0.1,55.7558-0.1,41.0082-0.1]
       }

df = pd.DataFrame(data)

print(df)

This will give us the following output:

  city_name  longitude_c1  latitude_c1  longitude_c2  latitude_c2  \
0     Paris        2.2522      48.7566        2.2522      48.9566   
1    London       -0.2276      51.4072       -0.2276      51.6072   
2    Moscow       37.5173      55.6558       37.5173      55.8558   
3  Istanbul       28.8784      40.9082       28.8784      41.1082

   longitude_c3  latitude_c3  longitude_c4  latitude_c4  
0        2.4522      48.9566        2.4522      48.7566  
1       -0.0276      51.6072       -0.0276      51.4072  
2       37.7173      55.8558       37.7173      55.6558  
3       29.0784      41.1082       29.0784      40.9082

Constructing our Polygon(s)

Now, with this DataFrame in place, we can begin constructing our Polygon(s) using the coordinates provided. We will be using the shapely library for this task, which provides various geometric operations on these coordinates.

There are various approaches to create our Polygons, depending on the size of the dataframe. While creating a function is a common approach, it can become slow as the dataframe size increases. In such cases, using the map() function is recommended.

Approach 1: Creating a Python function

To begin, we need to add a new column to our dataframe that will hold the points required for constructing our Polygon(s). This can be achieved by utilizing the apply function.

Here's an example of how to accomplish this using the shapely.geometry module:

from shapely.geometry import Polygon

def create_polygon(x):

    p1 = [x['longitude_c1'],x['latitude_c1']]
    p2 = [x['longitude_c2'],x['latitude_c2']]
    p3 = [x['longitude_c3'],x['latitude_c3']]
    p4 = [x['longitude_c4'],x['latitude_c4']]

    pixel_polygon = Polygon([p1,p2,p3,p4])

    return pixel_polygon

This will take each row in our dataframe and use the longitude and latitude columns to create a list of tuples, which can be interpreted as points

This code snippet iterates through each row of our dataframe and utilizes the longitude and latitude columns to generate a list of points. By invoking the function,

df.apply(create_polygon, axis=1)

a pandas.core.series.Series object is returned:

0    POLYGON ((2.2522 48.7566, 2.2522 48.9566, 2.45...
1    POLYGON ((-0.2276 51.4072, -0.2276 51.6072, -0...
2    POLYGON ((37.5173 55.6558, 37.5173 55.8558, 37...
3    POLYGON ((28.8784 40.9082, 28.8784 41.1082, 29...
dtype: object

Now, we have the opportunity to add a new column to our dataframe, which we can name "geometry":

df['geometry'] = df.apply(create_polygon, axis=1)

We can also visualize our Polygon(s) using the matplotlib library, which provides various plotting functions.

import matplotlib.pyplot as plt

plt.figure() # creates a new figure to plot on
for polygon in df['geometry']:
    plt.plot(*polygon.exterior.xy) # plots the exterior coordinates of each Polygon object as a line
plt.title('How to create a GeoDataFrame with Polygon geometry \n from a Pandas DataFrame with coordinates ?')
plt.savefig('geopandas_polygons_01.png', dpi=100, bbox_inches='tight')
plt.show() # displays the plot

The resulting visualization will show us the boundaries of our Polygons, based on the given longitude and latitude coordinates. We can also add other data to our Polygons, such as labels or colors, to further customize the plot.

Approach 2: Utilizing the map() function

The previous approach works best for smaller DataFrames. Another solution involves utilizing the python map function to generate a column of polygon geometry. Now, let's explore how this can be achieved using our example

First, let's extract the coordinates from the dataframe and store them in a numpy array. We can achieve this by selecting the columns 'longitude_c1', 'latitude_c1', 'longitude_c2', 'latitude_c2', 'longitude_c3', 'latitude_c3', 'longitude_c4', and 'latitude_c4'. Then, we can convert this selection into a numpy array using the to_numpy() function:

Coords = df[ ['longitude_c1', 'latitude_c1', \
              'longitude_c2', 'latitude_c2', \
              'longitude_c3', 'latitude_c3', \
              'longitude_c4', 'latitude_c4'] ].to_numpy()

The above code will display

array([[ 2.25220e+00,  4.87566e+01,  2.25220e+00,  4.89566e+01,
         2.45220e+00,  4.89566e+01,  2.45220e+00,  4.87566e+01],
       [-2.27600e-01,  5.14072e+01, -2.27600e-01,  5.16072e+01,
        -2.76000e-02,  5.16072e+01, -2.76000e-02,  5.14072e+01],
       [ 3.75173e+01,  5.56558e+01,  3.75173e+01,  5.58558e+01,
         3.77173e+01,  5.58558e+01,  3.77173e+01,  5.56558e+01],
       [ 2.88784e+01,  4.09082e+01,  2.88784e+01,  4.11082e+01,
         2.90784e+01,  4.11082e+01,  2.90784e+01,  4.09082e+01]])

By examining the shape of our matrix,

Coords.shape

, we observe that it consists of 4 rows with 8 coordinates, resulting in a shape of

(4, 8)

However, the shapely Polygon function (shapely.Polygon) requires the coordinates to be in the shape (N,2) (see shapely.Polygon). To meet this requirement, we can easily reshape our matrix by executing the following line of code:

Coords = Coords.reshape(4,4,2)

Consequently, our matrix now has a shape of (4,4,2), signifying 4 rows with 4 points, each defined by 2 coordinates representing longitude and latitude.

To create our polygons, we can utilize the built-in Python function map() as follows:

polygons = list(map(Polygon, Coords.tolist()))

This code generates a list of four polygons:

[
<shapely.geometry.polygon.Polygon at 0x7f7e1014f700>,
<shapely.geometry.polygon.Polygon at 0x7f7e08f37820>,
<shapely.geometry.polygon.Polygon at 0x7f7e1014fe50>,
<shapely.geometry.polygon.Polygon at 0x7f7e1014fd90>
]

We can utilize this list to generate a new column in our dataframe, such as "geometry," which will hold our polygons.

df['geometry'] = polygons

To display the dataframe, we can use the print function:

print(df)

The above code will display

  city_name  longitude_c1  latitude_c1  longitude_c2  latitude_c2  \
0     Paris        2.2522      48.7566        2.2522      48.9566   
1    London       -0.2276      51.4072       -0.2276      51.6072   
2    Moscow       37.5173      55.6558       37.5173      55.8558   
3  Istanbul       28.8784      40.9082       28.8784      41.1082

   longitude_c3  latitude_c3  longitude_c4  latitude_c4  \
0        2.4522      48.9566        2.4522      48.7566   
1       -0.0276      51.6072       -0.0276      51.4072   
2       37.7173      55.8558       37.7173      55.6558   
3       29.0784      41.1082       29.0784      40.9082

                                            geometry  
0  POLYGON ((2.2522 48.7566, 2.2522 48.9566, 2.45...  
1  POLYGON ((-0.2276 51.4072, -0.2276 51.6072, -0...  
2  POLYGON ((37.5173 55.6558, 37.5173 55.8558, 37...  
3  POLYGON ((28.8784 40.9082, 28.8784 41.1082, 29...

This approach can be beneficial for larger data sets, as it uses the efficient numpy library and the built-in map function.

Overall, there are multiple ways to generate polygons from a dataframe containing longitude and latitude coordinates. The chosen method will depend on the specific project requirements and the size of the dataset.

Converting our Pandas DataFrame into a GeoDataFrame

To convert our Pandas DataFrame into a GeoDataFrame, we can easily achieve this by using the following code snippet.

import geopandas

gdf = geopandas.GeoDataFrame(
    df, 
    geometry=df['geometry'], 
    crs="EPSG:4326"
)

Here, we are specifying the geometry column as "geometry" and assigning it a coordinate reference system (CRS) of EPSG:4326, which is commonly used for latitude and longitude coordinates. This will convert our dataframe into a geodataframe with polygon geometries in the "geometry" column.

References

Links Site
shapely.Polygon shapely.readthedocs.io
map() docs.python.org