How to create a GeoDataFrame from a Pandas DataFrame ?


Introduction

Creating a GeoDataFrame from a Pandas DataFrame involves adding a column of geometric data, which typically represents shapes such as points, lines, or polygons.

Here’s a step-by-step guide using Python and the libraries Pandas and GeoPandas:

Creating a GeoDataFrame with a point based on latitude and longitude as its geometry

When creating a geodataframe, the most common scenario is to have a column with points as its geometry. This can be easily accomplished using the geopandas function points_from_xy():

Creating a Pandas DataFrame containing columns for longitudes and latitudes

As an example, let's consider a scenario where we have data stored in a dictionary containing latitude and longitude values for different cities:

data = {'city_name':['Paris','London','Moscow', 'Istanbul'],
       'longitude':[2.3522,-0.1276,37.6173,28.9784],
       'latitude':[48.8566,51.5072,55.7558,41.0082]}

To store our data in a Pandas DataFrame, we can use the following code:

import pandas as pd

df = pd.DataFrame(data)

print(df)

The above code will display

  city_name  longitude  latitude
0     Paris     2.3522   48.8566
1    London    -0.1276   51.5072
2    Moscow    37.6173   55.7558
3  Istanbul    28.9784   41.0082

Converting the pandas DataFrame into a GeoDataFrame using points_from_xy() function

What is points_from_xy ?: points_from_xy is a function within the geopandas library that converts a series of latitude and longitude coordinates into a GeoSeries with Point geometries. This is a convenient way to quickly create a geometry column for our GeoDataFrame without having to manually construct each individual point.

To utilize the points_from_xy function, we first import the necessary libraries: pandas and geopandas. This function allows us to create a geopandas dataframe by providing a series of latitude and longitude coordinates. Let's examine an example:

import geopandas

gdf = geopandas.GeoDataFrame(
    df, 
    geometry=geopandas.points_from_xy(df.longitude, df.latitude), 
    crs="EPSG:4326"
)

By executing the code, we will generate the following geopandas dataframe with geometry points represented by longitude and latitude:

  city_name  longitude  latitude                   geometry
0     Paris     2.3522   48.8566   POINT (2.35220 48.85660)
1    London    -0.1276   51.5072  POINT (-0.12760 51.50720)
2    Moscow    37.6173   55.7558  POINT (37.61730 55.75580)
3  Istanbul    28.9784   41.0082  POINT (28.97840 41.00820)

This approach allows us to efficiently work with geospatial data using geopandas in Python.

Generating a GeoDataFrame from a Pandas DataFrame with complex geometries

So far, we have seen a straightforward example that is frequently used. Now, let's explore how to work with more complex geometries, such as polygons or multipoints.

Utilizing polygons as geometric shapes

Polygons are another type of geometric shape that can be used in a GeoDataFrame. They consist of a series of connected points, with the final point connecting back to the first point to create a closed shape. This allows us to represent more complex boundaries and areas.

To generate a GeoDataFrame from polygons, we will use the same steps as before, but this time we will provide the coordinates for each point in the polygon. Let's continue using our previous example of cities:

data = {'city_name':['Paris','London','Moscow', 'Istanbul'],
       'longitude_c1':[2.3522-0.1,-0.1276-0.1,37.6173-0.1,28.9784-0.1],
       'latitude_c1':[48.8566-0.1,51.5072-0.1,55.7558-0.1,41.0082-0.1],
       'longitude_c2':[2.3522-0.1,-0.1276-0.1,37.6173-0.1,28.9784-0.1],
       'latitude_c2':[48.8566+0.1,51.5072+0.1,55.7558+0.1,41.0082+0.1],           
       'longitude_c3':[2.3522+0.1,-0.1276+0.1,37.6173+0.1,28.9784+0.1],
       'latitude_c3':[48.8566+0.1,51.5072+0.1,55.7558+0.1,41.0082+0.1],           
       'longitude_c4':[2.3522+0.1,-0.1276+0.1,37.6173+0.1,28.9784+0.1],
       'latitude_c4':[48.8566-0.1,51.5072-0.1,55.7558-0.1,41.0082-0.1]
       }

Here, we have added four points for each city to represent a polygon shape. This will create a total of four coordinates for each city, forming a square around the city center.

Now, we shall proceed to generate a pandas dataframe:

import pandas as pd

df = pd.DataFrame(data)

print(df)

This will give us the following output:

  city_name  longitude_c1  latitude_c1  longitude_c2  latitude_c2  \
0     Paris        2.2522      48.7566        2.2522      48.9566   
1    London       -0.2276      51.4072       -0.2276      51.6072   
2    Moscow       37.5173      55.6558       37.5173      55.8558   
3  Istanbul       28.8784      40.9082       28.8784      41.1082

   longitude_c3  latitude_c3  longitude_c4  latitude_c4  
0        2.4522      48.9566        2.4522      48.7566  
1       -0.0276      51.6072       -0.0276      51.4072  
2       37.7173      55.8558       37.7173      55.6558  
3       29.0784      41.1082       29.0784      40.9082

To proceed, we can utilize the shapely library to develop a function that accepts four points and generates a polygon shape:

from shapely.geometry import Polygon

def create_polygon(x):

    p1 = [x['longitude_c1'],x['latitude_c1']]
    p2 = [x['longitude_c2'],x['latitude_c2']]
    p3 = [x['longitude_c3'],x['latitude_c3']]
    p4 = [x['longitude_c4'],x['latitude_c4']]

    pixel_polygon = Polygon([p1,p2,p3,p4])

    return pixel_polygon

We then apply this function to our pandas dataframe and use it as the geometry column for our GeoDataFrame.

df.apply(create_polygon, axis=1)

This will give us the following output:

0    POLYGON ((2.2522 48.7566, 2.2522 48.9566, 2.45...
1    POLYGON ((-0.2276 51.4072, -0.2276 51.6072, -0...
2    POLYGON ((37.5173 55.6558, 37.5173 55.8558, 37...
3    POLYGON ((28.8784 40.9082, 28.8784 41.1082, 29...
dtype: object

In the following code, we have modified our function to return a polygon shape using the shapely library. We then apply this function to our dataframe and assign it as the 'geometry' column:

df['geometry'] = df.apply(create_polygon, axis=1)

This will give us the following output:

  city_name  longitude_c1  latitude_c1  longitude_c2  latitude_c2  \
0     Paris        2.2522      48.7566        2.2522      48.9566   
1    London       -0.2276      51.4072       -0.2276      51.6072   
2    Moscow       37.5173      55.6558       37.5173      55.8558   
3  Istanbul       28.8784      40.9082       28.8784      41.1082

   longitude_c3  latitude_c3  longitude_c4  latitude_c4  \
0        2.4522      48.9566        2.4522      48.7566   
1       -0.0276      51.6072       -0.0276      51.4072   
2       37.7173      55.8558       37.7173      55.6558   
3       29.0784      41.1082       29.0784      40.9082

                                            geometry  
0  POLYGON ((2.2522 48.7566, 2.2522 48.9566, 2.45...  
1  POLYGON ((-0.2276 51.4072, -0.2276 51.6072, -0...  
2  POLYGON ((37.5173 55.6558, 37.5173 55.8558, 37...  
3  POLYGON ((28.8784 40.9082, 28.8784 41.1082, 29...

Next, we can utilize our existing DataFrame to generate a GeoPandas DataFrame:

import geopandas

gdf = geopandas.GeoDataFrame(
    df, 
    geometry=df['geometry'], 
    crs="EPSG:4326"
)

Here, we have utilized the geometry column of our dataframe to create a GeoDataFrame. We have also specified the Coordinate Reference System (CRS) as EPSG:4326, which corresponds to the standard WGS84 coordinate system and is commonly used for geographical data:

  city_name  longitude_c1  latitude_c1  longitude_c2  latitude_c2  \
0     Paris        2.2522      48.7566        2.2522      48.9566   
1    London       -0.2276      51.4072       -0.2276      51.6072   
2    Moscow       37.5173      55.6558       37.5173      55.8558   
3  Istanbul       28.8784      40.9082       28.8784      41.1082

   longitude_c3  latitude_c3  longitude_c4  latitude_c4  \
0        2.4522      48.9566        2.4522      48.7566   
1       -0.0276      51.6072       -0.0276      51.4072   
2       37.7173      55.8558       37.7173      55.6558   
3       29.0784      41.1082       29.0784      40.9082

                                            geometry  
0  POLYGON ((2.25220 48.75660, 2.25220 48.95660, ...  
1  POLYGON ((-0.22760 51.40720, -0.22760 51.60720...  
2  POLYGON ((37.51730 55.65580, 37.51730 55.85580...  
3  POLYGON ((28.87840 40.90820, 28.87840 41.10820...

Utilizing MulitPoint as geometric shapes

A different type of shape is the MultiPoint, which represents multiple points in space. This is particularly useful when your dataset includes closely located data points, like a cluster of cities or buildings.

In order to create a MultiPoint, we can leverage the shapely library and develop a function that can accept multiple points:

from shapely.geometry import MultiPoint

def create_multipoint(x):

    p1 = [x['longitude_c1'],x['latitude_c1']]
    p2 = [x['longitude_c2'],x['latitude_c2']]
    p3 = [x['longitude_c3'],x['latitude_c3']]
    p4 = [x['longitude_c4'],x['latitude_c4']]

    return MultiPoint([p1,p2,p3,p4])

We then apply this function to our dataframe and assign it as the 'geometry' column:

df['geometry'] = df.apply(create_multipoint, axis=1)

Next, we can utilize our existing DataFrame to generate a GeoPandas DataFrame:

import geopandas

gdf = geopandas.GeoDataFrame(
    df, 
    geometry=df['geometry'], 
    crs="EPSG:4326"
)

The resulting GeoPandas DataFrame will contain all the columns from our original dataframe, with the addition of a 'geometry' column that contains our multipoint geometries:

  city_name  longitude_c1  latitude_c1  longitude_c2  latitude_c2  \
0     Paris        2.2522      48.7566        2.2522      48.9566   
1    London       -0.2276      51.4072       -0.2276      51.6072   
2    Moscow       37.5173      55.6558       37.5173      55.8558   
3  Istanbul       28.8784      40.9082       28.8784      41.1082

   longitude_c3  latitude_c3  longitude_c4  latitude_c4  \
0        2.4522      48.9566        2.4522      48.7566   
1       -0.0276      51.6072       -0.0276      51.4072   
2       37.7173      55.8558       37.7173      55.6558   
3       29.0784      41.1082       29.0784      40.9082

                                            geometry  
0  MULTIPOINT (2.25220 48.75660, 2.25220 48.95660...  
1  MULTIPOINT (-0.22760 51.40720, -0.22760 51.607...  
2  MULTIPOINT (37.51730 55.65580, 37.51730 55.855...  
3  MULTIPOINT (28.87840 40.90820, 28.87840 41.108...

Utilizing multipoints as geometric shapes is just one of many techniques that can be used to enhance the visualization of geographical data. There are also other libraries such as Folium and Cartopy that provide more advanced mapping capabilities.

References