How to filter a geopandas dataframe for points that fall within a specified polygon ?


Introduction

Geopandas is a popular open-source library used for working with geospatial data in Python. It is built on top of the widely used Pandas library and adds support for handling geographic data types. One common task when working with geospatial data is filtering a dataframe to only include points that fall within a specific polygon. This can be useful for various applications.

In this article, let's explore how to filter a Geopandas dataframe that contains cities with latitude and longitude. Our goal is to narrow down this dataframe to include only cities located in France. We will be using a polygon of France to filter the cities that fall within its boundaries.

Create a Geopandas dataframe

First, let's create a geopandas dataframe with rows corresponding to the positions of cities:

import pandas as pd
import geopandas as gpd

df = pd.DataFrame(
    {
        "City": ["Paris", "New-York", "London", "Marseille", "Dijon", "Bordeaux"],
        "Latitude": [48.8566, 40.7128, 51.5072, 43.2965, 47.3220, 44.8378],
        "Longitude": [2.3522, -74.0060, -0.1276, 5.3698, 5.0415, -0.5792],
    }
)

gdf = gpd.GeoDataFrame(
    df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude), crs="EPSG:4326"
)


gdf

gives

        City  Latitude  Longitude                    geometry
0      Paris   48.8566     2.3522    POINT (2.35220 48.85660)
1   New-York   40.7128   -74.0060  POINT (-74.00600 40.71280)
2     London   51.5072    -0.1276   POINT (-0.12760 51.50720)
3  Marseille   43.2965     5.3698    POINT (5.36980 43.29650)
4      Dijon   47.3220     5.0415    POINT (5.04150 47.32200)
5   Bordeaux   44.8378    -0.5792   POINT (-0.57920 44.83780)

Please note that it is always important to check the coordinate system used in your geopandas dataframe. To do this, you can use gdf.crs.

gdf.crs

By default, the coordinate system used is the WGS 84 or EPSG:4326, which is a flat square system:

<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

As you start working with geospatial data and geopandas, it is important to understand the concept of coordinate systems and their significance. A coordinate system is a reference framework used to represent locations on Earth's surface. It consists of a set of coordinates that specify the exact position of a point on a map or globe.
Understanding Coordinate Systems

There are many different coordinate systems used in geospatial data, each with its own unique properties and uses. The most common ones include geographic coordinate systems (GCS) and projected coordinate systems (PCS). GCS uses latitude and longitude to define locations on Earth's surface, while PCS uses a flat grid system with x and y coordinates.

It is essential to know the coordinate system used in your geopandas dataframe, as it affects how the data is interpreted and displayed. For example, if you have data in GCS, distances will be measured in degrees, which are not a uniform unit of measurement. In contrast, PCS uses units such as meters or feet, making it easier to perform spatial analysis.

Create a polygon

Next, we will create a polygon representing Metropolitan France using shapely:

from shapely.geometry import Polygon

coords = ( (-4.592349819344779, 42.34338471126569),
           (-4.592349819344779, 51.14850617126183),
           (8.099278598674744, 51.14850617126183),
           (8.099278598674744, 42.343384711265697) )

Metropolitan_France = Polygon( coords )

Filtering for Points within a Polygon

Now, let's say we want to find all the points in France. To filter points within a specific polygon a solution is to use the within() function:

gdf.within( Metropolitan_France )


0     True
1    False
2    False
3     True
4     True
5     True
dtype: bool

We can now utilize this to filter our dataframe and retrieve only the rows that fall within the polygon:

gdf[ gdf.within( Metropolitan_France ) ]

gives

        City  Latitude  Longitude                   geometry
0      Paris   48.8566     2.3522   POINT (2.35220 48.85660)
3  Marseille   43.2965     5.3698   POINT (5.36980 43.29650)
4      Dijon   47.3220     5.0415   POINT (5.04150 47.32200)
5   Bordeaux   44.8378    -0.5792  POINT (-0.57920 44.83780)

Another example using Natural Earth 110m cultural dataset

In the example above, we used a simple rectangular shape to define metropolitan France. However, for a more accurate representation, we can utilize an advanced dataset from naturalearthdata, which provides a more precise polygon to define metropolitan France (see also How to retrieve country name for a given latitude and longitude using geopandas and naturalearthdata ? )

df_110m_cultural = gpd.read_file('110m_cultural')

To select row corresponding to France:

df_110m_cultural[ df_110m_cultural[ 'ADMIN' ] == 'France' ]

Now we can extract France geometries :

fr_geometries = df_110m_cultural['geometry'][ df_110m_cultural[ 'ADMIN' ] == 'France' ]

fr_geometries

will print:

43    MULTIPOLYGON (((-51.65780 4.15623, -52.24934 3...
Name: geometry, dtype: geometry

Create a list that contains all the geometries:

geoms = [g for g in fr_geometries.iloc[0].geoms]

To extract the geometry corresponding to metropolitan France:

geoms[1]

To filter points within geoms[1]:

gdf.within( geoms[1] )

0     True
1    False
2    False
3     True
4     True
5     True
dtype: bool

We can now utilize this to filter our dataframe and retrieve only the rows that fall within the polygon:

gdf[ gdf.within( geoms[1] ) ]

also gives

        City  Latitude  Longitude                   geometry
0      Paris   48.8566     2.3522   POINT (2.35220 48.85660)
3  Marseille   43.2965     5.3698   POINT (5.36980 43.29650)
4      Dijon   47.3220     5.0415   POINT (5.04150 47.32200)
5   Bordeaux   44.8378    -0.5792  POINT (-0.57920 44.83780)

Importance of CRS

The classification of a point as being within or outside a polygon in a geopandas dataframe can vary depending on the Coordinate Reference System (CRS) used. This phenomenon is discussed in detail in the following article.:
How to determine if a point, specified by its latitude and longitude, falls within a given area using the shapely library in Python ?

How to determine if a point, specified by its latitude and longitude, falls within a given area using the shapely library in Python ?
How to determine if a point, specified by its latitude and longitude, falls within a given area using the shapely library in Python ?

How to determine if a point, specified by its latitude and longitude, falls within a given area using the shapely library in Python ?
How to determine if a point, specified by its latitude and longitude, falls within a given area using the shapely library in Python ?

Note that geopandas dataframe CRS can be updated using set_crs.

References

Image

of