Introduction
Geopandas is a popular open-source library used for working with geospatial data in Python. It is built on top of the widely used Pandas library and adds support for handling geographic data types. One common task when working with geospatial data is filtering a dataframe to only include points that fall within a specific polygon. This can be useful for various applications.
In this article, let's explore how to filter a Geopandas dataframe that contains cities with latitude and longitude. Our goal is to narrow down this dataframe to include only cities located in France. We will be using a polygon of France to filter the cities that fall within its boundaries.
Create a Geopandas dataframe
First, let's create a geopandas dataframe with rows corresponding to the positions of cities:
import pandas as pd
import geopandas as gpd
df = pd.DataFrame(
{
"City": ["Paris", "New-York", "London", "Marseille", "Dijon", "Bordeaux"],
"Latitude": [48.8566, 40.7128, 51.5072, 43.2965, 47.3220, 44.8378],
"Longitude": [2.3522, -74.0060, -0.1276, 5.3698, 5.0415, -0.5792],
}
)
gdf = gpd.GeoDataFrame(
df, geometry=gpd.points_from_xy(df.Longitude, df.Latitude), crs="EPSG:4326"
)
gdf
gives
City Latitude Longitude geometry
0 Paris 48.8566 2.3522 POINT (2.35220 48.85660)
1 New-York 40.7128 -74.0060 POINT (-74.00600 40.71280)
2 London 51.5072 -0.1276 POINT (-0.12760 51.50720)
3 Marseille 43.2965 5.3698 POINT (5.36980 43.29650)
4 Dijon 47.3220 5.0415 POINT (5.04150 47.32200)
5 Bordeaux 44.8378 -0.5792 POINT (-0.57920 44.83780)
Please note that it is always important to check the coordinate system used in your geopandas dataframe. To do this, you can use gdf.crs.
gdf.crs
By default, the coordinate system used is the WGS 84 or EPSG:4326, which is a flat square system:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
As you start working with geospatial data and geopandas, it is important to understand the concept of coordinate systems and their significance. A coordinate system is a reference framework used to represent locations on Earth's surface. It consists of a set of coordinates that specify the exact position of a point on a map or globe.
Understanding Coordinate Systems
There are many different coordinate systems used in geospatial data, each with its own unique properties and uses. The most common ones include geographic coordinate systems (GCS) and projected coordinate systems (PCS). GCS uses latitude and longitude to define locations on Earth's surface, while PCS uses a flat grid system with x and y coordinates.
It is essential to know the coordinate system used in your geopandas dataframe, as it affects how the data is interpreted and displayed. For example, if you have data in GCS, distances will be measured in degrees, which are not a uniform unit of measurement. In contrast, PCS uses units such as meters or feet, making it easier to perform spatial analysis.
Create a polygon
Next, we will create a polygon representing Metropolitan France using shapely:
from shapely.geometry import Polygon
coords = ( (-4.592349819344779, 42.34338471126569),
(-4.592349819344779, 51.14850617126183),
(8.099278598674744, 51.14850617126183),
(8.099278598674744, 42.343384711265697) )
Metropolitan_France = Polygon( coords )
Filtering for Points within a Polygon
Now, let's say we want to find all the points in France. To filter points within a specific polygon a solution is to use the within()
function:
gdf.within( Metropolitan_France )
0 True
1 False
2 False
3 True
4 True
5 True
dtype: bool
We can now utilize this to filter our dataframe and retrieve only the rows that fall within the polygon:
gdf[ gdf.within( Metropolitan_France ) ]
gives
City Latitude Longitude geometry
0 Paris 48.8566 2.3522 POINT (2.35220 48.85660)
3 Marseille 43.2965 5.3698 POINT (5.36980 43.29650)
4 Dijon 47.3220 5.0415 POINT (5.04150 47.32200)
5 Bordeaux 44.8378 -0.5792 POINT (-0.57920 44.83780)
Another example using Natural Earth 110m cultural dataset
In the example above, we used a simple rectangular shape to define metropolitan France. However, for a more accurate representation, we can utilize an advanced dataset from naturalearthdata, which provides a more precise polygon to define metropolitan France (see also How to retrieve country name for a given latitude and longitude using geopandas and naturalearthdata ? )
df_110m_cultural = gpd.read_file('110m_cultural')
To select row corresponding to France:
df_110m_cultural[ df_110m_cultural[ 'ADMIN' ] == 'France' ]
Now we can extract France geometries :
fr_geometries = df_110m_cultural['geometry'][ df_110m_cultural[ 'ADMIN' ] == 'France' ]
fr_geometries
will print:
43 MULTIPOLYGON (((-51.65780 4.15623, -52.24934 3...
Name: geometry, dtype: geometry
Create a list that contains all the geometries:
geoms = [g for g in fr_geometries.iloc[0].geoms]
To extract the geometry corresponding to metropolitan France:
geoms[1]
To filter points within geoms[1]:
gdf.within( geoms[1] )
0 True
1 False
2 False
3 True
4 True
5 True
dtype: bool
We can now utilize this to filter our dataframe and retrieve only the rows that fall within the polygon:
gdf[ gdf.within( geoms[1] ) ]
also gives
City Latitude Longitude geometry
0 Paris 48.8566 2.3522 POINT (2.35220 48.85660)
3 Marseille 43.2965 5.3698 POINT (5.36980 43.29650)
4 Dijon 47.3220 5.0415 POINT (5.04150 47.32200)
5 Bordeaux 44.8378 -0.5792 POINT (-0.57920 44.83780)
Importance of CRS
The classification of a point as being within or outside a polygon in a geopandas dataframe can vary depending on the Coordinate Reference System (CRS) used. This phenomenon is discussed in detail in the following article.:
How to determine if a point, specified by its latitude and longitude, falls within a given area using the shapely library in Python ?
Note that geopandas dataframe CRS can be updated using set_crs.
References
Links | Site |
---|---|
Metropolitan France | en.wikipedia.org |
geopandas.GeoSeries.within | geopandas.org |
geopandas.GeoSeries.contains | geopandas.org |
Point in Polygon using Geopandas | github.io |
How to find the intersection between two geographic areas in python using shapely and cartopy ? | moonbooks.org |
How to check geometry type (MultiPolygon or Polygon) with Geopandas ? | moonbooks.org |
How to create and plot polygons in python using shapely and matplotlib ? | moonbooks.org |