How to Filter Geospatial Data to a Specific Country Using GeoPandas (U.S. Territory Example)

Introduction

Global geospatial datasets are commonly used in Earth observation, climate science, and environmental monitoring. However, many projects require focusing on a specific region such as the United States.

This article presents a practical workflow to filter a global dataset and retain only observations located within U.S. territory using GeoPandas. The example is based on a real use case involving VIIRS fire data aggregated using H3 spatial indexing.

GeoPandas DataFrame

Use case

A global dataset is available as a Pandas DataFrame containing H3 indices. The objective is to extract only observations located within U.S. territory for further analysis.

The dataset used in this example is an annual aggregation of VIIRS EFIRE thermal anomaly data.

Load dataset

1
2
3
4
5
6
import pandas as pd
import geopandas as gpd

df_viirs_annual_agg = pd.read_parquet(
    f'/Volumes/HD15TB/Datasets/VIIRS_EFIRE_Thermal_Anomaly_Frequency_Distribution/{platform}/{year}_EFIRE_Thermal_Anomaly_Frequency_Distribution.parquet'
)

Convert H3 indices to latitude / longitude

The dataset contains H3 cells but no geographic coordinates. Conversion to latitude and longitude is required.

1
2
3
4
import h3

df_viirs_annual_agg['lat'] = df_viirs_annual_agg['h3_cell'].apply(lambda h: h3.cell_to_latlng(h)[0])
df_viirs_annual_agg['lon'] = df_viirs_annual_agg['h3_cell'].apply(lambda h: h3.cell_to_latlng(h)[1])

Convert to GeoDataFrame

1
2
3
4
5
gdf = gpd.GeoDataFrame(
    df_viirs_annual_agg,
    geometry=gpd.points_from_xy(df_viirs_annual_agg.lon, df_viirs_annual_agg.lat),
    crs="EPSG:4326"
)

Filtering Global Geospatial Data to U.S. Territory with GeoPandas
Filtering Global Geospatial Data to U.S. Territory with GeoPandas

At this stage, the dataset is geospatially enabled and ready for spatial operations.

Load U.S. Boundary Polygon

Using Natural Earth (built-in dataset)

1
2
3
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))

usa = world[world['name'] == 'United States of America']

Using higher-resolution Natural Earth data

For more accurate boundaries (recommended for research applications), download higher-resolution data.

Download source:

Example using a local shapefile:

1
2
3
4
5
world = gpd.read_file(
    "/Volumes/HD15TB/Datasets/Natural_Earth_Data/ne_110m_admin_0_countries/ne_110m_admin_0_countries.shp"
)

usa = world[world['ADMIN'] == 'United States of America']

Important notes

Included by default:

  • Continental U.S.
  • Alaska
  • Hawaii

Potential limitations:

  • Overseas territories (Puerto Rico, Guam, etc.) may be missing depending on dataset resolution

For full coverage, use:

1
2
world = gpd.read_file("ne_10m_admin_0_countries.shp")
usa = world[world['ADMIN'] == 'United States of America']

Spatial Filtering (Point-in-Polygon)

The filtering step uses a geometric containment test.

1
gdf_us = gdf[gdf.within(usa.unary_union)]

This operation keeps only points located inside the U.S. polygon.

Performance note

For large datasets (millions of points), consider:

  • Using spatial indexing (rtree or pygeos)
  • Pre-filtering with bounding boxes
  • Converting H3 cells to polygons for higher accuracy near borders

Visual Validation with Folium

A map is useful to verify that the filtering behaves as expected.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import folium

usa = usa.to_crs(epsg=4326)

m = folium.Map(location=[39.5, -98.5], zoom_start=4, tiles="CartoDB positron")

folium.GeoJson(
    usa,
    name="United States",
    style_function=lambda feature: {
        "fillColor": "blue",
        "color": "blue",
        "weight": 2,
        "fillOpacity": 0.2,
    },
    tooltip=folium.GeoJsonTooltip(fields=["ADMIN"], aliases=["Country:"])
).add_to(m)

folium.LayerControl().add_to(m)

m

Filtering Global Geospatial Data to U.S. Territory with GeoPandas
Filtering Global Geospatial Data to U.S. Territory with GeoPandas

Optional: overlay filtered points

1
2
3
4
5
6
7
8
for _, row in gdf_us.sample(10000).iterrows():
    folium.CircleMarker(
        location=[row["lat"], row["lon"]],
        radius=2,
        color="red",
        fill=True,
        fill_opacity=0.7
    ).add_to(m)

Filtering Global Geospatial Data to U.S. Territory with GeoPandas
Filtering Global Geospatial Data to U.S. Territory with GeoPandas

Faster Alternative (Approximate Filtering)

A simple bounding box can be used for quick filtering.

1
2
3
4
df_us = df_viirs_annual_agg[
    (df_viirs_annual_agg['lat'] >= 24) & (df_viirs_annual_agg['lat'] <= 72) &
    (df_viirs_annual_agg['lon'] >= -170) & (df_viirs_annual_agg['lon'] <= -65)
]

Limitations

  • Includes ocean regions
  • Includes parts of Canada and Mexico
  • Not suitable for scientific analysis

Best Practices

  • Use GeoPandas spatial filtering for accuracy
  • Use high-resolution Natural Earth data for better boundaries
  • Use H3 polygon intersection instead of centroids for precise edge handling
  • Cache results (is_us flag) to avoid recomputation

References

Topic Link
GeoPandas documentation https://geopandas.org
Natural Earth datasets https://www.naturalearthdata.com/downloads/
H3 documentation https://h3geo.org/docs
Folium documentation https://python-visualization.github.io/folium/
Shapely (geometry operations) https://shapely.readthedocs.io
Image

of