Introduction
Creating a GeoDataFrame from a Pandas DataFrame involves adding a column of geometric data, which typically represents shapes such as points, lines, or polygons.
Here’s a step-by-step guide using Python and the libraries Pandas and GeoPandas:
Creating a GeoDataFrame with a point based on latitude and longitude as its geometry
When creating a geodataframe, the most common scenario is to have a column with points as its geometry. This can be easily accomplished using the geopandas function points_from_xy():
Creating a Pandas DataFrame containing columns for longitudes and latitudes
As an example, let's consider a scenario where we have data stored in a dictionary containing latitude and longitude values for different cities:
data = {'city_name':['Paris','London','Moscow', 'Istanbul'],
'longitude':[2.3522,-0.1276,37.6173,28.9784],
'latitude':[48.8566,51.5072,55.7558,41.0082]}
To store our data in a Pandas DataFrame, we can use the following code:
import pandas as pd
df = pd.DataFrame(data)
print(df)
The above code will display
city_name longitude latitude
0 Paris 2.3522 48.8566
1 London -0.1276 51.5072
2 Moscow 37.6173 55.7558
3 Istanbul 28.9784 41.0082
Converting the pandas DataFrame into a GeoDataFrame using points_from_xy() function
What is points_from_xy ?: points_from_xy is a function within the geopandas library that converts a series of latitude and longitude coordinates into a GeoSeries with Point geometries. This is a convenient way to quickly create a geometry column for our GeoDataFrame without having to manually construct each individual point.
To utilize the points_from_xy function, we first import the necessary libraries: pandas and geopandas. This function allows us to create a geopandas dataframe by providing a series of latitude and longitude coordinates. Let's examine an example:
import geopandas
gdf = geopandas.GeoDataFrame(
df,
geometry=geopandas.points_from_xy(df.longitude, df.latitude),
crs="EPSG:4326"
)
By executing the code, we will generate the following geopandas dataframe with geometry points represented by longitude and latitude:
city_name longitude latitude geometry
0 Paris 2.3522 48.8566 POINT (2.35220 48.85660)
1 London -0.1276 51.5072 POINT (-0.12760 51.50720)
2 Moscow 37.6173 55.7558 POINT (37.61730 55.75580)
3 Istanbul 28.9784 41.0082 POINT (28.97840 41.00820)
This approach allows us to efficiently work with geospatial data using geopandas in Python.
Generating a GeoDataFrame from a Pandas DataFrame with complex geometries
So far, we have seen a straightforward example that is frequently used. Now, let's explore how to work with more complex geometries, such as polygons or multipoints.
Utilizing polygons as geometric shapes
Polygons are another type of geometric shape that can be used in a GeoDataFrame. They consist of a series of connected points, with the final point connecting back to the first point to create a closed shape. This allows us to represent more complex boundaries and areas.
To generate a GeoDataFrame from polygons, we will use the same steps as before, but this time we will provide the coordinates for each point in the polygon. Let's continue using our previous example of cities:
data = {'city_name':['Paris','London','Moscow', 'Istanbul'],
'longitude_c1':[2.3522-0.1,-0.1276-0.1,37.6173-0.1,28.9784-0.1],
'latitude_c1':[48.8566-0.1,51.5072-0.1,55.7558-0.1,41.0082-0.1],
'longitude_c2':[2.3522-0.1,-0.1276-0.1,37.6173-0.1,28.9784-0.1],
'latitude_c2':[48.8566+0.1,51.5072+0.1,55.7558+0.1,41.0082+0.1],
'longitude_c3':[2.3522+0.1,-0.1276+0.1,37.6173+0.1,28.9784+0.1],
'latitude_c3':[48.8566+0.1,51.5072+0.1,55.7558+0.1,41.0082+0.1],
'longitude_c4':[2.3522+0.1,-0.1276+0.1,37.6173+0.1,28.9784+0.1],
'latitude_c4':[48.8566-0.1,51.5072-0.1,55.7558-0.1,41.0082-0.1]
}
Here, we have added four points for each city to represent a polygon shape. This will create a total of four coordinates for each city, forming a square around the city center.
Now, we shall proceed to generate a pandas dataframe:
import pandas as pd
df = pd.DataFrame(data)
print(df)
This will give us the following output:
city_name longitude_c1 latitude_c1 longitude_c2 latitude_c2 \
0 Paris 2.2522 48.7566 2.2522 48.9566
1 London -0.2276 51.4072 -0.2276 51.6072
2 Moscow 37.5173 55.6558 37.5173 55.8558
3 Istanbul 28.8784 40.9082 28.8784 41.1082
longitude_c3 latitude_c3 longitude_c4 latitude_c4
0 2.4522 48.9566 2.4522 48.7566
1 -0.0276 51.6072 -0.0276 51.4072
2 37.7173 55.8558 37.7173 55.6558
3 29.0784 41.1082 29.0784 40.9082
To proceed, we can utilize the shapely library to develop a function that accepts four points and generates a polygon shape:
from shapely.geometry import Polygon
def create_polygon(x):
p1 = [x['longitude_c1'],x['latitude_c1']]
p2 = [x['longitude_c2'],x['latitude_c2']]
p3 = [x['longitude_c3'],x['latitude_c3']]
p4 = [x['longitude_c4'],x['latitude_c4']]
pixel_polygon = Polygon([p1,p2,p3,p4])
return pixel_polygon
We then apply this function to our pandas dataframe and use it as the geometry column for our GeoDataFrame.
df.apply(create_polygon, axis=1)
This will give us the following output:
0 POLYGON ((2.2522 48.7566, 2.2522 48.9566, 2.45...
1 POLYGON ((-0.2276 51.4072, -0.2276 51.6072, -0...
2 POLYGON ((37.5173 55.6558, 37.5173 55.8558, 37...
3 POLYGON ((28.8784 40.9082, 28.8784 41.1082, 29...
dtype: object
In the following code, we have modified our function to return a polygon shape using the shapely library. We then apply this function to our dataframe and assign it as the 'geometry' column:
df['geometry'] = df.apply(create_polygon, axis=1)
This will give us the following output:
city_name longitude_c1 latitude_c1 longitude_c2 latitude_c2 \
0 Paris 2.2522 48.7566 2.2522 48.9566
1 London -0.2276 51.4072 -0.2276 51.6072
2 Moscow 37.5173 55.6558 37.5173 55.8558
3 Istanbul 28.8784 40.9082 28.8784 41.1082
longitude_c3 latitude_c3 longitude_c4 latitude_c4 \
0 2.4522 48.9566 2.4522 48.7566
1 -0.0276 51.6072 -0.0276 51.4072
2 37.7173 55.8558 37.7173 55.6558
3 29.0784 41.1082 29.0784 40.9082
geometry
0 POLYGON ((2.2522 48.7566, 2.2522 48.9566, 2.45...
1 POLYGON ((-0.2276 51.4072, -0.2276 51.6072, -0...
2 POLYGON ((37.5173 55.6558, 37.5173 55.8558, 37...
3 POLYGON ((28.8784 40.9082, 28.8784 41.1082, 29...
Next, we can utilize our existing DataFrame to generate a GeoPandas DataFrame:
import geopandas
gdf = geopandas.GeoDataFrame(
df,
geometry=df['geometry'],
crs="EPSG:4326"
)
Here, we have utilized the geometry column of our dataframe to create a GeoDataFrame. We have also specified the Coordinate Reference System (CRS) as EPSG:4326, which corresponds to the standard WGS84 coordinate system and is commonly used for geographical data:
city_name longitude_c1 latitude_c1 longitude_c2 latitude_c2 \
0 Paris 2.2522 48.7566 2.2522 48.9566
1 London -0.2276 51.4072 -0.2276 51.6072
2 Moscow 37.5173 55.6558 37.5173 55.8558
3 Istanbul 28.8784 40.9082 28.8784 41.1082
longitude_c3 latitude_c3 longitude_c4 latitude_c4 \
0 2.4522 48.9566 2.4522 48.7566
1 -0.0276 51.6072 -0.0276 51.4072
2 37.7173 55.8558 37.7173 55.6558
3 29.0784 41.1082 29.0784 40.9082
geometry
0 POLYGON ((2.25220 48.75660, 2.25220 48.95660, ...
1 POLYGON ((-0.22760 51.40720, -0.22760 51.60720...
2 POLYGON ((37.51730 55.65580, 37.51730 55.85580...
3 POLYGON ((28.87840 40.90820, 28.87840 41.10820...
Utilizing MulitPoint as geometric shapes
A different type of shape is the MultiPoint, which represents multiple points in space. This is particularly useful when your dataset includes closely located data points, like a cluster of cities or buildings.
In order to create a MultiPoint, we can leverage the shapely library and develop a function that can accept multiple points:
from shapely.geometry import MultiPoint
def create_multipoint(x):
p1 = [x['longitude_c1'],x['latitude_c1']]
p2 = [x['longitude_c2'],x['latitude_c2']]
p3 = [x['longitude_c3'],x['latitude_c3']]
p4 = [x['longitude_c4'],x['latitude_c4']]
return MultiPoint([p1,p2,p3,p4])
We then apply this function to our dataframe and assign it as the 'geometry' column:
df['geometry'] = df.apply(create_multipoint, axis=1)
Next, we can utilize our existing DataFrame to generate a GeoPandas DataFrame:
import geopandas
gdf = geopandas.GeoDataFrame(
df,
geometry=df['geometry'],
crs="EPSG:4326"
)
The resulting GeoPandas DataFrame will contain all the columns from our original dataframe, with the addition of a 'geometry' column that contains our multipoint geometries:
city_name longitude_c1 latitude_c1 longitude_c2 latitude_c2 \
0 Paris 2.2522 48.7566 2.2522 48.9566
1 London -0.2276 51.4072 -0.2276 51.6072
2 Moscow 37.5173 55.6558 37.5173 55.8558
3 Istanbul 28.8784 40.9082 28.8784 41.1082
longitude_c3 latitude_c3 longitude_c4 latitude_c4 \
0 2.4522 48.9566 2.4522 48.7566
1 -0.0276 51.6072 -0.0276 51.4072
2 37.7173 55.8558 37.7173 55.6558
3 29.0784 41.1082 29.0784 40.9082
geometry
0 MULTIPOINT (2.25220 48.75660, 2.25220 48.95660...
1 MULTIPOINT (-0.22760 51.40720, -0.22760 51.607...
2 MULTIPOINT (37.51730 55.65580, 37.51730 55.855...
3 MULTIPOINT (28.87840 40.90820, 28.87840 41.108...
Utilizing multipoints as geometric shapes is just one of many techniques that can be used to enhance the visualization of geographical data. There are also other libraries such as Folium and Cartopy that provide more advanced mapping capabilities.
References
Links | Site |
---|---|
Creating a GeoDataFrame from a DataFrame with coordinates | geopandas.org |
Data structures | geopandas.org |
points_from_xy | geopandas.org |
How to apply a function across multiple columns in a pandas DataFrame and create several new ones ? | en.moonbooks.org |
How to apply a function to a DataFrame row with pandas in python ? | en.moonbooks.org |
geopandas.GeoDataFrame.to_crs | geopandas.org |
geopandas.GeoDataFrame.crs | geopandas.org |