Example of how to combine two pandas dataframes and sum values of the shared columns:
Create two dataframes
The first step is to define the two dataframes to be combined:
import pandas as pddata1 = {'longitude':[0,30,60,90,120,180],'latitude':[0,30,30,60,60,90],'ice cloud counts':[0,0,0,0,0,0],'liquid cloud counts':[0,0,0,0,0,0]}df1 = pd.DataFrame(data1)print(df1)
the above code will generate the following dataframe:
longitude latitude ice cloud counts liquid cloud counts0 0 0 0 01 30 30 0 02 60 30 0 03 90 60 0 04 120 60 0 05 180 90 0 0
Let's create another dataframe:
data2 = {'longitude':[0,60,120,180],'latitude':[0,30,60,90],'ice cloud counts':[70,20,10,100],'liquid cloud counts':[40,100,50,200]}df2 = pd.DataFrame(data2)print(df2)
returns:
longitude latitude ice cloud counts liquid cloud counts0 0 0 70 401 60 30 20 1002 120 60 10 503 180 90 100 200
Sum the values of the shared columns
The task is to merge rows of the dataframes df1 and df2 in function of the longitude and latitude, and sum up the values of the matching columns. One solution is to use a sequence of pandas functions including concat(), group_by(), sum(), and reset_index():
df_agg = pd.concat([df1, df2]).groupby(['longitude', 'latitude']).sum().reset_index()print( df_agg )
gives
longitude latitude ice cloud counts liquid cloud counts0 0 0 70 401 30 30 0 02 60 30 20 1003 90 60 0 04 120 60 10 505 180 90 100 200
Step-by-step
If you're not familiar with the pandas functions, the formulae above can be difficult to comprehend initially. To clarify, we'll now demonstrate how we arrived at the final output in a step-by-step manner:
Using concat()
pd.concat([df1, df2])
returns
longitude latitude ice cloud counts liquid cloud counts0 0 0 0 01 30 30 0 02 60 30 0 03 90 60 0 04 120 60 0 05 180 90 0 00 0 0 70 401 60 30 20 1002 120 60 10 503 180 90 100 200
Using groupby() with sum()
pd.concat([df1, df2]).groupby(['longitude', 'latitude']).sum()
returns
ice cloud counts liquid cloud countslongitude latitude0 0 70 4030 30 0 060 30 20 10090 60 0 0120 60 10 50180 90 100 200
Using reset_index()
df_agg = pd.concat([df1, df2]).groupby(['longitude', 'latitude']).sum().reset_index()
returns
longitude latitude ice cloud counts liquid cloud counts0 0 0 70 401 30 30 0 02 60 30 20 1003 90 60 0 04 120 60 10 505 180 90 100 200
References
| Links | Site |
|---|---|
| concat | pandas.pydata.org |
| sum | pandas.pydata.org |
| groupby | pandas.pydata.org |
| reset_index | pandas.pydata.org |
