Example of how to combine two pandas dataframes and sum values of the shared columns:
Create two dataframes
The first step is to define the two dataframes to be combined:
import pandas as pd
data1 = {'longitude':[0,30,60,90,120,180],
'latitude':[0,30,30,60,60,90],
'ice cloud counts':[0,0,0,0,0,0],
'liquid cloud counts':[0,0,0,0,0,0]}
df1 = pd.DataFrame(data1)
print(df1)
the above code will generate the following dataframe:
longitude latitude ice cloud counts liquid cloud counts
0 0 0 0 0
1 30 30 0 0
2 60 30 0 0
3 90 60 0 0
4 120 60 0 0
5 180 90 0 0
Let's create another dataframe:
data2 = {'longitude':[0,60,120,180],
'latitude':[0,30,60,90],
'ice cloud counts':[70,20,10,100],
'liquid cloud counts':[40,100,50,200]}
df2 = pd.DataFrame(data2)
print(df2)
returns:
longitude latitude ice cloud counts liquid cloud counts
0 0 0 70 40
1 60 30 20 100
2 120 60 10 50
3 180 90 100 200
Sum the values of the shared columns
The task is to merge rows of the dataframes df1 and df2 in function of the longitude and latitude, and sum up the values of the matching columns. One solution is to use a sequence of pandas functions including concat(), group_by(), sum(), and reset_index():
df_agg = pd.concat([df1, df2]).groupby(['longitude', 'latitude']).sum().reset_index()
print( df_agg )
gives
longitude latitude ice cloud counts liquid cloud counts
0 0 0 70 40
1 30 30 0 0
2 60 30 20 100
3 90 60 0 0
4 120 60 10 50
5 180 90 100 200
Step-by-step
If you're not familiar with the pandas functions, the formulae above can be difficult to comprehend initially. To clarify, we'll now demonstrate how we arrived at the final output in a step-by-step manner:
Using concat()
pd.concat([df1, df2])
returns
longitude latitude ice cloud counts liquid cloud counts
0 0 0 0 0
1 30 30 0 0
2 60 30 0 0
3 90 60 0 0
4 120 60 0 0
5 180 90 0 0
0 0 0 70 40
1 60 30 20 100
2 120 60 10 50
3 180 90 100 200
Using groupby() with sum()
pd.concat([df1, df2]).groupby(['longitude', 'latitude']).sum()
returns
ice cloud counts liquid cloud counts
longitude latitude
0 0 70 40
30 30 0 0
60 30 20 100
90 60 0 0
120 60 10 50
180 90 100 200
Using reset_index()
df_agg = pd.concat([df1, df2]).groupby(['longitude', 'latitude']).sum().reset_index()
returns
longitude latitude ice cloud counts liquid cloud counts
0 0 0 70 40
1 30 30 0 0
2 60 30 20 100
3 90 60 0 0
4 120 60 10 50
5 180 90 100 200
References
Links | Site |
---|---|
concat | pandas.pydata.org |
sum | pandas.pydata.org |
groupby | pandas.pydata.org |
reset_index | pandas.pydata.org |