How to merge two pandas dataframes and sum values of the shared columns ?

Published: June 16, 2023

Updated: June 16, 2023

Tags: Python; Pandas; Dataframe;

DMCA.com Protection Status

Example of how to combine two pandas dataframes and sum values of the shared columns:

Create two dataframes

The first step is to define the two dataframes to be combined:

import pandas as pd

data1 = {'longitude':[0,30,60,90,120,180],
         'latitude':[0,30,30,60,60,90],
         'ice cloud counts':[0,0,0,0,0,0],
         'liquid cloud counts':[0,0,0,0,0,0]}

df1 = pd.DataFrame(data1)

print(df1)

the above code will generate the following dataframe:

   longitude  latitude  ice cloud counts  liquid cloud counts
0          0         0                 0                    0
1         30        30                 0                    0
2         60        30                 0                    0
3         90        60                 0                    0
4        120        60                 0                    0
5        180        90                 0                    0

Let's create another dataframe:

data2 = {'longitude':[0,60,120,180],
         'latitude':[0,30,60,90],
         'ice cloud counts':[70,20,10,100],
         'liquid cloud counts':[40,100,50,200]}

df2 = pd.DataFrame(data2)

print(df2)

returns:

   longitude  latitude  ice cloud counts  liquid cloud counts
0          0         0                70                   40
1         60        30                20                  100
2        120        60                10                   50
3        180        90               100                  200

Sum the values of the shared columns

The task is to merge rows of the dataframes df1 and df2 in function of the longitude and latitude, and sum up the values of the matching columns. One solution is to use a sequence of pandas functions including concat(), group_by(), sum(), and reset_index():

df_agg = pd.concat([df1, df2]).groupby(['longitude', 'latitude']).sum().reset_index()

print( df_agg )

gives

   longitude  latitude  ice cloud counts  liquid cloud counts
0          0         0                70                   40
1         30        30                 0                    0
2         60        30                20                  100
3         90        60                 0                    0
4        120        60                10                   50
5        180        90               100                  200

Step-by-step

If you're not familiar with the pandas functions, the formulae above can be difficult to comprehend initially. To clarify, we'll now demonstrate how we arrived at the final output in a step-by-step manner:

Using concat()

pd.concat([df1, df2])

returns

   longitude  latitude  ice cloud counts  liquid cloud counts
0          0         0                 0                    0
1         30        30                 0                    0
2         60        30                 0                    0
3         90        60                 0                    0
4        120        60                 0                    0
5        180        90                 0                    0
0          0         0                70                   40
1         60        30                20                  100
2        120        60                10                   50
3        180        90               100                  200

Using groupby() with sum()

pd.concat([df1, df2]).groupby(['longitude', 'latitude']).sum()

returns

                    ice cloud counts  liquid cloud counts
longitude latitude                                       
0         0                       70                   40
30        30                       0                    0
60        30                      20                  100
90        60                       0                    0
120       60                      10                   50
180       90                     100                  200

Using reset_index()

df_agg = pd.concat([df1, df2]).groupby(['longitude', 'latitude']).sum().reset_index()

returns

   longitude  latitude  ice cloud counts  liquid cloud counts
0          0         0                70                   40
1         30        30                 0                    0
2         60        30                20                  100
3         90        60                 0                    0
4        120        60                10                   50
5        180        90               100                  200

References

Links Site
concat pandas.pydata.org
sum pandas.pydata.org
groupby pandas.pydata.org
reset_index pandas.pydata.org