How to select the rows of a dataframe using the indices of another dataframe with pandas in python ?


Example of how to select the rows of a dataframe from the indices of another dataframe with pandas in python

A first example

Select rows of a dataframe df1 using the indices of a dataframe df2:

df1.loc[df2.index]

A second example

Lets create a dataframe with pandas:

import pandas as pd
import numpy as np
import random

Surface = [random.choice(['Ocean','Snow','Desert', 'Forest']) for i in range(20)]
c1 = np.random.uniform(0,1, size=20)
c2 = np.random.uniform(0,1, size=20)
c3 = np.random.uniform(0,1, size=20)

data = {'Surface':Surface,
        'c1':c1, 
        'c2':c2, 
        'c3':c3}

df = pd.DataFrame(data)

print(df)

returns for example:

   Surface        c1        c2        c3
0     Snow  0.803453  0.293418  0.711185
1   Desert  0.152995  0.683300  0.158748
2   Desert  0.473076  0.854214  0.800504
3   Forest  0.733894  0.703959  0.824994
4     Snow  0.666677  0.266554  0.241821
5   Forest  0.329957  0.639363  0.815788
6   Forest  0.458453  0.327581  0.878686
7    Ocean  0.934025  0.270376  0.592077
8   Desert  0.978084  0.786852  0.889306
9   Forest  0.735452  0.700018  0.653053
10    Snow  0.662458  0.565464  0.779821
11  Desert  0.342549  0.527397  0.912509
12  Desert  0.718146  0.150439  0.639360
13   Ocean  0.629078  0.640988  0.850612
14  Forest  0.948290  0.348320  0.579737
15  Desert  0.888459  0.339470  0.715396
16   Ocean  0.685656  0.599464  0.277781
17  Desert  0.771131  0.344754  0.776899
18   Ocean  0.262968  0.628254  0.757189
19  Desert  0.161602  0.367706  0.941437

Create another dataframe by selecting randomly rows a the first dataframe:

df_sample = df[['c1','c2','c3']].sample(n=10, random_state = 42)

returns:

          c1        c2        c3
0   0.803453  0.293418  0.711185
17  0.771131  0.344754  0.776899
15  0.888459  0.339470  0.715396
1   0.152995  0.683300  0.158748
8   0.978084  0.786852  0.889306
5   0.329957  0.639363  0.815788
11  0.342549  0.527397  0.912509
3   0.733894  0.703959  0.824994
18  0.262968  0.628254  0.757189
16  0.685656  0.599464  0.277781

Lets now retrieve the column "surface" for the rows of the dataframe df_sample:

df.loc[df_sample.index]

gives

   Surface        c1        c2        c3
0     Snow  0.803453  0.293418  0.711185
17  Desert  0.771131  0.344754  0.776899
15  Desert  0.888459  0.339470  0.715396
1   Desert  0.152995  0.683300  0.158748
8   Desert  0.978084  0.786852  0.889306
5   Forest  0.329957  0.639363  0.815788
11  Desert  0.342549  0.527397  0.912509
3   Forest  0.733894  0.703959  0.824994
18   Ocean  0.262968  0.628254  0.757189
16   Ocean  0.685656  0.599464  0.277781

References