Examples of how to replace dataframe columns by another dataframe columns with pandas
Dataframe 1
Let's create a first dataframe with pandas
import pandas as pd
import random
import numpy as np
data = np.arange(40)
data = data.reshape((10,4))
categorical_data = ['M', 'M', 'F', 'F', 'F']
gender_list = [random.choice(categorical_data) for i in range( data.shape[0] )]
label_list = [random.choice([0,1]) for i in range( data.shape[0] )]
df = pd.DataFrame(data,columns=['A','B','C','D'])
df['Gender'] = gender_list
df['Label'] = label_list
gives
A B C D Gender Label
0 0 1 2 3 M 0
1 4 5 6 7 F 1
2 8 9 10 11 F 1
3 12 13 14 15 F 0
4 16 17 18 19 F 1
5 20 21 22 23 F 1
6 24 25 26 27 M 1
7 28 29 30 31 F 1
8 32 33 34 35 F 0
9 36 37 38 39 M 0
Dataframe 2
Let's now create another dataframe:
data = np.random.randint(-100,0,size=(10,4))
label_list = [random.choice([0,1]) for i in range( data.shape[0] )]
df2 = pd.DataFrame(data,columns=['A','B','C','D'])
gives for example
A B C D
0 -41 -53 -38 -14
1 -28 -87 -70 -49
2 -49 -70 -18 -27
3 -26 -90 -74 -24
4 -48 -28 -14 -40
5 -9 -53 -84 -11
6 -62 -26 -35 -90
7 -5 -16 -45 -73
8 -97 -66 -61 -30
9 -14 -51 -79 -60
Replacing dataframe 1 columns by dataframe 2 columns
To replace dataframe 1 columns by dataframe 2 columns, a solution is to do:
columns=['A','B','C','D']
df[columns] = df2[columns]
gives then
A B C D Gender Label
0 -41 -53 -38 -14 M 0
1 -28 -87 -70 -49 F 1
2 -49 -70 -18 -27 F 1
3 -26 -90 -74 -24 F 0
4 -48 -28 -14 -40 F 1
5 -9 -53 -84 -11 F 1
6 -62 -26 -35 -90 M 1
7 -5 -16 -45 -73 F 1
8 -97 -66 -61 -30 F 0
9 -14 -51 -79 -60 M 0
Dataframe 2 with randomly shuffled indexes
Another example: let's shuffle dataframe indexes:
df2 = df2.sample(frac=1)
gives
A B C D
1 -28 -87 -70 -49
5 -9 -53 -84 -11
8 -97 -66 -61 -30
7 -5 -16 -45 -73
3 -26 -90 -74 -24
9 -14 -51 -79 -60
4 -48 -28 -14 -40
6 -62 -26 -35 -90
0 -41 -53 -38 -14
2 -49 -70 -18 -27
and
df[columns] = df2[columns]
still gives
A B C D Gender Label
0 -41 -53 -38 -14 M 0
1 -28 -87 -70 -49 F 1
2 -49 -70 -18 -27 F 1
3 -26 -90 -74 -24 F 0
4 -48 -28 -14 -40 F 1
5 -9 -53 -84 -11 F 1
6 -62 -26 -35 -90 M 1
7 -5 -16 -45 -73 F 1
8 -97 -66 -61 -30 F 0
9 -14 -51 -79 -60 M 0
Dataframe 2 with less rows than dataframe 1
Now if dataframe 2 has less rows than dataframe 1
df2 = df2.sample(frac=0.2)
gives for example
A B C D
3 -26 -90 -74 -24
0 -41 -53 -38 -14
and
print(df2.index)
gives
Int64Index([3, 0], dtype='int64')
A solution is to do:
df.loc[df2.index,columns] = df2[columns]
gives
A B C D Gender Label
0 -41 -53 -38 -14 M 0
1 4 5 6 7 F 1
2 8 9 10 11 F 1
3 -26 -90 -74 -24 F 0
4 16 17 18 19 F 1
5 20 21 22 23 F 1
6 24 25 26 27 M 1
7 28 29 30 31 F 1
8 32 33 34 35 F 0
9 36 37 38 39 M 0
Dataframe 2 with different column name
Another example with dataframe 2 with different column name:
data = np.random.randint(-100,0,size=(10,4))
label_list = [random.choice([0,1]) for i in range( data.shape[0] )]
df2 = pd.DataFrame(data,columns=['E','F','G','H'])
df[columns] = df2[['E','F','G','H']]
still gives
A B C D Gender Label
0 -65 -24 -93 -51 M 0
1 -21 -20 -100 -81 F 1
2 -44 -70 -95 -48 F 1
3 -46 -91 -82 -32 F 0
4 -42 -66 -6 -54 F 1
5 -71 -18 -3 -100 F 1
6 -10 -17 -18 -52 M 1
7 -11 -46 -56 -98 F 1
8 -65 -40 -11 -56 F 0
9 -16 -55 -5 -48 M 0