Examples of how to replace dataframe columns by another dataframe columns with pandas
Dataframe 1
Let's create a first dataframe with pandas
import pandas as pdimport randomimport numpy as npdata = np.arange(40)data = data.reshape((10,4))categorical_data = ['M', 'M', 'F', 'F', 'F']gender_list = [random.choice(categorical_data) for i in range( data.shape[0] )]label_list = [random.choice([0,1]) for i in range( data.shape[0] )]df = pd.DataFrame(data,columns=['A','B','C','D'])df['Gender'] = gender_listdf['Label'] = label_list
gives
A B C D Gender Label0 0 1 2 3 M 01 4 5 6 7 F 12 8 9 10 11 F 13 12 13 14 15 F 04 16 17 18 19 F 15 20 21 22 23 F 16 24 25 26 27 M 17 28 29 30 31 F 18 32 33 34 35 F 09 36 37 38 39 M 0
Dataframe 2
Let's now create another dataframe:
data = np.random.randint(-100,0,size=(10,4))label_list = [random.choice([0,1]) for i in range( data.shape[0] )]df2 = pd.DataFrame(data,columns=['A','B','C','D'])
gives for example
A B C D0 -41 -53 -38 -141 -28 -87 -70 -492 -49 -70 -18 -273 -26 -90 -74 -244 -48 -28 -14 -405 -9 -53 -84 -116 -62 -26 -35 -907 -5 -16 -45 -738 -97 -66 -61 -309 -14 -51 -79 -60
Replacing dataframe 1 columns by dataframe 2 columns
To replace dataframe 1 columns by dataframe 2 columns, a solution is to do:
columns=['A','B','C','D']df[columns] = df2[columns]
gives then
A B C D Gender Label0 -41 -53 -38 -14 M 01 -28 -87 -70 -49 F 12 -49 -70 -18 -27 F 13 -26 -90 -74 -24 F 04 -48 -28 -14 -40 F 15 -9 -53 -84 -11 F 16 -62 -26 -35 -90 M 17 -5 -16 -45 -73 F 18 -97 -66 -61 -30 F 09 -14 -51 -79 -60 M 0
Dataframe 2 with randomly shuffled indexes
Another example: let's shuffle dataframe indexes:
df2 = df2.sample(frac=1)
gives
A B C D1 -28 -87 -70 -495 -9 -53 -84 -118 -97 -66 -61 -307 -5 -16 -45 -733 -26 -90 -74 -249 -14 -51 -79 -604 -48 -28 -14 -406 -62 -26 -35 -900 -41 -53 -38 -142 -49 -70 -18 -27
and
df[columns] = df2[columns]
still gives
A B C D Gender Label0 -41 -53 -38 -14 M 01 -28 -87 -70 -49 F 12 -49 -70 -18 -27 F 13 -26 -90 -74 -24 F 04 -48 -28 -14 -40 F 15 -9 -53 -84 -11 F 16 -62 -26 -35 -90 M 17 -5 -16 -45 -73 F 18 -97 -66 -61 -30 F 09 -14 -51 -79 -60 M 0
Dataframe 2 with less rows than dataframe 1
Now if dataframe 2 has less rows than dataframe 1
df2 = df2.sample(frac=0.2)
gives for example
A B C D3 -26 -90 -74 -240 -41 -53 -38 -14
and
print(df2.index)
gives
Int64Index([3, 0], dtype='int64')
A solution is to do:
df.loc[df2.index,columns] = df2[columns]
gives
A B C D Gender Label0 -41 -53 -38 -14 M 01 4 5 6 7 F 12 8 9 10 11 F 13 -26 -90 -74 -24 F 04 16 17 18 19 F 15 20 21 22 23 F 16 24 25 26 27 M 17 28 29 30 31 F 18 32 33 34 35 F 09 36 37 38 39 M 0
Dataframe 2 with different column name
Another example with dataframe 2 with different column name:
data = np.random.randint(-100,0,size=(10,4))label_list = [random.choice([0,1]) for i in range( data.shape[0] )]df2 = pd.DataFrame(data,columns=['E','F','G','H'])df[columns] = df2[['E','F','G','H']]
still gives
A B C D Gender Label0 -65 -24 -93 -51 M 01 -21 -20 -100 -81 F 12 -44 -70 -95 -48 F 13 -46 -91 -82 -32 F 04 -42 -66 -6 -54 F 15 -71 -18 -3 -100 F 16 -10 -17 -18 -52 M 17 -11 -46 -56 -98 F 18 -65 -40 -11 -56 F 09 -16 -55 -5 -48 M 0
