How to add multiple columns to a dataframe with pandas ?


Examples of how to add multiple columns to a dataframe with pandas:

Create a dataframe with pandas

Let's create a dataframe with pandas:

import pandas as pd
import numpy as np

data = np.random.randint(10, size=(5,3))

columns = ['Score A','Score B','Score C']

df = pd.DataFrame(data=data,columns=columns)

print(df)

returns for example

   Score A  Score B  Score C
0        1        5        5
1        3        9        2
2        5        9        3
3        8        6        2
4        4        7        6

Add a new column

Reminder: to add a single column to a dataframe, a straightforward solution is to do:

data = np.random.randint(10, size=(5,1))

df['Score D'] = data

print(df)

returns

   Score A  Score B  Score C  Score D
0        1        5        5        1
1        3        9        2        8
2        5        9        3        0
3        8        6        2        4
4        4        7        6        2

Add multiple columns

To add multiple columns in the same time, a solution is to use pandas.concat:

data = np.random.randint(10, size=(5,2))

columns = ['Score E','Score F']

df_add = pd.DataFrame(data=data,columns=columns)

print(df)

df = pd.concat([df,df_add], axis=1)

print(df)

returns

   Score A  Score B  Score C  Score D  Score E  Score F
0        1        5        5        1        4        3
1        3        9        2        8        7        3
2        5        9        3        0        7        0
3        8        6        2        4        5        7
4        4        7        6        2        6        4

Remove duplicate columns

Note: if you apply concat again:

df = pd.concat([df,df_add], axis=1)

it will add again the new columns:

   Score A  Score B  Score C  Score D  Score E  Score F  Score E  Score F
0        1        5        5        1        4        3       4        3
1        3        9        2        8        7        3        7        3
2        5        9        3        0        7        0        7        0
3        8        6        2        4        5        7        5        7
4        4        7        6        2        6        4        6        4

To remove column with same name just do:

df = df.loc[:,~df.columns.duplicated()]

returns

   Score A  Score B  Score C  Score D  Score E  Score F
0        1        5        5        1        4        3
1        3        9        2        8        7        3
2        5        9        3        0        7        0
3        8        6        2        4        5        7
4        4        7        6        2        6        4

References