How to apply a function to an individual or multiple columns of a pandas DataFrame ?

Published: April 11, 2020

Updated: February 20, 2023

Tags: Pandas; python;

DMCA.com Protection Status

To apply a function to a column of a DataFrame in pandas, you can use the apply() method. This method takes the function that you want to apply. Examples

Synthetic data

To start, let's generate a DataFrame using synthetic data::

import pandas as pd
import numpy as np

    data = np.arange(1,13)
data = data.reshape(3,4)

    df = pd.DataFrame(data=data,columns=['a','b','c','d'])

The code displayed above will generate:

   a   b   c   d
0  1   2   3   4
1  5   6   7   8
2  9  10  11  12

Modifying data contained in a Dataframe's columns

Basic operations

Let's explore modifying elements of column b as an example:

>>> df['b']
0     2
1     6
2    10

If we want to add 10 to all the elements in column b, it's easy - just do this:

>>> df['b'] = df['b'] + 10
>>> df
   a   b   c   d
0  1  12   3   4
1  5  16   7   8
2  9  20  11  12

Another example that multiplies all the elements of column b by two is simple too:

>>> df['b'] = df['b']*2
>>> df
   a   b   c   d
0  1  24   3   4
1  5  32   7   8
2  9  40  11  12

Using apply()

To apply a more complicated function such as a square root for example, a solution is to use the pandas function apply():

>>> df['b'].apply(np.sqrt)
0    4.898979
1    5.656854
2    6.324555
Name: b, dtype: float64

A more complicated example:

df.apply(lambda x: np.sqrt(x) if x.name == 'b' else x)

   a         b   c   d
0  1  4.898979   3   4
1  5  5.656854   7   8
2  9  6.324555  11  12

Using apply() with a custom function

Example by defining its own function:

def myfunc(x):
    return x**2 + 2*x + 3

df['b'].apply(myfunc)

The code displayed above will then generate:

0     627
1    1091
2    1683

Apply several functions to an individual column

In addition to applying a single function to a column, it is also possible to apply a list of functions with the apply() method. You can do this by passing in a list of functions as the first argument. For example, if you want to apply both the abs() and sqrt() functions to the ‘b’ column of your DataFrame, you can use the following code:

df['b'].apply([abs, np.sqrt])

returns

   abs      sqrt
0    2  1.414214
1    6  2.449490
2   10  3.162278

Note that the apply() method can also be used on multiple columns at once. To apply a single function to more than one column, you can pass in a list of columns as the second argument to the apply() method. For example, if you want to apply the abs() function to both the ‘Age’ and ‘Height’ columns of your DataFrame, you can use the following code:

df[['a','b']].apply([abs, np.sqrt])

returns

    a             b          
  abs      sqrt abs      sqrt
0   1  1.000000   2  1.414214
1   5  2.236068   6  2.449490
2   9  3.000000  10  3.162278

Using apply() with a custom function on multiple columns

To start, let's generate a DataFrame using synthetic data:

    import pandas as pd
    import numpy as np
    import random

    np.random.seed(42)

    d = {'longitude':[random.uniform(0,1) * (360) - 180 for i in range(20)], 
         'latitude':[random.uniform(0,1) * (180) - 90 for i in range(20)]}

    df = pd.DataFrame(data=d)

the code displayed above will generate for example:

         longitude   latitude
    0   167.067264 -39.476855
    1    28.979819   1.683163
    2  -124.513305 -60.536511
    3   114.754113  55.249186
    4    12.346310  45.642464
    5    30.275235 -37.494582
    6   135.649712  53.335008
    7   175.366444 -30.276944
    8   171.082997 -63.158958
    9    63.046690  63.781048
    10 -113.276502 -65.871205
    11 -138.539111 -36.767633
    12 -169.591685 -56.994853
    13    3.381588 -15.315441
    14  138.534118  58.519542
    15 -161.543564  56.212077
    16 -142.353835  46.069997
    17   72.519566  30.986494
    18   83.485540 -38.041282
    19   68.728968 -18.316078

In order to apply a function across multiple columns, you will need to create a custom function which is capable of accepting multiple inputs and returning multiple outputs:

def spatial_aggregation(x):
    resolution = 1.0
    latitude_idx = int( (x['latitude']+90.0) / resolution )
    longitude_idx = int( (x['longitude']+180) / resolution )
    return longitude_idx, latitude_idx

Pandas provides an efficient way to apply a function to multiple columns of a DataFrame, thus creating several new columns.

This can be done using the DataFrame.apply() method which takes in the desired function as its first argument and returns a pandas object with the newly-created variables. The apply() method also has an optional axis argument with the default value of 0, which specifies how the function should be applied (either column-wise or row-wise), Example:

    df[['latitude_agg', 'longitude_agg']] = df.apply(spatial_aggregation, axis=1, result_type='expand')

the code displayed above will generate for example:

         longitude   latitude  latitude_agg  longitude_agg
    0   167.067264 -39.476855           347             50
    1    28.979819   1.683163           208             91
    2  -124.513305 -60.536511            55             29
    3   114.754113  55.249186           294            145
    4    12.346310  45.642464           192            135
    5    30.275235 -37.494582           210             52
    6   135.649712  53.335008           315            143
    7   175.366444 -30.276944           355             59
    8   171.082997 -63.158958           351             26
    9    63.046690  63.781048           243            153
    10 -113.276502 -65.871205            66             24
    11 -138.539111 -36.767633            41             53
    12 -169.591685 -56.994853            10             33
    13    3.381588 -15.315441           183             74
    14  138.534118  58.519542           318            148
    15 -161.543564  56.212077            18            146
    16 -142.353835  46.069997            37            136
    17   72.519566  30.986494           252            120
    18   83.485540 -38.041282           263             51
    19   68.728968 -18.316078           248             71

When using the apply() method, it is important to ensure that the desired output has the same number of rows as the dataframe. If this is not done, then a ValueError will be thrown.

References