How to apply a function across multiple columns in a pandas DataFrame and create several new ones ?


When working with a pandas DataFrame, it is often necessary to apply a single function across multiple columns in order to create several new ones. Fortunately, this process can be easily achieved by utilizing the .apply() method. This allows for quick and efficient data transformation as well as the ability to perform computationally intensive operations quickly and efficiently. Examples:

Synthetic data

To start, let's generate a DataFrame using synthetic data:

    import pandas as pd
    import numpy as np
    import random

    np.random.seed(42)

    d = {'longitude':[random.uniform(0,1) * (360) - 180 for i in range(20)], 
         'latitude':[random.uniform(0,1) * (180) - 90 for i in range(20)]}

    df = pd.DataFrame(data=d)

the code displayed above will generate for example:

         longitude   latitude
    0   167.067264 -39.476855
    1    28.979819   1.683163
    2  -124.513305 -60.536511
    3   114.754113  55.249186
    4    12.346310  45.642464
    5    30.275235 -37.494582
    6   135.649712  53.335008
    7   175.366444 -30.276944
    8   171.082997 -63.158958
    9    63.046690  63.781048
    10 -113.276502 -65.871205
    11 -138.539111 -36.767633
    12 -169.591685 -56.994853
    13    3.381588 -15.315441
    14  138.534118  58.519542
    15 -161.543564  56.212077
    16 -142.353835  46.069997
    17   72.519566  30.986494
    18   83.485540 -38.041282
    19   68.728968 -18.316078

Create a function with multiple outputs

In order to apply a function across multiple columns, you will need to create a custom function which is capable of accepting multiple inputs and returning multiple outputs:

def spatial_aggregation(x):
    resolution = 1.0
    latitude_idx = int( (x['latitude']+90.0) / resolution )
    longitude_idx = int( (x['longitude']+180) / resolution )
    return longitude_idx, latitude_idx

Apply function to a DataFrame

Pandas provides an efficient way to apply a function to multiple columns of a DataFrame, thus creating several new columns.

This can be done using the DataFrame.apply() method which takes in the desired function as its first argument and returns a pandas object with the newly-created variables. The apply() method also has an optional axis argument with the default value of 0, which specifies how the function should be applied (either column-wise or row-wise), Example:

    df[['latitude_agg', 'longitude_agg']] = df.apply(spatial_aggregation, axis=1, result_type='expand')

the code displayed above will generate for example:

         longitude   latitude  latitude_agg  longitude_agg
    0   167.067264 -39.476855           347             50
    1    28.979819   1.683163           208             91
    2  -124.513305 -60.536511            55             29
    3   114.754113  55.249186           294            145
    4    12.346310  45.642464           192            135
    5    30.275235 -37.494582           210             52
    6   135.649712  53.335008           315            143
    7   175.366444 -30.276944           355             59
    8   171.082997 -63.158958           351             26
    9    63.046690  63.781048           243            153
    10 -113.276502 -65.871205            66             24
    11 -138.539111 -36.767633            41             53
    12 -169.591685 -56.994853            10             33
    13    3.381588 -15.315441           183             74
    14  138.534118  58.519542           318            148
    15 -161.543564  56.212077            18            146
    16 -142.353835  46.069997            37            136
    17   72.519566  30.986494           252            120
    18   83.485540 -38.041282           263             51
    19   68.728968 -18.316078           248             71

When using the apply() method, it is important to ensure that the desired output has the same number of rows as the dataframe. If this is not done, then a ValueError will be thrown.

References