To apply a function to a column of a DataFrame in pandas, you can use the apply()
method. This method takes the function that you want to apply. Examples
Synthetic data
To start, let's generate a DataFrame using synthetic data::
import pandas as pd
import numpy as np
data = np.arange(1,13)
data = data.reshape(3,4)
df = pd.DataFrame(data=data,columns=['a','b','c','d'])
The code displayed above will generate:
a b c d
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
Modifying data contained in a Dataframe's columns
Basic operations
Let's explore modifying elements of column b as an example:
>>> df['b']
0 2
1 6
2 10
If we want to add 10 to all the elements in column b, it's easy - just do this:
>>> df['b'] = df['b'] + 10
>>> df
a b c d
0 1 12 3 4
1 5 16 7 8
2 9 20 11 12
Another example that multiplies all the elements of column b by two is simple too:
>>> df['b'] = df['b']*2
>>> df
a b c d
0 1 24 3 4
1 5 32 7 8
2 9 40 11 12
Using apply()
To apply a more complicated function such as a square root for example, a solution is to use the pandas function apply():
>>> df['b'].apply(np.sqrt)
0 4.898979
1 5.656854
2 6.324555
Name: b, dtype: float64
A more complicated example:
df.apply(lambda x: np.sqrt(x) if x.name == 'b' else x)
a b c d
0 1 4.898979 3 4
1 5 5.656854 7 8
2 9 6.324555 11 12
Using apply() with a custom function
Example by defining its own function:
def myfunc(x):
return x**2 + 2*x + 3
df['b'].apply(myfunc)
The code displayed above will then generate:
0 627
1 1091
2 1683
Apply several functions to an individual column
In addition to applying a single function to a column, it is also possible to apply a list of functions with the apply()
method. You can do this by passing in a list of functions as the first argument. For example, if you want to apply both the abs()
and sqrt()
functions to the ‘b’ column of your DataFrame, you can use the following code:
df['b'].apply([abs, np.sqrt])
returns
abs sqrt
0 2 1.414214
1 6 2.449490
2 10 3.162278
Note that the apply()
method can also be used on multiple columns at once. To apply a single function to more than one column, you can pass in a list of columns as the second argument to the apply()
method. For example, if you want to apply the abs()
function to both the ‘Age’ and ‘Height’ columns of your DataFrame, you can use the following code:
df[['a','b']].apply([abs, np.sqrt])
returns
a b
abs sqrt abs sqrt
0 1 1.000000 2 1.414214
1 5 2.236068 6 2.449490
2 9 3.000000 10 3.162278
Using apply() with a custom function on multiple columns
To start, let's generate a DataFrame using synthetic data:
import pandas as pd
import numpy as np
import random
np.random.seed(42)
d = {'longitude':[random.uniform(0,1) * (360) - 180 for i in range(20)],
'latitude':[random.uniform(0,1) * (180) - 90 for i in range(20)]}
df = pd.DataFrame(data=d)
the code displayed above will generate for example:
longitude latitude
0 167.067264 -39.476855
1 28.979819 1.683163
2 -124.513305 -60.536511
3 114.754113 55.249186
4 12.346310 45.642464
5 30.275235 -37.494582
6 135.649712 53.335008
7 175.366444 -30.276944
8 171.082997 -63.158958
9 63.046690 63.781048
10 -113.276502 -65.871205
11 -138.539111 -36.767633
12 -169.591685 -56.994853
13 3.381588 -15.315441
14 138.534118 58.519542
15 -161.543564 56.212077
16 -142.353835 46.069997
17 72.519566 30.986494
18 83.485540 -38.041282
19 68.728968 -18.316078
In order to apply a function across multiple columns, you will need to create a custom function which is capable of accepting multiple inputs and returning multiple outputs:
def spatial_aggregation(x):
resolution = 1.0
latitude_idx = int( (x['latitude']+90.0) / resolution )
longitude_idx = int( (x['longitude']+180) / resolution )
return longitude_idx, latitude_idx
Pandas provides an efficient way to apply a function to multiple columns of a DataFrame, thus creating several new columns.
This can be done using the DataFrame.apply() method which takes in the desired function as its first argument and returns a pandas object with the newly-created variables. The apply() method also has an optional axis argument with the default value of 0, which specifies how the function should be applied (either column-wise or row-wise), Example:
df[['latitude_agg', 'longitude_agg']] = df.apply(spatial_aggregation, axis=1, result_type='expand')
the code displayed above will generate for example:
longitude latitude latitude_agg longitude_agg
0 167.067264 -39.476855 347 50
1 28.979819 1.683163 208 91
2 -124.513305 -60.536511 55 29
3 114.754113 55.249186 294 145
4 12.346310 45.642464 192 135
5 30.275235 -37.494582 210 52
6 135.649712 53.335008 315 143
7 175.366444 -30.276944 355 59
8 171.082997 -63.158958 351 26
9 63.046690 63.781048 243 153
10 -113.276502 -65.871205 66 24
11 -138.539111 -36.767633 41 53
12 -169.591685 -56.994853 10 33
13 3.381588 -15.315441 183 74
14 138.534118 58.519542 318 148
15 -161.543564 56.212077 18 146
16 -142.353835 46.069997 37 136
17 72.519566 30.986494 252 120
18 83.485540 -38.041282 263 51
19 68.728968 -18.316078 248 71
When using the apply() method, it is important to ensure that the desired output has the same number of rows as the dataframe. If this is not done, then a ValueError will be thrown.
References
Links | Site |
---|---|
Apply a function to a single column in Dataframe | thispointer.com |
pandas.DataFrame.apply | pandas doc |