Examples of how to edit a pandas dataframe column values where a condition is verified in python:
1 -- Create a simple dataframe with pandas
Lets' start by creating a simple dataframe with 5 columns and 20 rows:
>>> import pandas as pd
>>> import numpy as np
>>> data = np.arange(1,101)
>>> data = data.reshape(20,5)
>>> df = pd.DataFrame(data=data,columns=['a','b','c','d','e'])
returns
>>> df
a b c d e
0 1 2 3 4 5
1 6 7 8 9 10
2 11 12 13 14 15
3 16 17 18 19 20
4 21 22 23 24 25
5 26 27 28 29 30
6 31 32 33 34 35
7 36 37 38 39 40
8 41 42 43 44 45
9 46 47 48 49 50
10 51 52 53 54 55
11 56 57 58 59 60
12 61 62 63 64 65
13 66 67 68 69 70
14 71 72 73 74 75
15 76 77 78 79 80
16 81 82 83 84 85
17 86 87 88 89 90
18 91 92 93 94 95
19 96 97 98 99 100
2 -- Select a column
To select a given column, a solution is to do:
>>> df['b']
0 2
1 7
2 12
3 17
4 22
5 27
6 32
7 37
8 42
9 47
10 52
11 57
12 62
13 67
14 72
15 77
16 82
17 87
18 92
19 97
Name: b, dtype: int64
or
>>> df.loc[:,'b']
0 2
1 7
2 12
3 17
4 22
5 27
6 32
7 37
8 42
9 47
10 52
11 57
12 62
13 67
14 72
15 77
16 82
17 87
18 92
19 97
Name: b, dtype: int64
Note: if the column does not have a name, there is the function iloc():
>>> df.iloc[:,1]
0 2
1 7
2 12
3 17
4 22
5 27
6 32
7 37
8 42
9 47
10 52
11 57
12 62
13 67
14 72
15 77
16 82
17 87
18 92
19 97
To edit (for example here by multiplying by 10) all elements of the column b, a solution. is to do:
>>> df['b'] = df['b'] * 10
>>> df
a b c d e
0 1 20 3 4 5
1 6 70 8 9 10
2 11 120 13 14 15
3 16 170 18 19 20
4 21 220 23 24 25
5 26 270 28 29 30
6 31 320 33 34 35
7 36 370 38 39 40
8 41 420 43 44 45
9 46 470 48 49 50
10 51 520 53 54 55
11 56 570 58 59 60
12 61 620 63 64 65
13 66 670 68 69 70
14 71 720 73 74 75
15 76 770 78 79 80
16 81 820 83 84 85
17 86 870 88 89 90
18 91 920 93 94 95
19 96 970 98 99 100
3 -- Select only elements of the column where a condition is verified
For example to edit only the values that are greater than 500:
>>> df['b'][ df['b'] > 500 ] = df['b'][ df['b'] > 500 ] / 10
>>> df
a b c d e
0 1 20 3 4 5
1 6 70 8 9 10
2 11 120 13 14 15
3 16 170 18 19 20
4 21 220 23 24 25
5 26 270 28 29 30
6 31 320 33 34 35
7 36 370 38 39 40
8 41 420 43 44 45
9 46 470 48 49 50
10 51 52 53 54 55
11 56 57 58 59 60
12 61 62 63 64 65
13 66 67 68 69 70
14 71 72 73 74 75
15 76 77 78 79 80
16 81 82 83 84 85
17 86 87 88 89 90
18 91 92 93 94 95
19 96 97 98 99 100
Note: if the name of the column is not specify all columns will be modified, illustration:
>>> df[ df['b'] > 500 ] = df[ df['b'] > 500 ] / 10
>>> df
a b c d e
0 1.0 20.0 3.0 4.0 5.0
1 6.0 70.0 8.0 9.0 10.0
2 11.0 120.0 13.0 14.0 15.0
3 16.0 170.0 18.0 19.0 20.0
4 21.0 220.0 23.0 24.0 25.0
5 26.0 270.0 28.0 29.0 30.0
6 31.0 320.0 33.0 34.0 35.0
7 36.0 370.0 38.0 39.0 40.0
8 41.0 420.0 43.0 44.0 45.0
9 46.0 470.0 48.0 49.0 50.0
10 5.1 52.0 5.3 5.4 5.5
11 5.6 57.0 5.8 5.9 6.0
12 6.1 62.0 6.3 6.4 6.5
13 6.6 67.0 6.8 6.9 7.0
14 7.1 72.0 7.3 7.4 7.5
15 7.6 77.0 7.8 7.9 8.0
16 8.1 82.0 8.3 8.4 8.5
17 8.6 87.0 8.8 8.9 9.0
18 9.1 92.0 9.3 9.4 9.5
19 9.6 97.0 9.8 9.9 10.0
4 -- Select only elements of the column where multiple conditions are verified
Example lets select the elements greater than 12 and lower than 40 in the column b:
>>> df['b'][ (df['b'] > 12) & (df['b'] < 40) ]
3 17
4 22
5 27
6 32
7 37
Edit those elements by adding 5
>>> df['b'][ (df['b'] > 12) & (df['b'] < 40) ] = df['b'][ (df['b'] > 12) & (df['b'] < 40) ] + 5
>>> df
a b c d e
0 1 2 3 4 5
1 6 7 8 9 10
2 11 12 13 14 15
3 16 22 18 19 20
4 21 27 23 24 25
5 26 32 28 29 30
6 31 37 33 34 35
7 36 42 38 39 40
8 41 42 43 44 45
9 46 47 48 49 50
10 51 52 53 54 55
11 56 57 58 59 60
12 61 62 63 64 65
13 66 67 68 69 70
14 71 72 73 74 75
15 76 77 78 79 80
16 81 82 83 84 85
17 86 87 88 89 90
18 91 92 93 94 95
19 96 97 98 99 100