Examples of how to edit a pandas dataframe column values where a condition is verified in python:
1 -- Create a simple dataframe with pandas
Lets' start by creating a simple dataframe with 5 columns and 20 rows:
>>> import pandas as pd>>> import numpy as np>>> data = np.arange(1,101)>>> data = data.reshape(20,5)>>> df = pd.DataFrame(data=data,columns=['a','b','c','d','e'])
returns
>>> dfa b c d e0 1 2 3 4 51 6 7 8 9 102 11 12 13 14 153 16 17 18 19 204 21 22 23 24 255 26 27 28 29 306 31 32 33 34 357 36 37 38 39 408 41 42 43 44 459 46 47 48 49 5010 51 52 53 54 5511 56 57 58 59 6012 61 62 63 64 6513 66 67 68 69 7014 71 72 73 74 7515 76 77 78 79 8016 81 82 83 84 8517 86 87 88 89 9018 91 92 93 94 9519 96 97 98 99 100
2 -- Select a column
To select a given column, a solution is to do:
>>> df['b']0 21 72 123 174 225 276 327 378 429 4710 5211 5712 6213 6714 7215 7716 8217 8718 9219 97Name: b, dtype: int64
or
>>> df.loc[:,'b']0 21 72 123 174 225 276 327 378 429 4710 5211 5712 6213 6714 7215 7716 8217 8718 9219 97Name: b, dtype: int64
Note: if the column does not have a name, there is the function iloc():
>>> df.iloc[:,1]0 21 72 123 174 225 276 327 378 429 4710 5211 5712 6213 6714 7215 7716 8217 8718 9219 97
To edit (for example here by multiplying by 10) all elements of the column b, a solution. is to do:
>>> df['b'] = df['b'] * 10>>> dfa b c d e0 1 20 3 4 51 6 70 8 9 102 11 120 13 14 153 16 170 18 19 204 21 220 23 24 255 26 270 28 29 306 31 320 33 34 357 36 370 38 39 408 41 420 43 44 459 46 470 48 49 5010 51 520 53 54 5511 56 570 58 59 6012 61 620 63 64 6513 66 670 68 69 7014 71 720 73 74 7515 76 770 78 79 8016 81 820 83 84 8517 86 870 88 89 9018 91 920 93 94 9519 96 970 98 99 100
3 -- Select only elements of the column where a condition is verified
For example to edit only the values that are greater than 500:
>>> df['b'][ df['b'] > 500 ] = df['b'][ df['b'] > 500 ] / 10>>> dfa b c d e0 1 20 3 4 51 6 70 8 9 102 11 120 13 14 153 16 170 18 19 204 21 220 23 24 255 26 270 28 29 306 31 320 33 34 357 36 370 38 39 408 41 420 43 44 459 46 470 48 49 5010 51 52 53 54 5511 56 57 58 59 6012 61 62 63 64 6513 66 67 68 69 7014 71 72 73 74 7515 76 77 78 79 8016 81 82 83 84 8517 86 87 88 89 9018 91 92 93 94 9519 96 97 98 99 100
Note: if the name of the column is not specify all columns will be modified, illustration:
>>> df[ df['b'] > 500 ] = df[ df['b'] > 500 ] / 10>>> dfa b c d e0 1.0 20.0 3.0 4.0 5.01 6.0 70.0 8.0 9.0 10.02 11.0 120.0 13.0 14.0 15.03 16.0 170.0 18.0 19.0 20.04 21.0 220.0 23.0 24.0 25.05 26.0 270.0 28.0 29.0 30.06 31.0 320.0 33.0 34.0 35.07 36.0 370.0 38.0 39.0 40.08 41.0 420.0 43.0 44.0 45.09 46.0 470.0 48.0 49.0 50.010 5.1 52.0 5.3 5.4 5.511 5.6 57.0 5.8 5.9 6.012 6.1 62.0 6.3 6.4 6.513 6.6 67.0 6.8 6.9 7.014 7.1 72.0 7.3 7.4 7.515 7.6 77.0 7.8 7.9 8.016 8.1 82.0 8.3 8.4 8.517 8.6 87.0 8.8 8.9 9.018 9.1 92.0 9.3 9.4 9.519 9.6 97.0 9.8 9.9 10.0
4 -- Select only elements of the column where multiple conditions are verified
Example lets select the elements greater than 12 and lower than 40 in the column b:
>>> df['b'][ (df['b'] > 12) & (df['b'] < 40) ]3 174 225 276 327 37
Edit those elements by adding 5
>>> df['b'][ (df['b'] > 12) & (df['b'] < 40) ] = df['b'][ (df['b'] > 12) & (df['b'] < 40) ] + 5>>> dfa b c d e0 1 2 3 4 51 6 7 8 9 102 11 12 13 14 153 16 22 18 19 204 21 27 23 24 255 26 32 28 29 306 31 37 33 34 357 36 42 38 39 408 41 42 43 44 459 46 47 48 49 5010 51 52 53 54 5511 56 57 58 59 6012 61 62 63 64 6513 66 67 68 69 7014 71 72 73 74 7515 76 77 78 79 8016 81 82 83 84 8517 86 87 88 89 9018 91 92 93 94 9519 96 97 98 99 100
