An example of how to get the max value of two or more columns in a pandas dataframe ?
Create a simple dataframe with pandas
Let's create a simple dataframe
import pandas as pd
import numpy as np
import random
a = [random.choice(['a','b','c']) for i in range(20)]
c1 = np.random.uniform(0,1, size=20)
c2 = np.random.uniform(0,1, size=20)
c3 = np.random.uniform(0,1, size=20)
data = {'a':a,
'c1':c1,
'c2':c2,
'c3':c3}
df = pd.DataFrame(data)
print(df)
returns for example
a c1 c2 c3
0 b 0.480835 0.108703 0.062893
1 b 0.469707 0.441162 0.584776
2 a 0.942992 0.254551 0.750689
3 c 0.634768 0.048194 0.810934
4 b 0.185650 0.551316 0.126457
5 c 0.661850 0.178760 0.321801
6 c 0.739558 0.878052 0.196450
7 a 0.344361 0.267702 0.401820
8 a 0.430181 0.300013 0.674959
9 b 0.412139 0.295223 0.799033
10 a 0.101919 0.810593 0.665577
11 c 0.036506 0.705999 0.183860
12 a 0.832732 0.288746 0.994192
13 b 0.474459 0.574548 0.539348
14 b 0.322733 0.015939 0.142288
15 c 0.108360 0.447548 0.474686
16 c 0.111654 0.048010 0.651970
17 c 0.360656 0.155344 0.854249
18 a 0.404405 0.034052 0.618728
19 b 0.234892 0.052290 0.889191
Find the max value for each row
To get the max value between the columns ['c1','c2','c3'] a solution is to use pandas.DataFrame.max:
df[['c1','c2','c3']].max(axis=1)
returns
0 0.480835
1 0.584776
2 0.942992
3 0.810934
4 0.551316
5 0.661850
6 0.878052
7 0.401820
8 0.674959
9 0.799033
10 0.810593
11 0.705999
12 0.994192
13 0.574548
14 0.322733
15 0.474686
16 0.651970
17 0.854249
18 0.618728
19 0.889191
dtype: float64
To append a new column to the dataframe with the max value:
df['max value'] = df[['c1','c2','c3']].max(axis=1)
returns
a c1 c2 c3 max value
0 b 0.480835 0.108703 0.062893 0.480835
1 b 0.469707 0.441162 0.584776 0.584776
2 a 0.942992 0.254551 0.750689 0.942992
3 c 0.634768 0.048194 0.810934 0.810934
4 b 0.185650 0.551316 0.126457 0.551316
5 c 0.661850 0.178760 0.321801 0.661850
6 c 0.739558 0.878052 0.196450 0.878052
7 a 0.344361 0.267702 0.401820 0.401820
8 a 0.430181 0.300013 0.674959 0.674959
9 b 0.412139 0.295223 0.799033 0.799033
10 a 0.101919 0.810593 0.665577 0.810593
11 c 0.036506 0.705999 0.183860 0.705999
12 a 0.832732 0.288746 0.994192 0.994192
13 b 0.474459 0.574548 0.539348 0.574548
14 b 0.322733 0.015939 0.142288 0.322733
15 c 0.108360 0.447548 0.474686 0.474686
16 c 0.111654 0.048010 0.651970 0.651970
17 c 0.360656 0.155344 0.854249 0.854249
18 a 0.404405 0.034052 0.618728 0.618728
19 b 0.234892 0.052290 0.889191 0.889191
Find the name of the column that contains the max value
To get the name of the column that contains the max value, a solution is to use pandas.DataFrame.idxmax
df[['c1','c2','c3']].idxmax(axis=1)
returns
0 c1
1 c3
2 c1
3 c3
4 c2
5 c1
6 c2
7 c3
8 c3
9 c3
10 c2
11 c2
12 c3
13 c2
14 c1
15 c3
16 c3
17 c3
18 c3
19 c3
dtype: object
Create a new column to store the results:
df['col name'] = df[['c1','c2','c3']].idxmax(axis=1)
returns
a c1 c2 c3 max value col name
0 b 0.480835 0.108703 0.062893 0.480835 c1
1 b 0.469707 0.441162 0.584776 0.584776 c3
2 a 0.942992 0.254551 0.750689 0.942992 c1
3 c 0.634768 0.048194 0.810934 0.810934 c3
4 b 0.185650 0.551316 0.126457 0.551316 c2
5 c 0.661850 0.178760 0.321801 0.661850 c1
6 c 0.739558 0.878052 0.196450 0.878052 c2
7 a 0.344361 0.267702 0.401820 0.401820 c3
8 a 0.430181 0.300013 0.674959 0.674959 c3
9 b 0.412139 0.295223 0.799033 0.799033 c3
10 a 0.101919 0.810593 0.665577 0.810593 c2
11 c 0.036506 0.705999 0.183860 0.705999 c2
12 a 0.832732 0.288746 0.994192 0.994192 c3
13 b 0.474459 0.574548 0.539348 0.574548 c2
14 b 0.322733 0.015939 0.142288 0.322733 c1
15 c 0.108360 0.447548 0.474686 0.474686 c3
16 c 0.111654 0.048010 0.651970 0.651970 c3
17 c 0.360656 0.155344 0.854249 0.854249 c3
18 a 0.404405 0.034052 0.618728 0.618728 c3
19 b 0.234892 0.052290 0.889191 0.889191 c3