An example of how to get the max value of two or more columns in a pandas dataframe ?
Create a simple dataframe with pandas
Let's create a simple dataframe
import pandas as pdimport numpy as npimport randoma = [random.choice(['a','b','c']) for i in range(20)]c1 = np.random.uniform(0,1, size=20)c2 = np.random.uniform(0,1, size=20)c3 = np.random.uniform(0,1, size=20)data = {'a':a,'c1':c1,'c2':c2,'c3':c3}df = pd.DataFrame(data)print(df)
returns for example
a c1 c2 c30 b 0.480835 0.108703 0.0628931 b 0.469707 0.441162 0.5847762 a 0.942992 0.254551 0.7506893 c 0.634768 0.048194 0.8109344 b 0.185650 0.551316 0.1264575 c 0.661850 0.178760 0.3218016 c 0.739558 0.878052 0.1964507 a 0.344361 0.267702 0.4018208 a 0.430181 0.300013 0.6749599 b 0.412139 0.295223 0.79903310 a 0.101919 0.810593 0.66557711 c 0.036506 0.705999 0.18386012 a 0.832732 0.288746 0.99419213 b 0.474459 0.574548 0.53934814 b 0.322733 0.015939 0.14228815 c 0.108360 0.447548 0.47468616 c 0.111654 0.048010 0.65197017 c 0.360656 0.155344 0.85424918 a 0.404405 0.034052 0.61872819 b 0.234892 0.052290 0.889191
Find the max value for each row
To get the max value between the columns ['c1','c2','c3'] a solution is to use pandas.DataFrame.max:
df[['c1','c2','c3']].max(axis=1)
returns
0 0.4808351 0.5847762 0.9429923 0.8109344 0.5513165 0.6618506 0.8780527 0.4018208 0.6749599 0.79903310 0.81059311 0.70599912 0.99419213 0.57454814 0.32273315 0.47468616 0.65197017 0.85424918 0.61872819 0.889191dtype: float64
To append a new column to the dataframe with the max value:
df['max value'] = df[['c1','c2','c3']].max(axis=1)
returns
a c1 c2 c3 max value0 b 0.480835 0.108703 0.062893 0.4808351 b 0.469707 0.441162 0.584776 0.5847762 a 0.942992 0.254551 0.750689 0.9429923 c 0.634768 0.048194 0.810934 0.8109344 b 0.185650 0.551316 0.126457 0.5513165 c 0.661850 0.178760 0.321801 0.6618506 c 0.739558 0.878052 0.196450 0.8780527 a 0.344361 0.267702 0.401820 0.4018208 a 0.430181 0.300013 0.674959 0.6749599 b 0.412139 0.295223 0.799033 0.79903310 a 0.101919 0.810593 0.665577 0.81059311 c 0.036506 0.705999 0.183860 0.70599912 a 0.832732 0.288746 0.994192 0.99419213 b 0.474459 0.574548 0.539348 0.57454814 b 0.322733 0.015939 0.142288 0.32273315 c 0.108360 0.447548 0.474686 0.47468616 c 0.111654 0.048010 0.651970 0.65197017 c 0.360656 0.155344 0.854249 0.85424918 a 0.404405 0.034052 0.618728 0.61872819 b 0.234892 0.052290 0.889191 0.889191
Find the name of the column that contains the max value
To get the name of the column that contains the max value, a solution is to use pandas.DataFrame.idxmax
df[['c1','c2','c3']].idxmax(axis=1)
returns
0 c11 c32 c13 c34 c25 c16 c27 c38 c39 c310 c211 c212 c313 c214 c115 c316 c317 c318 c319 c3dtype: object
Create a new column to store the results:
df['col name'] = df[['c1','c2','c3']].idxmax(axis=1)
returns
a c1 c2 c3 max value col name0 b 0.480835 0.108703 0.062893 0.480835 c11 b 0.469707 0.441162 0.584776 0.584776 c32 a 0.942992 0.254551 0.750689 0.942992 c13 c 0.634768 0.048194 0.810934 0.810934 c34 b 0.185650 0.551316 0.126457 0.551316 c25 c 0.661850 0.178760 0.321801 0.661850 c16 c 0.739558 0.878052 0.196450 0.878052 c27 a 0.344361 0.267702 0.401820 0.401820 c38 a 0.430181 0.300013 0.674959 0.674959 c39 b 0.412139 0.295223 0.799033 0.799033 c310 a 0.101919 0.810593 0.665577 0.810593 c211 c 0.036506 0.705999 0.183860 0.705999 c212 a 0.832732 0.288746 0.994192 0.994192 c313 b 0.474459 0.574548 0.539348 0.574548 c214 b 0.322733 0.015939 0.142288 0.322733 c115 c 0.108360 0.447548 0.474686 0.474686 c316 c 0.111654 0.048010 0.651970 0.651970 c317 c 0.360656 0.155344 0.854249 0.854249 c318 a 0.404405 0.034052 0.618728 0.618728 c319 b 0.234892 0.052290 0.889191 0.889191 c3
