How to find the most frequent value in a pandas DataFrame column ?


Pandas offers several ways to find the most frequent value in a dataframe column. The simplest one is using the mode() method, which returns the most frequent value.

Let's first generate a pandas DataFrame:

import pandas as pd
import numpy as np
import random

np.random.seed(42)

data = np.random.randint(0,5,10)

data = { 'Age':np.random.randint(0,5,10)+20,
         'Gender':[np.random.choice(['Male', 'Female', 'Unknown']) for i in range(10)]
}

df = pd.DataFrame(data,columns=['Age','Gender'])

print(df)

Output

   Age   Gender
0   23   Female
1   22   Female
2   24     Male
3   21     Male
4   23     Male
5   21  Unknown
6   23  Unknown
7   24  Unknown
8   20   Female
9   23  Unknown

Using pandas method mode()

To find the most frequent value in the "Age" column:

df['Age'].mode()

Output

0    23
Name: Age, dtype: int64

To get the value:

df['Age'].mode()[0]

To give another example, consider the 'Gender' column containing categorical variables.

df['Gender'].mode()

Output

0    Unknown

Name: Gender, dtype: object

Using pandas value_counts()

Another useful approach is to use the value_counts() method. This gives you a complete list of values in a column, along with the number of times each one appears. You can then use this information to determine which value is the most frequent.

df['Age'].value_counts()

returns

    23    4
    24    2
    21    2
    22    1
    20    1
    Name: Age, dtype: int64

To access the most frequent value, a solution is to do:

df['Age'].value_counts().index[0]

gives

23

And to get the number of occurance of 23

df['Age'].value_counts().values[0]

gives

4

Using pandas groupby()

Finally, if your dataset contains multiple columns that contain duplicate values, it might be worthwhile to use the groupby() and size() methods to group similar values together and then find the most frequent one. This approach can be useful if you want to compare multiple columns at once.

Example:

df.groupby(by='Gender').agg('count')

returns

             Age
    Gender      
    Female     3
    Male       3
    Unknown    4

Sort values by Age:

dfg = df.groupby(by='Gender').agg('count')

dfg.sort_values(by='Age',ascending=False)

Extract first row:

dfg.sort_values(by='Age',ascending=False).reset_index().iloc[0]

returns

    Gender    Unknown
    Age             4
    Name: 0, dtype: object

References

Links Site
mode() pandas.pydata.org
value_counts() pandas.pydata.org
groupby() pandas.pydata.org