Pandas offers several ways to find the most frequent value in a dataframe column. The simplest one is using the mode() method, which returns the most frequent value.
Let's first generate a pandas DataFrame:
import pandas as pd
import numpy as np
import random
np.random.seed(42)
data = np.random.randint(0,5,10)
data = { 'Age':np.random.randint(0,5,10)+20,
'Gender':[np.random.choice(['Male', 'Female', 'Unknown']) for i in range(10)]
}
df = pd.DataFrame(data,columns=['Age','Gender'])
print(df)
Output
Age Gender
0 23 Female
1 22 Female
2 24 Male
3 21 Male
4 23 Male
5 21 Unknown
6 23 Unknown
7 24 Unknown
8 20 Female
9 23 Unknown
Using pandas method mode()
To find the most frequent value in the "Age" column:
df['Age'].mode()
Output
0 23
Name: Age, dtype: int64
To get the value:
df['Age'].mode()[0]
To give another example, consider the 'Gender' column containing categorical variables.
df['Gender'].mode()
Output
0 Unknown
Name: Gender, dtype: object
Using pandas value_counts()
Another useful approach is to use the value_counts() method. This gives you a complete list of values in a column, along with the number of times each one appears. You can then use this information to determine which value is the most frequent.
df['Age'].value_counts()
returns
23 4
24 2
21 2
22 1
20 1
Name: Age, dtype: int64
To access the most frequent value, a solution is to do:
df['Age'].value_counts().index[0]
gives
23
And to get the number of occurance of 23
df['Age'].value_counts().values[0]
gives
4
Using pandas groupby()
Finally, if your dataset contains multiple columns that contain duplicate values, it might be worthwhile to use the groupby() and size() methods to group similar values together and then find the most frequent one. This approach can be useful if you want to compare multiple columns at once.
Example:
df.groupby(by='Gender').agg('count')
returns
Age
Gender
Female 3
Male 3
Unknown 4
Sort values by Age:
dfg = df.groupby(by='Gender').agg('count')
dfg.sort_values(by='Age',ascending=False)
Extract first row:
dfg.sort_values(by='Age',ascending=False).reset_index().iloc[0]
returns
Gender Unknown
Age 4
Name: 0, dtype: object
References
Links | Site |
---|---|
mode() | pandas.pydata.org |
value_counts() | pandas.pydata.org |
groupby() | pandas.pydata.org |