Pandas offers several ways to find the most frequent value in a dataframe column. The simplest one is using the mode() method, which returns the most frequent value.
Let's first generate a pandas DataFrame:
import pandas as pdimport numpy as npimport randomnp.random.seed(42)data = np.random.randint(0,5,10)data = { 'Age':np.random.randint(0,5,10)+20,'Gender':[np.random.choice(['Male', 'Female', 'Unknown']) for i in range(10)]}df = pd.DataFrame(data,columns=['Age','Gender'])print(df)
Output
Age Gender0 23 Female1 22 Female2 24 Male3 21 Male4 23 Male5 21 Unknown6 23 Unknown7 24 Unknown8 20 Female9 23 Unknown
Using pandas method mode()
To find the most frequent value in the "Age" column:
df['Age'].mode()
Output
0 23Name: Age, dtype: int64
To get the value:
df['Age'].mode()[0]
To give another example, consider the 'Gender' column containing categorical variables.
df['Gender'].mode()
Output
0 Unknown
Name: Gender, dtype: object
Using pandas value_counts()
Another useful approach is to use the value_counts() method. This gives you a complete list of values in a column, along with the number of times each one appears. You can then use this information to determine which value is the most frequent.
df['Age'].value_counts()
returns
23 424 221 222 120 1Name: Age, dtype: int64
To access the most frequent value, a solution is to do:
df['Age'].value_counts().index[0]
gives
23
And to get the number of occurance of 23
df['Age'].value_counts().values[0]
gives
4
Using pandas groupby()
Finally, if your dataset contains multiple columns that contain duplicate values, it might be worthwhile to use the groupby() and size() methods to group similar values together and then find the most frequent one. This approach can be useful if you want to compare multiple columns at once.
Example:
df.groupby(by='Gender').agg('count')
returns
AgeGenderFemale 3Male 3Unknown 4
Sort values by Age:
dfg = df.groupby(by='Gender').agg('count')dfg.sort_values(by='Age',ascending=False)
Extract first row:
dfg.sort_values(by='Age',ascending=False).reset_index().iloc[0]
returns
Gender UnknownAge 4Name: 0, dtype: object
References
| Links | Site |
|---|---|
| mode() | pandas.pydata.org |
| value_counts() | pandas.pydata.org |
| groupby() | pandas.pydata.org |
