Introduction
One common task when dealing with arrays is finding the most frequent value, also known as mode:
Mode is the value that appears the most frequently in a dataset. It is different from mean and median, which are measures of central tendency. Mode can be useful in situations where you want to find the most common value or category in a dataset.
Python offers several approaches to determine the mode, providing flexibility and versatility in finding the most frequently occurring value:
Creating a numpy array
First, let's generate a simulated 2D array filled with random integer
import numpy as np
data = np.random.randint(0,4,(4,5))
print(data)
Output for example
[[3 2 3 3 3]
[3 0 2 3 3]
[1 0 2 0 2]
[3 3 3 2 2]]
Our goal is to find the most frequent value or mode.
Using scipy stats mode function
One way, to find the most frequent value, is to use the scipy.stats module, which has a function called mode() that works for both one-dimensional and multidimensional arrays. Example
from scipy import stats
stats.mode(data, axis=None, keepdims=True)
Output
ModeResult(mode=array([3]), count=array([10]))
As evident, the function returns both the mode and its corresponding counts.
To access the modes directly, simply do:
s = stats.mode(data, axis=None, keepdims=True)
mode = s[0][0]
print(mode)
Output
3
Using numpy bincount() function
An alternative approach is to employ the bincount() function. It is important to note, however, that this function is only compatible with flat or one-dimensional arrays:
counts = np.bincount(data.ravel())
print(np.argmax(counts))
Output
3
Using numpy unique() function
Another approach using numpy unique() function
values, counts = np.unique(data, return_counts=True)
print(values,counts)
Output
[0 1 2 3] [ 3 1 6 10]
Extracting the most frequent value:
values, counts = np.unique(data, return_counts=True)
print(values,counts)
Output
3
Using Counter from collections module
An alternative approach involves transferring the data into a list and utilizing the Counter function from the collections module.
from collections import Counter
data_to_list = list( data.ravel() )
Counter(data_to_list).most_common(1)
Output
[(3, 10)]
The above code returns the mode, which in this case is 3, along with the frequency of its occurrence.
Visualization
import matplotlib.pyplot as plt
#plt.hist(data.ravel(), bins = [i for i in range( np.unique(data).shape[0] + 1 ) ] )
plt.hist(data.ravel() )
plt.title('How to find the most frequent value in a numpy array ?')
plt.savefig("histogram_matplotlib_mode.png")
plt.show()
Conclusion
In summary, there are multiple ways to find the mode in a numpy array. Whether you have a one-dimensional or multidimensional array, Numpy provides efficient methods for finding the most frequent value. Understanding how to use these methods can be useful for data analysis and making informed decisions based on your data.
References
Links | Site |
---|---|
scipy.stats.mode | docs.scipy.org |
numpy.bincount | numpy.org |
numpy.unique | numpy.org |
Python Counter - Python Collections Counter | digitalocean.com |
How to create a matrix of random numbers with numpy in python ? | en.moonbooks.org |
How to find all unique values in a matrix using numpy in python | en.moonbooks.org |
How to create and plot a simple histogram with matplotlib and python ? | en.moonbooks.org |