How to find the most frequent value or mode in a numpy array ?


Introduction

One common task when dealing with arrays is finding the most frequent value, also known as mode:

Mode is the value that appears the most frequently in a dataset. It is different from mean and median, which are measures of central tendency. Mode can be useful in situations where you want to find the most common value or category in a dataset.

Python offers several approaches to determine the mode, providing flexibility and versatility in finding the most frequently occurring value:

Creating a numpy array

First, let's generate a simulated 2D array filled with random integer

import numpy as np

data = np.random.randint(0,4,(4,5))

print(data)

Output for example

[[3 2 3 3 3]
 [3 0 2 3 3]
 [1 0 2 0 2]
 [3 3 3 2 2]]

Our goal is to find the most frequent value or mode.

Using scipy stats mode function

One way, to find the most frequent value, is to use the scipy.stats module, which has a function called mode() that works for both one-dimensional and multidimensional arrays. Example

 from scipy import stats

 stats.mode(data, axis=None, keepdims=True)

Output

ModeResult(mode=array([3]), count=array([10]))

As evident, the function returns both the mode and its corresponding counts.

To access the modes directly, simply do:

s = stats.mode(data, axis=None, keepdims=True)

mode = s[0][0]

print(mode)

Output

3

Using numpy bincount() function

An alternative approach is to employ the bincount() function. It is important to note, however, that this function is only compatible with flat or one-dimensional arrays:

counts = np.bincount(data.ravel())

print(np.argmax(counts))

Output

3

Using numpy unique() function

Another approach using numpy unique() function

values, counts = np.unique(data, return_counts=True)

print(values,counts)

Output

[0 1 2 3] [ 3  1  6 10]

Extracting the most frequent value:

values, counts = np.unique(data, return_counts=True)

print(values,counts)

Output

3

Using Counter from collections module

An alternative approach involves transferring the data into a list and utilizing the Counter function from the collections module.

from collections import Counter

data_to_list = list( data.ravel() )

Counter(data_to_list).most_common(1)

Output

[(3, 10)]

The above code returns the mode, which in this case is 3, along with the frequency of its occurrence.

Visualization

import matplotlib.pyplot as plt

#plt.hist(data.ravel(), bins = [i for i in range( np.unique(data).shape[0] + 1 ) ]  )
plt.hist(data.ravel() )

plt.title('How to find the most frequent value in a numpy array ?')

plt.savefig("histogram_matplotlib_mode.png")

plt.show()

How to find the most frequent value or mode in a numpy array ?
How to find the most frequent value or mode in a numpy array ?

Conclusion

In summary, there are multiple ways to find the mode in a numpy array. Whether you have a one-dimensional or multidimensional array, Numpy provides efficient methods for finding the most frequent value. Understanding how to use these methods can be useful for data analysis and making informed decisions based on your data.

References

Image

of