Introduction
When working with large datasets, it is common to compare arrays and extract certain elements based on specific conditions. In such cases, only keeping the maximum value element-wise can help us focus on the most significant data points and improve our analysis.
In this article, we will explore various methods for selectively retaining the maximum values from two arrays. By the end, I will provide a real-life case study example to illustrate the concepts discussed
Using numpy maximum() function
One way to only keep the maximum value element-wise is to use the np.maximum() function. This function takes two arrays as input and returns a new array with the maximum value at each element-wise index.
To demonstrate the usage of np.maximum(), let's begin by generating two random arrays. This will serve as an illustration of its functionality.
import numpy as np
np.random.seed(42) # to always generate the same random numbers
ds1 = np.random.randint(0,10,(4,5)) # dataset 1
ds2 = np.random.randint(0,10,(4,5)) # dataset 2
print(ds1)
print(ds2)
The code provided above will generate:
array([[6, 3, 7, 4, 6],
[9, 2, 6, 7, 4],
[3, 7, 7, 2, 5],
[4, 1, 7, 5, 1]])
and
array([[4, 0, 9, 5, 8],
[0, 9, 2, 6, 3],
[8, 2, 4, 2, 6],
[4, 8, 6, 1, 3]])
To retain only the highest value between ds1 and ds2, you can utilize the maximum() function from the numpy library.
np.maximum(ds1,ds2)
Output
array([[6, 3, 9, 5, 8],
[9, 9, 6, 7, 4],
[8, 7, 7, 2, 6],
[4, 8, 7, 5, 3]])
Note: The arrays containing the elements for comparison should have compatible shapes.
Using the maximum() function across multiple arrays
Let's now create a new array:
ds3 = np.random.randint(0,15,(4,5)) # dataset 2
print(ds3)
Output
array([[ 8, 11, 13, 1, 9],
[ 8, 9, 4, 1, 3],
[11, 14, 11, 6, 11],
[12, 7, 14, 2, 13]])
To efficiently apply the maximum() function to multiple arrays simultaneously, a simple solution is to utilize the reduce() function from the numpy library. This function is designed to iteratively apply a given function to all elements in a list, resulting in a single output value. In the case of comparing two numpy arrays, we can use reduce()
alongside the maximum()
function to keep only the maximum value element-wise:
np.maximum.reduce([ds1,ds2,ds3])
Output
array([[ 8, 11, 13, 5, 9],
[ 9, 9, 6, 7, 4],
[11, 14, 11, 6, 11],
[12, 8, 14, 5, 13]])
Dealing with NaNs
To begin, let's create a numpy array filled with random NaN values. To achieve this, we should first convert the arrays into float arrays:
ds1 = ds1.astype('float64')
ds2 = ds2.astype('float64')
print(ds1)
Output
array([[6., 3., 7., 4., 6.],
[9., 2., 6., 7., 4.],
[3., 7., 7., 2., 5.],
[4., 1., 7., 5., 1.]])
and then insert NaNs:
n = 4
index = np.random.choice(ds1.size, n, replace=False)
ds1.ravel()[index] = np.nan
print(ds1)
Keep in mind that when utilizing the maximum() function, such as:
np.maximum(ds1,ds2)
the resulting output may contain NaN values.
array([[ 6., 3., nan, 5., 8.],
[nan, 9., 6., 7., 4.],
[nan, 7., 7., 2., 6.],
[ 4., 8., 7., 5., nan]])
To retain only valid values instead of NaNs, one possible solution is to utilize the fmax() function:
np.fmax(ds1,ds2)
Output
array([[6., 3., 9., 5., 8.],
[0., 9., 6., 7., 4.],
[8., 7., 7., 2., 6.],
[4., 8., 7., 5., 3.]])
Using broadcasting and logical operators
Another method is to use the power of broadcasting in numpy combined with logical operators such as greater than (>) or equal to (>=). This approach allows us to keep only the maximum value element-wise.
Create an array of NaN values that matches the dimensions of ds1
R = np.full( ds1.shape, np.nan)
Output
array([[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan],
[nan, nan, nan, nan, nan]])
Now using broadcasting and logical operators
R[ds1 >= ds2] = ds1[ds1 >= ds2]
R[ds2 >= ds1] = ds2[ds2 >= ds1]
print(R)
we also get
array([[6., 3., 9., 5., 8.],
[9., 9., 6., 7., 4.],
[8., 7., 7., 2., 6.],
[4., 8., 7., 5., 3.]])
It is worth noting that we also have the ability to update the data type of the array:
R.astype('int')
Output
array([[6, 3, 9, 5, 8],
[9, 9, 6, 7, 4],
[8, 7, 7, 2, 6],
[4, 8, 7, 5, 3]])
Using numpy max() function
An alternative approach is to utilize the numpy max() function, even if it was not specifically designed for that purpose
np.max( [ds1,ds2] ,axis=0)
also returns
array([[6, 3, 9, 5, 8],
[9, 9, 6, 7, 4],
[8, 7, 7, 2, 6],
[4, 8, 7, 5, 3]])
References
Links | Site |
---|---|
numpy.maximum | numpy.org |
numpy.fmax | numpy.org |
numpy.max | numpy.org |
numpy.ufunc.reduce | numpy.org |
numpy.fmax | numpy.org |
How to find the most frequent value or mode in a numpy array ? | en.moonbooks.org |
How to change array (or matrix) type with numpy in python ? | en.moonbooks.org |
How to randomly insert NaN in a matrix with numpy in python ? | en.moonbooks.org |