How to only keep the maximum value of two numpy arrays ?

Published: December 08, 2023

Updated: December 08, 2023

Tags: Python; Numpy;

DMCA.com Protection Status

Introduction

When working with large datasets, it is common to compare arrays and extract certain elements based on specific conditions. In such cases, only keeping the maximum value element-wise can help us focus on the most significant data points and improve our analysis.

In this article, we will explore various methods for selectively retaining the maximum values from two arrays. By the end, I will provide a real-life case study example to illustrate the concepts discussed

Using numpy maximum() function

One way to only keep the maximum value element-wise is to use the np.maximum() function. This function takes two arrays as input and returns a new array with the maximum value at each element-wise index.

To demonstrate the usage of np.maximum(), let's begin by generating two random arrays. This will serve as an illustration of its functionality.

import numpy as np

np.random.seed(42) # to always generate the same random numbers

ds1 = np.random.randint(0,10,(4,5)) # dataset 1

ds2 = np.random.randint(0,10,(4,5)) # dataset 2

print(ds1)

print(ds2)

The code provided above will generate:

array([[6, 3, 7, 4, 6],
       [9, 2, 6, 7, 4],
       [3, 7, 7, 2, 5],
       [4, 1, 7, 5, 1]])

and

array([[4, 0, 9, 5, 8],
       [0, 9, 2, 6, 3],
       [8, 2, 4, 2, 6],
       [4, 8, 6, 1, 3]])

To retain only the highest value between ds1 and ds2, you can utilize the maximum() function from the numpy library.

np.maximum(ds1,ds2)

Output

array([[6, 3, 9, 5, 8],
       [9, 9, 6, 7, 4],
       [8, 7, 7, 2, 6],
       [4, 8, 7, 5, 3]])

Note: The arrays containing the elements for comparison should have compatible shapes.

Using the maximum() function across multiple arrays

Let's now create a new array:

ds3 = np.random.randint(0,15,(4,5)) # dataset 2

print(ds3)

Output

array([[ 8, 11, 13,  1,  9],
       [ 8,  9,  4,  1,  3],
       [11, 14, 11,  6, 11],
       [12,  7, 14,  2, 13]])

To efficiently apply the maximum() function to multiple arrays simultaneously, a simple solution is to utilize the reduce() function from the numpy library. This function is designed to iteratively apply a given function to all elements in a list, resulting in a single output value. In the case of comparing two numpy arrays, we can use reduce() alongside the maximum() function to keep only the maximum value element-wise:

np.maximum.reduce([ds1,ds2,ds3])

Output

array([[ 8, 11, 13,  5,  9],
       [ 9,  9,  6,  7,  4],
       [11, 14, 11,  6, 11],
       [12,  8, 14,  5, 13]])

Dealing with NaNs

To begin, let's create a numpy array filled with random NaN values. To achieve this, we should first convert the arrays into float arrays:

ds1 = ds1.astype('float64')
ds2 = ds2.astype('float64')

print(ds1)

Output

array([[6., 3., 7., 4., 6.],
       [9., 2., 6., 7., 4.],
       [3., 7., 7., 2., 5.],
       [4., 1., 7., 5., 1.]])

and then insert NaNs:

n = 4

index = np.random.choice(ds1.size, n, replace=False)

ds1.ravel()[index] = np.nan

print(ds1)

Keep in mind that when utilizing the maximum() function, such as:

np.maximum(ds1,ds2)

the resulting output may contain NaN values.

array([[ 6.,  3., nan,  5.,  8.],
       [nan,  9.,  6.,  7.,  4.],
       [nan,  7.,  7.,  2.,  6.],
       [ 4.,  8.,  7.,  5., nan]])

To retain only valid values instead of NaNs, one possible solution is to utilize the fmax() function:

np.fmax(ds1,ds2)

Output

array([[6., 3., 9., 5., 8.],
       [0., 9., 6., 7., 4.],
       [8., 7., 7., 2., 6.],
       [4., 8., 7., 5., 3.]])

Using broadcasting and logical operators

Another method is to use the power of broadcasting in numpy combined with logical operators such as greater than (>) or equal to (>=). This approach allows us to keep only the maximum value element-wise.

Create an array of NaN values that matches the dimensions of ds1

R = np.full( ds1.shape, np.nan)

Output

array([[nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan],
      [nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan]])

Now using broadcasting and logical operators

R[ds1 >= ds2]  = ds1[ds1 >= ds2] 
R[ds2 >= ds1]  = ds2[ds2 >= ds1]

print(R)

we also get

array([[6., 3., 9., 5., 8.],
       [9., 9., 6., 7., 4.],
       [8., 7., 7., 2., 6.],
       [4., 8., 7., 5., 3.]])

It is worth noting that we also have the ability to update the data type of the array:

R.astype('int')

Output

array([[6, 3, 9, 5, 8],
       [9, 9, 6, 7, 4],
       [8, 7, 7, 2, 6],
       [4, 8, 7, 5, 3]])

Using numpy max() function

An alternative approach is to utilize the numpy max() function, even if it was not specifically designed for that purpose

np.max( [ds1,ds2] ,axis=0)

also returns

array([[6, 3, 9, 5, 8],
       [9, 9, 6, 7, 4],
       [8, 7, 7, 2, 6],
       [4, 8, 7, 5, 3]])

References