# How to only keep the maximum value of two numpy arrays ?

Published: December 08, 2023

Updated: December 08, 2023

Tags: Python; Numpy;

## Introduction

When working with large datasets, it is common to compare arrays and extract certain elements based on specific conditions. In such cases, only keeping the maximum value element-wise can help us focus on the most significant data points and improve our analysis.

In this article, we will explore various methods for selectively retaining the maximum values from two arrays. By the end, I will provide a real-life case study example to illustrate the concepts discussed

## Using numpy maximum() function

One way to only keep the maximum value element-wise is to use the np.maximum() function. This function takes two arrays as input and returns a new array with the maximum value at each element-wise index.

To demonstrate the usage of np.maximum(), let's begin by generating two random arrays. This will serve as an illustration of its functionality.

````import numpy as np`

`np.random.seed(42) # to always generate the same random numbers`

`ds1 = np.random.randint(0,10,(4,5)) # dataset 1`

`ds2 = np.random.randint(0,10,(4,5)) # dataset 2`

`print(ds1)`

`print(ds2)`
```

The code provided above will generate:

````array([[6, 3, 7, 4, 6],`
`       [9, 2, 6, 7, 4],`
`       [3, 7, 7, 2, 5],`
`       [4, 1, 7, 5, 1]])`
```

and

````array([[4, 0, 9, 5, 8],`
`       [0, 9, 2, 6, 3],`
`       [8, 2, 4, 2, 6],`
`       [4, 8, 6, 1, 3]])`
```

To retain only the highest value between ds1 and ds2, you can utilize the maximum() function from the numpy library.

````np.maximum(ds1,ds2)`
```

Output

````array([[6, 3, 9, 5, 8],`
`       [9, 9, 6, 7, 4],`
`       [8, 7, 7, 2, 6],`
`       [4, 8, 7, 5, 3]])`
```

Note: The arrays containing the elements for comparison should have compatible shapes.

### Using the maximum() function across multiple arrays

Let's now create a new array:

````ds3 = np.random.randint(0,15,(4,5)) # dataset 2`

`print(ds3)`
```

Output

````array([[ 8, 11, 13,  1,  9],`
`       [ 8,  9,  4,  1,  3],`
`       [11, 14, 11,  6, 11],`
`       [12,  7, 14,  2, 13]])`
```

To efficiently apply the maximum() function to multiple arrays simultaneously, a simple solution is to utilize the reduce() function from the numpy library. This function is designed to iteratively apply a given function to all elements in a list, resulting in a single output value. In the case of comparing two numpy arrays, we can use `reduce()` alongside the `maximum()` function to keep only the maximum value element-wise:

````np.maximum.reduce([ds1,ds2,ds3])`
```

Output

````array([[ 8, 11, 13,  5,  9],`
`       [ 9,  9,  6,  7,  4],`
`       [11, 14, 11,  6, 11],`
`       [12,  8, 14,  5, 13]])`
```

### Dealing with NaNs

To begin, let's create a numpy array filled with random NaN values. To achieve this, we should first convert the arrays into float arrays:

````ds1 = ds1.astype('float64')`
`ds2 = ds2.astype('float64')`

`print(ds1)`
```

Output

````array([[6., 3., 7., 4., 6.],`
`       [9., 2., 6., 7., 4.],`
`       [3., 7., 7., 2., 5.],`
`       [4., 1., 7., 5., 1.]])`
```

and then insert NaNs:

````n = 4`

`index = np.random.choice(ds1.size, n, replace=False)`

`ds1.ravel()[index] = np.nan`

`print(ds1)`
```

Keep in mind that when utilizing the maximum() function, such as:

````np.maximum(ds1,ds2)`
```

the resulting output may contain NaN values.

````array([[ 6.,  3., nan,  5.,  8.],`
`       [nan,  9.,  6.,  7.,  4.],`
`       [nan,  7.,  7.,  2.,  6.],`
`       [ 4.,  8.,  7.,  5., nan]])`
```

To retain only valid values instead of NaNs, one possible solution is to utilize the fmax() function:

````np.fmax(ds1,ds2)`
```

Output

````array([[6., 3., 9., 5., 8.],`
`       [0., 9., 6., 7., 4.],`
`       [8., 7., 7., 2., 6.],`
`       [4., 8., 7., 5., 3.]])`
```

## Using broadcasting and logical operators

Another method is to use the power of broadcasting in numpy combined with logical operators such as greater than (>) or equal to (>=). This approach allows us to keep only the maximum value element-wise.

Create an array of NaN values that matches the dimensions of ds1

````R = np.full( ds1.shape, np.nan)`
```

Output

````array([[nan, nan, nan, nan, nan],`
`       [nan, nan, nan, nan, nan],`
`      [nan, nan, nan, nan, nan],`
`       [nan, nan, nan, nan, nan]])`
```

Now using broadcasting and logical operators

````R[ds1 >= ds2]  = ds1[ds1 >= ds2] `
`R[ds2 >= ds1]  = ds2[ds2 >= ds1]`

`print(R)`
```

we also get

````array([[6., 3., 9., 5., 8.],`
`       [9., 9., 6., 7., 4.],`
`       [8., 7., 7., 2., 6.],`
`       [4., 8., 7., 5., 3.]])`
```

It is worth noting that we also have the ability to update the data type of the array:

````R.astype('int')`
```

Output

````array([[6, 3, 9, 5, 8],`
`       [9, 9, 6, 7, 4],`
`       [8, 7, 7, 2, 6],`
`       [4, 8, 7, 5, 3]])`
```

## Using numpy max() function

An alternative approach is to utilize the numpy max() function, even if it was not specifically designed for that purpose

````np.max( [ds1,ds2] ,axis=0)`
```

also returns

````array([[6, 3, 9, 5, 8],`
`       [9, 9, 6, 7, 4],`
`       [8, 7, 7, 2, 6],`
`       [4, 8, 7, 5, 3]])`
```