How to replace nan value with zero using numpy ?

Published: May 16, 2023

Tags: Python; Numpy;

DMCA.com Protection Status

Numpy provides a useful way to replace nan (Not a Number) values with zero. This is especially important when dealing with numerical data, as these errors can lead to inaccurate results or errors during calculations.

Using numpy nan_to_num() function

Let's first create an array with NaN values.

import numpy as np

np.random.seed(42)

A = np.random.uniform(-10,80, size=(4,6))

n = 6

index = np.random.choice(A.size, n, replace=False)

A.ravel()[index] = np.nan

print(A)

returns

[[        nan 75.56428758 55.87945476 43.87926358  4.04167764         nan]
 [-4.7724749          nan 44.10035106 53.726532   -8.14739551 77.29188669]
 [64.91983767         nan  6.36424705         nan 17.38180187 37.22807885]
 [28.87505168 16.21062262         nan  2.55444746 16.29301837 22.9725659 ]]

Note that

print( A.min() )
print( A.max() )

returns

nan
nan

To replace nan value with zero, you will need to use the np.nan_to_num() function. This function takes an array and returns a new array with all nan values replaced by zeroes. It also has an optional parameter that can be used to specify the replacement value for nan values instead of using zero, which can be useful in certain situations. To use this function, simply pass in the array containing the nan values as an argument. For example

new_A = np.nan_to_num(A)

This will output the following result:

array([[ 0.        , 75.56428758, 55.87945476, 43.87926358,  4.04167764,
     0.        ],
       [-4.7724749 ,  0.        , 44.10035106, 53.726532  , -8.14739551,
    77.29188669],
       [64.91983767,  0.        ,  6.36424705,  0.        , 17.38180187,
    37.22807885],
       [28.87505168, 16.21062262,  0.        ,  2.55444746, 16.29301837,
    22.9725659 ]])

As you can see, all nan values have been replaced with zeroes. This is a simple yet effective way to replace nan values with zeroes using numpy.

Note that

print( new_A.min() )
print( new_A.max() )

gives then

-8.14739551337778
77.29188669457949

Replacing nan with a given value

For example let's replace nan by -999

new_A = np.nan_to_num(A,nan=-999)

This will output the following result:

array([[-999.        ,   75.56428758,   55.87945476,   43.87926358,
           4.04167764, -999.        ],
       [  -4.7724749 , -999.        ,   44.10035106,   53.726532  ,
          -8.14739551,   77.29188669],
       [  64.91983767, -999.        ,    6.36424705, -999.        ,
          17.38180187,   37.22807885],
       [  28.87505168,   16.21062262, -999.        ,    2.55444746,
          16.29301837,   22.9725659 ]])

Using a mask array

Using mask array can be a better approach than replacing nan values at times.

import numpy.ma as ma

ma.masked_invalid(A)

returns

masked_array(
  data=[[--, 75.56428757689245, 55.87945476302646, 43.8792635777333,
         4.041677639819287, --],
        [-4.772474904862048, --, 44.10035105688879, 53.726532001644095,
         -8.14739551337778, 77.29188669457949],
        [64.91983767203796, --, 6.364247048639054, --,
         17.381801866358394, 37.228078846901404],
        [28.875051677790417, 16.210622617823773, --, 2.5544474586837644,
         16.293018368169633, 22.97256589643225]],
  mask=[[ True, False, False, False, False,  True],
        [False,  True, False, False, False, False],
        [False,  True, False,  True, False, False],
        [False, False,  True, False, False, False]],
  fill_value=1e+20)

Note that

print( ma.masked_invalid(A).min() )
print( ma.masked_invalid(A).max() )

gives then

-8.14739551337778
77.29188669457949

References

Links Site
nan_to_num numpy.org
numpy.ma.masked_invalid numpy.org