Numpy provides a useful way to replace nan (Not a Number) values with zero. This is especially important when dealing with numerical data, as these errors can lead to inaccurate results or errors during calculations.
Using numpy nan_to_num() function
Let's first create an array with NaN values.
import numpy as np
np.random.seed(42)
A = np.random.uniform(-10,80, size=(4,6))
n = 6
index = np.random.choice(A.size, n, replace=False)
A.ravel()[index] = np.nan
print(A)
returns
[[ nan 75.56428758 55.87945476 43.87926358 4.04167764 nan]
[-4.7724749 nan 44.10035106 53.726532 -8.14739551 77.29188669]
[64.91983767 nan 6.36424705 nan 17.38180187 37.22807885]
[28.87505168 16.21062262 nan 2.55444746 16.29301837 22.9725659 ]]
Note that
print( A.min() )
print( A.max() )
returns
nan
nan
To replace nan value with zero, you will need to use the np.nan_to_num() function. This function takes an array and returns a new array with all nan values replaced by zeroes. It also has an optional parameter that can be used to specify the replacement value for nan values instead of using zero, which can be useful in certain situations. To use this function, simply pass in the array containing the nan values as an argument. For example
new_A = np.nan_to_num(A)
This will output the following result:
array([[ 0. , 75.56428758, 55.87945476, 43.87926358, 4.04167764,
0. ],
[-4.7724749 , 0. , 44.10035106, 53.726532 , -8.14739551,
77.29188669],
[64.91983767, 0. , 6.36424705, 0. , 17.38180187,
37.22807885],
[28.87505168, 16.21062262, 0. , 2.55444746, 16.29301837,
22.9725659 ]])
As you can see, all nan values have been replaced with zeroes. This is a simple yet effective way to replace nan values with zeroes using numpy.
Note that
print( new_A.min() )
print( new_A.max() )
gives then
-8.14739551337778
77.29188669457949
Replacing nan with a given value
For example let's replace nan by -999
new_A = np.nan_to_num(A,nan=-999)
This will output the following result:
array([[-999. , 75.56428758, 55.87945476, 43.87926358,
4.04167764, -999. ],
[ -4.7724749 , -999. , 44.10035106, 53.726532 ,
-8.14739551, 77.29188669],
[ 64.91983767, -999. , 6.36424705, -999. ,
17.38180187, 37.22807885],
[ 28.87505168, 16.21062262, -999. , 2.55444746,
16.29301837, 22.9725659 ]])
Using a mask array
Using mask array can be a better approach than replacing nan values at times.
import numpy.ma as ma
ma.masked_invalid(A)
returns
masked_array(
data=[[--, 75.56428757689245, 55.87945476302646, 43.8792635777333,
4.041677639819287, --],
[-4.772474904862048, --, 44.10035105688879, 53.726532001644095,
-8.14739551337778, 77.29188669457949],
[64.91983767203796, --, 6.364247048639054, --,
17.381801866358394, 37.228078846901404],
[28.875051677790417, 16.210622617823773, --, 2.5544474586837644,
16.293018368169633, 22.97256589643225]],
mask=[[ True, False, False, False, False, True],
[False, True, False, False, False, False],
[False, True, False, True, False, False],
[False, False, True, False, False, False]],
fill_value=1e+20)
Note that
print( ma.masked_invalid(A).min() )
print( ma.masked_invalid(A).max() )
gives then
-8.14739551337778
77.29188669457949
References
Links | Site |
---|---|
nan_to_num | numpy.org |
numpy.ma.masked_invalid | numpy.org |