Let's consider a 2D NumPy array with 6 rows and 4 columns. Our objective is to eliminate rows that contain the value -999:
Create a 2d numpy array
import pandas as pd
import numpy as np
data = np.array([[44., 99., 2., 93.],
[51., 72., 75., 28.],
[89., -999., 17., 17.],
[73., 11., 81., 3.],
[83., 59., 41., -999.],
[34., 94., 51., 84.]])
Using any()
If the objective is to eliminate any rows that contain the value -999, regardless of the column in which it appears, a solution is to use any():
(data == -999.).any(axis=1)
returns here
array([False, False, True, False, True, False])
since the rows with index 2 and 4 contain a value of -999 (keeping in mind that Python uses 0-based indexing).
Now to keep only rows with no -999, a solution is to take the inverse using the python operator ~:
~(data == -999.).any(axis=1)
which gives
array([ True, True, False, True, False, True])
and define a new array:
new_data = data[ ~(data == -999.).any(axis=1) , : ]
print(new_data)
which will return:
array([[44., 99., 2., 93.],
[51., 72., 75., 28.],
[73., 11., 81., 3.],
[34., 94., 51., 84.]])
Filter out rows based on a specific condition in a particular column
For example remove rows if there is a -999. value in the column of index 1 (meaning the second column)
data[ data[:,1] != -999. ]
returns
array([[ 44., 99., 2., 93.],
[ 51., 72., 75., 28.],
[ 73., 11., 81., 3.],
[ 83., 59., 41., -999.],
[ 34., 94., 51., 84.]])
Filter out rows based on a specific condition in multiple columns
Using a condition with the operator &
data[ (data[:, 1] != -999.) & (data[:, 3] != -999.) ]
returns
array([[44., 99., 2., 93.],
[51., 72., 75., 28.],
[73., 11., 81., 3.],
[34., 94., 51., 84.]])
Using any()
Notice that:
(data[:, [1,3]] == -999.).any(axis=1)
returns
array([False, False, True, False, True, False])
Then
data[ ~(data[:, [1,3]] == -999.).any(axis=1), : ]
returns
array([[44., 99., 2., 93.],
[51., 72., 75., 28.],
[73., 11., 81., 3.],
[34., 94., 51., 84.]])
References
Links | Site |
---|---|
numpy.any | numpy.org |
numpy.invert | numpy.org |