Examples of how to remove rows from a numpy array for different cases:
Create a 2d numpy array
Let's first create a basic 2d array with numpy
import pandas as pd
import numpy as np
data = np.array([[44., 99., 2., 93.],
[51., 72., 75., 28.],
[89., -999., 17., 17.],
[73., 11., 81., 3.],
[83., 59., 41., -999.],
[34., 94., 51., 84.]])
Remove rows using delete()
The easiest way to remove rows from a numpy array is by using the numpy.delete
function. This function takes two arguments: an array and a list of indices (the positions of the elements you want to remove):
Remove a specific row based on its index
To remove a specific row with a given index, a solution is to use delete():
np.delete(data, (2), axis=0)
returns
array([[ 44., 99., 2., 93.],
[ 51., 72., 75., 28.],
[ 73., 11., 81., 3.],
[ 83., 59., 41., -999.],
[ 34., 94., 51., 84.]])
Remove multiples rows
To remove multiple rows:
np.delete(data, (2,4), axis=0)
returns
array([[44., 99., 2., 93.],
[51., 72., 75., 28.],
[73., 11., 81., 3.],
[34., 94., 51., 84.]])
Remove rows based on a column condition
Remove rows using any()
If the objective is to eliminate any rows that contain the value -999, regardless of the column in which it appears, a solution is to use any():
(data == -999.).any(axis=1)
returns here
array([False, False, True, False, True, False])
since the rows with index 2 and 4 contain a value of -999 (keeping in mind that Python uses 0-based indexing).
Now to keep only rows with no -999, a solution is to take the inverse using the python operator ~:
~(data == -999.).any(axis=1)
which gives
array([ True, True, False, True, False, True])
and define a new array:
new_data = data[ ~(data == -999.).any(axis=1) , : ]
print(new_data)
which will return:
array([[44., 99., 2., 93.],
[51., 72., 75., 28.],
[73., 11., 81., 3.],
[34., 94., 51., 84.]])
Filter out rows based on a specific condition in a particular column
For example remove rows if there is a -999. value in the column of index 1 (meaning the second column)
data[ data[:,1] != -999. ]
returns
array([[ 44., 99., 2., 93.],
[ 51., 72., 75., 28.],
[ 73., 11., 81., 3.],
[ 83., 59., 41., -999.],
[ 34., 94., 51., 84.]])
Filter out rows based on a specific condition in multiple columns
Using a condition with the operator &
data[ (data[:, 1] != -999.) & (data[:, 3] != -999.) ]
returns
array([[44., 99., 2., 93.],
[51., 72., 75., 28.],
[73., 11., 81., 3.],
[34., 94., 51., 84.]])
Using any()
Notice that:
(data[:, [1,3]] == -999.).any(axis=1)
returns
array([False, False, True, False, True, False])
Then
data[ ~(data[:, [1,3]] == -999.).any(axis=1), : ]
returns
array([[44., 99., 2., 93.],
[51., 72., 75., 28.],
[73., 11., 81., 3.],
[34., 94., 51., 84.]])
References
Links | Site |
---|---|
numpy.delete | numpy.org |
numpy.any | numpy.org |
numpy.invert | numpy.org |