How to remove rows from a numpy array in python ?

Published: September 18, 2023

Tags: Python; Numpy;

DMCA.com Protection Status

Examples of how to remove rows from a numpy array for different cases:

Create a 2d numpy array

Let's first create a basic 2d array with numpy

import pandas as pd
import numpy as np

data = np.array([[44., 99.,  2., 93.],
                 [51., 72., 75., 28.],
                 [89.,  -999., 17., 17.],
                 [73., 11., 81.,  3.],
                 [83., 59., 41.,  -999.],
                 [34., 94., 51., 84.]])

Remove rows using delete()

The easiest way to remove rows from a numpy array is by using the numpy.delete function. This function takes two arguments: an array and a list of indices (the positions of the elements you want to remove):

Remove a specific row based on its index

To remove a specific row with a given index, a solution is to use delete():

np.delete(data, (2), axis=0)

returns

array([[  44.,   99.,    2.,   93.],
       [  51.,   72.,   75.,   28.],
       [  73.,   11.,   81.,    3.],
       [  83.,   59.,   41., -999.],
       [  34.,   94.,   51.,   84.]])

Remove multiples rows

To remove multiple rows:

np.delete(data, (2,4), axis=0)

returns

array([[44., 99.,  2., 93.],
       [51., 72., 75., 28.],
       [73., 11., 81.,  3.],
       [34., 94., 51., 84.]])

Remove rows based on a column condition

Remove rows using any()

If the objective is to eliminate any rows that contain the value -999, regardless of the column in which it appears, a solution is to use any():

(data == -999.).any(axis=1)

returns here

array([False, False,  True, False,  True, False])

since the rows with index 2 and 4 contain a value of -999 (keeping in mind that Python uses 0-based indexing).

Now to keep only rows with no -999, a solution is to take the inverse using the python operator ~:

~(data == -999.).any(axis=1)

which gives

array([ True,  True, False,  True, False,  True])

and define a new array:

new_data = data[ ~(data == -999.).any(axis=1) , :  ]

print(new_data)

which will return:

array([[44., 99.,  2., 93.],
       [51., 72., 75., 28.],
       [73., 11., 81.,  3.],
       [34., 94., 51., 84.]])

Filter out rows based on a specific condition in a particular column

For example remove rows if there is a -999. value in the column of index 1 (meaning the second column)

data[ data[:,1] != -999. ]

returns

array([[  44.,   99.,    2.,   93.],
       [  51.,   72.,   75.,   28.],
       [  73.,   11.,   81.,    3.],
       [  83.,   59.,   41., -999.],
       [  34.,   94.,   51.,   84.]])

Filter out rows based on a specific condition in multiple columns

Using a condition with the operator &

data[ (data[:, 1] != -999.) & (data[:, 3] != -999.) ]

returns

array([[44., 99.,  2., 93.],
       [51., 72., 75., 28.],
       [73., 11., 81.,  3.],
       [34., 94., 51., 84.]])

Using any()

Notice that:

(data[:, [1,3]] == -999.).any(axis=1)

returns

array([False, False,  True, False,  True, False])

Then

data[ ~(data[:, [1,3]] == -999.).any(axis=1), :  ]

returns

array([[44., 99.,  2., 93.],
       [51., 72., 75., 28.],
       [73., 11., 81.,  3.],
       [34., 94., 51., 84.]])

References

Links Site
numpy.delete numpy.org
numpy.any numpy.org
numpy.invert numpy.org