Examples of how to remove one or multiple rows in a pandas DataFrame in python
Remove one row
Lets create a simple dataframe with pandas
>>> data = np.random.randint(100, size=(10,10))>>> df = pd.DataFrame(data=data)>>> df0 1 2 3 4 5 6 7 8 90 13 44 3 55 13 98 44 87 20 521 72 72 96 18 68 84 81 41 8 752 53 71 97 27 21 44 61 9 59 873 40 94 49 7 56 93 17 62 71 144 4 20 56 45 89 78 9 27 58 775 71 90 73 79 31 49 14 73 58 16 28 77 73 66 90 71 26 51 18 877 96 15 18 23 74 82 54 62 24 18 32 34 50 57 90 68 20 56 26 789 93 0 90 16 24 88 16 90 82 45
To remove for example the row 7 a solution is to use drop():
>>> df.drop(7,0,inplace=True)
returns
>>> df0 1 2 3 4 5 6 7 8 90 13 44 3 55 13 98 44 87 20 521 72 72 96 18 68 84 81 41 8 752 53 71 97 27 21 44 61 9 59 873 40 94 49 7 56 93 17 62 71 144 4 20 56 45 89 78 9 27 58 775 71 90 73 79 31 49 14 73 58 16 28 77 73 66 90 71 26 51 18 878 32 34 50 57 90 68 20 56 26 789 93 0 90 16 24 88 16 90 82 45
Remove a list of rows
>>> df = pd.DataFrame(data=data)>>> df.drop([1,5,7,9],0,inplace=True)>>> df0 1 2 3 4 5 6 7 8 90 13 44 3 55 13 98 44 87 20 522 53 71 97 27 21 44 61 9 59 873 40 94 49 7 56 93 17 62 71 144 4 20 56 45 89 78 9 27 58 776 28 77 73 66 90 71 26 51 18 878 32 34 50 57 90 68 20 56 26 78
Remove multiple consecutive rows
>>> data = np.random.randint(100, size=(10,10))>>> df.drop(df.index[3:7],0,inplace=True)>>> df0 1 2 3 4 5 6 7 8 90 13 44 3 55 13 98 44 87 20 521 72 72 96 18 68 84 81 41 8 752 53 71 97 27 21 44 61 9 59 877 96 15 18 23 74 82 54 62 24 18 32 34 50 57 90 68 20 56 26 789 93 0 90 16 24 88 16 90 82 45
Remove rows with missing data
Lets create a dataset with missing data
>>> data = np.random.randn(10,7)>>> data.ravel()[np.random.choice(data.size, 5, replace=False)] = np.nan>>> dataarray([[-0.21556193, 0.50798317, -1.40910182, -2.13125538, 1.1835753 ,0.45158695, 0.73910367],[-0.87888441, 1.05993664, -0.77287598, -0.69139053, -0.29032073,-0.64202622, -0.28829388],[-1.60249368, -1.50622796, 1.46894158, nan, -2.7252065 ,1.36411611, -0.57278577],[ 0.79703402, -1.5212633 , 0.62016751, nan, 1.09850942,-0.2358472 , -0.00723673],[ 0.8763736 , -1.07815499, 1.07747808, -0.20271076, -0.16235893,nan, -1.52423974],[ 0.27451099, -1.26743679, -0.05715345, -1.10172544, 0.02002978,-0.82632633, 0.54550534],[ 1.39432992, 0.9903974 , -1.56818002, 1.29163684, -0.393829 ,1.73997774, 0.86798373],[-0.07952965, -0.09397074, nan, 1.53816504, -1.05609124,1.08434771, -0.2870059 ],[-0.41546041, 0.11339261, 0.14015969, -1.46552628, 0.7903862 ,nan, 0.08339854],[-1.01347812, -1.41749653, -1.0594971 , -0.84758429, -1.11227765,0.46318414, 0.94640032]])
Create a dataframe
>>> df = pd.DataFrame(data=data)
Get a list of row with missing data
>>> index_with_nan = df.index[df.isnull().any(axis=1)]>>> index_with_nanInt64Index([2, 3, 4, 7, 8], dtype='int64')
Remove the rows with missing data
>>> df.drop(index_with_nan,0, inplace=True)>>> df0 1 2 3 4 5 60 -0.215562 0.507983 -1.409102 -2.131255 1.183575 0.451587 0.7391041 -0.878884 1.059937 -0.772876 -0.691391 -0.290321 -0.642026 -0.2882945 0.274511 -1.267437 -0.057153 -1.101725 0.020030 -0.826326 0.5455056 1.394330 0.990397 -1.568180 1.291637 -0.393829 1.739978 0.8679849 -1.013478 -1.417497 -1.059497 -0.847584 -1.112278 0.463184 0.946400
References
| Links | Site |
|---|---|
| drop() | pandas doc |
| Python Pandas : How to drop rows in DataFrame by index labels | thispointer.com |
| How to count nan values in a pandas DataFrame?) | stackoverflow |
| isnull | pandas doc |
| any | pandas doc |
| Create sample numpy array with randomly placed NaNs | stackoverflow |
