How to count the number of NaN in a pandas dataframe column ?


Examples of how to count the number of NaN (Not a Number) in a pandas dataframe column:

Create a dataframe

Let's create a simple dataframe

import pandas as pd
import numpy as np

height = np.random.randint(130,200, size=20)
weight = np.random.randint(140,300, size=20)
age = np.random.uniform(10,80, size=20)

n = 6  
index = np.random.choice(age.shape[0], n, replace=False)  
age[index] = np.nan

data = {'height':height, 
        'weight':weight, 
        'age':age}

df = pd.DataFrame(data)

print(df)

returns for example

          age  height  weight
0   38.465208     146     258
1         NaN     170     238
2   45.885901     153     209
3   60.914051     150     223
4   17.178981     133     206
5         NaN     160     174
6   13.015937     187     287
7   32.084851     169     287
8   47.084864     147     223
9   41.501424     132     236
10        NaN     191     275
11  69.703666     147     253
12  14.395377     174     293
13  75.123441     199     259
14  48.716606     166     194
15        NaN     165     145
16  36.518000     156     223
17  28.828981     170     158
18        NaN     194     228
19        NaN     164     285

Get the number of NaN in the column called 'age'

To get the number of NaN in the column called 'age' a solution is

df['age'].isna().sum()

gives

6

Another approach using value_counts() with the option dropna=False:

df['age'].value_counts(dropna=False)

returns

NaN           6
 47.084864    1
 17.178981    1
 45.885901    1
 28.828981    1
 13.015937    1
 69.703666    1
 60.914051    1
 14.395377    1
 38.465208    1
 41.501424    1
 32.084851    1
 75.123441    1
 48.716606    1
 36.518000    1
Name: age, dtype: int64

Get the row indexes with a NaN in the column 'age'

To get the indexes with a NaN in the column 'age':

df.index[ df['age'].isna() ]

returns

Int64Index([1, 5, 10, 15, 18, 19], dtype='int64')

Drop a column with a given percentage of NaN

Example of application: remove a column if there is more than 20% of NaN in it:

for c in df.columns:
    if 100.0 * df[c].isna().sum() / df.shape[0] > 20:
        df.drop(c,1,inplace=True)

df

returns

        height  weight
0      175     151
1      155     169
2      175     279
3      180     181
4      166     188
5      164     255
6      162     152
7      162     179
8      185     157
9      193     262
10     149     191
11     187     208
12     164     245
13     139     209
14     173     169
15     172     265
16     141     160
17     183     212
18     195     199
19     183     180

References