Examples of how to count the number of NaN (Not a Number) in a pandas dataframe column:
Create a dataframe
Let's create a simple dataframe
import pandas as pd
import numpy as np
height = np.random.randint(130,200, size=20)
weight = np.random.randint(140,300, size=20)
age = np.random.uniform(10,80, size=20)
n = 6
index = np.random.choice(age.shape[0], n, replace=False)
age[index] = np.nan
data = {'height':height,
'weight':weight,
'age':age}
df = pd.DataFrame(data)
print(df)
returns for example
age height weight
0 38.465208 146 258
1 NaN 170 238
2 45.885901 153 209
3 60.914051 150 223
4 17.178981 133 206
5 NaN 160 174
6 13.015937 187 287
7 32.084851 169 287
8 47.084864 147 223
9 41.501424 132 236
10 NaN 191 275
11 69.703666 147 253
12 14.395377 174 293
13 75.123441 199 259
14 48.716606 166 194
15 NaN 165 145
16 36.518000 156 223
17 28.828981 170 158
18 NaN 194 228
19 NaN 164 285
Get the number of NaN in the column called 'age'
To get the number of NaN in the column called 'age' a solution is
df['age'].isna().sum()
gives
6
Another approach using value_counts() with the option dropna=False:
df['age'].value_counts(dropna=False)
returns
NaN 6
47.084864 1
17.178981 1
45.885901 1
28.828981 1
13.015937 1
69.703666 1
60.914051 1
14.395377 1
38.465208 1
41.501424 1
32.084851 1
75.123441 1
48.716606 1
36.518000 1
Name: age, dtype: int64
Get the row indexes with a NaN in the column 'age'
To get the indexes with a NaN in the column 'age':
df.index[ df['age'].isna() ]
returns
Int64Index([1, 5, 10, 15, 18, 19], dtype='int64')
Drop a column with a given percentage of NaN
Example of application: remove a column if there is more than 20% of NaN in it:
for c in df.columns:
if 100.0 * df[c].isna().sum() / df.shape[0] > 20:
df.drop(c,1,inplace=True)
df
returns
height weight
0 175 151
1 155 169
2 175 279
3 180 181
4 166 188
5 164 255
6 162 152
7 162 179
8 185 157
9 193 262
10 149 191
11 187 208
12 164 245
13 139 209
14 173 169
15 172 265
16 141 160
17 183 212
18 195 199
19 183 180