Examples of how to count the number of NaN (Not a Number) in a pandas dataframe column:
Create a dataframe
Let's create a simple dataframe
import pandas as pdimport numpy as npheight = np.random.randint(130,200, size=20)weight = np.random.randint(140,300, size=20)age = np.random.uniform(10,80, size=20)n = 6index = np.random.choice(age.shape[0], n, replace=False)age[index] = np.nandata = {'height':height,'weight':weight,'age':age}df = pd.DataFrame(data)print(df)
returns for example
age height weight0 38.465208 146 2581 NaN 170 2382 45.885901 153 2093 60.914051 150 2234 17.178981 133 2065 NaN 160 1746 13.015937 187 2877 32.084851 169 2878 47.084864 147 2239 41.501424 132 23610 NaN 191 27511 69.703666 147 25312 14.395377 174 29313 75.123441 199 25914 48.716606 166 19415 NaN 165 14516 36.518000 156 22317 28.828981 170 15818 NaN 194 22819 NaN 164 285
Get the number of NaN in the column called 'age'
To get the number of NaN in the column called 'age' a solution is
df['age'].isna().sum()
gives
6
Another approach using value_counts() with the option dropna=False:
df['age'].value_counts(dropna=False)
returns
NaN 647.084864 117.178981 145.885901 128.828981 113.015937 169.703666 160.914051 114.395377 138.465208 141.501424 132.084851 175.123441 148.716606 136.518000 1Name: age, dtype: int64
Get the row indexes with a NaN in the column 'age'
To get the indexes with a NaN in the column 'age':
df.index[ df['age'].isna() ]
returns
Int64Index([1, 5, 10, 15, 18, 19], dtype='int64')
Drop a column with a given percentage of NaN
Example of application: remove a column if there is more than 20% of NaN in it:
for c in df.columns:if 100.0 * df[c].isna().sum() / df.shape[0] > 20:df.drop(c,1,inplace=True)df
returns
height weight0 175 1511 155 1692 175 2793 180 1814 166 1885 164 2556 162 1527 162 1798 185 1579 193 26210 149 19111 187 20812 164 24513 139 20914 173 16915 172 26516 141 16017 183 21218 195 19919 183 180
