How to count the occurrence of an element in a pandas data frame column ?

Published: November 08, 2019

DMCA.com Protection Status

Examples of how to count the occurrence of an element in a pandas data frame column:

Using value_counts()

Lets take for example the file 'default of credit card clients Data Set" that can be downloaded here

>>> import pandas as pd
>>> df = pd.read_excel('default of credit card clients.xls', header=1)

To get the count of default payment a solution is to use value_counts():

>>> df['default payment next month'].value_counts()
0    23364
1     6636
Name: default payment next month, dtype: int64

Example with the column sex:

>>> df['SEX'].value_counts()
2    18112
1    11888
Name: SEX, dtype: int64

Example with the column age:

>>> df['AGE'].value_counts()
29    1605
27    1477
28    1409
30    1395
26    1256
31    1217
25    1186
34    1162
32    1158
33    1146
24    1127
35    1113
36    1108
37    1041
39     954
38     944
23     931
40     870
41     824
42     794
44     700
43     670
45     617
46     570
22     560
47     501
48     466
49     452
50     411
51     340
53     325
52     304
54     247
55     209
56     178
58     122
57     122
59      83
60      67
21      67
61      56
62      44
63      31
64      31
66      25
65      24
67      16
69      15
70      10
68       5
73       4
71       3
72       3
75       3
74       1
79       1
Name: AGE, dtype: int64

It is also possible to define bins:

>>> df['AGE'].value_counts(bins=10)
26.800    8261
32.600    6514
20.942    5127
38.400    4812
44.200    3017
50.000    1425
55.800     628
61.600     171
67.400      40
73.200       5

and to personalize the bin range:

Name: AGE, dtype: int64
>>> df['AGE'].value_counts(bins=[0,10,20,25,30,35,40,60,80])
40    8002
25    7142
30    5796
35    4917
20    3871
60     272
10       0
0        0
Name: AGE, dtype: int64

Column with missing data

If we add some missing data:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.read_excel('default of credit card clients.xls', header=1)
>>> df.iloc[[2,7,4,99,10,130],:] = np.nan

and then apply value_counts(), the missing data are automatically discards:

>>> df['SEX'].value_counts()
2.0    18108
1.0    11886
Name: SEX, dtype: int64

To get missing data count just add dropna=False:

>>> df['SEX'].value_counts(dropna=False)
 2.0    18108
 1.0    11886
NaN         6
Name: SEX, dtype: int64

References