Examples of how to count the occurrence of an element in a pandas data frame column:
Table of contents
Using value_counts()
Lets take for example the file 'default of credit card clients Data Set" that can be downloaded here
>>> import pandas as pd
>>> df = pd.read_excel('default of credit card clients.xls', header=1)
To get the count of default payment a solution is to use value_counts():
>>> df['default payment next month'].value_counts()
0 23364
1 6636
Name: default payment next month, dtype: int64
Example with the column sex:
>>> df['SEX'].value_counts()
2 18112
1 11888
Name: SEX, dtype: int64
Example with the column age:
>>> df['AGE'].value_counts()
29 1605
27 1477
28 1409
30 1395
26 1256
31 1217
25 1186
34 1162
32 1158
33 1146
24 1127
35 1113
36 1108
37 1041
39 954
38 944
23 931
40 870
41 824
42 794
44 700
43 670
45 617
46 570
22 560
47 501
48 466
49 452
50 411
51 340
53 325
52 304
54 247
55 209
56 178
58 122
57 122
59 83
60 67
21 67
61 56
62 44
63 31
64 31
66 25
65 24
67 16
69 15
70 10
68 5
73 4
71 3
72 3
75 3
74 1
79 1
Name: AGE, dtype: int64
It is also possible to define bins:
>>> df['AGE'].value_counts(bins=10)
26.800 8261
32.600 6514
20.942 5127
38.400 4812
44.200 3017
50.000 1425
55.800 628
61.600 171
67.400 40
73.200 5
and to personalize the bin range:
Name: AGE, dtype: int64
>>> df['AGE'].value_counts(bins=[0,10,20,25,30,35,40,60,80])
40 8002
25 7142
30 5796
35 4917
20 3871
60 272
10 0
0 0
Name: AGE, dtype: int64
Column with missing data
If we add some missing data:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.read_excel('default of credit card clients.xls', header=1)
>>> df.iloc[[2,7,4,99,10,130],:] = np.nan
and then apply value_counts(), the missing data are automatically discards:
>>> df['SEX'].value_counts()
2.0 18108
1.0 11886
Name: SEX, dtype: int64
To get missing data count just add dropna=False:
>>> df['SEX'].value_counts(dropna=False)
2.0 18108
1.0 11886
NaN 6
Name: SEX, dtype: int64
References
Links | Site |
---|---|
value_counts() | pandas doc |
count the frequency that a value occurs in a dataframe column | stackoverflow |
pandas.DataFrame.count | pandas doc |