How to calculate the fraction in percent of a categorical variable using Pandas GroupBy ?


Example of how to calculate the fraction in percent of a categorical variable using groupby on a pandas dataframe

Create a fake dataframe

Let's first create a DataFrame with a column called 'class' with categorical variables:

import pandas as pd

data = {'zone':['A','A','A','A','B','B'],
        'class':[1,1,1,2,1,3]}

df = pd.DataFrame(data)

output

  zone  class
0    A      1
1    A      1
2    A      1
3    A      2
4    B      1
5    B      3

The goal is to calculate the fraction in percent for each categorical variables inside each zone.

Calculate the fraction

First we can apply a groupby with value_counts:

df.groupby(['zone'])['class'].value_counts(normalize=True)

output

zone  class
A     1        0.75
      2        0.25
B     1        0.50
      3        0.50
Name: class, dtype: float64

We can then use unstack() and fillna(0) (put 0 if class does not appear in 'zone')

df.groupby(["zone"])["class"].value_counts(normalize=True).unstack("class").fillna(0)

output

class     1     2    3
zone                  
A      0.75  0.25  0.0
B      0.50  0.00  0.5

to get the fraction in percent just multiply per 100 using mul(100):

df.groupby(["zone"])["class"].value_counts(normalize=True).mul(100).unstack("class").fillna(0)

output

class     1     2     3
zone                   
A      75.0  25.0   0.0
B      50.0   0.0  50.0

References

Links Site
groupby pandas.pydata.org
value_counts() pandas.pydata.org
unstack() pandas.pydata.org