How to calculate the fraction in percent of a categorical variable using Pandas GroupBy ?

Published: February 25, 2023

Example of how to calculate the fraction in percent of a categorical variable using groupby on a pandas dataframe

Create a fake dataframe

Let's first create a DataFrame with a column called 'class' with categorical variables:

````import pandas as pd`

`data = {'zone':['A','A','A','A','B','B'],`
`        'class':[1,1,1,2,1,3]}`

`df = pd.DataFrame(data)`
```

output

````  zone  class`
`0    A      1`
`1    A      1`
`2    A      1`
`3    A      2`
`4    B      1`
`5    B      3`
```

The goal is to calculate the fraction in percent for each categorical variables inside each zone.

Calculate the fraction

First we can apply a groupby with value_counts:

````df.groupby(['zone'])['class'].value_counts(normalize=True)`
```

output

````zone  class`
`A     1        0.75`
`      2        0.25`
`B     1        0.50`
`      3        0.50`
`Name: class, dtype: float64`
```

We can then use unstack() and fillna(0) (put 0 if class does not appear in 'zone')

````df.groupby(["zone"])["class"].value_counts(normalize=True).unstack("class").fillna(0)`
```

output

````class     1     2    3`
`zone                  `
`A      0.75  0.25  0.0`
`B      0.50  0.00  0.5`
```

to get the fraction in percent just multiply per 100 using mul(100):

````df.groupby(["zone"])["class"].value_counts(normalize=True).mul(100).unstack("class").fillna(0)`
```

output

````class     1     2     3`
`zone                   `
`A      75.0  25.0   0.0`
`B      50.0   0.0  50.0`
```