Example of how to calculate the fraction in percent of a categorical variable using groupby on a pandas dataframe
Table of contents
Create a fake dataframe
Let's first create a DataFrame with a column called 'class' with categorical variables:
import pandas as pddata = {'zone':['A','A','A','A','B','B'],'class':[1,1,1,2,1,3]}df = pd.DataFrame(data)
output
zone class0 A 11 A 12 A 13 A 24 B 15 B 3
The goal is to calculate the fraction in percent for each categorical variables inside each zone.
Calculate the fraction
First we can apply a groupby with value_counts:
df.groupby(['zone'])['class'].value_counts(normalize=True)
output
zone classA 1 0.752 0.25B 1 0.503 0.50Name: class, dtype: float64
We can then use unstack() and fillna(0) (put 0 if class does not appear in 'zone')
df.groupby(["zone"])["class"].value_counts(normalize=True).unstack("class").fillna(0)
output
class 1 2 3zoneA 0.75 0.25 0.0B 0.50 0.00 0.5
to get the fraction in percent just multiply per 100 using mul(100):
df.groupby(["zone"])["class"].value_counts(normalize=True).mul(100).unstack("class").fillna(0)
output
class 1 2 3zoneA 75.0 25.0 0.0B 50.0 0.0 50.0
References
| Links | Site |
|---|---|
| groupby | pandas.pydata.org |
| value_counts() | pandas.pydata.org |
| unstack() | pandas.pydata.org |
