Example of how to calculate the fraction in percent of a categorical variable using groupby on a pandas dataframe
Table of contents
Create a fake dataframe
Let's first create a DataFrame with a column called 'class' with categorical variables:
import pandas as pd
data = {'zone':['A','A','A','A','B','B'],
'class':[1,1,1,2,1,3]}
df = pd.DataFrame(data)
output
zone class
0 A 1
1 A 1
2 A 1
3 A 2
4 B 1
5 B 3
The goal is to calculate the fraction in percent for each categorical variables inside each zone.
Calculate the fraction
First we can apply a groupby with value_counts:
df.groupby(['zone'])['class'].value_counts(normalize=True)
output
zone class
A 1 0.75
2 0.25
B 1 0.50
3 0.50
Name: class, dtype: float64
We can then use unstack() and fillna(0) (put 0 if class does not appear in 'zone')
df.groupby(["zone"])["class"].value_counts(normalize=True).unstack("class").fillna(0)
output
class 1 2 3
zone
A 0.75 0.25 0.0
B 0.50 0.00 0.5
to get the fraction in percent just multiply per 100 using mul(100):
df.groupby(["zone"])["class"].value_counts(normalize=True).mul(100).unstack("class").fillna(0)
output
class 1 2 3
zone
A 75.0 25.0 0.0
B 50.0 0.0 50.0
References
Links | Site |
---|---|
groupby | pandas.pydata.org |
value_counts() | pandas.pydata.org |
unstack() | pandas.pydata.org |