Examples of how to convert quantitative data to categorical data with pandas using cut:
Table of contents
Create synthetic data
Let's first create some fake continuous data:
import random
l = [random.randint(0,100) for i in range(10)]
returns for example
[66, 44, 62, 99, 82, 13, 7, 58, 60, 38]
Save data in a pandas dataframe
import pandas as pd
import numpy as np
data = np.array(l)
df = pd.DataFrame(data,columns=['x'])
print(df)
returns
x
0 66
1 44
2 62
3 99
4 82
5 13
6 7
7 58
8 60
9 38
Aggregate
To convert numeric data to categorical data, a solution with pandas is to use cut | pandas.pydata.org
pd.cut(df['x'], [0,25,50,75,100], labels=['A', 'B', 'C', 'D'])
returns
0 C
1 B
2 C
3 D
4 D
5 A
6 A
7 C
8 C
9 B
Name: x, dtype: category
Categories (4, object): ['A' < 'B' < 'C' < 'D']
Create a new column:
df['Cx'] = pd.cut(df['x'], [0,25,50,75,100], labels=['A', 'B', 'C', 'D'])
print(df)
returns
x Cx
0 66 C
1 44 B
2 62 C
3 99 D
4 82 D
5 13 A
6 7 A
7 58 C
8 60 C
9 38 B
References
Links | Site |
---|---|
Group by of a float column using pandas | stackoverflow |
pandas.cut | pandas.pydata.org |