Normalizing each row of a Pandas DataFrame into percentages is something an interesting step to take when analyzing data. By normalizing the data, we can easily compare values across different rows and better understand the relative importance of each value in the dataset.
Create a synthetic data
import pandas as pdimport numpy as npnp.random.seed(42)data = np.random.random_sample((6, 2)) * 10df = pd.DataFrame(data,columns=['A','B'])
Output
A B0 3.745401 9.5071431 7.319939 5.9865852 1.560186 1.5599453 0.580836 8.6617614 6.011150 7.0807265 0.205845 9.699099
So the goal here is to normalize each row of the DataFrame into percentages.
Step 1: Individually sum the rows
To do this, we must first divide each value in a row by the sum of all the values in that row. This will give us a number between 0 and 1, representing each value's percentage of the total for that row.
df.sum(axis=1)
Ouput
0 13.2525441 13.3065242 3.1201323 9.2425984 13.0918765 9.904943dtype: float64
Step 2: Divide each row per the sum
df[['A','B']].div(df.sum(axis=1), axis=0)
Ouput
A B0 0.282618 0.7173821 0.550102 0.4498982 0.500039 0.4999613 0.062843 0.9371574 0.459151 0.5408495 0.020782 0.979218
Step 3: Multiply per 100
We then multiply by 100 to get a percentage value.
df[['A','B']].div(df.sum(axis=1), axis=0) * 100
This will create a new DataFrame, with each row representing the percentage of each value in the original dataframe. Now we can easily compare values across different rows and better understand the relative importance of each value in the dataset.
A B0 28.261752 71.7382481 55.010153 44.9898472 50.003865 49.9961353 6.284339 93.7156614 45.915117 54.0848835 2.078204 97.921796
Round normalized dataframe
(df[['A','B']].div(df.sum(axis=1), axis=0) * 100).round(2)
Ouput
A B0 28.26 71.741 55.01 44.992 50.00 50.003 6.28 93.724 45.92 54.085 2.08 97.92
References
| Links | Site |
|---|---|
| pandas.DataFrame.sum | pandas.pydata.org |
| pandas.DataFrame.div | pandas.pydata.org |
