How to normalize each row of an Pandas DataFrame into percentages ?

Published: April 12, 2023

Tags: Python; Pandas; Dataframe;

DMCA.com Protection Status

Normalizing each row of a Pandas DataFrame into percentages is something an interesting step to take when analyzing data. By normalizing the data, we can easily compare values across different rows and better understand the relative importance of each value in the dataset.

Create a synthetic data

import pandas as pd
import numpy as np

np.random.seed(42)

data = np.random.random_sample((6, 2)) * 10

df = pd.DataFrame(data,columns=['A','B'])

Output

              A         B
    0  3.745401  9.507143
    1  7.319939  5.986585
    2  1.560186  1.559945
    3  0.580836  8.661761
    4  6.011150  7.080726
    5  0.205845  9.699099

So the goal here is to normalize each row of the DataFrame into percentages.

Step 1: Individually sum the rows

To do this, we must first divide each value in a row by the sum of all the values in that row. This will give us a number between 0 and 1, representing each value's percentage of the total for that row.

df.sum(axis=1)

Ouput

0    13.252544
1    13.306524
2     3.120132
3     9.242598
4    13.091876
5     9.904943
dtype: float64

Step 2: Divide each row per the sum

df[['A','B']].div(df.sum(axis=1), axis=0)

Ouput

          A         B
0  0.282618  0.717382
1  0.550102  0.449898
2  0.500039  0.499961
3  0.062843  0.937157
4  0.459151  0.540849
5  0.020782  0.979218

Step 3: Multiply per 100

We then multiply by 100 to get a percentage value.

df[['A','B']].div(df.sum(axis=1), axis=0) * 100

This will create a new DataFrame, with each row representing the percentage of each value in the original dataframe. Now we can easily compare values across different rows and better understand the relative importance of each value in the dataset.

           A          B
0  28.261752  71.738248
1  55.010153  44.989847
2  50.003865  49.996135
3   6.284339  93.715661
4  45.915117  54.084883
5   2.078204  97.921796

Round normalized dataframe

(df[['A','B']].div(df.sum(axis=1), axis=0) * 100).round(2)

Ouput

       A      B
0  28.26  71.74
1  55.01  44.99
2  50.00  50.00
3   6.28  93.72
4  45.92  54.08
5   2.08  97.92

References

Links Site
pandas.DataFrame.sum pandas.pydata.org
pandas.DataFrame.div pandas.pydata.org