Normalizing each row of a Pandas DataFrame into percentages is something an interesting step to take when analyzing data. By normalizing the data, we can easily compare values across different rows and better understand the relative importance of each value in the dataset.

## Create a synthetic data

`import pandas as pd`

`import numpy as np`

`np.random.seed(42)`

`data = np.random.random_sample((6, 2)) * 10`

`df = pd.DataFrame(data,columns=['A','B'])`

Output

`A B`

`0 3.745401 9.507143`

`1 7.319939 5.986585`

`2 1.560186 1.559945`

`3 0.580836 8.661761`

`4 6.011150 7.080726`

`5 0.205845 9.699099`

So the goal here is to normalize each row of the DataFrame into percentages.

## Step 1: Individually sum the rows

To do this, we must first divide each value in a row by the sum of all the values in that row. This will give us a number between 0 and 1, representing each value's percentage of the total for that row.

`df.sum(axis=1)`

Ouput

`0 13.252544`

`1 13.306524`

`2 3.120132`

`3 9.242598`

`4 13.091876`

`5 9.904943`

`dtype: float64`

## Step 2: Divide each row per the sum

`df[['A','B']].div(df.sum(axis=1), axis=0)`

Ouput

`A B`

`0 0.282618 0.717382`

`1 0.550102 0.449898`

`2 0.500039 0.499961`

`3 0.062843 0.937157`

`4 0.459151 0.540849`

`5 0.020782 0.979218`

## Step 3: Multiply per 100

We then multiply by 100 to get a percentage value.

`df[['A','B']].div(df.sum(axis=1), axis=0) * 100`

This will create a new DataFrame, with each row representing the percentage of each value in the original dataframe. Now we can easily compare values across different rows and better understand the relative importance of each value in the dataset.

`A B`

`0 28.261752 71.738248`

`1 55.010153 44.989847`

`2 50.003865 49.996135`

`3 6.284339 93.715661`

`4 45.915117 54.084883`

`5 2.078204 97.921796`

## Round normalized dataframe

`(df[['A','B']].div(df.sum(axis=1), axis=0) * 100).round(2)`

Ouput

`A B`

`0 28.26 71.74`

`1 55.01 44.99`

`2 50.00 50.00`

`3 6.28 93.72`

`4 45.92 54.08`

`5 2.08 97.92`

## References

Links | Site |
---|---|

pandas.DataFrame.sum | pandas.pydata.org |

pandas.DataFrame.div | pandas.pydata.org |