How to normalize each row of an array into percentages with numpy ?

Published: March 17, 2023

Updated: March 19, 2023

Tags: Python; Numpy; Normalization;

DMCA.com Protection Status

Normalizing each row of an array into percentages with numpy, also known as row normalization, can be done by dividing each element of the array by the sum of all elements in that particular row:

Let's consider the following example

import numpy as np

np.random.seed(42)

data = np.random.random_sample((6, 2)) * 10

returns

    array([[3.74540119, 9.50714306],
           [7.31993942, 5.98658484],
           [1.5601864 , 1.5599452 ],
           [0.58083612, 8.66176146],
           [6.01115012, 7.08072578],
           [0.20584494, 9.69909852]])

Our aim is to standardize each row and work out the percentage.

How to normalize each row of an array into percentages with numpy ?
How to normalize each row of an array into percentages with numpy ?

Using python broadcasting method

row_sums = data.sum(axis=1)
data_new = data / row_sums[:, np.newaxis

returns

array([[0.28261752, 0.71738248],
       [0.55010153, 0.44989847],
       [0.50003865, 0.49996135],
       [0.06284339, 0.93715661],
       [0.45915117, 0.54084883],
       [0.02078204, 0.97921796]])

To get percentage just multiple each row per 100:

row_sums = data.sum(axis=1)
data_new = data / row_sums[:, np.newaxis] * 100

returns

    array([[28.26175199, 71.73824801],
           [55.01015348, 44.98984652],
           [50.00386524, 49.99613476],
           [ 6.28433854, 93.71566146],
           [45.91511687, 54.08488313],
           [ 2.07820412, 97.92179588]])

How to normalize each row of an array into percentages with numpy ?
How to normalize each row of an array into percentages with numpy ?

Using sklearn with normalize

Another solution

from sklearn.preprocessing import normalize

data_new = normalize(data, axis=1, norm='l1') * 100

returns

    array([[28.26175199, 71.73824801],
           [55.01015348, 44.98984652],
           [50.00386524, 49.99613476],
           [ 6.28433854, 93.71566146],
           [45.91511687, 54.08488313],
           [ 2.07820412, 97.92179588]])

Using pandas

Another solution

import pandas as pd

df = pd.DataFrame(data)

Calculate sum for each row

df.sum(axis=1)

gives

    0    13.252544
    1    13.306524
    2     3.120132
    3     9.242598
    4    13.091876
    5     9.904943
    dtype: float64

Create dataframe with rows normalized to 100

df[[0,1]].div(df.sum(axis=1), axis=0) * 100

returns the following dataframe

               0          1
    0  28.582162  71.417838
    1  55.399630  44.600370
    2  50.397602  49.602398
    3   6.377737  93.622263
    4  46.306478  53.693522
    5   2.110498  97.889502

Convert dataframe to numpy

df[[0,1]].div(df.sum(axis=1), axis=0).to_numpy() * 100

output

    array([[28.26175199, 71.73824801],
           [55.01015348, 44.98984652],
           [50.00386524, 49.99613476],
           [ 6.28433854, 93.71566146],
           [45.91511687, 54.08488313],
           [ 2.07820412, 97.92179588]])

Additional notes

Create heatmap using matplotlib

    import matplotlib.pyplot as plt
    import seaborn as sns; sns.set()

    fig = plt.figure(num=None, figsize=(12, 8), dpi=80, facecolor='w', edgecolor='k')

    plt.clf()

    ax = fig.add_subplot(111)

    ax.set_aspect(0.5)

    annot_m = np.empty(data_new.shape,dtype='<U16')
    for i in range(data_new.shape[0]):
        for j in range(data_new.shape[1]):
            annot_m[i,j] = 'Score: {:.2f}'.format(data_new[i,j])

    res = sns.heatmap(data_new, annot=annot_m, fmt="", cmap="YlGnBu", vmin=0.0, vmax=100.0, cbar=False)

    plt.title('How to normalize rows of an array with numpy ?',fontsize=12)

    plt.xticks([i+0.5 for i in range(data_new.shape[1])], ['C1', 'C2'])
    plt.xticks(rotation=0)

    plt.yticks([i+0.5 for i in range(data_new.shape[0])], ['A', 'B', 'C','D', 'E', 'F'])
    plt.yticks(rotation=0)

    plt.yticks(rotation=0)

    plt.savefig("how_to_normalize_rows_of_an_array_with_numpy_02.png", bbox_inches='tight', dpi=200)

    plt.show()

Round array values

np.around(data_new,2)

output

    array([[28.26, 71.74],
           [55.01, 44.99],
           [50.  , 50.  ],
           [ 6.28, 93.72],
           [45.92, 54.08],
           [ 2.08, 97.92]])

References

Links Site
numpy.sum numpy.org
broadcasting numpy.org
sklearn.preprocessing.normaliz scikit-learn.org
pandas.DataFrame.divide pandas.pydata.org
around() numpy.org
Image

of