Normalizing each row of an array into percentages with numpy, also known as row normalization, can be done by dividing each element of the array by the sum of all elements in that particular row:
Let's consider the following example
import numpy as npnp.random.seed(42)data = np.random.random_sample((6, 2)) * 10
returns
array([[3.74540119, 9.50714306],[7.31993942, 5.98658484],[1.5601864 , 1.5599452 ],[0.58083612, 8.66176146],[6.01115012, 7.08072578],[0.20584494, 9.69909852]])
Our aim is to standardize each row and work out the percentage.

Using python broadcasting method
row_sums = data.sum(axis=1)data_new = data / row_sums[:, np.newaxis
returns
array([[0.28261752, 0.71738248],[0.55010153, 0.44989847],[0.50003865, 0.49996135],[0.06284339, 0.93715661],[0.45915117, 0.54084883],[0.02078204, 0.97921796]])
To get percentage just multiple each row per 100:
row_sums = data.sum(axis=1)data_new = data / row_sums[:, np.newaxis] * 100
returns
array([[28.26175199, 71.73824801],[55.01015348, 44.98984652],[50.00386524, 49.99613476],[ 6.28433854, 93.71566146],[45.91511687, 54.08488313],[ 2.07820412, 97.92179588]])

Using sklearn with normalize
Another solution
from sklearn.preprocessing import normalizedata_new = normalize(data, axis=1, norm='l1') * 100
returns
array([[28.26175199, 71.73824801],[55.01015348, 44.98984652],[50.00386524, 49.99613476],[ 6.28433854, 93.71566146],[45.91511687, 54.08488313],[ 2.07820412, 97.92179588]])
Using pandas
Another solution
import pandas as pddf = pd.DataFrame(data)
Calculate sum for each row
df.sum(axis=1)
gives
0 13.2525441 13.3065242 3.1201323 9.2425984 13.0918765 9.904943dtype: float64
Create dataframe with rows normalized to 100
df[[0,1]].div(df.sum(axis=1), axis=0) * 100
returns the following dataframe
0 10 28.582162 71.4178381 55.399630 44.6003702 50.397602 49.6023983 6.377737 93.6222634 46.306478 53.6935225 2.110498 97.889502
Convert dataframe to numpy
df[[0,1]].div(df.sum(axis=1), axis=0).to_numpy() * 100
output
array([[28.26175199, 71.73824801],[55.01015348, 44.98984652],[50.00386524, 49.99613476],[ 6.28433854, 93.71566146],[45.91511687, 54.08488313],[ 2.07820412, 97.92179588]])
Additional notes
Create heatmap using matplotlib
import matplotlib.pyplot as pltimport seaborn as sns; sns.set()fig = plt.figure(num=None, figsize=(12, 8), dpi=80, facecolor='w', edgecolor='k')plt.clf()ax = fig.add_subplot(111)ax.set_aspect(0.5)annot_m = np.empty(data_new.shape,dtype='<U16')for i in range(data_new.shape[0]):for j in range(data_new.shape[1]):annot_m[i,j] = 'Score: {:.2f}'.format(data_new[i,j])res = sns.heatmap(data_new, annot=annot_m, fmt="", cmap="YlGnBu", vmin=0.0, vmax=100.0, cbar=False)plt.title('How to normalize rows of an array with numpy ?',fontsize=12)plt.xticks([i+0.5 for i in range(data_new.shape[1])], ['C1', 'C2'])plt.xticks(rotation=0)plt.yticks([i+0.5 for i in range(data_new.shape[0])], ['A', 'B', 'C','D', 'E', 'F'])plt.yticks(rotation=0)plt.yticks(rotation=0)plt.savefig("how_to_normalize_rows_of_an_array_with_numpy_02.png", bbox_inches='tight', dpi=200)plt.show()
Round array values
np.around(data_new,2)
output
array([[28.26, 71.74],[55.01, 44.99],[50. , 50. ],[ 6.28, 93.72],[45.92, 54.08],[ 2.08, 97.92]])
References
| Links | Site |
|---|---|
| numpy.sum | numpy.org |
| broadcasting | numpy.org |
| sklearn.preprocessing.normaliz | scikit-learn.org |
| pandas.DataFrame.divide | pandas.pydata.org |
| around() | numpy.org |
