Normalizing each row of an array into percentages with numpy, also known as row normalization, can be done by dividing each element of the array by the sum of all elements in that particular row:
Let's consider the following example
import numpy as np
np.random.seed(42)
data = np.random.random_sample((6, 2)) * 10
returns
array([[3.74540119, 9.50714306],
[7.31993942, 5.98658484],
[1.5601864 , 1.5599452 ],
[0.58083612, 8.66176146],
[6.01115012, 7.08072578],
[0.20584494, 9.69909852]])
Our aim is to standardize each row and work out the percentage.
Using python broadcasting method
row_sums = data.sum(axis=1)
data_new = data / row_sums[:, np.newaxis
returns
array([[0.28261752, 0.71738248],
[0.55010153, 0.44989847],
[0.50003865, 0.49996135],
[0.06284339, 0.93715661],
[0.45915117, 0.54084883],
[0.02078204, 0.97921796]])
To get percentage just multiple each row per 100:
row_sums = data.sum(axis=1)
data_new = data / row_sums[:, np.newaxis] * 100
returns
array([[28.26175199, 71.73824801],
[55.01015348, 44.98984652],
[50.00386524, 49.99613476],
[ 6.28433854, 93.71566146],
[45.91511687, 54.08488313],
[ 2.07820412, 97.92179588]])
Using sklearn with normalize
Another solution
from sklearn.preprocessing import normalize
data_new = normalize(data, axis=1, norm='l1') * 100
returns
array([[28.26175199, 71.73824801],
[55.01015348, 44.98984652],
[50.00386524, 49.99613476],
[ 6.28433854, 93.71566146],
[45.91511687, 54.08488313],
[ 2.07820412, 97.92179588]])
Using pandas
Another solution
import pandas as pd
df = pd.DataFrame(data)
Calculate sum for each row
df.sum(axis=1)
gives
0 13.252544
1 13.306524
2 3.120132
3 9.242598
4 13.091876
5 9.904943
dtype: float64
Create dataframe with rows normalized to 100
df[[0,1]].div(df.sum(axis=1), axis=0) * 100
returns the following dataframe
0 1
0 28.582162 71.417838
1 55.399630 44.600370
2 50.397602 49.602398
3 6.377737 93.622263
4 46.306478 53.693522
5 2.110498 97.889502
Convert dataframe to numpy
df[[0,1]].div(df.sum(axis=1), axis=0).to_numpy() * 100
output
array([[28.26175199, 71.73824801],
[55.01015348, 44.98984652],
[50.00386524, 49.99613476],
[ 6.28433854, 93.71566146],
[45.91511687, 54.08488313],
[ 2.07820412, 97.92179588]])
Additional notes
Create heatmap using matplotlib
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
fig = plt.figure(num=None, figsize=(12, 8), dpi=80, facecolor='w', edgecolor='k')
plt.clf()
ax = fig.add_subplot(111)
ax.set_aspect(0.5)
annot_m = np.empty(data_new.shape,dtype='<U16')
for i in range(data_new.shape[0]):
for j in range(data_new.shape[1]):
annot_m[i,j] = 'Score: {:.2f}'.format(data_new[i,j])
res = sns.heatmap(data_new, annot=annot_m, fmt="", cmap="YlGnBu", vmin=0.0, vmax=100.0, cbar=False)
plt.title('How to normalize rows of an array with numpy ?',fontsize=12)
plt.xticks([i+0.5 for i in range(data_new.shape[1])], ['C1', 'C2'])
plt.xticks(rotation=0)
plt.yticks([i+0.5 for i in range(data_new.shape[0])], ['A', 'B', 'C','D', 'E', 'F'])
plt.yticks(rotation=0)
plt.yticks(rotation=0)
plt.savefig("how_to_normalize_rows_of_an_array_with_numpy_02.png", bbox_inches='tight', dpi=200)
plt.show()
Round array values
np.around(data_new,2)
output
array([[28.26, 71.74],
[55.01, 44.99],
[50. , 50. ],
[ 6.28, 93.72],
[45.92, 54.08],
[ 2.08, 97.92]])
References
Links | Site |
---|---|
numpy.sum | numpy.org |
broadcasting | numpy.org |
sklearn.preprocessing.normaliz | scikit-learn.org |
pandas.DataFrame.divide | pandas.pydata.org |
around() | numpy.org |