How to randomly select elements of an array with numpy in python ?

Published: November 26, 2019

DMCA.com Protection Status

Examples of how to randomly select elements of an array with numpy in python:

Randomly select elements of a 1D array using choice()

Lets create a simple 1D array with 10 elements:

>>> import numpy as np
>>> data = np.arange(10)
>>> data
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

\begin{equation}
A = \left( \begin{array}{ccc}
0 & 1& 2& 3& 4& 5& 6& 7& 8& 9
\end{array}\right)
\end{equation}

To select randomly n elements, a solution is to use choice(). Example of how to select randomly 4 elements from the array data:

>>> np.random.choice(data,4)
array([9, 6, 2, 9])

returns for example

\begin{equation}
A = \left( \begin{array}{ccc}
9 & 6 & 2 & 9
\end{array}\right)
\end{equation}

Another example, with n = 5

>>> for i in range(10):
...     np.random.choice(data,5)
... 
array([3, 4, 0, 8, 4])
array([3, 0, 0, 3, 6])
array([5, 1, 2, 0, 9])
array([5, 8, 6, 0, 1])
array([4, 0, 9, 4, 2])
array([9, 6, 3, 9, 9])
array([9, 5, 1, 2, 7])
array([9, 7, 6, 4, 5])
array([6, 8, 5, 5, 9])
array([8, 9, 5, 5, 6])

Random sampling without replacement

To do random sampling without remplacement, just add the option "replace = False":

>>> for i in range(10):
...     np.random.choice(data,5,replace=False)
... 
array([9, 7, 4, 0, 6])
array([0, 9, 2, 4, 6])
array([2, 6, 5, 0, 9])
array([0, 3, 5, 7, 9])
array([0, 5, 9, 6, 7])
array([5, 0, 9, 6, 3])
array([7, 2, 6, 9, 1])
array([7, 6, 5, 8, 4])
array([6, 8, 5, 7, 4])
array([0, 1, 2, 3, 5])

One can see than an element cannot be selected more than one time.

Weighted random sampling

To do weighted random sampling, it is possible to define for each element the probability to be selected:

>>> p = [0.05, 0.05, 0.1, 0.125, 0.175, 0.175, 0.125, 0.1, 0.05, 0.05]

Note: the sum must be equal to 1:

>>> sum(p)
1.0

Here for example the elements 0,1,8 or 9 will have a lower probability to be selected:

>>> for idx,p in enumerate(p):
...     print(p,data[idx])
... 
0.05 0
0.05 1
0.1 2
0.125 3
0.175 4
0.175 5
0.125 6
0.1 7
0.05 8
0.05 9

Lets check:

>>> for i in range(10):
...     np.random.choice(data,5,replace=False,p=p)
... 
array([7, 5, 0, 2, 3])
array([9, 2, 3, 5, 7])
array([2, 5, 3, 7, 4])
array([7, 2, 9, 4, 5])
array([1, 4, 6, 3, 2])
array([4, 5, 3, 7, 1])
array([2, 7, 4, 6, 3])
array([6, 5, 0, 1, 8])
array([4, 0, 5, 9, 6])
array([8, 9, 3, 4, 6])

Random sampling for a 2D array

Lets consider the following 2D array:

>>> data =  np.arange(80).reshape((8, 10))
>>> data
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79]])

\begin{equation}
data = \left( \begin{array}{ccc}
0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \\
10 & 11 & 12 & 13 & 14 & 15 & 16 & 17 & 18 & 19 \\
20 & 21 & 22 & 23 & 24 & 25 & 26 & 27 & 28 & 29 \\
30 & 31 & 32 & 33 & 34 & 35 & 36 & 37 & 38 & 39 \\
40 & 41 & 42 & 43 & 44 & 45 & 46 & 47 & 48 & 49 \\
50 & 51 & 52 & 53 & 54 & 55 & 56 & 57 & 58 & 59 \\
60 & 61 & 62 & 63 & 64 & 65 & 66 & 67 & 68 & 69 \\
70 & 71 & 72 & 73 & 74 & 75 & 76 & 77 & 78 & 79
\end{array}\right)
\end{equation}

The function choice() takes only 1D array as an input, however a solution is to use ravel() to transform the 2D array to a 1D array, example:

>>> np.random.choice( data.ravel(),10,replace=False)
array([64, 35, 53, 14, 48, 29, 74, 21, 62, 41])

References