When working with a binary matrix in Python, clustering data in a 2D format can be achieved using scipy.ndimage. Examples
Create synthetic data
Before we begin, we need to import the required libraries and generate artificial data:
import scipy.ndimage as ndimageimport matplotlib.pyplot as pltimport numpy as npfrom scipy.ndimage import label, generate_binary_structuredata = np.array( [[0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1],[1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1],[1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1],[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],[1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0],[1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1],[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],[1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1],[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])plt.imshow(data, interpolation='nearest')plt.title('How to cluster data \n from a 2D binary matrix in python ?')plt.savefig('clustering_data_01.png',facecolor='white')plt.show()

Clustering data using ndimage
Next, we will use the ndimage module from scipy to cluster our data.
current_output, num_ids = ndimage.label(data)plt.imshow(current_output, interpolation='nearest')plt.title('How to cluster data \n from a 2D binary matrix in python ?')plt.savefig('clustering_data_02.png',facecolor='white')plt.show()

By default ndimage use the following structure to define pixel connectivity
s = generate_binary_structure(2,1)print(s)
Output
[[False True False][ True True True][False True False]]
But it is possible to define another structure.
s = generate_binary_structure(2,2)print(s)
Output
[[ True True True][ True True True][ True True True]]current_output, num_ids = ndimage.label(data, structure=s)plt.imshow(current_output, interpolation='nearest')plt.title('How to cluster data \n from a 2D binary matrix in python ?')plt.savefig('clustering_data_03.png',facecolor='white')plt.show()

Clustering data using ndimage and binary_dilation()
To cluster a binary array, you can enlarge the data through dilation to form larger groups prior to the clustering process.
data_dilated = ndimage.binary_dilation(data)plt.imshow(data_dilated, interpolation='nearest')plt.title('How to cluster data \n from a 2D binary matrix in python ?')plt.savefig('clustering_data_04.png',facecolor='white')plt.show()

We can use ndimage.label() just like we did before
current_output, num_ids = ndimage.label(data_dilated)plt.imshow(current_output, interpolation='nearest')plt.title('How to cluster data \n from a 2D binary matrix in python ?')plt.savefig('clustering_data_05.png',facecolor='white')plt.show()

and mask our data:
current_output[ data == 0 ] = 0plt.imshow(current_output, interpolation='nearest')plt.title('How to cluster data \n from a 2D binary matrix in python ?')plt.savefig('clustering_data_06.png',facecolor='white')plt.show()

References
| Links | Site |
|---|---|
| ndimage | docs.scipy.org |
| generate_binary_structure | docs.scipy.org |
| binary_dilation | docs.scipy.org |
| binary_erosion | docs.scipy.org |
| plot_dbscan | scikit-learn.org |
