How to cluster data from a 2D binary matrix in python ?

Published: May 01, 2023

Tags: Python; Machine Learning; Clustering;

DMCA.com Protection Status

When working with a binary matrix in Python, clustering data in a 2D format can be achieved using scipy.ndimage. Examples

Create synthetic data

Before we begin, we need to import the required libraries and generate artificial data:

import scipy.ndimage as ndimage
import matplotlib.pyplot as plt
import numpy as np

from scipy.ndimage import label, generate_binary_structure


data = np.array( [[0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1],
                  [1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1],
                  [1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1],
                  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
                  [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
                  [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
                  [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
                  [1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0],
                  [1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1],
                  [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
                  [1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1],
                  [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

plt.imshow(data, interpolation='nearest')
plt.title('How to cluster data \n from a 2D binary matrix in python ?')
plt.savefig('clustering_data_01.png',facecolor='white')
plt.show()

How to cluster data from a 2D binary matrix in python ?
How to cluster data from a 2D binary matrix in python ?

Clustering data using ndimage

Next, we will use the ndimage module from scipy to cluster our data.

current_output, num_ids = ndimage.label(data)

plt.imshow(current_output, interpolation='nearest')
plt.title('How to cluster data \n from a 2D binary matrix in python ?')
plt.savefig('clustering_data_02.png',facecolor='white')
plt.show()

How to cluster data from a 2D binary matrix in python ?
How to cluster data from a 2D binary matrix in python ?

By default ndimage use the following structure to define pixel connectivity

s = generate_binary_structure(2,1)

print(s)

Output

[[False  True False]
 [ True  True  True]
 [False  True False]]

But it is possible to define another structure.

s = generate_binary_structure(2,2)

print(s)

Output

[[ True  True  True]
 [ True  True  True]
 [ True  True  True]]


current_output, num_ids = ndimage.label(data, structure=s)

plt.imshow(current_output, interpolation='nearest')
plt.title('How to cluster data \n from a 2D binary matrix in python ?')
plt.savefig('clustering_data_03.png',facecolor='white')
plt.show()

How to cluster data from a 2D binary matrix in python ?
How to cluster data from a 2D binary matrix in python ?

Clustering data using ndimage and binary_dilation()

To cluster a binary array, you can enlarge the data through dilation to form larger groups prior to the clustering process.

data_dilated = ndimage.binary_dilation(data)

plt.imshow(data_dilated, interpolation='nearest')
plt.title('How to cluster data \n from a 2D binary matrix in python ?')
plt.savefig('clustering_data_04.png',facecolor='white')
plt.show()

How to cluster data from a 2D binary matrix in python ?
How to cluster data from a 2D binary matrix in python ?

We can use ndimage.label() just like we did before

current_output, num_ids = ndimage.label(data_dilated)

plt.imshow(current_output, interpolation='nearest')
plt.title('How to cluster data \n from a 2D binary matrix in python ?')
plt.savefig('clustering_data_05.png',facecolor='white')
plt.show()

How to cluster data from a 2D binary matrix in python ?
How to cluster data from a 2D binary matrix in python ?

and mask our data:

current_output[ data == 0 ] = 0

plt.imshow(current_output, interpolation='nearest')
plt.title('How to cluster data \n from a 2D binary matrix in python ?')
plt.savefig('clustering_data_06.png',facecolor='white')
plt.show()

How to cluster data from a 2D binary matrix in python ?
How to cluster data from a 2D binary matrix in python ?

References

Links Site
ndimage docs.scipy.org
generate_binary_structure docs.scipy.org
binary_dilation docs.scipy.org
binary_erosion docs.scipy.org
plot_dbscan scikit-learn.org
Image

of