When working with a binary matrix in Python, clustering data in a 2D format can be achieved using scipy.ndimage. Examples
Create synthetic data
Before we begin, we need to import the required libraries and generate artificial data:
import scipy.ndimage as ndimage
import matplotlib.pyplot as plt
import numpy as np
from scipy.ndimage import label, generate_binary_structure
data = np.array( [[0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1],
[1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1],
[1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1],
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
plt.imshow(data, interpolation='nearest')
plt.title('How to cluster data \n from a 2D binary matrix in python ?')
plt.savefig('clustering_data_01.png',facecolor='white')
plt.show()
Clustering data using ndimage
Next, we will use the ndimage module from scipy to cluster our data.
current_output, num_ids = ndimage.label(data)
plt.imshow(current_output, interpolation='nearest')
plt.title('How to cluster data \n from a 2D binary matrix in python ?')
plt.savefig('clustering_data_02.png',facecolor='white')
plt.show()
By default ndimage use the following structure to define pixel connectivity
s = generate_binary_structure(2,1)
print(s)
Output
[[False True False]
[ True True True]
[False True False]]
But it is possible to define another structure.
s = generate_binary_structure(2,2)
print(s)
Output
[[ True True True]
[ True True True]
[ True True True]]
current_output, num_ids = ndimage.label(data, structure=s)
plt.imshow(current_output, interpolation='nearest')
plt.title('How to cluster data \n from a 2D binary matrix in python ?')
plt.savefig('clustering_data_03.png',facecolor='white')
plt.show()
Clustering data using ndimage and binary_dilation()
To cluster a binary array, you can enlarge the data through dilation to form larger groups prior to the clustering process.
data_dilated = ndimage.binary_dilation(data)
plt.imshow(data_dilated, interpolation='nearest')
plt.title('How to cluster data \n from a 2D binary matrix in python ?')
plt.savefig('clustering_data_04.png',facecolor='white')
plt.show()
We can use ndimage.label() just like we did before
current_output, num_ids = ndimage.label(data_dilated)
plt.imshow(current_output, interpolation='nearest')
plt.title('How to cluster data \n from a 2D binary matrix in python ?')
plt.savefig('clustering_data_05.png',facecolor='white')
plt.show()
and mask our data:
current_output[ data == 0 ] = 0
plt.imshow(current_output, interpolation='nearest')
plt.title('How to cluster data \n from a 2D binary matrix in python ?')
plt.savefig('clustering_data_06.png',facecolor='white')
plt.show()
References
Links | Site |
---|---|
ndimage | docs.scipy.org |
generate_binary_structure | docs.scipy.org |
binary_dilation | docs.scipy.org |
binary_erosion | docs.scipy.org |
plot_dbscan | scikit-learn.org |