# How to cluster data from a 2D binary matrix in python ?

Published: May 01, 2023

When working with a binary matrix in Python, clustering data in a 2D format can be achieved using scipy.ndimage. Examples

## Create synthetic data

Before we begin, we need to import the required libraries and generate artificial data:

````import scipy.ndimage as ndimage`
`import matplotlib.pyplot as plt`
`import numpy as np`

`from scipy.ndimage import label, generate_binary_structure`

`data = np.array( [[0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1],`
`                  [1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1],`
`                  [1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1],`
`                  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],`
`                  [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],`
`                  [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0],`
`                  [1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],`
`                  [1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0],`
`                  [1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1],`
`                  [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],`
`                  [1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1],`
`                  [1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])`

`plt.imshow(data, interpolation='nearest')`
`plt.title('How to cluster data \n from a 2D binary matrix in python ?')`
`plt.savefig('clustering_data_01.png',facecolor='white')`
`plt.show()`
```

## Clustering data using ndimage

Next, we will use the ndimage module from scipy to cluster our data.

````current_output, num_ids = ndimage.label(data)`

`plt.imshow(current_output, interpolation='nearest')`
`plt.title('How to cluster data \n from a 2D binary matrix in python ?')`
`plt.savefig('clustering_data_02.png',facecolor='white')`
`plt.show()`
```

By default ndimage use the following structure to define pixel connectivity

````s = generate_binary_structure(2,1)`

`print(s)`
```

Output

````[[False  True False]`
` [ True  True  True]`
` [False  True False]]`
```

But it is possible to define another structure.

````s = generate_binary_structure(2,2)`

`print(s)`
```

Output

````[[ True  True  True]`
` [ True  True  True]`
` [ True  True  True]]`

`current_output, num_ids = ndimage.label(data, structure=s)`

`plt.imshow(current_output, interpolation='nearest')`
`plt.title('How to cluster data \n from a 2D binary matrix in python ?')`
`plt.savefig('clustering_data_03.png',facecolor='white')`
`plt.show()`
```

## Clustering data using ndimage and binary_dilation()

To cluster a binary array, you can enlarge the data through dilation to form larger groups prior to the clustering process.

````data_dilated = ndimage.binary_dilation(data)`

`plt.imshow(data_dilated, interpolation='nearest')`
`plt.title('How to cluster data \n from a 2D binary matrix in python ?')`
`plt.savefig('clustering_data_04.png',facecolor='white')`
`plt.show()`
```

We can use ndimage.label() just like we did before

````current_output, num_ids = ndimage.label(data_dilated)`

`plt.imshow(current_output, interpolation='nearest')`
`plt.title('How to cluster data \n from a 2D binary matrix in python ?')`
`plt.savefig('clustering_data_05.png',facecolor='white')`
`plt.show()`
```

````current_output[ data == 0 ] = 0`

`plt.imshow(current_output, interpolation='nearest')`
`plt.title('How to cluster data \n from a 2D binary matrix in python ?')`
`plt.savefig('clustering_data_06.png',facecolor='white')`
`plt.show()`
```

## References

ndimage docs.scipy.org
generate_binary_structure docs.scipy.org
binary_dilation docs.scipy.org
binary_erosion docs.scipy.org
plot_dbscan scikit-learn.org
Image

of