How to use a Gaussian mixture model (GMM) with sklearn in python ?

Published: May 09, 2020

Tags: Python; sklearn; Machine Learning;

DMCA.com Protection Status

Examples of how to use a Gaussian mixture model (GMM) with sklearn in python:

from sklearn import mixture

import numpy as np
import matplotlib.pyplot as plt

1 -- Example with one Gaussian

Let's generate random numbers from a normal distribution with a mean $\mu_0 = 5$ and standard deviation $\sigma_0 = 2$

mu_0 = 5.0
srd_0 = 2.0

data = np.random.randn(100000)
data = data * srd_0 + mu_0

data = data.reshape(-1, 1)

Plot the data using matplotlib:

hx, hy, _ = plt.hist(data, bins=50, density=1,color="lightblue")

plt.ylim(0.0,max(hx)+0.05)
plt.title('Gaussian mixture example 01')
plt.grid()

plt.xlim(mu_0-4*srd_0,mu_0+4*srd_0)

plt.savefig("example_gmm_01.png", bbox_inches='tight')
plt.show()

How to use a Gaussian mixture model (GMM) with sklearn in python ?
How to use a Gaussian mixture model (GMM) with sklearn in python ?

gmm = mixture.GaussianMixture(n_components=1, covariance_type='full').fit(data)

print(gmm.means_)
print(np.sqrt(gmm.covariances_))

[[5.00715457]]
[[[1.99746652]]]

Comparisons with numpy:

print(np.mean(data))
print(np.std(data))

4.998997166872173
2.0008903305868855

2 -- Example of a mixture of two gaussians

mu_1 = 2.0
srd_1 = 4.0

mu_2 = 10.0
srd_2 = 1.0

new_data = np.random.randn(50000)
data_1 = new_data * srd_1 + mu_1

new_data = np.random.randn(50000)
data_2 = new_data * srd_2 + mu_2

new_data = np.concatenate((data_1, data_2), axis=0)

new_data = new_data.reshape(-1, 1)

hx, hy, _ = plt.hist(new_data, bins=100, density=1,color="lightblue")

plt.title('Gaussian mixture example 02')
plt.grid()

plt.savefig("example_gmm_02.png", bbox_inches='tight')
plt.show()

How to use a Gaussian mixture model (GMM) with sklearn in python ?
How to use a Gaussian mixture model (GMM) with sklearn in python ?

gmm = mixture.GaussianMixture(n_components=2, max_iter=1000, covariance_type='full').fit(new_data)

print('means')
print(gmm.means_)
#print(gmm.covariances_)
print('std')
print(np.sqrt(gmm.covariances_))

means
[[1.82377272]
 [9.9837662 ]]
std
[[[3.89836502]]

 [[1.02825841]]]

3 -- References

Image

of