Kernel Density Estimation (KDE) is a non-parametric method of estimating probability density functions from sample data. The basic idea is to construct a smooth function that can be used to approximate the underlying probability distribution, based on a given set of observations in two dimensions.
Create synthetic data
Using Python, it is fairly straightforward to calculate and plot a 2D KDE. The first step is to import the necessary modules, including numpy, scipy and matplotlib. Next, get your data ready for the calculation - it should be in the form of an array or list of two-dimensional points.
import numpy as npmu = 5.0sigma = 4.0x = np.random.randn(10000) * sigma + mumu = 5.0sigma = 10.0y = np.random.randn(10000) * sigma + mu
Calculate a 2D kernel density estimation
Once you have your data ready, you can use the scipy.stats.gaussian_kde() function to calculate the KDE. This function returns an object that contains all the information you need to plot the result.
from scipy import stats, mgrid, c_data = np.stack((x,y), axis=1)data.shapexmin = data.min()xmax = data.max()ymin = data.min()ymax = data.max()kernel = stats.gaussian_kde(data.T)
Plot result using matplotlib
To plot, you can use either matplotlib or seaborn, depending on how complex of a graph you want. With matplotlib, it is fairly simple to plot a basic 2D KDE, using the contourf() or imshow() functions. Seaborn offers more options for customization and allows you to create more complex plots with the kdeplot() function:
Using imshow()
import matplotlib.pyplot as pltimport matplotlib.cm as cmfig = plt.figure(figsize=(10,4))ax = fig.add_subplot(111)X, Y = mgrid[xmin:xmax:100j, ymin:ymax:100j]positions = c_[X.ravel(), Y.ravel()]Z = np.reshape(kernel(positions.T).T, X.T.shape)cax = ax.imshow(Z.T, interpolation='nearest',cmap=cm.jet, origin='lower', extent=[xmin, xmax, ymin, ymax])plt.xlabel('x',fontsize=14)plt.ylabel('y',fontsize=14)plt.xlim(xmin,xmax)plt.ylim(ymin,ymax)ticks_at = [0, Z.max()/6.0, 2.0*Z.max()/6.0, 3.0*Z.max()/6.0, 4.0*Z.max()/6.0, 5.0*Z.max()/6.0, Z.max()]cbar = fig.colorbar(cax,ticks=ticks_at,format='%1.2g')cbar.set_label('Kernel density function')plt.savefig("kernel2D.png", dpi=100, bbox_inches='tight')plt.show()

Using contourf()
fig = plt.figure(figsize=(10,4))ax = fig.gca()ax.set_xlim(xmin, xmax)ax.set_ylim(ymin, ymax)cfset = ax.contourf(X, Y, Z, cmap='Blues')cset = ax.contour(X, Y, Z, colors='k')ax.clabel(cset, inline=1, fontsize=10)ax.set_xlabel('X')ax.set_ylabel('Y')plt.savefig("kernel2D_contourf.png", dpi=100, bbox_inches='tight')plt.show()

References
| Links | Site |
|---|---|
| scipy.stats.gaussian_kde | docs.scipy.org |
| How to generate random numbers from a normal (Gaussian) distribution in python ? | www.moonbooks.org |
| numpy.random.randn | numpy.org |
| numpy.stack | numpy.org |
