How to calculate and plot a 2D kernel density estimation with python ?

Published: March 03, 2023

Updated: March 03, 2023

Tags: Python; Matplotlib;

DMCA.com Protection Status

Kernel Density Estimation (KDE) is a non-parametric method of estimating probability density functions from sample data. The basic idea is to construct a smooth function that can be used to approximate the underlying probability distribution, based on a given set of observations in two dimensions.

Create synthetic data

Using Python, it is fairly straightforward to calculate and plot a 2D KDE. The first step is to import the necessary modules, including numpy, scipy and matplotlib. Next, get your data ready for the calculation - it should be in the form of an array or list of two-dimensional points.

import numpy as np

mu = 5.0
sigma = 4.0

x = np.random.randn(10000) * sigma + mu

mu = 5.0
sigma = 10.0

y = np.random.randn(10000) * sigma + mu

Calculate a 2D kernel density estimation

Once you have your data ready, you can use the scipy.stats.gaussian_kde() function to calculate the KDE. This function returns an object that contains all the information you need to plot the result.

from scipy import stats, mgrid, c_

data = np.stack((x,y), axis=1)

data.shape

xmin = data.min()
xmax = data.max()
ymin = data.min()
ymax = data.max()

kernel = stats.gaussian_kde(data.T)

Plot result using matplotlib

To plot, you can use either matplotlib or seaborn, depending on how complex of a graph you want. With matplotlib, it is fairly simple to plot a basic 2D KDE, using the contourf() or imshow() functions. Seaborn offers more options for customization and allows you to create more complex plots with the kdeplot() function:

Using imshow()

import matplotlib.pyplot as plt
import matplotlib.cm as cm

fig = plt.figure(figsize=(10,4))

ax = fig.add_subplot(111)

X, Y = mgrid[xmin:xmax:100j, ymin:ymax:100j]

positions = c_[X.ravel(), Y.ravel()]

Z = np.reshape(kernel(positions.T).T, X.T.shape)

cax = ax.imshow(Z.T, interpolation='nearest',
            cmap=cm.jet, origin='lower', extent=[xmin, xmax, ymin, ymax])

plt.xlabel('x',fontsize=14)
plt.ylabel('y',fontsize=14)

plt.xlim(xmin,xmax)
plt.ylim(ymin,ymax)

ticks_at = [0, Z.max()/6.0, 2.0*Z.max()/6.0, 3.0*Z.max()/6.0, 4.0*Z.max()/6.0, 5.0*Z.max()/6.0, Z.max()]
cbar = fig.colorbar(cax,ticks=ticks_at,format='%1.2g')
cbar.set_label('Kernel density function')

plt.savefig("kernel2D.png", dpi=100, bbox_inches='tight')
plt.show()

How to calculate and plot a 2D kernel density estimation with python ?
How to calculate and plot a 2D kernel density estimation with python ?

Using contourf()

fig = plt.figure(figsize=(10,4))

ax = fig.gca()

ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)

cfset = ax.contourf(X, Y, Z, cmap='Blues')

cset = ax.contour(X, Y, Z, colors='k')

ax.clabel(cset, inline=1, fontsize=10)
ax.set_xlabel('X')
ax.set_ylabel('Y')

plt.savefig("kernel2D_contourf.png", dpi=100, bbox_inches='tight')

plt.show()

How to calculate and plot a 2D kernel density estimation with python ?
How to calculate and plot a 2D kernel density estimation with python ?

References

Image

of