Machine learning: target data scaling for a non linear regression using Gaussian processes with python and scikit-learn

Published: July 14, 2020

Tags: Python; Scikit-learn; Machine Learning; Gaussian processes;

DMCA.com Protection Status

Example of impact of target data scaling for a non linear regression using Gaussian processes with python and scikit-learn

Non linear regression with gaussian processes

Let's first import python module required:

from sklearn import preprocessing
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.gaussian_process.kernels import DotProduct, ConstantKernel as C
from pylab import figure

import matplotlib.pyplot as plt
import numpy as np

Learning Data

x = np.arange(0.0,12.0,1.0)
x = x[:, np.newaxis]

y = np.array([0.9995577,  0.999717,   0.9997348,  0.99975765, 0.99978703, 0.99980724, 0.9998182,  0.99982077, 0.99981844, 0.99981105, 0.99980015, 0.9997869 ])
y = y[:, np.newaxis]

Create a Gaussian process model:

kernel = C(1.0, (0.1, 10.0)) * RBF([0.1], (1e-2, 1e2))

gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=20)

mymodel = gp.fit(x,y)

Make prediction and plot:

fig = figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')

plt.scatter(x,y)

x = np.arange(0,11.1,0.1)
x = x[:, np.newaxis]

y = mymodel.predict(x)

plt.plot(x,y,color='red')

plt.ylim(np.min(y)-0.00001,np.max(y)+0.00001)

plt.grid()

plt.show()

returns:

Machine learning: target data scaling for a non linear regression using Gaussian processes with python and scikit-learn
Machine learning: target data scaling for a non linear regression using Gaussian processes with python and scikit-learn

With normalization of the target data

In the above example the target data are quite small and close to each other. A solution is to scale the data to improve the model using:

scaler = preprocessing.StandardScaler().fit(y)
y = scaler.transform(y)

and to transform in the y original scale:

y = scaler.inverse_transform(y)

Full code:

x = np.arange(0.0,12.0,1.0)
x = x[:, np.newaxis]

y = np.array([0.9995577,  0.999717,   0.9997348,  0.99975765, 0.99978703, 0.99980724, 0.9998182,  0.99982077, 0.99981844, 0.99981105, 0.99980015, 0.9997869 ])
y = y[:, np.newaxis]

scaler = preprocessing.StandardScaler().fit(y)
y = scaler.transform(y)

kernel = C(1.0, (0.1, 10.0)) * RBF([0.1], (1e-2, 1e2))

gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=20)

mymodel = gp.fit(x,y)

fig = figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')

y = scaler.inverse_transform(y)

plt.scatter(x,y)

x = np.arange(0,11.1,0.1)
x = x[:, np.newaxis]

y = mymodel.predict(x)

y = scaler.inverse_transform(y)

plt.plot(x,y,color='red')

plt.ylim(np.min(y)-0.00001,np.max(y)+0.00001)

plt.grid()

plt.title(r'Gaussian Processes (without target data scaling)')
plt.xlabel('x')
plt.ylabel('y')

plt.savefig("gp_with_scaling_target.png", bbox_inches='tight')

plt.show()

returns

Machine learning: target data scaling for a non linear regression using Gaussian processes with python and scikit-learn
Machine learning: target data scaling for a non linear regression using Gaussian processes with python and scikit-learn

References

Image

of