Example of impact of target data scaling for a non linear regression using Gaussian processes with python and scikit-learn
Non linear regression with gaussian processes
Let's first import python module required:
from sklearn import preprocessing
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from sklearn.gaussian_process.kernels import DotProduct, ConstantKernel as C
from pylab import figure
import matplotlib.pyplot as plt
import numpy as np
Learning Data
x = np.arange(0.0,12.0,1.0)
x = x[:, np.newaxis]
y = np.array([0.9995577, 0.999717, 0.9997348, 0.99975765, 0.99978703, 0.99980724, 0.9998182, 0.99982077, 0.99981844, 0.99981105, 0.99980015, 0.9997869 ])
y = y[:, np.newaxis]
Create a Gaussian process model:
kernel = C(1.0, (0.1, 10.0)) * RBF([0.1], (1e-2, 1e2))
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=20)
mymodel = gp.fit(x,y)
Make prediction and plot:
fig = figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')
plt.scatter(x,y)
x = np.arange(0,11.1,0.1)
x = x[:, np.newaxis]
y = mymodel.predict(x)
plt.plot(x,y,color='red')
plt.ylim(np.min(y)-0.00001,np.max(y)+0.00001)
plt.grid()
plt.show()
returns:
With normalization of the target data
In the above example the target data are quite small and close to each other. A solution is to scale the data to improve the model using:
scaler = preprocessing.StandardScaler().fit(y)
y = scaler.transform(y)
and to transform in the y original scale:
y = scaler.inverse_transform(y)
Full code:
x = np.arange(0.0,12.0,1.0)
x = x[:, np.newaxis]
y = np.array([0.9995577, 0.999717, 0.9997348, 0.99975765, 0.99978703, 0.99980724, 0.9998182, 0.99982077, 0.99981844, 0.99981105, 0.99980015, 0.9997869 ])
y = y[:, np.newaxis]
scaler = preprocessing.StandardScaler().fit(y)
y = scaler.transform(y)
kernel = C(1.0, (0.1, 10.0)) * RBF([0.1], (1e-2, 1e2))
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=20)
mymodel = gp.fit(x,y)
fig = figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')
y = scaler.inverse_transform(y)
plt.scatter(x,y)
x = np.arange(0,11.1,0.1)
x = x[:, np.newaxis]
y = mymodel.predict(x)
y = scaler.inverse_transform(y)
plt.plot(x,y,color='red')
plt.ylim(np.min(y)-0.00001,np.max(y)+0.00001)
plt.grid()
plt.title(r'Gaussian Processes (without target data scaling)')
plt.xlabel('x')
plt.ylabel('y')
plt.savefig("gp_with_scaling_target.png", bbox_inches='tight')
plt.show()
returns