Example of impact of target data scaling for a non linear regression using Gaussian processes with python and scikit-learn
Non linear regression with gaussian processes
Let's first import python module required:
from sklearn import preprocessingfrom sklearn.gaussian_process import GaussianProcessRegressorfrom sklearn.gaussian_process.kernels import RBFfrom sklearn.gaussian_process.kernels import DotProduct, ConstantKernel as Cfrom pylab import figureimport matplotlib.pyplot as pltimport numpy as np
Learning Data
x = np.arange(0.0,12.0,1.0)x = x[:, np.newaxis]y = np.array([0.9995577, 0.999717, 0.9997348, 0.99975765, 0.99978703, 0.99980724, 0.9998182, 0.99982077, 0.99981844, 0.99981105, 0.99980015, 0.9997869 ])y = y[:, np.newaxis]
Create a Gaussian process model:
kernel = C(1.0, (0.1, 10.0)) * RBF([0.1], (1e-2, 1e2))gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=20)mymodel = gp.fit(x,y)
Make prediction and plot:
fig = figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')plt.scatter(x,y)x = np.arange(0,11.1,0.1)x = x[:, np.newaxis]y = mymodel.predict(x)plt.plot(x,y,color='red')plt.ylim(np.min(y)-0.00001,np.max(y)+0.00001)plt.grid()plt.show()
returns:

With normalization of the target data
In the above example the target data are quite small and close to each other. A solution is to scale the data to improve the model using:
scaler = preprocessing.StandardScaler().fit(y)y = scaler.transform(y)
and to transform in the y original scale:
y = scaler.inverse_transform(y)
Full code:
x = np.arange(0.0,12.0,1.0)x = x[:, np.newaxis]y = np.array([0.9995577, 0.999717, 0.9997348, 0.99975765, 0.99978703, 0.99980724, 0.9998182, 0.99982077, 0.99981844, 0.99981105, 0.99980015, 0.9997869 ])y = y[:, np.newaxis]scaler = preprocessing.StandardScaler().fit(y)y = scaler.transform(y)kernel = C(1.0, (0.1, 10.0)) * RBF([0.1], (1e-2, 1e2))gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=20)mymodel = gp.fit(x,y)fig = figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')y = scaler.inverse_transform(y)plt.scatter(x,y)x = np.arange(0,11.1,0.1)x = x[:, np.newaxis]y = mymodel.predict(x)y = scaler.inverse_transform(y)plt.plot(x,y,color='red')plt.ylim(np.min(y)-0.00001,np.max(y)+0.00001)plt.grid()plt.title(r'Gaussian Processes (without target data scaling)')plt.xlabel('x')plt.ylabel('y')plt.savefig("gp_with_scaling_target.png", bbox_inches='tight')plt.show()
returns

