# Machine learning: target data scaling for a non linear regression using Gaussian processes with python and scikit-learn

Published: July 14, 2020

Example of impact of target data scaling for a non linear regression using Gaussian processes with python and scikit-learn

### Non linear regression with gaussian processes

Let's first import python module required:

````from sklearn import preprocessing`
`from sklearn.gaussian_process import GaussianProcessRegressor`
`from sklearn.gaussian_process.kernels import RBF`
`from sklearn.gaussian_process.kernels import DotProduct, ConstantKernel as C`
`from pylab import figure`

`import matplotlib.pyplot as plt`
`import numpy as np`
```

Learning Data

````x = np.arange(0.0,12.0,1.0)`
`x = x[:, np.newaxis]`

`y = np.array([0.9995577,  0.999717,   0.9997348,  0.99975765, 0.99978703, 0.99980724, 0.9998182,  0.99982077, 0.99981844, 0.99981105, 0.99980015, 0.9997869 ])`
`y = y[:, np.newaxis]`
```

Create a Gaussian process model:

````kernel = C(1.0, (0.1, 10.0)) * RBF([0.1], (1e-2, 1e2))`

`gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=20)`

`mymodel = gp.fit(x,y)`
```

Make prediction and plot:

````fig = figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')`

`plt.scatter(x,y)`

`x = np.arange(0,11.1,0.1)`
`x = x[:, np.newaxis]`

`y = mymodel.predict(x)`

`plt.plot(x,y,color='red')`

`plt.ylim(np.min(y)-0.00001,np.max(y)+0.00001)`

`plt.grid()`

`plt.show()`
```

returns:

### With normalization of the target data

In the above example the target data are quite small and close to each other. A solution is to scale the data to improve the model using:

````scaler = preprocessing.StandardScaler().fit(y)`
`y = scaler.transform(y)`
```

and to transform in the y original scale:

````y = scaler.inverse_transform(y)`
```

Full code:

````x = np.arange(0.0,12.0,1.0)`
`x = x[:, np.newaxis]`

`y = np.array([0.9995577,  0.999717,   0.9997348,  0.99975765, 0.99978703, 0.99980724, 0.9998182,  0.99982077, 0.99981844, 0.99981105, 0.99980015, 0.9997869 ])`
`y = y[:, np.newaxis]`

`scaler = preprocessing.StandardScaler().fit(y)`
`y = scaler.transform(y)`

`kernel = C(1.0, (0.1, 10.0)) * RBF([0.1], (1e-2, 1e2))`

`gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=20)`

`mymodel = gp.fit(x,y)`

`fig = figure(num=None, figsize=(12, 10), dpi=80, facecolor='w', edgecolor='k')`

`y = scaler.inverse_transform(y)`

`plt.scatter(x,y)`

`x = np.arange(0,11.1,0.1)`
`x = x[:, np.newaxis]`

`y = mymodel.predict(x)`

`y = scaler.inverse_transform(y)`

`plt.plot(x,y,color='red')`

`plt.ylim(np.min(y)-0.00001,np.max(y)+0.00001)`

`plt.grid()`

`plt.title(r'Gaussian Processes (without target data scaling)')`
`plt.xlabel('x')`
`plt.ylabel('y')`

`plt.savefig("gp_with_scaling_target.png", bbox_inches='tight')`

`plt.show()`
```

returns

Image

of