Examples of how to calculate the Pearson’s Correlation coefficient between two datasets in python:

### Create a dataset

Let's first create some data:

`import numpy as np`

`def f(a,b,c,X):`

`eps = c * np.random.randn(X.shape[0])`

`return a * X + b + eps`

`a = 1 # slope`

`b = 0 # intercept`

`c = 1.0 # noise`

`X = np.random.randint(100, size=250)`

`Y = f(a,b,c,X)`

and use matplotlib to visualize it:

`import matplotlib.pyplot as plt`

`plt.scatter(X,Y)`

`plt.xlim(-10,110)`

`plt.title("How to calculate the Pearson’s Correlation coefficient \n between two datasets in python ?")`

`plt.xlabel('X')`

`plt.ylabel('Y')`

`plt.savefig("Pearson_Correlation_coefficient_01.png", bbox_inches='tight')`

`plt.show()`

### Calculate the Pearson’s Correlation coefficient using scipy

To calculate the Pearson’s Correlation coefficient between variables X and Y, a solution is to use scipy.stats.pearsonr

`from scipy.stats import pearsonr`

`corr, _ = pearsonr(X, Y)`

gives

`0.9434925682236153`

that can be rounded:

`round(corr,2)`

gives then

`0.94`

### Examples of Pearson’s Correlation coefficients calculation

Lets now reproduce the example from wikipedia:

`import matplotlib.pyplot as plt`

`import numpy as np`

`from scipy.stats import pearsonr`

`def f(a,b,c,X):`

`eps = c * np.random.randn(X.shape[0])`

`return a * X + b + eps`

`A = [1.0,1.0,1.0,0.0,-1.0,-1.0,-1.0]`

`B = [0.0,0.0,0.0,0.0,0.0,0.0,0.0]`

`C = [1.0, 10, 20, 20, 20 ,10, 1.0]`

`n = 1`

`for a,b,c in zip(A,B,C):`

`print(a,b,c)`

`X = np.random.randint(100, size=250)`

`Y = f(a,b,c,X)`

`corr, _ = pearsonr(X, Y)`

`plt.scatter(X,Y)`

`plt.xlim(-10,110)`

`plt.title("""`

`How to calculate the Pearson’s Correlation coefficient \n`

`between two datasets in python ? \n corrcoef = {} \n a = {} b = {} c = {}""".format( str(round(corr,2)), a, b, c) )`

`plt.xlabel('X')`

`plt.ylabel('Y')`

`plt.savefig("Pearson_Correlation_coefficient_{}.png".format(n), bbox_inches='tight')`

`plt.show()`

`n += 1`

gives

### Calculate the Pearson’s Correlation coefficient using numpy

Another solution is to use numpy with numpy.corrcoef:

`import numpy as np`

`np.corrcoef(X,Y)`

gives

`[[1. 0.94349257]`

`[0.94349257 1. ]]`