Examples of how to calculate the Pearson’s Correlation coefficient between two datasets in python:
Create a dataset
Let's first create some data:
import numpy as npdef f(a,b,c,X):eps = c * np.random.randn(X.shape[0])return a * X + b + epsa = 1 # slopeb = 0 # interceptc = 1.0 # noiseX = np.random.randint(100, size=250)Y = f(a,b,c,X)
and use matplotlib to visualize it:
import matplotlib.pyplot as pltplt.scatter(X,Y)plt.xlim(-10,110)plt.title("How to calculate the Pearson’s Correlation coefficient \n between two datasets in python ?")plt.xlabel('X')plt.ylabel('Y')plt.savefig("Pearson_Correlation_coefficient_01.png", bbox_inches='tight')plt.show()

Calculate the Pearson’s Correlation coefficient using scipy
To calculate the Pearson’s Correlation coefficient between variables X and Y, a solution is to use scipy.stats.pearsonr
from scipy.stats import pearsonrcorr, _ = pearsonr(X, Y)
gives
0.9434925682236153
that can be rounded:
round(corr,2)
gives then
0.94
Examples of Pearson’s Correlation coefficients calculation
Lets now reproduce the example from wikipedia:
import matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import pearsonrdef f(a,b,c,X):eps = c * np.random.randn(X.shape[0])return a * X + b + epsA = [1.0,1.0,1.0,0.0,-1.0,-1.0,-1.0]B = [0.0,0.0,0.0,0.0,0.0,0.0,0.0]C = [1.0, 10, 20, 20, 20 ,10, 1.0]n = 1for a,b,c in zip(A,B,C):print(a,b,c)X = np.random.randint(100, size=250)Y = f(a,b,c,X)corr, _ = pearsonr(X, Y)plt.scatter(X,Y)plt.xlim(-10,110)plt.title("""How to calculate the Pearson’s Correlation coefficient \nbetween two datasets in python ? \n corrcoef = {} \n a = {} b = {} c = {}""".format( str(round(corr,2)), a, b, c) )plt.xlabel('X')plt.ylabel('Y')plt.savefig("Pearson_Correlation_coefficient_{}.png".format(n), bbox_inches='tight')plt.show()n += 1
gives
Calculate the Pearson’s Correlation coefficient using numpy
Another solution is to use numpy with numpy.corrcoef:
import numpy as npnp.corrcoef(X,Y)
gives
[[1. 0.94349257][0.94349257 1. ]]
