Examples of how to calculate the mean over a dataframe column with pandas in python:
Create a dataframe
Lets consider the following dataframe:
import pandas as pddata = {'Name':['Ben','Anna','Zoe','Tom','John','Steve'],'Age':[20,27,43,30,12,21]}df = pd.DataFrame(data)
returns
Name Age0 Ben 201 Anna 272 Zoe 433 Tom 304 John 125 Steve 21
Calculate the mean
To calculate the mean over the column called above 'Age' a solution is to use mean(), example
df['Age'].mean()
returns
25.5
Another example with a NaN value in the column
import pandas as pdimport numpy as npdata = {'Name':['Ben','Anna','Zoe','Tom','John','Steve','Bob'],'Age':[20,27,43,30,12,21, np.nan]}df = pd.DataFrame(data)Name Age0 Ben 20.01 Anna 27.02 Zoe 43.03 Tom 30.04 John 12.05 Steve 21.06 Bob NaNdf['Age'].mean()
returns
25.5
Example with normally distributed data
Generate data normally distributed data (mean=27; std=2.0)
import numpy as npimport pandas as pdmu = 27.0sigma = 2.0data = np.random.randn(100000) * sigma + mudf = pd.DataFrame(data, columns=['age'])age0 31.2385311 28.6850022 27.8117283 25.1022734 23.525331... ...99995 25.40631799996 25.24849199997 25.55594199998 27.03727899999 27.461417
calculate the mean
df['age'].mean()
returns
26.998999150576736
Can be usefull to visualize the distribution of data:
df['age'].hist()plt.title("How to calculate a column mean with pandas ?")plt.savefig("pandas_column_mean.png", bbox_inches='tight')

Note: if data are censored see how to estimate the mean with a truncated dataset using python for data generated from a normal distribution ?
