Examples of how to calculate the mean over a dataframe column with pandas in python:
Create a dataframe
Lets consider the following dataframe:
import pandas as pd
data = {'Name':['Ben','Anna','Zoe','Tom','John','Steve'],
'Age':[20,27,43,30,12,21]}
df = pd.DataFrame(data)
returns
Name Age
0 Ben 20
1 Anna 27
2 Zoe 43
3 Tom 30
4 John 12
5 Steve 21
Calculate the mean
To calculate the mean over the column called above 'Age' a solution is to use mean(), example
df['Age'].mean()
returns
25.5
Another example with a NaN value in the column
import pandas as pd
import numpy as np
data = {'Name':['Ben','Anna','Zoe','Tom','John','Steve','Bob'],
'Age':[20,27,43,30,12,21, np.nan]}
df = pd.DataFrame(data)
Name Age
0 Ben 20.0
1 Anna 27.0
2 Zoe 43.0
3 Tom 30.0
4 John 12.0
5 Steve 21.0
6 Bob NaN
df['Age'].mean()
returns
25.5
Example with normally distributed data
Generate data normally distributed data (mean=27; std=2.0)
import numpy as np
import pandas as pd
mu = 27.0
sigma = 2.0
data = np.random.randn(100000) * sigma + mu
df = pd.DataFrame(data, columns=['age'])
age
0 31.238531
1 28.685002
2 27.811728
3 25.102273
4 23.525331
... ...
99995 25.406317
99996 25.248491
99997 25.555941
99998 27.037278
99999 27.461417
calculate the mean
df['age'].mean()
returns
26.998999150576736
Can be usefull to visualize the distribution of data:
df['age'].hist()
plt.title("How to calculate a column mean with pandas ?")
plt.savefig("pandas_column_mean.png", bbox_inches='tight')
Note: if data are censored see how to estimate the mean with a truncated dataset using python for data generated from a normal distribution ?