How to calculate a mean from a dataframe column with pandas in python ?


Examples of how to calculate the mean over a dataframe column with pandas in python:

Create a dataframe

Lets consider the following dataframe:

import pandas as pd

data = {'Name':['Ben','Anna','Zoe','Tom','John','Steve'], 
        'Age':[20,27,43,30,12,21]}

df = pd.DataFrame(data)

returns

    Name  Age
0    Ben   20
1   Anna   27
2    Zoe   43
3    Tom   30
4   John   12
5  Steve   21

Calculate the mean

To calculate the mean over the column called above 'Age' a solution is to use mean(), example

df['Age'].mean()

returns

25.5

Another example with a NaN value in the column

import pandas as pd
import numpy as np

data = {'Name':['Ben','Anna','Zoe','Tom','John','Steve','Bob'], 
        'Age':[20,27,43,30,12,21, np.nan]}

df = pd.DataFrame(data)

    Name   Age
0    Ben  20.0
1   Anna  27.0
2    Zoe  43.0
3    Tom  30.0
4   John  12.0
5  Steve  21.0
6    Bob   NaN

df['Age'].mean()

returns

25.5

Example with normally distributed data

Generate data normally distributed data (mean=27; std=2.0)

import numpy as np
import pandas as pd

mu = 27.0
sigma = 2.0

data = np.random.randn(100000) * sigma + mu

df = pd.DataFrame(data, columns=['age'])

             age
0      31.238531
1      28.685002
2      27.811728
3      25.102273
4      23.525331
...          ...
99995  25.406317
99996  25.248491
99997  25.555941
99998  27.037278
99999  27.461417

calculate the mean

df['age'].mean()

returns

26.998999150576736

Can be usefull to visualize the distribution of data:

df['age'].hist()

plt.title("How to calculate a column mean with pandas ?")

plt.savefig("pandas_column_mean.png", bbox_inches='tight')

How to calculate a mean from a dataframe column with pandas in python ?
How to calculate a mean from a dataframe column with pandas in python ?

Note: if data are censored see how to estimate the mean with a truncated dataset using python for data generated from a normal distribution ?

References

Image

of