Examples of how to create a data frame using the python module pandas
Create a data frame using an array
Import pandas and numpy:
>>> import pandas as pd>>> import numpy as np
Let's consider the following matrix
\begin{equation}
data = \left( \begin{array}{ccc}
1 & 2 & 3 & 4 \\
5 & 6 & 7 & 8 \\
9 & 10 & 11 & 12
\end{array}\right)
\end{equation}
>>> data = np.arange(1,13)>>> data = data.reshape(3,4)>>> dataarray([[ 1, 2, 3, 4],[ 5, 6, 7, 8],[ 9, 10, 11, 12]])
Give names to columns
>>> columns = ['Home','Car','Sport','Food']
Give names to rows
>>> index = ['Alice','Bob','Emma']
Create a data frame using pandas:
>>> df = pd.DataFrame(data=data,index=index,columns=columns)>>> dfHome Car Sport FoodAlice 1 2 3 4Bob 5 6 7 8Emma 9 10 11 12
An example without the index
>>> df = pd.DataFrame(data=data,columns=columns)>>> dfHome Car Sport Food0 1 2 3 41 5 6 7 82 9 10 11 12
Create a data frame using a dictionary
Create a dataframe from a python dictionary (method 1)
To create a dataframe from a python dictionary:
d = {'Name': ['Ben', 'John', 'Emma', 'Zoe'],'Age': [40, 56, 34, 12]}
a solution is to do
import pandas as pddf = pd.DataFrame(d)
gives
Name Age0 Ben 401 John 562 Emma 343 Zoe 12
Create a dataframe from a python dictionary (method 2)
Another solution is to use pandas.DataFrame.from_dict
df = pd.DataFrame.from_dict(d)print(df)
gives
Name Age0 Ben 401 John 562 Emma 343 Zoe 12
The advantage is to use parameters such as orient:
df = pd.DataFrame.from_dict(d, orient='index')
gives then
0 1 2 3Name Ben John Emma ZoeAge 40 56 34 12
Add column names:
df = pd.DataFrame.from_dict(d, orient='index', columns=['User 1', 'User 2', 'User 3', 'User 4'])
gives
User 1 User 2 User 3 User 4Name Ben John Emma ZoeAge 40 56 34 12
Add a new row into a dataframe
Another example, add a new row into an existing dataframe:
import pandas as pdd = {'Name': ['Ben', 'John', 'Emma', 'Zoe'],'Age': [40, 56, 34, 12]}df = pd.DataFrame.from_dict(d)Name Age0 Ben 401 John 562 Emma 343 Zoe 12
Create a new dataframe from a python dictionary
new_d = {'Name': ['Paula'],'Age': [67]}df_new_row = pd.DataFrame.from_dict(new_d)
To add a new row, a solution is to use concat() (see How to merge (concatenate) two or more dataframe columns into one column with pandas ? and How to add a new row at the end of a pandas DataFrame in pandas ?)
df = pd.concat([df,df_new_row], ignore_index=True)
gives
Name Age0 Ben 401 John 562 Emma 343 Zoe 124 Paula 67
Fix error "If using all scalar values, you must pass an index"
new_d = {'Name': 'Paula','Age': 67}df_new_row = pd.DataFrame.from_dict(new_d)
returns
ValueError: If using all scalar values, you must pass an index
This is because values of the dictionary new_d are not a list.
To fix that just add []:
new_d = {'Name': ['Paula'],'Age': [67]}df_new_row = pd.DataFrame.from_dict(new_d)
Create a data frame using a list of tuple
Example using a list of tuple
>>> data = [(1,2,3,4),(5,6,7,8)]>>> index = ['Alice','Bob']>>> columns = ['Home','Car','Sport','Food']>>> df = pd.DataFrame(data=data,index=index,columns=columns)>>> dfHome Car Sport FoodAlice 1 2 3 4Bob 5 6 7 8
Create a data frame from a ascii file
To extract data from a file and create a data frame a solution is to use read_csv()
>>> import pandas as pd>>> df = pd.read_csv('myfile.csv', sep=",", header=None)>>> df.head()
