Examples of how to create a data frame using the python module pandas
Create a data frame using an array
Import pandas and numpy:
>>> import pandas as pd
>>> import numpy as np
Let's consider the following matrix
\begin{equation}
data = \left( \begin{array}{ccc}
1 & 2 & 3 & 4 \\
5 & 6 & 7 & 8 \\
9 & 10 & 11 & 12
\end{array}\right)
\end{equation}
>>> data = np.arange(1,13)
>>> data = data.reshape(3,4)
>>> data
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
Give names to columns
>>> columns = ['Home','Car','Sport','Food']
Give names to rows
>>> index = ['Alice','Bob','Emma']
Create a data frame using pandas:
>>> df = pd.DataFrame(data=data,index=index,columns=columns)
>>> df
Home Car Sport Food
Alice 1 2 3 4
Bob 5 6 7 8
Emma 9 10 11 12
An example without the index
>>> df = pd.DataFrame(data=data,columns=columns)
>>> df
Home Car Sport Food
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
Create a data frame using a dictionary
Create a dataframe from a python dictionary (method 1)
To create a dataframe from a python dictionary:
d = {
'Name': ['Ben', 'John', 'Emma', 'Zoe'],
'Age': [40, 56, 34, 12]
}
a solution is to do
import pandas as pd
df = pd.DataFrame(d)
gives
Name Age
0 Ben 40
1 John 56
2 Emma 34
3 Zoe 12
Create a dataframe from a python dictionary (method 2)
Another solution is to use pandas.DataFrame.from_dict
df = pd.DataFrame.from_dict(d)
print(df)
gives
Name Age
0 Ben 40
1 John 56
2 Emma 34
3 Zoe 12
The advantage is to use parameters such as orient:
df = pd.DataFrame.from_dict(d, orient='index')
gives then
0 1 2 3
Name Ben John Emma Zoe
Age 40 56 34 12
Add column names:
df = pd.DataFrame.from_dict(d, orient='index', columns=['User 1', 'User 2', 'User 3', 'User 4'])
gives
User 1 User 2 User 3 User 4
Name Ben John Emma Zoe
Age 40 56 34 12
Add a new row into a dataframe
Another example, add a new row into an existing dataframe:
import pandas as pd
d = {
'Name': ['Ben', 'John', 'Emma', 'Zoe'],
'Age': [40, 56, 34, 12]
}
df = pd.DataFrame.from_dict(d)
Name Age
0 Ben 40
1 John 56
2 Emma 34
3 Zoe 12
Create a new dataframe from a python dictionary
new_d = {
'Name': ['Paula'],
'Age': [67]
}
df_new_row = pd.DataFrame.from_dict(new_d)
To add a new row, a solution is to use concat() (see How to merge (concatenate) two or more dataframe columns into one column with pandas ? and How to add a new row at the end of a pandas DataFrame in pandas ?)
df = pd.concat([df,df_new_row], ignore_index=True)
gives
Name Age
0 Ben 40
1 John 56
2 Emma 34
3 Zoe 12
4 Paula 67
Fix error "If using all scalar values, you must pass an index"
new_d = {
'Name': 'Paula',
'Age': 67
}
df_new_row = pd.DataFrame.from_dict(new_d)
returns
ValueError: If using all scalar values, you must pass an index
This is because values of the dictionary new_d are not a list.
To fix that just add []:
new_d = {
'Name': ['Paula'],
'Age': [67]
}
df_new_row = pd.DataFrame.from_dict(new_d)
Create a data frame using a list of tuple
Example using a list of tuple
>>> data = [(1,2,3,4),(5,6,7,8)]
>>> index = ['Alice','Bob']
>>> columns = ['Home','Car','Sport','Food']
>>> df = pd.DataFrame(data=data,index=index,columns=columns)
>>> df
Home Car Sport Food
Alice 1 2 3 4
Bob 5 6 7 8
Create a data frame from a ascii file
To extract data from a file and create a data frame a solution is to use read_csv()
>>> import pandas as pd
>>> df = pd.read_csv('myfile.csv', sep=",", header=None)
>>> df.head()