How to create a data frame in python using pandas ?

Published: October 15, 2019

Updated: November 29, 2022

Tags: Python; Pandas; Dataframe;

DMCA.com Protection Status

Examples of how to create a data frame using the python module pandas

Create a data frame using an array

Import pandas and numpy:

>>> import pandas as pd
>>> import numpy as np

Let's consider the following matrix

\begin{equation}
data = \left( \begin{array}{ccc}
1 & 2 & 3 & 4 \\
5 & 6 & 7 & 8 \\
9 & 10 & 11 & 12
\end{array}\right)
\end{equation}

>>> data = np.arange(1,13)
>>> data = data.reshape(3,4)
>>> data
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])

Give names to columns

>>> columns = ['Home','Car','Sport','Food']

Give names to rows

>>> index = ['Alice','Bob','Emma']

Create a data frame using pandas:

>>> df = pd.DataFrame(data=data,index=index,columns=columns)
>>> df
       Home  Car  Sport  Food
Alice     1    2      3     4
Bob       5    6      7     8
Emma      9   10     11    12

An example without the index

>>> df = pd.DataFrame(data=data,columns=columns)
>>> df
   Home  Car  Sport  Food
0     1    2      3     4
1     5    6      7     8
2     9   10     11    12

Create a data frame using a dictionary

Create a dataframe from a python dictionary (method 1)

To create a dataframe from a python dictionary:

d = {
    'Name': ['Ben', 'John', 'Emma', 'Zoe'],
    'Age': [40, 56, 34, 12]
}

a solution is to do

import pandas as pd

df = pd.DataFrame(d)

gives

   Name  Age
0   Ben   40
1  John   56
2  Emma   34
3   Zoe   12

Create a dataframe from a python dictionary (method 2)

Another solution is to use pandas.DataFrame.from_dict

df = pd.DataFrame.from_dict(d)

print(df)

gives

   Name  Age
0   Ben   40
1  John   56
2  Emma   34
3   Zoe   12

The advantage is to use parameters such as orient:

df = pd.DataFrame.from_dict(d, orient='index')

gives then

        0     1     2    3
Name  Ben  John  Emma  Zoe
Age    40    56    34   12

Add column names:

df = pd.DataFrame.from_dict(d, orient='index', columns=['User 1', 'User 2', 'User 3', 'User 4'])

gives

     User 1 User 2 User 3 User 4
Name    Ben   John   Emma    Zoe
Age      40     56     34     12

Add a new row into a dataframe

Another example, add a new row into an existing dataframe:

import pandas as pd

d = {
    'Name': ['Ben', 'John', 'Emma', 'Zoe'],
    'Age': [40, 56, 34, 12]
}

df = pd.DataFrame.from_dict(d)

    Name  Age
0   Ben   40
1  John   56
2  Emma   34
3   Zoe   12

Create a new dataframe from a python dictionary

new_d = {
    'Name': ['Paula'],
    'Age': [67]
}

df_new_row = pd.DataFrame.from_dict(new_d)

To add a new row, a solution is to use concat() (see How to merge (concatenate) two or more dataframe columns into one column with pandas ? and How to add a new row at the end of a pandas DataFrame in pandas ?)

df = pd.concat([df,df_new_row], ignore_index=True)

gives

     Name  Age
0    Ben   40
1   John   56
2   Emma   34
3    Zoe   12
4  Paula   67

Fix error "If using all scalar values, you must pass an index"

new_d = {
    'Name': 'Paula',
    'Age': 67
}

df_new_row = pd.DataFrame.from_dict(new_d)

returns

 ValueError: If using all scalar values, you must pass an index

This is because values of the dictionary new_d are not a list.

To fix that just add []:

new_d = {
    'Name': ['Paula'],
    'Age': [67]
}

df_new_row = pd.DataFrame.from_dict(new_d)

Create a data frame using a list of tuple

Example using a list of tuple

>>> data = [(1,2,3,4),(5,6,7,8)]
>>> index = ['Alice','Bob']
>>> columns = ['Home','Car','Sport','Food']
>>> df = pd.DataFrame(data=data,index=index,columns=columns)
>>> df
       Home  Car  Sport  Food
Alice     1    2      3     4
Bob       5    6      7     8

Create a data frame from a ascii file

To extract data from a file and create a data frame a solution is to use read_csv()

>>> import pandas as pd
>>> df = pd.read_csv('myfile.csv', sep=",", header=None)
>>> df.head()