How to create a dataframe with multiple indexes with pandas ?

Published: September 17, 2021

Tags: Python; Pandas; DataFrame;

DMCA.com Protection Status

Examples of how to create a dataframe with multiple indexes with pandas:

Create a dataframe with multiple indexes from a tuple

To create a dataframe with multiple indexes from a tuple, a solution is to use MultiIndex / advanced indexing:

import pandas as pd

tuples = [('Land', 'Liquid'),
          ('Land', 'Ice'),
          ('Ocean', 'Liquid'),
          ('Ocean', 'Ice')]

index = pd.MultiIndex.from_tuples(tuples, names=["Id1", "Id2"])

df = pd.DataFrame({'Count A': [12., 70., 30., 20.], 'Count B': [12., 70., 30., 20.]}, index=index)

print(df)

returns here

              Count A  Count B
Id1   Id2                     
Land  Liquid     12.0     12.0
      Ice        70.0     70.0
Ocean Liquid     30.0     30.0
      Ice        20.0     20.0

Create a dataframe with multiple indexes using MultiIndex.from_product()

Another more efficient way to create a dataframe with multiple indexes is to use MultiIndex.from_product)

import pandas as pd

iterables = [["Land", "Ocean"], ["Liquid", "Ice"]]

index = pd.MultiIndex.from_product(iterables, names=["Id1", "Id2"])

df = pd.DataFrame({'Count A': [12., 70., 30., 20.], 'Count B': [12., 70., 30., 20.]}, index=index)

returns the same dataframe as before

              Count A  Count B
Id1   Id2                     
Land  Liquid     12.0     12.0
      Ice        70.0     70.0
Ocean Liquid     30.0     30.0
      Ice        20.0     20.0

Get dataframe MultiIndex names

To retrieve dataframe index names a solution is to do:

df.index

returns

MultiIndex([( 'Land', 'Liquid'),
            ( 'Land',    'Ice'),
            ('Ocean', 'Liquid'),
            ('Ocean',    'Ice')],
           names=['Id1', 'Id2'])

Get dataframe row value(s) for given indexes

To get the dataframe row associated with index lables "Ocean" and "Liquid":

df.loc['Ocean','Liquid']

returns

Count A    30.0
Count B    30.0
Name: (Ocean, Liquid), dtype: float64

To get only for column named "Count A":

df['Count A'].loc['Ocean','Liquid']

returns then

 30.0

To get all rows with the label "Ocean"

df.loc['Ocean',:]

returns

        Count A  Count B
Id2                     
Liquid     99.0     30.0
Ice        20.0     20.0

Modify a cell value in the dataframe

To change the value of a cell in the dataframe, a solution is to do for example

df['Count A'].loc['Ocean','Liquid'] = 99.0

print(df)

returns

              Count A  Count B
Id1   Id2                     
Land  Liquid     12.0     12.0
      Ice        70.0     70.0
Ocean Liquid     99.0     30.0
      Ice        20.0     20.0

References