How to get column index from column name with pandas ?

Published: February 23, 2021

Tags: Python; Pandas; DataFrame;

DMCA.com Protection Status

Examples of how to get column index from column name with pandas:

Create a dataframe with pandas

Let's first create a simple dataframe with pandas:

import pandas as pd
import numpy as np

data = np.random.randint(10, size=(5,3))

columns = ['Score A','Score B','Score C']

df = pd.DataFrame(data=data,columns=columns)

print(df)

returns

   Score A  Score B  Score C
0        3        5        6
1        8        1        5
2        3        5        3
3        3        4        6
4        4        6        7

Get column index from column name (Example 1 column name are unique)

To get the column index associated with a column name, a solution is to use get_loc(), example:

df.columns.get_loc("Score A")

gives

0

Column called "Score B"

df.columns.get_loc("Score B")

gives

1

Column called "Score C"

df.columns.get_loc("Score C")

gives

2

It is possible to create a dictionary with column name as key and associated index as value:

idx_dic = {}

for col in df.columns:
    idx_dic[col] = df.columns.get_loc(col)

print(idx_dic)

gives

{'Score A': 0, 'Score B': 1, 'Score C': 2}

Example 2 column name are not unique but are adjacent to each other

Another example but here there are several adjacent columns with the same name called "Score C":

columns = ['Score A','Score B','Score C']
df = pd.DataFrame(data=data,columns=columns)

data2 = np.random.randint(10, size=(5,1))

columns = ['Score C']

df_add = pd.DataFrame(data=data2,columns=columns)

df = pd.concat([df,df_add], axis=1)
df = pd.concat([df,df_add], axis=1)

print(df)

returns

   Score A  Score B  Score C  Score C  Score C
0        3        5        6        3        3
1        8        1        5        1        1
2        3        5        3        3        3
3        3        4        6        2        2
4        4        6        7        7        7

then

df.columns.get_loc("Score C")

return a slice:

slice(2, 5, None)

Note To get the type

type(df.columns.get_loc("Score C"))

and to check the type with python:

isinstance(df.columns.get_loc("Score C"), slice)

returns here

True

Then to get a list of index associated with column "Score C", a solution is to do:

col_idx_slice = df.columns.get_loc("Score C")

col_idx_list = [i+col_idx_slice.start for i in range(col_idx_slice.stop-col_idx_slice.start) ]

col_idx_list

[2, 3, 4]

Example 3 column name are not unique and are not adjacent to each other

Another example, here there are several columns with the same name called "Score C" but not adjacent:

columns = ['Score A','Score B','Score C']
df = pd.DataFrame(data=data,columns=columns)

data2 = np.random.randint(10, size=(5,2))

columns = ['Score C','Score D']

df_add = pd.DataFrame(data=data2,columns=columns)

df = pd.concat([df,df_add], axis=1)
df = pd.concat([df,df_add], axis=1)

print(df)

returns

   Score A  Score B  Score C  Score C  Score D  Score C  Score D
0        3        5        6        3        9        3        9
1        8        1        5        0        4        0        4
2        3        5        3        3        0        3        0
3        3        4        6        6        9        6        9
4        4        6        7        7        4        7        4

then

df.columns.get_loc("Score C")

returns an array:

array([False, False,  True,  True, False,  True, False])

Note: check the type

type(df.columns.get_loc("Score C"))

numpy.ndarray

or

isinstance(df.columns.get_loc("Score C"), np.ndarray)

returns here:

True

To convert this boolean array to index just do:

np.where( df.columns.get_loc("Score C") )[0]

which returns

array([2, 3, 5])

and then to a list:

np.where( df.columns.get_loc("Score C") )[0].tolist()

returns

[2, 3, 5]

Create a function to get indexes from column name

Combine all 3 examples above to create a function that returns column index from column name:

Example of dataframe

   Score A  Score B  Score C  Score D  Score D  Score C
0        3        5        6        6        7        5
1        8        1        5        7        3        9
2        3        5        3        0        1        7
3        3        4        6        8        2        2
4        4        6        7        5        0        1

Solution

def get_column_index_list(column):

    if type( df.columns.get_loc(column) ) == int:
        col_index_list = []
        col_index_list.append( df.columns.get_loc(column) )
    if isinstance(df.columns.get_loc(column), slice):
        col_index_list = [i+col_idx_slice.start for i in range(col_idx_slice.stop-col_idx_slice.start) ]
    if isinstance(df.columns.get_loc(column), np.ndarray): 
        col_index_list = np.where( df.columns.get_loc(column) )[0].tolist()

    return col_index_list

print( get_column_index_list('Score A') )
print( get_column_index_list('Score C') )
print( get_column_index_list('Score D') )

returns

[0]
[2, 5]
[3, 4]

References