Examples of how to get column index from column name with pandas:
Table of contents
- Create a dataframe with pandas
- Get column index from column name (Example 1 column name are unique)
- Example 2 column name are not unique but are adjacent to each other
- Example 3 column name are not unique and are not adjacent to each other
- Create a function to get indexes from column name
- References
Create a dataframe with pandas
Let's first create a simple dataframe with pandas:
import pandas as pd
import numpy as np
data = np.random.randint(10, size=(5,3))
columns = ['Score A','Score B','Score C']
df = pd.DataFrame(data=data,columns=columns)
print(df)
returns
Score A Score B Score C
0 3 5 6
1 8 1 5
2 3 5 3
3 3 4 6
4 4 6 7
Get column index from column name (Example 1 column name are unique)
To get the column index associated with a column name, a solution is to use get_loc(), example:
df.columns.get_loc("Score A")
gives
0
Column called "Score B"
df.columns.get_loc("Score B")
gives
1
Column called "Score C"
df.columns.get_loc("Score C")
gives
2
It is possible to create a dictionary with column name as key and associated index as value:
idx_dic = {}
for col in df.columns:
idx_dic[col] = df.columns.get_loc(col)
print(idx_dic)
gives
{'Score A': 0, 'Score B': 1, 'Score C': 2}
Example 2 column name are not unique but are adjacent to each other
Another example but here there are several adjacent columns with the same name called "Score C":
columns = ['Score A','Score B','Score C']
df = pd.DataFrame(data=data,columns=columns)
data2 = np.random.randint(10, size=(5,1))
columns = ['Score C']
df_add = pd.DataFrame(data=data2,columns=columns)
df = pd.concat([df,df_add], axis=1)
df = pd.concat([df,df_add], axis=1)
print(df)
returns
Score A Score B Score C Score C Score C
0 3 5 6 3 3
1 8 1 5 1 1
2 3 5 3 3 3
3 3 4 6 2 2
4 4 6 7 7 7
then
df.columns.get_loc("Score C")
return a slice:
slice(2, 5, None)
Note To get the type
type(df.columns.get_loc("Score C"))
and to check the type with python:
isinstance(df.columns.get_loc("Score C"), slice)
returns here
True
Then to get a list of index associated with column "Score C", a solution is to do:
col_idx_slice = df.columns.get_loc("Score C")
col_idx_list = [i+col_idx_slice.start for i in range(col_idx_slice.stop-col_idx_slice.start) ]
col_idx_list
[2, 3, 4]
Example 3 column name are not unique and are not adjacent to each other
Another example, here there are several columns with the same name called "Score C" but not adjacent:
columns = ['Score A','Score B','Score C']
df = pd.DataFrame(data=data,columns=columns)
data2 = np.random.randint(10, size=(5,2))
columns = ['Score C','Score D']
df_add = pd.DataFrame(data=data2,columns=columns)
df = pd.concat([df,df_add], axis=1)
df = pd.concat([df,df_add], axis=1)
print(df)
returns
Score A Score B Score C Score C Score D Score C Score D
0 3 5 6 3 9 3 9
1 8 1 5 0 4 0 4
2 3 5 3 3 0 3 0
3 3 4 6 6 9 6 9
4 4 6 7 7 4 7 4
then
df.columns.get_loc("Score C")
returns an array:
array([False, False, True, True, False, True, False])
Note: check the type
type(df.columns.get_loc("Score C"))
numpy.ndarray
or
isinstance(df.columns.get_loc("Score C"), np.ndarray)
returns here:
True
To convert this boolean array to index just do:
np.where( df.columns.get_loc("Score C") )[0]
which returns
array([2, 3, 5])
and then to a list:
np.where( df.columns.get_loc("Score C") )[0].tolist()
returns
[2, 3, 5]
Create a function to get indexes from column name
Combine all 3 examples above to create a function that returns column index from column name:
Example of dataframe
Score A Score B Score C Score D Score D Score C
0 3 5 6 6 7 5
1 8 1 5 7 3 9
2 3 5 3 0 1 7
3 3 4 6 8 2 2
4 4 6 7 5 0 1
Solution
def get_column_index_list(column):
if type( df.columns.get_loc(column) ) == int:
col_index_list = []
col_index_list.append( df.columns.get_loc(column) )
if isinstance(df.columns.get_loc(column), slice):
col_index_list = [i+col_idx_slice.start for i in range(col_idx_slice.stop-col_idx_slice.start) ]
if isinstance(df.columns.get_loc(column), np.ndarray):
col_index_list = np.where( df.columns.get_loc(column) )[0].tolist()
return col_index_list
print( get_column_index_list('Score A') )
print( get_column_index_list('Score C') )
print( get_column_index_list('Score D') )
returns
[0]
[2, 5]
[3, 4]