Examples of how to get column index from column name with pandas:
Table of contents
- Create a dataframe with pandas
- Get column index from column name (Example 1 column name are unique)
- Example 2 column name are not unique but are adjacent to each other
- Example 3 column name are not unique and are not adjacent to each other
- Create a function to get indexes from column name
- References
Create a dataframe with pandas
Let's first create a simple dataframe with pandas:
import pandas as pdimport numpy as npdata = np.random.randint(10, size=(5,3))columns = ['Score A','Score B','Score C']df = pd.DataFrame(data=data,columns=columns)print(df)
returns
Score A Score B Score C0 3 5 61 8 1 52 3 5 33 3 4 64 4 6 7
Get column index from column name (Example 1 column name are unique)
To get the column index associated with a column name, a solution is to use get_loc(), example:
df.columns.get_loc("Score A")
gives
0
Column called "Score B"
df.columns.get_loc("Score B")
gives
1
Column called "Score C"
df.columns.get_loc("Score C")
gives
2
It is possible to create a dictionary with column name as key and associated index as value:
idx_dic = {}for col in df.columns:idx_dic[col] = df.columns.get_loc(col)print(idx_dic)
gives
{'Score A': 0, 'Score B': 1, 'Score C': 2}
Example 2 column name are not unique but are adjacent to each other
Another example but here there are several adjacent columns with the same name called "Score C":
columns = ['Score A','Score B','Score C']df = pd.DataFrame(data=data,columns=columns)data2 = np.random.randint(10, size=(5,1))columns = ['Score C']df_add = pd.DataFrame(data=data2,columns=columns)df = pd.concat([df,df_add], axis=1)df = pd.concat([df,df_add], axis=1)print(df)
returns
Score A Score B Score C Score C Score C0 3 5 6 3 31 8 1 5 1 12 3 5 3 3 33 3 4 6 2 24 4 6 7 7 7
then
df.columns.get_loc("Score C")
return a slice:
slice(2, 5, None)
Note To get the type
type(df.columns.get_loc("Score C"))
and to check the type with python:
isinstance(df.columns.get_loc("Score C"), slice)
returns here
True
Then to get a list of index associated with column "Score C", a solution is to do:
col_idx_slice = df.columns.get_loc("Score C")col_idx_list = [i+col_idx_slice.start for i in range(col_idx_slice.stop-col_idx_slice.start) ]col_idx_list[2, 3, 4]
Example 3 column name are not unique and are not adjacent to each other
Another example, here there are several columns with the same name called "Score C" but not adjacent:
columns = ['Score A','Score B','Score C']df = pd.DataFrame(data=data,columns=columns)data2 = np.random.randint(10, size=(5,2))columns = ['Score C','Score D']df_add = pd.DataFrame(data=data2,columns=columns)df = pd.concat([df,df_add], axis=1)df = pd.concat([df,df_add], axis=1)print(df)
returns
Score A Score B Score C Score C Score D Score C Score D0 3 5 6 3 9 3 91 8 1 5 0 4 0 42 3 5 3 3 0 3 03 3 4 6 6 9 6 94 4 6 7 7 4 7 4
then
df.columns.get_loc("Score C")
returns an array:
array([False, False, True, True, False, True, False])
Note: check the type
type(df.columns.get_loc("Score C"))numpy.ndarray
or
isinstance(df.columns.get_loc("Score C"), np.ndarray)
returns here:
True
To convert this boolean array to index just do:
np.where( df.columns.get_loc("Score C") )[0]
which returns
array([2, 3, 5])
and then to a list:
np.where( df.columns.get_loc("Score C") )[0].tolist()
returns
[2, 3, 5]
Create a function to get indexes from column name
Combine all 3 examples above to create a function that returns column index from column name:
Example of dataframe
Score A Score B Score C Score D Score D Score C0 3 5 6 6 7 51 8 1 5 7 3 92 3 5 3 0 1 73 3 4 6 8 2 24 4 6 7 5 0 1
Solution
def get_column_index_list(column):if type( df.columns.get_loc(column) ) == int:col_index_list = []col_index_list.append( df.columns.get_loc(column) )if isinstance(df.columns.get_loc(column), slice):col_index_list = [i+col_idx_slice.start for i in range(col_idx_slice.stop-col_idx_slice.start) ]if isinstance(df.columns.get_loc(column), np.ndarray):col_index_list = np.where( df.columns.get_loc(column) )[0].tolist()return col_index_listprint( get_column_index_list('Score A') )print( get_column_index_list('Score C') )print( get_column_index_list('Score D') )
returns
[0][2, 5][3, 4]
