One commonly needed task when working with pandas DataFrames is to reset the index of a DataFrame. This can be done easily with the reset_index() function.
Create a dataframe
To start, let's generate a DataFrame using synthetic data:
import pandas as pd
import numpy as np
data = np.random.randint(100, size=(20,2))
df = pd.DataFrame(data=data,columns=['A','B'])
Example of output:
A B
0 56 43
1 52 38
2 78 33
3 57 79
4 80 13
5 14 20
6 79 27
7 11 49
8 68 44
9 7 67
10 61 39
11 46 4
12 94 78
13 38 2
14 29 29
15 34 14
16 18 66
17 11 63
18 30 85
19 3 21
Note that DataFrame index (row labels) are located here above in the first column.
Now lets create for example a sample
df = df.sample(5)
gives
A B
15 34 14
17 11 63
8 68 44
11 46 4
0 56 43
One can see here that the sample has random index as expected.
Reset dataframe index (case 1)
To reset dataframe index a solution is to use pandas.DataFrame.reset_index:
df = df.reset_index()
Output
index A B
0 15 34 14
1 17 11 63
2 8 68 44
3 11 46 4
4 0 56 43
Note that reset_index() also create a new column called index that stored the previous index.
Reset dataframe index (case 2)
If you do not want this extra column just re-create a new sample:
data = np.random.randint(100, size=(20,2))
df = pd.DataFrame(data=data,columns=['A','B'])
df = df.sample(5)
Output
A B
17 88 80
14 10 25
19 23 26
8 32 45
4 40 51
In order to reset the index of the pandas DataFrame "df" with its default values and drop the existing index column, the following code could be used: df.reset_index(drop=True):
df = df.reset_index(drop=True)
Output:
A B
0 88 80
1 10 25
2 23 26
3 32 45
4 40 51
Reset dataframe index (case 3)
To start the index at 1 instead of 0, a solution is then to do:
df.index = df.index + 1
print(df)
Output
A B
1 88 80
2 10 25
3 23 26
4 32 45
5 40 51
or to add any Incremental numbers:
df.index = df.index + 200
print(df)
Output
A B
201 88 80
202 10 25
203 23 26
204 32 45
205 40 51
Apply a function to dataframe index
To apply a function to dataframe index, a solution is to do:
df = df.reset_index()
df['index'] = df['index'].apply(np.sqrt)
df.index = df['index']
df.drop(['index'], axis=1, inplace=True)
Output
A B
index
14.177447 88 80
14.212670 10 25
14.247807 23 26
14.282857 32 45
14.317821 40 51
Use an existing dataframe column as index
Another solution is to use an existing dataframe column as index with pandas.DataFrame.reset_index:
data = np.random.randint(100, size=(20,2))
df = pd.DataFrame(data=data,columns=['A','B'])
df = df.set_index('A')
returns for example
B
A
30 35
73 48
27 64
39 99
26 43
98 99
5 86
36 75
41 86
32 59
15 84
29 12
99 58
58 7
87 86
12 90
18 8
46 82
2 28
60 17
References
Links | Site |
---|---|
pandas.DataFrame.reset_index | pandas.pydata.org |
pandas.DataFrame.set_index | pandas.pydata.org |
pandas.DataFrame.drop | pandas.pydata.org |