Pandas Dataframes are widely used in data analysis and manipulation. Sorting the values within a column of a pandas dataframe is one of the most common tasks while working with data. It can be achieved by using the sort_values() function. Examples:
Sorting the values within a column
The syntax for this method is as follows:
df['column name'].sort_values(ascending=True or False)
where, 'column name' corresponds to the column in the dataframe which needs to be sorted. The ascending
parameter is used to specify whether the values need to be sorted in ascending (smallest to largest) or descending (largest to smallest) order. It is set to True
by default.
Sort the values within a column in ascending order
For example, if you have a dataframe df that contains the following values:
import pandas as pd
data = {'User_id':[0,1,2,3,4],
'Age':[25,30,29,26,22],
'Name':['Ben','Anna','Zoe','Tom','John']}
df = pd.DataFrame(data)
Output
User_id Age Name
0 0 25 Ben
1 1 30 Anna
2 2 29 Zoe
3 3 26 Tom
4 4 22 John
To sort the values in the 'Age' column in ascending order, run the following command:
df.sort_values('Age', ascending = True)
This would give an output as :
User_id Age Name
4 4 22 John
0 0 25 Ben
3 3 26 Tom
2 2 29 Zoe
1 1 30 Anna
Note that
df.sort_values('Age')
returns the same output.
Sort the values within a column in descending order
Similarly, to sort in descending order, set the parameter ascending = False
:
df.sort_values('Age', ascending = False)
This would give an output as :
User_id Age Name
1 1 30 Anna
2 2 29 Zoe
3 3 26 Tom
0 0 25 Ben
4 4 22 John
Additional features
Save sorted output
For those seeking to permanently modify the dataframe, an additional feature is available. This allows you to save your sorted output and make it a permanent alteration in the dataframe structure, a solution is to add option inplace=True:
df.sort_values(by=['Age'], inplace=True)
Show sorted output for a given column
To only show the Age column:
df['Age'].sort_values()
Output
4 22
0 25
3 26
2 29
1 30
Name: Age, dtype: int64
Sort the values within a column of strings
Another example sorting the dataframe using the 'Name' column
df.sort_values(by=['Name'])
print(df.sort_values(by=['Name']))
returns
User_id Age Name
1 1 30 Anna
0 0 25 Ben
4 4 22 John
3 3 26 Tom
2 2 29 Zoe
Sort by multiple columns in the dataframe
You can also sort by multiple columns in the dataframe. To do so, you specify a list of column names as an argument to the sort_values()
function. The syntax for this is:
df.sort_values(['column1', 'column2'], ascending = True)
where, column1
and column2
correspond to the columns in the dataframe which need to be sorted. This will sort column 1 values in ascending order followed by column 2 values in ascending order.
For example, if the dataframe contains a column 'Gender' in addition to 'Age', you can sort values in both columns together by running:
import pandas as pd
data = {'User_id':[0,1,2,3,4],
'Age':[20,20,20,26,22],
'Name':['Ben','Anna','Zoe','Tom','John']}
df = pd.DataFrame(data)
Output
User_id Age Name
0 0 20 Ben
1 1 20 Anna
2 2 20 Zoe
3 3 26 Tom
4 4 22 John
then
df.sort_values(by=['Age','Name'])
This would give an output as :
User_id Age Name
1 1 20 Anna
0 0 20 Ben
2 2 20 Zoe
4 4 22 John
3 3 26 Tom
References
Links | Site |
---|---|
sort_values | pandas.pydata.org |