How to select columns of a pandas Dataframe based on their data types ?

Published: June 20, 2023

Tags: Python; Pandas; Dataframe; Protection Status

If you're working with a Pandas Dataframe, you may find yourself needing to select columns based on their data types. This is possible by using the select_dtypes() function:

Create a Pandas DataFrame

The following Dataframe contains both categorical and quantitative data stored in separate columns.

import pandas as pd

data = {'height':[5.9,5.4,6.3,5.8],
        'eye color':['blue','blue','black','green']}

df = pd.DataFrame(data)

print( df )


   height  weight gender eye color
0     5.9   100.1      F      blue
1     5.4    98.2      M      blue
2     6.3   120.5      M     black
3     5.8    91.4      F     green

The aim is to distinguish between columns that are categorical and those that are quantitative.

Find Dataframe column data types

First step, find Dataframe column data types using the dtypes attribute.

The dtypes attribute returns the data type of each column as a Series, and can be used to quickly identify which columns contain numerical values, strings, or other types of data.



height       float64
weight       float64
gender        object
eye color     object
dtype: object

Note that you can also use the info() method to see an overview of the Dataframe and all of its columns' data types. If you need to quickly change the data type of a column, you can use the astype() method. By passing it a new data type, it will convert all of the values in that column into that new format. This is especially useful for converting numerical values to strings or vice versa. With these simple methods, you can easily check and manipulate the data types of a Dataframe in Python.

Select columns using select_dtypes()

The syntax for this function is quite simple:

dataframe.select_dtypes(include=None, exclude=None).

The include parameter allows you to select columns by their data types. You can specify a single data type, such as float, or multiple ones if needed, for example float and int. The exclude parameter does the opposite of include: it allows you to exclude columns by their data types.

Using this function you can easily select or exclude certain columns from your Dataframe based on their data types. Examples:

Let's choose the columns with categorical data from the pandas dataframe that is associated with the object data type.



gender eye color
0      F      blue
1      M      blue
2      M     black
3      F     green

Select quantitative data



   height  weight
0     5.9   100.1
1     5.4    98.2
2     6.3   120.5
3     5.8    91.4


Links Site