How to split qualitative and categorical data columns in a pandas dataframe ?

Published: June 20, 2023

Tags: Python; Pandas; Dataframe;

DMCA.com Protection Status

If you have a pandas dataframe with qualitative and categorical data, you can split the columns into two types of values by using the select_dtypes() function. Examples:

Create a Pandas DataFrame

The following Dataframe contains both categorical and quantitative data stored in separate columns.

import pandas as pd

data = {'height':[5.9,5.4,6.3,5.8],
        'weight':[100.1,98.2,120.5,91.4],
        'gender':['F','M','M','F'],
        'eye color':['blue','blue','black','green']}

df = pd.DataFrame(data)

print( df )

Output

   height  weight gender eye color
0     5.9   100.1      F      blue
1     5.4    98.2      M      blue
2     6.3   120.5      M     black
3     5.8    91.4      F     green

The aim is to distinguish between columns that are categorical and those that are quantitative.

Find Dataframe column data types

First step, find Dataframe column data types using the dtypes attribute.

The dtypes attribute returns the data type of each column as a Series, and can be used to quickly identify which columns contain numerical values, strings, or other types of data.

df.dtypes

returns

height       float64
weight       float64
gender        object
eye color     object
dtype: object

Note that you can also use the info() method to see an overview of the Dataframe and all of its columns' data types. If you need to quickly change the data type of a column, you can use the astype() method. By passing it a new data type, it will convert all of the values in that column into that new format. This is especially useful for converting numerical values to strings or vice versa. With these simple methods, you can easily check and manipulate the data types of a Dataframe in Python.

Select columns using select_dtypes()

The syntax for this function is quite simple:

dataframe.select_dtypes(include=None, exclude=None).

The include parameter allows you to select columns by their data types. You can specify a single data type, such as float, or multiple ones if needed, for example float and int. The exclude parameter does the opposite of include: it allows you to exclude columns by their data types.

Using this function you can easily select or exclude certain columns from your Dataframe based on their data types. Examples:

Let's choose the columns with categorical data from the pandas dataframe that is associated with the object data type.

df.select_dtypes(include='object')

returns

gender eye color
0      F      blue
1      M      blue
2      M     black
3      F     green

Select quantitative data

df.select_dtypes(include='float64')

returns

   height  weight
0     5.9   100.1
1     5.4    98.2
2     6.3   120.5
3     5.8    91.4

References

Links Site
select_dtypes pandas.pydata.org
dtypes pandas.pydata.org