If you have a pandas dataframe with qualitative and categorical data, you can split the columns into two types of values by using the select_dtypes() function. Examples:
Create a Pandas DataFrame
The following Dataframe contains both categorical and quantitative data stored in separate columns.
import pandas as pd
data = {'height':[5.9,5.4,6.3,5.8],
'weight':[100.1,98.2,120.5,91.4],
'gender':['F','M','M','F'],
'eye color':['blue','blue','black','green']}
df = pd.DataFrame(data)
print( df )
Output
height weight gender eye color
0 5.9 100.1 F blue
1 5.4 98.2 M blue
2 6.3 120.5 M black
3 5.8 91.4 F green
The aim is to distinguish between columns that are categorical and those that are quantitative.
Find Dataframe column data types
First step, find Dataframe column data types using the dtypes attribute.
The dtypes attribute returns the data type of each column as a Series, and can be used to quickly identify which columns contain numerical values, strings, or other types of data.
df.dtypes
returns
height float64
weight float64
gender object
eye color object
dtype: object
Note that you can also use the info() method to see an overview of the Dataframe and all of its columns' data types. If you need to quickly change the data type of a column, you can use the astype() method. By passing it a new data type, it will convert all of the values in that column into that new format. This is especially useful for converting numerical values to strings or vice versa. With these simple methods, you can easily check and manipulate the data types of a Dataframe in Python.
Select columns using select_dtypes()
The syntax for this function is quite simple:
dataframe.select_dtypes(include=None, exclude=None).
The include
parameter allows you to select columns by their data types. You can specify a single data type, such as float
, or multiple ones if needed, for example float
and int
. The exclude
parameter does the opposite of include: it allows you to exclude columns by their data types.
Using this function you can easily select or exclude certain columns from your Dataframe based on their data types. Examples:
Let's choose the columns with categorical data from the pandas dataframe that is associated with the object data type.
df.select_dtypes(include='object')
returns
gender eye color
0 F blue
1 M blue
2 M black
3 F green
Select quantitative data
df.select_dtypes(include='float64')
returns
height weight
0 5.9 100.1
1 5.4 98.2
2 6.3 120.5
3 5.8 91.4
References
Links | Site |
---|---|
select_dtypes | pandas.pydata.org |
dtypes | pandas.pydata.org |