To determine the data type of each column in a Pandas DataFrame, a solution is to use dtypes, examples:
Get data type of each column dtypes()
Let's consider the following dataframe:
import pandas as pddata = { 'c1':[1,2,3,4,5,6],'c2':[1.,2.,3.,4.,5.,6.],'c3':['a','b','b','d','e','f']}df = pd.DataFrame(data)print(df)
Ouput
c1 c2 c30 1 1.0 a1 2 2.0 b2 3 3.0 b3 4 4.0 d4 5 5.0 e5 6 6.0 f
To retrieve data type of each column, enter
df.dtypes
which will return a Series containing the data type for each column of the original DataFrame:
c1 int64c2 float64c3 objectdtype: object
Using info()
It is also possible to get the data type of each column using the info() method on a DataFrame. This will provide information about all columns in the DataFrame, including the data type.
For example:
df.info()
This would return something like this:
<class 'pandas.core.frame.DataFrame'>RangeIndex: 6 entries, 0 to 5Data columns (total 3 columns):# Column Non-Null Count Dtype--- ------ -------------- -----0 c1 6 non-null int641 c2 6 non-null float642 c3 6 non-null objectdtypes: float64(1), int64(1), object(1)memory usage: 272.0+ bytes
This example shows that the 'c3' column holds objects, 'c2' contains float64, and 'c1' has int64 data types.
Example of use
Merging two DataFrames
This is an example why checking the data type of a Pandas dataframe can be helpful. There are two dataframes, df1
FRP MASK Longitude Latitude0 0.0 0 -121.214928 41.8686521 0.0 0 -121.214813 41.8685492 0.0 0 -121.214699 41.8684433 0.0 0 -121.214584 41.8683324 0.0 0 -121.214470 41.868225... ... ... ... ...435827 0.0 0 -121.271240 41.782211435828 0.0 0 -121.271126 41.782104435829 0.0 0 -121.271004 41.781994435830 0.0 0 -121.270874 41.781868435831 0.0 0 -121.270760 41.781761
and df2
Longitude Latitude A0 -120.371639 42.494111 -9991 -120.371405 42.493905 -9992 -120.371191 42.493716 -9993 -120.371054 42.493590 -9994 -120.370844 42.493405 -999... ... ... ...12422595 -121.414409 41.654469 -99912422596 -121.414205 41.654282 -99912422597 -121.414020 41.654113 -99912422598 -121.413863 41.653969 -99912422599 -121.413656 41.653779 -999
, that need to be merged using the latitude and longitude. However, the merge
pd.merge(df1,df2, on=['Longitude','Latitude'], how='inner')
resulted in an empty dataframe.
The reason is that latitude and longitude may appear similar, but they are stored as different data types:
df1.dtypes
returns
FRP float32MASK int32Longitude float32Latitude float32dtype: object
while
df2.dtypes
returns
Longitude float64Latitude float64A int64dtype: object
It can be observed that the data type for latitude and longitude is float32 in df1 and float64 in df2.
One solution to transforming data from float64 to float32 is by making use of astype().
df2['Longitude'] = df2['Longitude'].astype('float32')df2['Latitude'] = df2['Latitude'].astype('float32')
Now using merge:
pd.merge(df,df_L1, on=['Longitude','Latitude'], how='inner')
will return
FRP MASK Longitude Latitude A0 0.0 0 -121.214928 41.868652 -9991 0.0 0 -121.214813 41.868549 -9992 0.0 0 -121.214699 41.868443 -9993 0.0 0 -121.214584 41.868332 -9994 0.0 0 -121.214470 41.868225 -999... ... ... ... ... ...435827 0.0 0 -121.271240 41.782211 -999435828 0.0 0 -121.271126 41.782104 -999435829 0.0 0 -121.271004 41.781994 -999435830 0.0 0 -121.270874 41.781868 -999435831 0.0 0 -121.270760 41.781761 -999
References
| Links | Site |
|---|---|
| dtypes | pandas.pydata.org |
| astype | pandas.pydata.org |
| info() | pandas.pydata.org |
| Data types | numpy.org |
