To determine the data type of each column in a Pandas DataFrame, a solution is to use dtypes, examples:
Get data type of each column dtypes()
Let's consider the following dataframe:
import pandas as pd
data = { 'c1':[1,2,3,4,5,6],
'c2':[1.,2.,3.,4.,5.,6.],
'c3':['a','b','b','d','e','f']
}
df = pd.DataFrame(data)
print(df)
Ouput
c1 c2 c3
0 1 1.0 a
1 2 2.0 b
2 3 3.0 b
3 4 4.0 d
4 5 5.0 e
5 6 6.0 f
To retrieve data type of each column, enter
df.dtypes
which will return a Series containing the data type for each column of the original DataFrame:
c1 int64
c2 float64
c3 object
dtype: object
Using info()
It is also possible to get the data type of each column using the info() method on a DataFrame. This will provide information about all columns in the DataFrame, including the data type.
For example:
df.info()
This would return something like this:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 c1 6 non-null int64
1 c2 6 non-null float64
2 c3 6 non-null object
dtypes: float64(1), int64(1), object(1)
memory usage: 272.0+ bytes
This example shows that the 'c3' column holds objects, 'c2' contains float64, and 'c1' has int64 data types.
Example of use
Merging two DataFrames
This is an example why checking the data type of a Pandas dataframe can be helpful. There are two dataframes, df1
FRP MASK Longitude Latitude
0 0.0 0 -121.214928 41.868652
1 0.0 0 -121.214813 41.868549
2 0.0 0 -121.214699 41.868443
3 0.0 0 -121.214584 41.868332
4 0.0 0 -121.214470 41.868225
... ... ... ... ...
435827 0.0 0 -121.271240 41.782211
435828 0.0 0 -121.271126 41.782104
435829 0.0 0 -121.271004 41.781994
435830 0.0 0 -121.270874 41.781868
435831 0.0 0 -121.270760 41.781761
and df2
Longitude Latitude A
0 -120.371639 42.494111 -999
1 -120.371405 42.493905 -999
2 -120.371191 42.493716 -999
3 -120.371054 42.493590 -999
4 -120.370844 42.493405 -999
... ... ... ...
12422595 -121.414409 41.654469 -999
12422596 -121.414205 41.654282 -999
12422597 -121.414020 41.654113 -999
12422598 -121.413863 41.653969 -999
12422599 -121.413656 41.653779 -999
, that need to be merged using the latitude and longitude. However, the merge
pd.merge(df1,df2, on=['Longitude','Latitude'], how='inner')
resulted in an empty dataframe.
The reason is that latitude and longitude may appear similar, but they are stored as different data types:
df1.dtypes
returns
FRP float32
MASK int32
Longitude float32
Latitude float32
dtype: object
while
df2.dtypes
returns
Longitude float64
Latitude float64
A int64
dtype: object
It can be observed that the data type for latitude and longitude is float32 in df1 and float64 in df2.
One solution to transforming data from float64 to float32 is by making use of astype().
df2['Longitude'] = df2['Longitude'].astype('float32')
df2['Latitude'] = df2['Latitude'].astype('float32')
Now using merge:
pd.merge(df,df_L1, on=['Longitude','Latitude'], how='inner')
will return
FRP MASK Longitude Latitude A
0 0.0 0 -121.214928 41.868652 -999
1 0.0 0 -121.214813 41.868549 -999
2 0.0 0 -121.214699 41.868443 -999
3 0.0 0 -121.214584 41.868332 -999
4 0.0 0 -121.214470 41.868225 -999
... ... ... ... ... ...
435827 0.0 0 -121.271240 41.782211 -999
435828 0.0 0 -121.271126 41.782104 -999
435829 0.0 0 -121.271004 41.781994 -999
435830 0.0 0 -121.270874 41.781868 -999
435831 0.0 0 -121.270760 41.781761 -999
References
Links | Site |
---|---|
dtypes | pandas.pydata.org |
astype | pandas.pydata.org |
info() | pandas.pydata.org |
Data types | numpy.org |