How to compare two dates in a pandas DataFrame ?

Published: April 20, 2023

Tags: Python; Pandas; Dataframe;

DMCA.com Protection Status

There are several ways to compare dates in a pandas DataFrame, and the approach you choose will depend on the problem you are trying to solve. Example

Initial steps before comparing two dates (Case Study)

Suppose you're employed at NOAA and have received a CSV file that contains data about observations made by an Earth observation satellite (You can download the CSV file used here):

Open csv file

The initial step is to save the CSV information into a DataFrame:

import pandas as pd

filename = 'NOAA_Williams_Flats.csv'

df = pd.read_csv(filename)

Output

How to compare two dates in a pandas DataFrame ?
How to compare two dates in a pandas DataFrame ?

We are interested in the column labeled "observation date/time"

Check columns types

It is recommended to verify the column types before comparing two dates in a pandas dataframe.

df.dtypes

which will give:

fire incident                object
satellite scene id            int64
latitude                    float64
longitude                   float64
image start date/time        object
observation date/time        object
satellite                    object
instrument                   object
scan domain                  object
FRP [MW]                    float64
MWIR brightness temp [K]    float64
LWIR brightness temp [K]    float64
pixel area [m^2]            float64
solar zenith angle [deg]    float64
dtype: object

The column "Observation Date/Time" is a data type called "object," which means it contains both numeric and non-numeric values.

Please note that in our example, it is common for the date to be stored in a format other than a datetime object. This applies specifically to our example.

Checking the number of rows for each date

It might helpful to explore the option of analyzing the number of rows for each date:

df['observation date/time'].value_counts()

gives

2019-08-08T05:41:25Z    48
2019-08-08T04:56:25Z    48
2019-08-08T05:01:24Z    48
2019-08-08T05:06:25Z    48
2019-08-08T04:41:24Z    48
                        ..
2019-08-02T16:01:25Z     1
2019-08-02T15:56:24Z     1
2019-08-02T15:51:25Z     1
2019-08-02T15:46:24Z     1
2019-08-09T11:26:24Z     1
Name: observation date/time, Length: 1999, dtype: int64

Use the contains() method to choose the rows corresponding to a specific date.

If we only want to select rows in the dataframe that match the specific date of "2008-08-08," there is no need to convert it into a datetime object. We can simply use the "contains()" method:

df[ df['observation date/time'].str.contains('2019-08-08')  ]

gives

How to compare two dates in a pandas DataFrame ?
How to compare two dates in a pandas DataFrame ?

df = df[ df['observation date/time'].str.contains('2019-08-08')  ]

Converting dataframe column to a datetime type.

In order to compare two dates in Python, it is recommended to generate datetime objects. Therefore, a new datetime column should be created using the 'observation date/time':

To do that let's define a function:

import datetime
import re

def create_datetime_column(x):

    obs_time = x['observation date/time']

    m = re.search('(.*)-(.*)-(.*)T(.*):(.*):(.*)', obs_time)

    year = int( m.group(1) )
    month = int( m.group(2) )
    day = int( m.group(3) )

    hour = int( m.group(4) )
    minute = int( m.group(5) )

    return datetime.datetime(year, month, day, hour,minute)

and apply it to our dataframe:

df.apply(create_datetime_column, axis=1)

gives

22252   2019-08-08 00:01:00
22253   2019-08-08 00:01:00
22254   2019-08-08 00:01:00
22255   2019-08-08 00:01:00
22256   2019-08-08 00:01:00
                ...        
29071   2019-08-08 23:56:00
29072   2019-08-08 23:56:00
29073   2019-08-08 23:56:00
29074   2019-08-08 23:56:00
29075   2019-08-08 23:56:00
Name: obs_time, Length: 6824, dtype: datetime64[ns]

Create a new column:

df['obs_time'] = df.apply(create_datetime_column, axis=1)

Comparing two dates

Suppose we intend to compare our data with another instrument at a specific time of observation.:

year = 2019
month = 8
day = 8

hour = 20
minute = 6

obs_datetime = datetime.datetime(year, month, day, hour,minute)

Using == operator

One way to determine if there are rows with identical observation times is by utilizing the "== operator."

df['obs_time'] == obs_datetime

returns a column of boolean values in a dataframe:

22252    False
22253    False
22254    False
22255    False
22256    False
         ...  
29071    False
29072    False
29073    False
29074    False
29075    False
Name: obs_time, Length: 6824, dtype: bool

To only select the rows with identical observation times:

df[ df['obs_time'] == obs_datetime ][['latitude','longitude','observation date/time','obs_time','FRP [MW]']]

How to compare two dates in a pandas DataFrame ?
How to compare two dates in a pandas DataFrame ?

Using != operator

df['obs_time'] != obs_datetime

Select rows with different observation times

df[ df['obs_time'] != obs_datetime ]

Using > operator

df['obs_time'] > obs_datetime

choose the rows that correspond to observation made after.

df[ df['obs_time'] != obs_datetime ]

Using < operator

df['obs_time'] < obs_datetime

select rows that correspond to the previous observation

df[ df['obs_time'] != obs_datetime ]

Time difference

 df['obs_time'] - obs_datetime

gives

22252   -1 days +03:55:00
22253   -1 days +03:55:00
22254   -1 days +03:55:00
22255   -1 days +03:55:00
22256   -1 days +03:55:00
               ...       
29071     0 days 03:50:00
29072     0 days 03:50:00
29073     0 days 03:50:00
29074     0 days 03:50:00
29075     0 days 03:50:00
Name: obs_time, Length: 6824, dtype: timedelta64[ns]

Time difference in minutes

delta.dt.total_seconds() / 3600.0

gives

22252   -20.083333
22253   -20.083333
22254   -20.083333
22255   -20.083333
22256   -20.083333
           ...    
29071     3.833333
29072     3.833333
29073     3.833333
29074     3.833333
29075     3.833333
Name: obs_time, Length: 6824, dtype: float64

References

Links Site
contains() pandas.pydata.org
datetime docs.python.org
How to create a datetime column from year, month and day columns in pandas ? moonbooks.org
Image

of