There are several ways to compare dates in a pandas DataFrame, and the approach you choose will depend on the problem you are trying to solve. Example
Initial steps before comparing two dates (Case Study)
Suppose you're employed at NOAA and have received a CSV file that contains data about observations made by an Earth observation satellite (You can download the CSV file used here):
Open csv file
The initial step is to save the CSV information into a DataFrame:
import pandas as pdfilename = 'NOAA_Williams_Flats.csv'df = pd.read_csv(filename)
Output

We are interested in the column labeled "observation date/time"
Check columns types
It is recommended to verify the column types before comparing two dates in a pandas dataframe.
df.dtypes
which will give:
fire incident objectsatellite scene id int64latitude float64longitude float64image start date/time objectobservation date/time objectsatellite objectinstrument objectscan domain objectFRP [MW] float64MWIR brightness temp [K] float64LWIR brightness temp [K] float64pixel area [m^2] float64solar zenith angle [deg] float64dtype: object
The column "Observation Date/Time" is a data type called "object," which means it contains both numeric and non-numeric values.
Please note that in our example, it is common for the date to be stored in a format other than a datetime object. This applies specifically to our example.
Checking the number of rows for each date
It might helpful to explore the option of analyzing the number of rows for each date:
df['observation date/time'].value_counts()
gives
2019-08-08T05:41:25Z 482019-08-08T04:56:25Z 482019-08-08T05:01:24Z 482019-08-08T05:06:25Z 482019-08-08T04:41:24Z 48..2019-08-02T16:01:25Z 12019-08-02T15:56:24Z 12019-08-02T15:51:25Z 12019-08-02T15:46:24Z 12019-08-09T11:26:24Z 1Name: observation date/time, Length: 1999, dtype: int64
Use the contains() method to choose the rows corresponding to a specific date.
If we only want to select rows in the dataframe that match the specific date of "2008-08-08," there is no need to convert it into a datetime object. We can simply use the "contains()" method:
df[ df['observation date/time'].str.contains('2019-08-08') ]
gives

df = df[ df['observation date/time'].str.contains('2019-08-08') ]
Converting dataframe column to a datetime type.
In order to compare two dates in Python, it is recommended to generate datetime objects. Therefore, a new datetime column should be created using the 'observation date/time':
To do that let's define a function:
import datetimeimport redef create_datetime_column(x):obs_time = x['observation date/time']m = re.search('(.*)-(.*)-(.*)T(.*):(.*):(.*)', obs_time)year = int( m.group(1) )month = int( m.group(2) )day = int( m.group(3) )hour = int( m.group(4) )minute = int( m.group(5) )return datetime.datetime(year, month, day, hour,minute)
and apply it to our dataframe:
df.apply(create_datetime_column, axis=1)
gives
22252 2019-08-08 00:01:0022253 2019-08-08 00:01:0022254 2019-08-08 00:01:0022255 2019-08-08 00:01:0022256 2019-08-08 00:01:00...29071 2019-08-08 23:56:0029072 2019-08-08 23:56:0029073 2019-08-08 23:56:0029074 2019-08-08 23:56:0029075 2019-08-08 23:56:00Name: obs_time, Length: 6824, dtype: datetime64[ns]
Create a new column:
df['obs_time'] = df.apply(create_datetime_column, axis=1)
Comparing two dates
Suppose we intend to compare our data with another instrument at a specific time of observation.:
year = 2019month = 8day = 8hour = 20minute = 6obs_datetime = datetime.datetime(year, month, day, hour,minute)
Using == operator
One way to determine if there are rows with identical observation times is by utilizing the "== operator."
df['obs_time'] == obs_datetime
returns a column of boolean values in a dataframe:
22252 False22253 False22254 False22255 False22256 False...29071 False29072 False29073 False29074 False29075 FalseName: obs_time, Length: 6824, dtype: bool
To only select the rows with identical observation times:
df[ df['obs_time'] == obs_datetime ][['latitude','longitude','observation date/time','obs_time','FRP [MW]']]

Using != operator
df['obs_time'] != obs_datetime
Select rows with different observation times
df[ df['obs_time'] != obs_datetime ]
Using > operator
df['obs_time'] > obs_datetime
choose the rows that correspond to observation made after.
df[ df['obs_time'] != obs_datetime ]
Using < operator
df['obs_time'] < obs_datetime
select rows that correspond to the previous observation
df[ df['obs_time'] != obs_datetime ]
Time difference
df['obs_time'] - obs_datetime
gives
22252 -1 days +03:55:0022253 -1 days +03:55:0022254 -1 days +03:55:0022255 -1 days +03:55:0022256 -1 days +03:55:00...29071 0 days 03:50:0029072 0 days 03:50:0029073 0 days 03:50:0029074 0 days 03:50:0029075 0 days 03:50:00Name: obs_time, Length: 6824, dtype: timedelta64[ns]
Time difference in minutes
delta.dt.total_seconds() / 3600.0
gives
22252 -20.08333322253 -20.08333322254 -20.08333322255 -20.08333322256 -20.083333...29071 3.83333329072 3.83333329073 3.83333329074 3.83333329075 3.833333Name: obs_time, Length: 6824, dtype: float64
References
| Links | Site |
|---|---|
| contains() | pandas.pydata.org |
| datetime | docs.python.org |
| How to create a datetime column from year, month and day columns in pandas ? | moonbooks.org |
