How to retrieve the minimum and maximum dates (i.e., start and end dates) from a datetime column in a pandas dataframe ?


Introduction

In data analysis and manipulation, one of the most important steps is to extract and understand statistical information from a given dataset. This includes finding the maximum and minimum values for a datetime column in Pandas dataframe.

Create a dataframe by generating Random Dates

In order to learn how to extract the maximum and minimum values from a datetime column in a pandas dataframe, we first need to create a dataframe with random dates. This will allow us to practice our code and see the results in action (see previous articleHow to generate a random date within a specified range using Python ?).

First, let's begin by defining a start and end date in Python using the datetime module:

from random import randrange
from datetime import timedelta

import datetime

year = 2019
month = 8
day = 1

star_date = datetime.datetime(year,month,day)

year = 2019
month = 10
day = 21

end_date = datetime.datetime(year,month,day)

Next, we can proceed to create a function that generates random dates within the specified range:

def random_date(star_date, end):
    delta = end_date - star_date
    delta_in_second = (delta.days * 24 * 60 * 60) + delta.seconds
    random_second = randrange(delta_in_second)
    return star_date + timedelta(seconds=random_second)

Finally, we'll utilize this function to create a synthetic dataframe:

import pandas as pd
import random


data = {'date':[random_date(star_date, end_date) for i in range(20)], 
        'value':[random.randint(0,100) for i in range(20)]}

df = pd.DataFrame(data)

The code above will generate for example the following dataframe:

                  date  value
0  2019-09-20 01:25:24     23
1  2019-10-10 10:47:01     28
2  2019-10-06 05:49:40     32
3  2019-09-22 05:14:28     20
4  2019-10-05 16:11:36      2
5  2019-09-08 02:39:19     41
6  2019-09-02 11:52:19     27
7  2019-10-17 08:15:55     12
8  2019-09-26 05:44:48     29
9  2019-09-05 22:47:18     38
10 2019-08-18 13:30:46     57
11 2019-08-08 18:34:48     47
12 2019-10-02 18:37:24     79
13 2019-08-22 05:41:41     57
14 2019-10-14 12:50:51     92
15 2019-09-06 20:24:33     44
16 2019-08-21 16:30:04     48
17 2019-08-28 16:13:33     77
18 2019-10-04 00:00:51     42
19 2019-10-12 20:21:25     73

Before manipulating a dataframe, it is always a good idea to check the column types using the dtypes method. This helps ensure accuracy and efficiency in your data operations:

df.dtypes

returns here

date     datetime64[ns]
value             int64
dtype: object

Extracting Maximum and Minimum Values from a Datetime Column

In Pandas, datetime columns are treated as special type of data called datetime. This allows for efficient manipulation and extraction of information from these columns. To extract the maximum and minimum values from a datetime column in a Pandas dataframe, we can use the max() and min() functions respectively.

To illustrate, to determine the start date, we can utilize the min() function. This function will provide us with a datetime object,

start_date = df['date'].min()

print(start_date)

Output

Timestamp('2019-08-08 18:34:48')

from which we can subsequently extract specific values such as the year, month, day, hour, or minute, individually.

print(start_date.year)
print(start_date.month)
print(start_date.day)

print(start_date.hour)
print(start_date.minute)

Outputs

2019
8
8
18
34

We can also retrieve the end date by using the "max" function in a similar manner.

df['date'].max()

Output

Timestamp('2019-10-17 08:15:55')

Sorting Rows by DateTime

To organize rows based on date and time, you can utilize the sorting feature. This allows for efficient arrangement of data in a chronological order, facilitating easy analysis and interpretation. To sort rows by datetime, we can use the sort_values() function in Pandas. For example, let's say we want to sort our dataframe based on the 'date' column in ascending order:

sorted_df = df.sort_values('date')
print(sorted_df)

The output will be:

                  date  value
11 2019-08-08 18:34:48     47
10 2019-08-18 13:30:46     57
16 2019-08-21 16:30:04     48
13 2019-08-22 05:41:41     57
17 2019-08-28 16:13:33     77
6  2019-09-02 11:52:19     27
9  2019-09-05 22:47:18     38
15 2019-09-06 20:24:33     44
5  2019-09-08 02:39:19     41
0  2019-09-20 01:25:24     23
3  2019-09-22 05:14:28     20
8  2019-09-26 05:44:48     29
12 2019-10-02 18:37:24     79
18 2019-10-04 00:00:51     42
4  2019-10-05 16:11:36      2
2  2019-10-06 05:49:40     32
1  2019-10-10 10:47:01     28
19 2019-10-12 20:21:25     73
14 2019-10-14 12:50:51     92
7  2019-10-17 08:15:55     12

By utilizing the sorted dataframe with the date, we can access the earliest and latest dates using iloc:

sorted_df.iloc[0]['date']

Output

Timestamp('2019-08-08 18:34:48')

And

sorted_df.iloc[-1]['date']

Output

Timestamp('2019-10-17 08:15:55')

References

Links Site
to_datetime pandas.pydata.org
sort_values pandas.pydata.org
How to generate a random date within a specified range using Python ? moonbooks.org
pandas.DataFrame.min pandas.pydata.org
pandas.DataFrame.max pandas.pydata.org