Introduction
A time series is a sequence of data points recorded at successive time intervals. Analyzing time series data is crucial for uncovering patterns, trends, and anomalies that can inform decision-making across various fields, such as climate science, economics, and engineering.
In this tutorial, we’ll use Python to create, process, and visualize a time series of atmospheric CO₂ levels using the Mauna Loa dataset. This guide is designed for beginners and intermediate users interested in applying Python for time series analysis.
Dataset Overview
The Mauna Loa CO2 dataset is a globally recognized resource for studying atmospheric carbon dioxide concentrations. Managed by NOAA, it has provided continuous monthly CO2 measurements since 1958, offering invaluable insights into long-term climate trends. The dataset includes fields such as year, month, average CO2 concentration, and deseasonalized CO2 levels.
You can download the dataset from NOAA’s website here.
Import Libraries and Load the Data
First, we import the necessary libraries and load the dataset into a pandas DataFrame.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import pandas as pd import matplotlib.pyplot as plt import numpy as np import datetime import warnings warnings.filterwarnings('ignore') # Load CO₂ data co2_data = pd.read_csv( "./inputs/co2_mm_mlo.csv", # Replace with your file path skiprows=40, na_values=["-99.99"]) print(co2_data) |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | year month decimal date average deseasonalized ndays sdev unc 0 1958 3 1958.2027 315.71 314.44 -1 -9.99 -0.99 1 1958 4 1958.2877 317.45 315.16 -1 -9.99 -0.99 2 1958 5 1958.3699 317.51 314.69 -1 -9.99 -0.99 3 1958 6 1958.4548 317.27 315.15 -1 -9.99 -0.99 4 1958 7 1958.5370 315.87 315.20 -1 -9.99 -0.99 .. ... ... ... ... ... ... ... ... 796 2024 7 2024.5417 425.55 425.11 24 0.69 0.27 797 2024 8 2024.6250 422.99 424.83 22 1.08 0.44 798 2024 9 2024.7083 422.03 425.44 18 0.41 0.18 799 2024 10 2024.7917 422.38 425.63 22 0.35 0.14 800 2024 11 2024.8750 423.85 425.84 24 0.33 0.13 [801 rows x 8 columns] |
Process and Format the Data
To analyze the dataset effectively, we clean it by removing missing values and adding a datetime column for easier indexing and visualization.
1 2 3 4 | co2_data = co2_data.dropna() co2_data['date'] = pd.to_datetime(co2_data[['year', 'month']].assign(day=1)) print( co2_data.dtypes ) |
Output:
1 2 3 4 5 6 7 8 9 10 | year int64 month int64 decimal date float64 average float64 deseasonalized float64 ndays int64 sdev float64 unc float64 date datetime64[ns] dtype: object |
Visualize the Time Series
Example of how to extract date column values
1 | co2_data['date'] |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 | 0 1958-03-01 1 1958-04-01 2 1958-05-01 3 1958-06-01 4 1958-07-01 ... 796 2024-07-01 797 2024-08-01 798 2024-09-01 799 2024-10-01 800 2024-11-01 Name: date, Length: 801, dtype: datetime64[ns] |
Example of how to extract C02 concentation column values
1 | co2_data['average'] |
Output:
1 2 3 4 5 6 7 8 9 10 11 12 | 0 315.71 1 317.45 2 317.51 3 317.27 4 315.87 ... 796 425.55 797 422.99 798 422.03 799 422.38 800 423.85 Name: average, Length: 801, dtype: float64 |
Initial Visualization
The first step in time series analysis is visualization. Here, we plot the CO₂ levels over time.
1 2 3 4 5 6 7 8 9 10 | x_values = co2_data['date'] y_values = co2_data['average'] plt.plot(x_values,y_values) plt.title("Mauna Loa C02", fontsize=11) plt.savefig('./outputs/time_series_01.png', dpi=100, bbox_inches='tight') plt.show() |
Ouput
Zooming into a Specific Period
We can focus on specific time periods by filtering the dataset. Here, we examine CO₂ levels from 1980 to 1984.
1 2 3 4 5 | start_date = datetime.datetime(1980, 1, 1) end_date = datetime.datetime(1984, 10, 1) x_sub_values = co2_data['date'][ (co2_data['date'] > start_date) & (co2_data['date'] < end_date) ] y_sub_values = co2_data['average'][ (co2_data['date'] > start_date) & (co2_data['date'] < end_date) ] |
Output
1 2 3 4 5 6 7 | plt.plot(x_sub_values,y_sub_values) plt.title("Mauna Loa C02", fontsize=11) plt.savefig('./outputs/time_series_02.png', dpi=100, bbox_inches='tight') plt.show() |
Customizing the Time Series Plot
Customizing plots enhances their interpretability. Below are some common customizations.
Increasing Figure Size
To make the plot more readable and visually appealing, especially for presentations or reports, increasing the figure size is a key step. A larger figure ensures that details, such as trends and fluctuations in the time series, are clearly visible. Here’s how you can adjust the figure size and incorporate additional elements for a polished visualization:
1 2 3 4 5 6 7 8 9 | fig, ax = plt.subplots(figsize=(16,4)) plt.plot(x_values,y_values) plt.title("Mauna Loa C02", fontsize=11) plt.savefig('./outputs/time_series_03.png', dpi=100, bbox_inches='tight') plt.show() |
Figure Size: Set to (16, 4) for a wide, panoramic view of the time series, making it easier to observe long-term trends.
Adding Axis Titles
Axis titles are essential for clearly conveying the meaning of the data plotted on each axis. This helps viewers understand the context of the visualization without additional explanation.
1 2 3 4 5 6 7 8 9 10 11 12 | fig, ax = plt.subplots(figsize=(16,4)) plt.plot(x_values,y_values) plt.title("Mauna Loa C02", fontsize=12) plt.xlabel('Days', fontsize=11) plt.ylabel('Values', fontsize=11) plt.savefig('./outputs/time_series_04.png', dpi=100, bbox_inches='tight') plt.show() |
The labelpad parameter is a useful feature in Matplotlib that adds extra space between the axis labels and the axis itself. This improves readability and prevents the labels from appearing too close to the plot area, especially in complex or dense visualizations.
1 2 3 4 5 6 7 8 9 10 11 12 | fig, ax = plt.subplots(figsize=(16,4)) plt.plot(x_values,y_values) plt.title("Mauna Loa C02", fontsize=12) plt.xlabel('Days', fontsize=11, labelpad=20) plt.ylabel('Values', fontsize=11, labelpad=20) plt.savefig('./outputs/time_series_05.png', dpi=100, bbox_inches='tight') plt.show() |
Adding Custom Tick Marks
Customizing tick marks is a powerful way to make plots more readable and informative, especially when working with time series data. In this example, we define specific positions and labels for the x-axis tick marks, selecting them at regular intervals to avoid clutter.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | fig, ax = plt.subplots(figsize=(16,4)) x_tick_positions = co2_data['date'] x_tick_labels = co2_data['date'] x_tick_positions = [d for d in x_tick_positions if d.year % 10 == 0 and d.month == 1] x_tick_labels = [d for d in x_tick_positions if d.year % 10 == 0 and d.month == 1] plt.plot(x_values,y_values) plt.title("Mauna Loa C02", fontsize=12) plt.xlabel('Days', fontsize=11, labelpad=20) plt.ylabel('Values', fontsize=11, labelpad=20) plt.xticks(x_tick_positions, x_tick_labels, rotation=90, fontsize=11) plt.savefig('./outputs/time_series_07.png', dpi=100, bbox_inches='tight') plt.show() |
Formatted Tick Labels (see python datetime):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | x_tick_positions = co2_data['date'] x_tick_labels = co2_data['date'] x_tick_positions = [d for d in x_tick_positions if d.year % 10 == 0 and d.month == 1] x_tick_labels = [d for d in x_tick_positions if d.year % 10 == 0 and d.month == 1] x_tick_labels = [ l.strftime("%Y-%m-%d") for l in x_tick_labels] fig, ax = plt.subplots(figsize=(16,4)) plt.plot(x_values,y_values) plt.title("Mauna Loa C02", fontsize=12) plt.xlabel('Days', fontsize=11, labelpad=20) plt.ylabel('Values', fontsize=11, labelpad=20) plt.xticks(x_tick_positions, x_tick_labels, rotation=90, fontsize=11) plt.savefig('./outputs/time_series_08.png', dpi=100, bbox_inches='tight') plt.show() |
Customizing Matplotlib Axis Colors
Customizing the axis colors in Matplotlib can enhance the visual appeal and highlight important aspects of the plot. By modifying the axis spines, tick marks, and labels, you can make your visualizations more visually appealing and easier to interpret.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | fig, ax = plt.subplots(figsize=(16,4)) plt.plot(x_values,y_values) plt.title("Mauna Loa C02", fontsize=12) plt.xlabel('Days', fontsize=11, labelpad=20) plt.ylabel('Values', fontsize=11, labelpad=20) plt.xticks(x_tick_positions, x_tick_labels, rotation=90, fontsize=11, color='red') plt.savefig('./outputs/time_series_09.png', dpi=100, bbox_inches='tight') plt.show() |
Increasing padding between tick labels and axis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | fig, ax = plt.subplots(figsize=(16,4)) plt.plot(x_values,y_values) plt.title("Mauna Loa C02", fontsize=12) plt.xlabel('Days', fontsize=11, labelpad=20) plt.ylabel('Values', fontsize=11, labelpad=20) plt.xticks(x_tick_positions, x_tick_labels, rotation=90, fontsize=11, color='red') ax.tick_params(axis='x', which='major', pad=15) plt.savefig('./outputs/time_series_10.png', dpi=100, bbox_inches='tight') plt.show() |
Using Seaborn to Enhance the Plot
Seaborn is a powerful Python visualization library built on top of Matplotlib. It provides high-level interfaces for creating attractive and informative statistical graphics with better aesthetics and functionality compared to traditional Matplotlib plots.
By using Seaborn's built-in styles, color palettes, and improved plotting techniques, you can create visually appealing plots with minimal customization. Seaborn's integration with Pandas also makes it easy to work with data frames directly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import seaborn as sn import seaborn as sns; sns.set() fig, ax = plt.subplots(figsize=(16,4)) plt.plot(x_values,y_values) plt.title("Mauna Loa C02", fontsize=12) plt.xlabel('Days', fontsize=11, labelpad=20) plt.ylabel('Values', fontsize=11, labelpad=20) plt.savefig('./outputs/time_series_11.png', dpi=100, bbox_inches='tight') plt.show() |
Advanced Visualization: Dual Axes Plot
Visualizing Data with Two Y-Axes
In this advanced plot, we combine two distinct data sets—atmospheric CO₂ concentrations and global temperature anomalies—on the same plot using dual y-axes. This technique allows us to visualize two data series with different units or scales on a single figure. For this visualization, we'll use Matplotlib’s twinx() function, which provides a second y-axis sharing the same x-axis, enabling a more intuitive comparison of related data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | import pandas as pd import matplotlib.pyplot as plt import numpy as np # Load CO₂ data co2_data = pd.read_csv( "./inputs/co2_mm_mlo.csv", # Replace with your file path skiprows=40, na_values=["-99.99"]) co2_data = co2_data.dropna() co2_data['date'] = pd.to_datetime(co2_data[['year', 'month']].assign(day=1)) # Load temperature anomaly data temp_data = pd.read_csv( "./inputs/GLB.Ts+dSST.csv", # Replace with your file path skiprows=1 ) temp_data = temp_data.rename(columns={"Year": "year", "J-D": "anomaly"}) temp_data = temp_data[['year', 'anomaly']] temp_data['date'] = pd.to_datetime(temp_data['year'].astype(str) + '-07-01') # Mid-year for annual data start_date = max(co2_data['date'].min(), temp_data['date'].min()) end_date = min(co2_data['date'].max(), temp_data['date'].max()) co2_filtered = co2_data[(co2_data['date'] >= start_date) & (co2_data['date'] <= end_date)] temp_filtered = temp_data[(temp_data['date'] >= start_date) & (temp_data['date'] <= end_date)] # Ensure 'anomaly' column is numeric temp_filtered['anomaly'] = pd.to_numeric(temp_filtered['anomaly'], errors='coerce') # Drop rows with NaN values in 'anomaly' after conversion temp_filtered = temp_filtered.dropna(subset=['anomaly']) # Separate data into positive and negative anomalies positive_anomalies = temp_filtered[temp_filtered['anomaly'] > 0] negative_anomalies = temp_filtered[temp_filtered['anomaly'] <= 0] # Plotting fig, ax1 = plt.subplots(figsize=(10, 6)) # Plot CO₂ data ax1.plot(co2_filtered['year'], co2_filtered['average'], 'g-', label='CO2 (ppm)') ax1.set_xlabel('Year') ax1.set_ylabel('CO2 Concentration (ppm)', color='g') ax1.tick_params(axis='y', colors='g') ax1.legend(loc='upper left') # Add a second axis for temperature anomalies ax2 = ax1.twinx() # Plot bars with different colors ax2.bar(positive_anomalies['year'], positive_anomalies['anomaly'], color='r', alpha=0.7, label='Positive Anomaly') ax2.bar(negative_anomalies['year'], negative_anomalies['anomaly'], color='b', alpha=0.7, label='Negative Anomaly') # Set secondary y-axis label and formatting ax2.set_ylabel('Temperature Anomaly (°C)', color='k') ax2.tick_params(axis='y', colors='k') ax2.legend(loc='upper right') # Add title and grid plt.title('Atmospheric CO2 and Global Temperature Anomalies') plt.grid() plt.tight_layout() plt.savefig('./outputs/time_series_12.png', dpi=100, bbox_inches='tight') plt.show() |
The twinx() function allows the creation of two y-axes: one for CO2 concentrations (ppm) and another for temperature anomalies (°C). This is particularly useful when comparing two data sets with different units and scales.
Visualizing Data with Two X-Axes
A dual x-axis plot is useful when comparing two datasets with different time intervals. For example, you might have observations from different sensors that record data at different frequencies. This visualization technique allows you to plot both datasets on the same graph with separate x-axes, providing a clear comparison.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | fig, ax = plt.subplots(figsize=(16,4)) #-------------------------------------------------------------# x_tick_positions = co2_data['date'] x_tick_labels = co2_data['date'] x_tick_positions = [d for d in x_tick_positions if d.year % 10 == 0 and d.month == 1] x_tick_labels = [d for d in x_tick_positions if d.year % 10 == 0 and d.month == 1] x_tick_labels = [ l.strftime("%Y-%m-%d") for l in x_tick_labels] plt.plot(x_values,y_values) plt.xlabel('Obs 1', fontsize=11, labelpad=20) plt.ylabel('Values', fontsize=11, labelpad=20) plt.xticks(x_tick_positions, x_tick_labels, rotation=90, fontsize=11) #-------------------------------------------------------------# x_tick_positions = co2_data['date'] x_tick_labels = co2_data['date'] x_tick_positions = [d for d in x_tick_positions if d.year % 5 == 0 and d.month == 1] x_tick_labels = [d for d in x_tick_positions if d.year % 5 == 0 and d.month == 1] x_tick_labels = [ l.strftime("%Y-%m-%d") for l in x_tick_labels] def identity_func(x): return x ax_secondary = ax.secondary_xaxis('top', functions=(identity_func, identity_func)) ax_secondary.tick_params(axis='x', colors='black', labelsize=11) ax_secondary.set_xlabel('Obs 2', color='black', fontsize=11, labelpad=20) ax_secondary.set_xticks(x_tick_positions) ax_secondary.set_xticklabels(x_tick_labels, rotation=90, fontsize=11) #-------------------------------------------------------------# plt.savefig('./outputs/time_series_13.png', dpi=100, bbox_inches='tight') plt.show() |
References
Links | Site |
---|---|
pandas.DataFrame.set_index | pandas.pydata.org |
pandas.DataFrame.index | pandas.pydata.org |
python datetime | www.w3schools.com |
C02 NOAA Dataset | gml.noaa.gov |