How to Read a MODIS HDF File Using Python and pyhdf ?

Published: June 02, 2016

Updated: September 21, 2024

DMCA.com Protection Status

Introduction

MODIS is a key instrument on NASA's Terra and Aqua satellites that measures large-scale global dynamics, including cloud cover, radiation, and other atmospheric data. These data are stored in HDF4 format, which is slightly different from the more common HDF5 format.

This tutorial will guide you through the process of using the pyhdf library in Python to read a MODIS (Moderate Resolution Imaging Spectroradiometer) HDF (Hierarchical Data Format) file. Specifically, we will be working with a MODIS Level 2 file, which contains scientific datasets (SDS) related to cloud properties.

Requirements

Ensure you have the following packages installed in your Python environment:

  • pyhdf: To read HDF4 files (Install via pip install pyhdf or conda install -c anaconda pyhdf)
  • numpy: For array manipulation (Install via pip install numpy)

Download the MODIS HDF file we’ll be working with from this link.

Additionally, you can download a Jupyter notebook version of this guide from here.

Reading a MODIS HDF4 File

Once the file has been downloaded, start by importing the necessary modules and opening the HDF file:

1
2
3
4
5
from pyhdf.SD import SD, SDC
import numpy as np

file_name = './inputs/MYD06_L2.A2007219.2010.006.2014053202546.hdf'
file = SD(file_name, SDC.READ)

The SD class is used to open the file for reading. The SDC.READ flag opens it in read-only mode.

To check the general information about the file, such as how many Scientific Datasets (SDS) it contains, use:

1
file.info()

You should see that the file contains 127 SDS. Each SDS represents a different scientific measurement, such as cloud top temperature or cloud mask.

Printing Scientific Datasets (SDS)

To list all the SDS names, we can loop through the file’s datasets() method:

1
2
3
4
datasets_dic = file.datasets()

for idx, sds in enumerate(datasets_dic.keys()):
    print(idx, sds)

The output will display all the SDS in the file, such as:

1
2
3
4
5
6
0 Cloud_Top_Height_Nadir_Night
1 Single_Scatter_Albedo_Ice
2 Radiance_Variance
3 Brightness_Temperature
...
126 Cloud_Effective_Radius_Uncertainty_1621

Accessing Specific SDS Data

Once we have identified the SDS we want to work with, we can extract its data. For example, to retrieve data from the Cloud Top Temperature at 1km SDS, use:

1
2
3
sds_obj = file.select('cloud_top_temperature_1km')  # Select SDS
data = sds_obj.get()  # Get data from SDS
print(data)

This will output the raw data as an array:

1
2
3
4
[[11758 11565 10858 ... 12100 12299 11902]
 [11758 11565 10667 ... 12100 12299 12100]
 [11758 11565 10858 ... 11902 12299 12100]
 ...]

Interpreting SDS Attributes

Each SDS contains metadata, such as units and scaling factors, that help interpret the raw data. To access the attributes for the cloud_top_temperature_1km SDS, use:

1
2
import pprint
pprint.pprint(sds_obj.attributes())

The output will look something like this:

1
2
3
4
5
6
{'_FillValue': -999,
 'add_offset': -15000.0,
 'long_name': 'Cloud Top Temperature at 1-km resolution from LEOCAT, '
 'scale_factor': 0.009999999776482582,
 'units': 'K',
 'valid_range': [0, 20000]}

The add_offset and scale_factor attributes are critical for converting the raw data to meaningful values. To apply these to the data:

1
2
3
4
5
6
add_offset = sds_obj.attributes()['add_offset']
scale_factor = sds_obj.attributes()['scale_factor']

# Reformat the data
data = (data - add_offset) * scale_factor
print(data)

This will convert the raw integer values to actual temperatures in Kelvin (K).

Extracting Complex Data (e.g., Cloud Mask)

Some SDS store multiple pieces of information in a single data array. For instance, the Cloud_Mask_1km SDS stores several flags in its bit-packed format.

1
2
3
sds_obj = file.select('Cloud_Mask_1km')
data = sds_obj.get()
print(data.shape)  # (2030, 1354, 2)

To extract specific flags, such as the Cloud Mask Status Flag, Cloud Mask Cloudiness Flag, and Day/Night Flag, use a bit-stripping function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def bits_stripping(bit_start, bit_count, value):
    bitmask = pow(2, bit_start + bit_count) - 1
    return np.right_shift(np.bitwise_and(value, bitmask), bit_start)

i, j = 576, 236  # Random pixel

status_flag = bits_stripping(0, 1, data[i, j, 0]) 
day_flag = bits_stripping(3, 1, data[i, j, 0]) 
cloud_mask_flag = bits_stripping(1, 2, data[i, j, 0])

print('Cloud Mask determined:', status_flag)
print('Day/Night Flag:', day_flag)
print('Cloud Mask:', cloud_mask_flag)

This will decode the information packed into the byte.

Complete Python Code (Python 3)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#!/usr/bin/env python

from pyhdf.SD import SD, SDC
import numpy as np
import pprint

file_name = 'MYD06_L2.A2007219.2010.006.2014053202546.hdf'
file = SD(file_name, SDC.READ)

# Print basic info
print(file.info())

# List SDS names
datasets_dic = file.datasets()
for idx, sds in enumerate(datasets_dic.keys()):
    print(idx, sds)

# Access and process Cloud Top Temperature SDS
sds_obj = file.select('cloud_top_temperature_1km')
data = sds_obj.get()

# Extract and apply scale_factor and add_offset
attrs = sds_obj.attributes()
add_offset = attrs['add_offset']
scale_factor = attrs['scale_factor']
data = (data - add_offset) * scale_factor
print(data)

# Cloud Mask example
sds_obj = file.select('Cloud_Mask_1km')
data = sds_obj.get()
print(data.shape)

# Extract bit-packed information
def bits_stripping(bit_start, bit_count, value):
    bitmask = pow(2, bit_start + bit_count) - 1
    return np.right_shift(np.bitwise_and(value, bitmask), bit_start)

i, j = 576, 236
status_flag = bits_stripping(0, 1, data[i, j, 0])
day_flag = bits_stripping(3, 1, data[i, j, 0])
cloud_mask_flag = bits_stripping(1, 2, data[i, j, 0])

print('Cloud Mask determined:', status_flag)
print('Day/Night Flag:', day_flag)
print('Cloud Mask:', cloud_mask_flag)

Python 2.7 Compatibility

If you are using Python 2.7, the code is nearly identical. However, you will need to adjust for Python 2 print statements and iterating over dictionaries:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
#!/usr/bin/env python

# Importing the required modules from pyhdf to read HDF4 files
from pyhdf.SD import SD, SDC

# Specify the name of the HDF file to read (ensure the file path is correct)
file_name = 'MYD06_L2.A2007219.2010.006.2014053202546.hdf'

# Open the HDF file for reading (SDC.READ opens the file in read-only mode)
file = SD(file_name, SDC.READ)

# Print out the general information of the file including number of datasets, dimensions, etc.
print file.info()

# ----------------------------------------------------------------------------------------#
# Print names of all Scientific Data Sets (SDS) within the file

# Get the dictionary of datasets in the file (each dataset has a name and index)
datasets_dic = file.datasets()

# Loop through the dictionary and print the index and name of each SDS
for idx, sds in enumerate(datasets_dic.keys()):
    print idx, sds

# ----------------------------------------------------------------------------------------#
# Retrieve and print data for a specific SDS ('cloud_top_temperature_1km')

# Select the SDS for 'cloud_top_temperature_1km' by name (assuming it exists in the file)
sds_obj = file.select('cloud_top_temperature_1km')

# Get the actual data stored in the selected SDS (this is typically a NumPy array)
data = sds_obj.get()

# Print the raw data from the selected SDS
print data

# ----------------------------------------------------------------------------------------#
# Retrieve and print attributes for the selected SDS

# Importing pprint to format output of attributes
import pprint

# Pretty print the attributes of the 'cloud_top_temperature_1km' SDS
pprint.pprint(sds_obj.attributes())

# Extract specific attributes like 'add_offset' and 'scale_factor' which are required for data scaling
for key, value in sds_obj.attributes().iteritems():
    print key, value
    if key == 'add_offset':
        add_offset = value    # Offset used to shift data values
    if key == 'scale_factor':
        scale_factor = value  # Scale factor used to rescale the data

# Print the extracted 'add_offset' and 'scale_factor' for verification
print 'add_offset:', add_offset, type(add_offset)
print 'scale_factor:', scale_factor, type(scale_factor)

# Apply the scale factor and offset to the raw data to convert it to its correct units
data = (data - add_offset) * scale_factor

# Print the scaled data (this is the final cloud top temperature in Kelvin)
print data

# ----------------------------------------------------------------------------------------#
# Example with the 'Cloud_Mask_1km' SDS to extract specific information

# Select the SDS for 'Cloud_Mask_1km' by name
sds_obj = file.select('Cloud_Mask_1km')

# Get the data from the 'Cloud_Mask_1km' SDS
data = sds_obj.get()

# Pretty print the attributes of 'Cloud_Mask_1km' to understand the structure of the data
pprint.pprint(sds_obj.attributes())

# Print the shape of the data array (this tells you the dimensions of the dataset)
print data.shape

# ----------------------------------------------------------------------------------------#
# Extracting specific bits of information from the 'Cloud_Mask_1km' data

# Import NumPy to perform bitwise operations
import numpy as np

# Function to strip specific bits from the data to retrieve flags
def bits_stripping(bit_start, bit_count, value):
    bitmask = pow(2, bit_start + bit_count) - 1  # Create a bitmask for the desired bits
    return np.right_shift(np.bitwise_and(value, bitmask), bit_start)  # Strip the bits and shift

# Define the indices for a random pixel in the data
i = 576  # Row index
j = 236  # Column index

# Extract 'Cloud Mask Status Flag' (stored in bit 0)
status_flag = bits_stripping(0, 1, data[i, j, 0])

# Extract 'Day/Night Flag' (stored in bit 3)
day_flag = bits_stripping(3, 1, data[i, j, 0])

# Extract 'Cloud Mask Cloudiness Flag' (stored in bits 1-2)
cloud_mask_flag = bits_stripping(1, 2, data[i, j, 0])

# Print the extracted flags with descriptions
print 'Cloud Mask determined: ', status_flag
print 'Cloud Mask day or night: ', day_flag
print 'Cloud Mask: ', cloud_mask_flag

References

Link WebSite
MODIS C6 cloud optical products user guide NASA
pyhdf sourceforge