Introduction
MODIS is a key instrument on NASA's Terra and Aqua satellites that measures large-scale global dynamics, including cloud cover, radiation, and other atmospheric data. These data are stored in HDF4 format, which is slightly different from the more common HDF5 format.
This tutorial will guide you through the process of using the pyhdf library in Python to read a MODIS (Moderate Resolution Imaging Spectroradiometer) HDF (Hierarchical Data Format) file. Specifically, we will be working with a MODIS Level 2 file, which contains scientific datasets (SDS) related to cloud properties.
Requirements
Ensure you have the following packages installed in your Python environment:
- pyhdf: To read HDF4 files (Install via pip install pyhdf or conda install -c anaconda pyhdf)
- numpy: For array manipulation (Install via pip install numpy)
Download the MODIS HDF file we’ll be working with from this link.
Additionally, you can download a Jupyter notebook version of this guide from here.
Reading a MODIS HDF4 File
Once the file has been downloaded, start by importing the necessary modules and opening the HDF file:
1 2 3 4 5 | from pyhdf.SD import SD, SDC import numpy as np file_name = './inputs/MYD06_L2.A2007219.2010.006.2014053202546.hdf' file = SD(file_name, SDC.READ) |
The SD
class is used to open the file for reading. The SDC.READ
flag opens it in read-only mode.
To check the general information about the file, such as how many Scientific Datasets (SDS) it contains, use:
1 | file.info() |
You should see that the file contains 127 SDS. Each SDS represents a different scientific measurement, such as cloud top temperature or cloud mask.
Printing Scientific Datasets (SDS)
To list all the SDS names, we can loop through the file’s datasets()
method:
1 2 3 4 | datasets_dic = file.datasets() for idx, sds in enumerate(datasets_dic.keys()): print(idx, sds) |
The output will display all the SDS in the file, such as:
1 2 3 4 5 6 | 0 Cloud_Top_Height_Nadir_Night 1 Single_Scatter_Albedo_Ice 2 Radiance_Variance 3 Brightness_Temperature ... 126 Cloud_Effective_Radius_Uncertainty_1621 |
Accessing Specific SDS Data
Once we have identified the SDS we want to work with, we can extract its data. For example, to retrieve data from the Cloud Top Temperature at 1km SDS, use:
1 2 3 | sds_obj = file.select('cloud_top_temperature_1km') # Select SDS data = sds_obj.get() # Get data from SDS print(data) |
This will output the raw data as an array:
1 2 3 4 | [[11758 11565 10858 ... 12100 12299 11902] [11758 11565 10667 ... 12100 12299 12100] [11758 11565 10858 ... 11902 12299 12100] ...] |
Interpreting SDS Attributes
Each SDS contains metadata, such as units and scaling factors, that help interpret the raw data. To access the attributes for the cloud_top_temperature_1km SDS, use:
1 2 | import pprint pprint.pprint(sds_obj.attributes()) |
The output will look something like this:
1 2 3 4 5 6 | {'_FillValue': -999, 'add_offset': -15000.0, 'long_name': 'Cloud Top Temperature at 1-km resolution from LEOCAT, ' 'scale_factor': 0.009999999776482582, 'units': 'K', 'valid_range': [0, 20000]} |
The add_offset
and scale_factor
attributes are critical for converting the raw data to meaningful values. To apply these to the data:
1 2 3 4 5 6 | add_offset = sds_obj.attributes()['add_offset'] scale_factor = sds_obj.attributes()['scale_factor'] # Reformat the data data = (data - add_offset) * scale_factor print(data) |
This will convert the raw integer values to actual temperatures in Kelvin (K).
Extracting Complex Data (e.g., Cloud Mask)
Some SDS store multiple pieces of information in a single data array. For instance, the Cloud_Mask_1km SDS stores several flags in its bit-packed format.
1 2 3 | sds_obj = file.select('Cloud_Mask_1km') data = sds_obj.get() print(data.shape) # (2030, 1354, 2) |
To extract specific flags, such as the Cloud Mask Status Flag, Cloud Mask Cloudiness Flag, and Day/Night Flag, use a bit-stripping function:
1 2 3 4 5 6 7 8 9 10 11 12 13 | def bits_stripping(bit_start, bit_count, value): bitmask = pow(2, bit_start + bit_count) - 1 return np.right_shift(np.bitwise_and(value, bitmask), bit_start) i, j = 576, 236 # Random pixel status_flag = bits_stripping(0, 1, data[i, j, 0]) day_flag = bits_stripping(3, 1, data[i, j, 0]) cloud_mask_flag = bits_stripping(1, 2, data[i, j, 0]) print('Cloud Mask determined:', status_flag) print('Day/Night Flag:', day_flag) print('Cloud Mask:', cloud_mask_flag) |
This will decode the information packed into the byte.
Complete Python Code (Python 3)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | #!/usr/bin/env python from pyhdf.SD import SD, SDC import numpy as np import pprint file_name = 'MYD06_L2.A2007219.2010.006.2014053202546.hdf' file = SD(file_name, SDC.READ) # Print basic info print(file.info()) # List SDS names datasets_dic = file.datasets() for idx, sds in enumerate(datasets_dic.keys()): print(idx, sds) # Access and process Cloud Top Temperature SDS sds_obj = file.select('cloud_top_temperature_1km') data = sds_obj.get() # Extract and apply scale_factor and add_offset attrs = sds_obj.attributes() add_offset = attrs['add_offset'] scale_factor = attrs['scale_factor'] data = (data - add_offset) * scale_factor print(data) # Cloud Mask example sds_obj = file.select('Cloud_Mask_1km') data = sds_obj.get() print(data.shape) # Extract bit-packed information def bits_stripping(bit_start, bit_count, value): bitmask = pow(2, bit_start + bit_count) - 1 return np.right_shift(np.bitwise_and(value, bitmask), bit_start) i, j = 576, 236 status_flag = bits_stripping(0, 1, data[i, j, 0]) day_flag = bits_stripping(3, 1, data[i, j, 0]) cloud_mask_flag = bits_stripping(1, 2, data[i, j, 0]) print('Cloud Mask determined:', status_flag) print('Day/Night Flag:', day_flag) print('Cloud Mask:', cloud_mask_flag) |
Python 2.7 Compatibility
If you are using Python 2.7, the code is nearly identical. However, you will need to adjust for Python 2 print statements and iterating over dictionaries:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | #!/usr/bin/env python # Importing the required modules from pyhdf to read HDF4 files from pyhdf.SD import SD, SDC # Specify the name of the HDF file to read (ensure the file path is correct) file_name = 'MYD06_L2.A2007219.2010.006.2014053202546.hdf' # Open the HDF file for reading (SDC.READ opens the file in read-only mode) file = SD(file_name, SDC.READ) # Print out the general information of the file including number of datasets, dimensions, etc. print file.info() # ----------------------------------------------------------------------------------------# # Print names of all Scientific Data Sets (SDS) within the file # Get the dictionary of datasets in the file (each dataset has a name and index) datasets_dic = file.datasets() # Loop through the dictionary and print the index and name of each SDS for idx, sds in enumerate(datasets_dic.keys()): print idx, sds # ----------------------------------------------------------------------------------------# # Retrieve and print data for a specific SDS ('cloud_top_temperature_1km') # Select the SDS for 'cloud_top_temperature_1km' by name (assuming it exists in the file) sds_obj = file.select('cloud_top_temperature_1km') # Get the actual data stored in the selected SDS (this is typically a NumPy array) data = sds_obj.get() # Print the raw data from the selected SDS print data # ----------------------------------------------------------------------------------------# # Retrieve and print attributes for the selected SDS # Importing pprint to format output of attributes import pprint # Pretty print the attributes of the 'cloud_top_temperature_1km' SDS pprint.pprint(sds_obj.attributes()) # Extract specific attributes like 'add_offset' and 'scale_factor' which are required for data scaling for key, value in sds_obj.attributes().iteritems(): print key, value if key == 'add_offset': add_offset = value # Offset used to shift data values if key == 'scale_factor': scale_factor = value # Scale factor used to rescale the data # Print the extracted 'add_offset' and 'scale_factor' for verification print 'add_offset:', add_offset, type(add_offset) print 'scale_factor:', scale_factor, type(scale_factor) # Apply the scale factor and offset to the raw data to convert it to its correct units data = (data - add_offset) * scale_factor # Print the scaled data (this is the final cloud top temperature in Kelvin) print data # ----------------------------------------------------------------------------------------# # Example with the 'Cloud_Mask_1km' SDS to extract specific information # Select the SDS for 'Cloud_Mask_1km' by name sds_obj = file.select('Cloud_Mask_1km') # Get the data from the 'Cloud_Mask_1km' SDS data = sds_obj.get() # Pretty print the attributes of 'Cloud_Mask_1km' to understand the structure of the data pprint.pprint(sds_obj.attributes()) # Print the shape of the data array (this tells you the dimensions of the dataset) print data.shape # ----------------------------------------------------------------------------------------# # Extracting specific bits of information from the 'Cloud_Mask_1km' data # Import NumPy to perform bitwise operations import numpy as np # Function to strip specific bits from the data to retrieve flags def bits_stripping(bit_start, bit_count, value): bitmask = pow(2, bit_start + bit_count) - 1 # Create a bitmask for the desired bits return np.right_shift(np.bitwise_and(value, bitmask), bit_start) # Strip the bits and shift # Define the indices for a random pixel in the data i = 576 # Row index j = 236 # Column index # Extract 'Cloud Mask Status Flag' (stored in bit 0) status_flag = bits_stripping(0, 1, data[i, j, 0]) # Extract 'Day/Night Flag' (stored in bit 3) day_flag = bits_stripping(3, 1, data[i, j, 0]) # Extract 'Cloud Mask Cloudiness Flag' (stored in bits 1-2) cloud_mask_flag = bits_stripping(1, 2, data[i, j, 0]) # Print the extracted flags with descriptions print 'Cloud Mask determined: ', status_flag print 'Cloud Mask day or night: ', day_flag print 'Cloud Mask: ', cloud_mask_flag |
References
Link | WebSite |
---|---|
MODIS C6 cloud optical products user guide | NASA |
pyhdf | sourceforge |