How to read a MODIS HDF4 file using python and pyhdf ?


Here is a basic guide on how to use the pyhdf library with python to read a MODIS HDF file. You can also use this guide) to automatically download a MODIS granule onto your computer's directory.

Please note that the tutorial provided is intended for use with Python 3. The hdf file used for this tutorial can be downloaded here: https://drive.google.com/file/d/1KvClX1rShqujFuK1vjOBHQrmKiQslb7S/view?usp=sharing

I also create a jupyter notebook that you can be downloaded here

Once your file has been downloaded, you can use the python library pyhdf to read the MODIS HDF4 file. First, install pyhdf via pip or conda with the command "pip install pyhdf" or "conda install -c anaconda pyhdf". Next, import what we need from the library to read the file. We will need SD and SDS for this task.

from pyhdf.SD import SD, SDC

import numpy as np

file_name = './inputs/MYD06_L2.A2007219.2010.006.2014053202546.hdf'

To read the MODIS data from the HDF4 file, we use the code "hdf = SD(file_name)". This opens a handle to our HDF4 file and allows us to access its data.

file = SD(file_name, SDC.READ)

to determine that the file contains 127 Scientific Datasets (SDS). To obtain the names of all SDS, refer to the code given in the output.

Then, use the command "print( file.info() )"

file.info()

to determine that the file contains 127 Scientific Datasets (SDS).

Printing all SDS names ?

datasets_dic = file.datasets()

for idx,sds in enumerate(datasets_dic.keys()):
    print( idx,sds )

Output

0 Cloud_Top_Height_Nadir_Night
1 Single_Scatter_Albedo_Ice
2 Radiance_Variance
3 Brightness_Temperature
4 Sensor_Azimuth
. .
. .
. .
123 Cloud_Top_Height
124 Cloud_Effective_Emissivity_Nadir
125 Cloud_Effective_Emissivity
126 Cloud_Effective_Radius_Uncertainty_1621

Get data for a given SDS

Getting data stored in a particular SDS, like the cloud_top_temperature_1km, would be much easier since we have the exact name of all SDS:

sds_obj = file.select('cloud_top_temperature_1km') # select sds

data = sds_obj.get() # get sds data

print( data )

Output

[[11758 11565 10858 ... 12100 12299 11902]
 [11758 11565 10667 ... 12100 12299 12100]
 [11758 11565 10858 ... 11902 12299 12100]
 ...
 [11820 11820 11820 ... 12475 12633 12633]
 [11643 11820 11820 ... 12633 12633 12781]
 [11496 11820 11820 ... 12633 12633 12781]]

Obtaining SDS attributes

To obtain the cloud top temperature at 1km, you need to reformat the data using the information stored in the sds attributes. To access the attributes of the cloud_top_temperature_1km SDS, please use the following code:

 import pprint

 pprint.pprint( sds_obj.attributes() )

Output

 {'Cell_Across_Swath_Sampling': [1, 1354, 1],
  'Cell_Along_Swath_Sampling': [1, 2030, 1],
  'Geolocation_Pointer': 'External MODIS geolocation product',
  'Parameter_Type': 'Output',
  '_FillValue': -999,
  'add_offset': -15000.0,
  'long_name': 'Cloud Top Temperature at 1-km resolution from LEOCAT, '
               'Temperature from Ancillary Data at Retrieved Cloud Top Pressure '
               'Level',
  'scale_factor': 0.009999999776482582,
  'units': 'K',
  'valid_range': [0, 20000]}

The cloud_top_temperature_1km SDS has two crucial attributes, 'add_offset' and 'scale_factor', which can be extracted.

  for key, value in sds_obj.attributes().items():
      print( key, value )
      if key == 'add_offset':
          add_offset = value  
      if key == 'scale_factor':
          scale_factor = value

Output

  valid_range [0, 20000]
  _FillValue -999
  long_name Cloud Top Temperature at 1-km resolution from LEOCAT, Temperature from Ancillary Data at Retrieved Cloud Top Pressure Level
  units K
  scale_factor 0.009999999776482582
  add_offset -15000.0
  Parameter_Type Output
  Cell_Along_Swath_Sampling [1, 2030, 1]
  Cell_Across_Swath_Sampling [1, 1354, 1]
  Geolocation_Pointer External MODIS geolocation product

to then transform the data and get the cloud top temperature at 1km spatial resolution at nadir

  data = (data - add_offset) * scale_factor

  data

Output

array([[267.57999402, 265.64999406, 258.57999422, ..., 270.99999394,
        272.9899939 , 269.01999399],
       [267.57999402, 265.64999406, 256.66999426, ..., 270.99999394,
        272.9899939 , 270.99999394],
       [267.57999402, 265.64999406, 258.57999422, ..., 269.01999399,
        272.9899939 , 270.99999394],
       ...,
       [268.19999401, 268.19999401, 268.19999401, ..., 274.74999386,
        276.32999382, 276.32999382],
       [266.42999404, 268.19999401, 268.19999401, ..., 276.32999382,
        276.32999382, 277.80999379],
       [264.95999408, 268.19999401, 268.19999401, ..., 276.32999382,
        276.32999382, 277.80999379]])

Extracting information stored in a byte such as the cloud mask

The "Cloud_Mask_1km" SDS is an example of a complex case in which multiple pieces of information are stored together. More information can be found on page 115 of the MODIS C6 cloud optical products User Guide.

sds_obj = file.select('Cloud_Mask_1km') # select sds

data = sds_obj.get()

data.shape

Output

(2030, 1354, 2)

We need to extract the 'Cloud Mask Status Flag', 'Cloud Mask Cloudiness Flag', and 'Day/Night Flag' information from the Cloud_Mask_1km data.

def bits_stripping(bit_start,bit_count,value):
    bitmask=pow(2,bit_start+bit_count)-1
    return np.right_shift(np.bitwise_and(value,bitmask),bit_start)

i = 576 # random pixel
j = 236

status_flag = bits_stripping(0,1,data[i,j,0]) 
day_flag = bits_stripping(3,1,data[i,j,0]) 
cloud_mask_flag = bits_stripping(1,2,data[i,j,0])

print( 'Cloud Mask determined: ',  status_flag )
print( 'Cloud Mask day or night: ',  day_flag )
print( 'Cloud Mask: ',  cloud_mask_flag )

Output

Cloud Mask determined:  1
Cloud Mask day or night:  1
Cloud Mask:  0

The message use a function called bits_stripping to extract information from a byte. The inputs required are the position of the first bit and the number of bits used. It is important to note that Python uses 0-based indexing for bits, which means that the indexes range from 0 to 7 for one byte. An example is given for how to extract the "day_flag": the first bit is at position 3 and only 1 bit is used, so the code would be "bits_stripping(3,1,data[i,j,0]".

Code written in python 3:

#!/usr/bin/env python

from pyhdf.SD import SD, SDC

file_name = 'MYD06_L2.A2007219.2010.006.2014053202546.hdf'

file = SD(file_name, SDC.READ)

print( file.info() )

#----------------------------------------------------------------------------------------#
# print SDS names

datasets_dic = file.datasets()

for idx,sds in enumerate(datasets_dic.keys()):
    print( idx,sds )

#----------------------------------------------------------------------------------------#
# get data

sds_obj = file.select('cloud_top_temperature_1km') # select sds

data = sds_obj.get() # get sds data

print( data )

#----------------------------------------------------------------------------------------#
# get attributes

import pprint

pprint.pprint( sds_obj.attributes() )

for key, value in sds_obj.attributes().items():
    print( key, value )
    if key == 'add_offset':
        add_offset = value  
    if key == 'scale_factor':
        scale_factor = value

print( 'add_offset', add_offset, type(add_offset) )
print( 'scale_factor', scale_factor, type(scale_factor) )

data = (data - add_offset) * scale_factor

print( data )

#----------------------------------------------------------------------------------------#
# Exemple with Cloud_Mask_1km

sds_obj = file.select('Cloud_Mask_1km') # select sds

data = sds_obj.get()

pprint.pprint( sds_obj.attributes() )

print( data.shape )

#----------------------------------------------------------------------------------------#
# Extract info from a byte

import numpy as np

def bits_stripping(bit_start,bit_count,value):
    bitmask=pow(2,bit_start+bit_count)-1
    return np.right_shift(np.bitwise_and(value,bitmask),bit_start)

i = 576 # random pixel
j = 236

status_flag = bits_stripping(0,1,data[i,j,0]) 
day_flag = bits_stripping(3,1,data[i,j,0]) 
cloud_mask_flag = bits_stripping(1,2,data[i,j,0])

print( 'Cloud Mask determined: ',  status_flag )
print( 'Cloud Mask day or night: ',  day_flag )
print( 'Cloud Mask: ',  cloud_mask_flag )

Code written in python 2.7:

#!/usr/bin/env python

from pyhdf.SD import SD, SDC

file_name = 'MYD06_L2.A2007219.2010.006.2014053202546.hdf'

file = SD(file_name, SDC.READ)

print file.info()

#----------------------------------------------------------------------------------------#
# print SDS names

datasets_dic = file.datasets()

for idx,sds in enumerate(datasets_dic.keys()):
    print idx,sds

#----------------------------------------------------------------------------------------#
# get data

sds_obj = file.select('cloud_top_temperature_1km') # select sds

data = sds_obj.get() # get sds data

print data

#----------------------------------------------------------------------------------------#
# get attributes

import pprint

pprint.pprint( sds_obj.attributes() )

for key, value in sds_obj.attributes().iteritems():
    print key, value
    if key == 'add_offset':
        add_offset = value  
    if key == 'scale_factor':
        scale_factor = value

print 'add_offset', add_offset, type(add_offset)
print 'scale_factor', scale_factor, type(scale_factor)

data = (data - add_offset) * scale_factor

print data

#----------------------------------------------------------------------------------------#
# Exemple with Cloud_Mask_1km

sds_obj = file.select('Cloud_Mask_1km') # select sds

data = sds_obj.get()

pprint.pprint( sds_obj.attributes() )

print data.shape

#----------------------------------------------------------------------------------------#
# Extract info from a byte

import numpy as np

def bits_stripping(bit_start,bit_count,value):
    bitmask=pow(2,bit_start+bit_count)-1
    return np.right_shift(np.bitwise_and(value,bitmask),bit_start)

i = 576 # random pixel
j = 236

status_flag = bits_stripping(0,1,data[i,j,0]) 
day_flag = bits_stripping(3,1,data[i,j,0]) 
cloud_mask_flag = bits_stripping(1,2,data[i,j,0])

print 'Cloud Mask determined: ',  status_flag
print 'Cloud Mask day or night: ',  day_flag
print 'Cloud Mask: ',  cloud_mask_flag

References

Link WebSite
MODIS C6 cloud optical products user guide NASA
pyhdf sourceforge