Reading and writing HDF5 files in Python is an important task for many data scientists. In this article, we will look at how to open and read HDF5 files with Python.
Open a HDF5 file
You can download the dummy file used in this short tutorial from the provided link.
To get started, you need to install the h5py library. This library allows you to access HDF5-formatted files from Python. Once you have installed h5py, you can open an HDF5 file with the following code:
import h5py
file_name = "dummy_file.h5"
f = h5py.File(file_name)
This opens the HDF5 file and returns a File object, which is stored in the variable f.
Access hdf5 groups and datasets
You can access different parts of this file object to read data from the HDF5 file. For example, you can access groups and datasets within the File object.
for key in f.keys():
print(key) #
print(type(f[key])) # Group or dataset
Output here
All_Data
<class 'h5py._hl.group.Group'>
Data_Products
<class 'h5py._hl.group.Group'>
Access a group
To access a group within the file object, use the following code:
group = f['group_name']
Select the group labeled "All_Data" for example:
group = f['All_Data']
This code creates a Group object.
You can obtain additional group objects or datasets from within a group object
for key in group.keys():
print(key)
print(type(group[key]))
Output
VIIRS-DualGain-Cal-IP_All
<class 'h5py._hl.group.Group'>
The 'All_Data' group has a subgroup called 'VIIRS-DualGain-Cal-IP_All'
sub_group = group['VIIRS-DualGain-Cal-IP_All']
for key in sub_group.keys():
print(key)
print(type(sub_group[key]))
Output
btRefl
<class 'h5py._hl.dataset.Dataset'>
radiance_1
<class 'h5py._hl.dataset.Dataset'>
radiance_2
<class 'h5py._hl.dataset.Dataset'>
radiance_3
<class 'h5py._hl.dataset.Dataset'>
radiance_4
<class 'h5py._hl.dataset.Dataset'>
radiance_5
<class 'h5py._hl.dataset.Dataset'>
radiance_6
<class 'h5py._hl.dataset.Dataset'>
There are 7 datasets present in this specific sub-group
Read a dataset
To read data from a dataset within the HDF5 file, use the following code:
data = sub_group['radiance_4'][()]
This code creates a Dataset object and then reads the data within it into the variable 'data'.
data.shape
Output
(768, 5024)
Close the file
Finally, when you are finished writing data to the HDF5 file, use the following code to save the changes:
f.close()