How to open and read a HDF5 files with python ?

Published: May 01, 2023

Tags: Python; HDF;

DMCA.com Protection Status

Reading and writing HDF5 files in Python is an important task for many data scientists. In this article, we will look at how to open and read HDF5 files with Python.

Open a HDF5 file

You can download the dummy file used in this short tutorial from the provided link.

To get started, you need to install the h5py library. This library allows you to access HDF5-formatted files from Python. Once you have installed h5py, you can open an HDF5 file with the following code:

import h5py

file_name = "dummy_file.h5"

f = h5py.File(file_name)

This opens the HDF5 file and returns a File object, which is stored in the variable f.

Access hdf5 groups and datasets

You can access different parts of this file object to read data from the HDF5 file. For example, you can access groups and datasets within the File object.

for key in f.keys():
    print(key) #
    print(type(f[key])) # Group or dataset

Output here

All_Data
<class 'h5py._hl.group.Group'>
Data_Products
<class 'h5py._hl.group.Group'>

Access a group

To access a group within the file object, use the following code:

group = f['group_name']

Select the group labeled "All_Data" for example:

group = f['All_Data']

This code creates a Group object.

You can obtain additional group objects or datasets from within a group object

for key in group.keys():
    print(key)
    print(type(group[key]))

Output

VIIRS-DualGain-Cal-IP_All
<class 'h5py._hl.group.Group'>

The 'All_Data' group has a subgroup called 'VIIRS-DualGain-Cal-IP_All'

sub_group = group['VIIRS-DualGain-Cal-IP_All']

for key in sub_group.keys():
    print(key)
    print(type(sub_group[key]))

Output

btRefl
<class 'h5py._hl.dataset.Dataset'>
radiance_1
<class 'h5py._hl.dataset.Dataset'>
radiance_2
<class 'h5py._hl.dataset.Dataset'>
radiance_3
<class 'h5py._hl.dataset.Dataset'>
radiance_4
<class 'h5py._hl.dataset.Dataset'>
radiance_5
<class 'h5py._hl.dataset.Dataset'>
radiance_6
<class 'h5py._hl.dataset.Dataset'>

There are 7 datasets present in this specific sub-group

Read a dataset

To read data from a dataset within the HDF5 file, use the following code:

data = sub_group['radiance_4'][()]

This code creates a Dataset object and then reads the data within it into the variable 'data'.

data.shape

Output

(768, 5024)

Close the file

Finally, when you are finished writing data to the HDF5 file, use the following code to save the changes:

f.close()

References

Links Site
h5py h5py.org
hdfgroup hdfgroup.org
h5py h5py.org