How to read a GRIB weather data file using Python ?


Among the various formats used to store weather data, GRIB stands out for its efficiency in representing meteorological and oceanographic data. GRIB (short for "General Regularly Distributed Information in Binary Form") is a file format specifically designed for storing and sharing weather data. It's a widely used standard established by the World Meteorological Organization (WMO) for archiving and exchanging gridded weather information.

Understanding how to read GRIB files is valuable in remote sensing, as these data can serve as important ancillary information.

Key features of GRIB files:

  • Binary format: GRIB files use a binary format, which packs data efficiently to save storage space.
  • Self-contained messages: Each GRIB file is a collection of independent messages, each containing weather data for a specific variable (like temperature or pressure) at a particular time and location.
  • File extensions: You'll typically find GRIB files with extensions like .grib, .grb, or .gb.

GRIB 1 vs. GRIB 2

There are two main versions of the GRIB format: GRIB 1 and GRIB 2. While they both serve the same purpose, they differ in how they structure the data:

  • GRIB 1: This is the older version. It has limitations in precision, such as storing latitude and longitude in milli-degrees. Additionally, longitude values are restricted to a specific range.
  • GRIB 2: This is the more recent and advanced version. It offers greater precision for variables like location (using micro-degrees) and allows for longitude values across the entire 0 to 360-degree range. It also uses a different encoding method and relies on templates for data descriptions.

Setting up a Conda environment for reading GRIB files

To read a GRIB weather data file, a solution is to install the Python cfgrib library. It is a library that can be used to decode and encode GRIB weather data files.

To do that let's create a new Conda environment for reading GRIB files called worklab_grib:

1
conda create -n worklab_grib python numpy pandas -c conda-forge

This command creates a new Conda environment named "worklab_grib". Inside this environment, it installs Python, NumPy, and Pandas packages. The -c conda-forge flag indicates that Conda should search for these packages in the conda-forge channel, a community-driven collection of packages.

1
conda activate worklab_grib

will activate the newly created Conda environment named "worklab_grib".

1
conda install xarray -c conda-forge

This command installs the Xarray package into the previously created "worklab_grib" environment. Again, the -c conda-forge flag specifies the conda-forge channel as the source for the Xarray package.

1
conda install -c conda-forge cfgrib

This command installs the cfgrib package into the "worklab_grib" environment. Similar to the previous commands, it specifies the conda-forge channel for package retrieval.

Reading a GRIB weather data file

To demonstrate how to access weather data stored in a GRIB file, let's explore data from the NOAA's Global Forecast System (GFS). Suppose our aim is to extract the surface temperature information. Typically, reading a GRIB file can be as simple as:

1
2
3
import xarray as xr

ds = xr.open_dataset('gfs_4_20231106_0000_000.grb2', engine='cfgrib')

However, in this instance, an error arises:

1
2
3
4
cfgrib.dataset.DatasetBuildError: multiple values for unique key, try re-open the file with one of:
    filter_by_keys={'typeOfLevel': 'meanSea'}
    filter_by_keys={'typeOfLevel': 'hybrid'}
    ...

This error signifies that there are multiple keys with the same identifier within the file. To remedy this, we can specify a filter criterion to narrow down our selection. Since our focus is on surface temperature, we can employ the filter_by_keys parameter:

1
2
3
import xarray as xr

ds = xr.open_dataset('gfs_4_20231106_0000_000.grb2', engine='cfgrib', filter_by_keys={'typeOfLevel': 'surface'})

Voila! Now it works like a charm. By accessing the dataset:

1
ds

We receive a comprehensive overview:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
KeysView(<xarray.Dataset> Size: 30MB
Dimensions:     (latitude: 361, longitude: 720)
Coordinates:
    time        datetime64[ns] 8B ...
    step        timedelta64[ns] 8B ...
    surface     float64 8B ...
  * latitude    (latitude) float64 3kB 90.0 89.5 89.0 88.5 ... -89.0 -89.5 -90.0
  * longitude   (longitude) float64 6kB 0.0 0.5 1.0 1.5 ... 358.5 359.0 359.5
    valid_time  datetime64[ns] 8B ...
Data variables: (12/29)
    vis         (latitude, longitude) float32 1MB ...
    gust        (latitude, longitude) float32 1MB ...
    hindex      (latitude, longitude) float32 1MB ...
    sp          (latitude, longitude) float32 1MB ...
    orog        (latitude, longitude) float32 1MB ...
    t           (latitude, longitude) float32 1MB ...
    ...          ...
    lftx        (latitude, longitude) float32 1MB ...
    cape        (latitude, longitude) float32 1MB ...
    cin         (latitude, longitude) float32 1MB ...
    4lftx       (latitude, longitude) float32 1MB ...
    lsm         (latitude, longitude) float32 1MB ...
    siconc      (latitude, longitude) float32 1MB ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    history:                 2024-05-21T08:40 GRIB to CDM+CF via cfgrib-0.9.1...)

To explore the keys within our dataset and understand its structure further, we can use ds.keys() to obtain a comprehensive list of keys:

1
ds.keys()

This will return:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
KeysView(<xarray.Dataset>
Size: 30MB
Dimensions:     (latitude: 361, longitude: 720)
Coordinates:
    time         datetime64[ns] 8B ...
    step         timedelta64[ns] 8B ...
    surface      float64 8B ...
  * latitude     (latitude) float64 3kB 90.0 89.5 89.0 88.5 ... -89.5 -90.0
  * longitude    (longitude) float64 6kB 0.0 0.5 1.0 1.5 ... 358.5 359.0 359.5
    valid_time   datetime64[ns] 8B ...
Data variables: (12/29)
    vis          (latitude, longitude) float32 1MB ...
    gust         (latitude, longitude) float32 1MB ...
    hindex       (latitude, longitude) float32 1MB ...
    sp           (latitude, longitude) float32 1MB ...
    orog         (latitude, longitude) float32 1MB ...
    t            (latitude, longitude) float32 1MB ...
    ...           ...
    lftx         (latitude, longitude) float32 1MB ...
    cape         (latitude, longitude) float32 1MB ...
    cin          (latitude, longitude) float32 1MB ...
    4lftx        (latitude, longitude) float32 1MB ...
    lsm          (latitude, longitude) float32 1MB ...
    siconc       (latitude, longitude) float32 1MB ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    history:                 2024-05-21T08:40 GRIB to CDM+CF via cfgrib-0.9.1...)

Similarly, to explore the coordinates:

1
ds.coords

We receive:

1
2
3
4
5
6
7
Coordinates:
    time         datetime64[ns] 8B ...
    step         timedelta64[ns] 8B ...
    surface      float64 8B ...
  * latitude     (latitude) float64 3kB 90.0 89.5 89.0 88.5 ... -89.5 -90.0
  * longitude    (longitude) float64 6kB 0.0 0.5 1.0 1.5 ... 358.5 359.0 359.5
    valid_time   datetime64[ns] 8B ...

Extract data from the dataset

To extract the surface temperature from the dataset, you can simply access the 't' variable using:

1
ds['t']

This returns a DataArray object with attributes such as unit, size, and coordinates. For instance:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
<xarray.DataArray 't' (latitude: 361, longitude: 720)>
Size: 1MB
[259920 values with dtype=float32]
Coordinates:
    time        datetime64[ns] 8B ...
    step        timedelta64[ns] 8B ...
    surface     float64 8B ...
  * latitude    (latitude) float64 3kB 90.0 89.5 89.0 88.5 ... -89.5 -90.0
  * longitude   (longitude) float64 6kB 0.0 0.5 1.0 1.5 ... 358.0 358.5 359.0 359.5
    valid_time  datetime64[ns] 8B ...
Attributes: (12/29)
    GRIB_paramId:             130
    GRIB_dataType:            fc
    GRIB_numberOfPoints:      259920
    GRIB_typeOfLevel:         surface
    GRIB_stepUnits:           1
    GRIB_stepType:            instant
    ...
    GRIB_name:                Temperature
    GRIB_shortName:           t
    GRIB_units:               K
    long_name:                Temperature
    units:                    K
    standard_name:            air_temperature

To obtain only the temperature values, you can use:

1
ds['t'].values

This will return a NumPy array of temperatures, for example:

1
2
3
4
array([[250.89314, 250.89314, 250.89314, ..., 250.89314, 250.89314, 250.89314],
       [251.59314, 251.59314, 251.59314, ..., 251.59314, 251.59314, 251.59314],
       ...
       [228.29314, 228.29314, 228.29314, ..., 228.29314, 228.29314, 228.29314]], dtype=float32)

You can also check the shape of the temperature array using:

1
ds['t'].values.shape

Which in this case returns:

1
(361, 720)

Indicating 361 values for latitude and 720 for longitude.

To extract the latitude and longitude values separately, you can use:

1
ds['latitude'].values

And:

1
ds['longitude'].values

These commands will return arrays of latitude and longitude values respectively, allowing you to further analyze or visualize the data.

References

Links Site
cfgrib pypi.org
How to get started with GRIB2 weather data and Python spire.com
What are GRIB files and how can I read them confluence.ecmwf.int
xarray docs.xarray.dev
Reading and writing files docs.xarray.dev
NOAA Global Forecast System (GFS) registry.opendata.aws
Global Forecast System (GFS) www.ncei.noaa.gov