How to slice a multidimensional matrix on column values with numpy ?


A case study of how to slice a multidimensional matrix based on column values with numpy:

Table of contents

Case study

Let's discuss a realistic example where we have a multidimensional matrix (called hereafter data) of shape:

(3,1500, 2500)

where

dim 0 = longitude
dim 1 = latitude
dim 2 = a physical value (the fire radiative power FRP for instance)

The goal is to extract a smaller matrix for a given range of latitudes and longitudes .

Data

Data minimum longitude data[0,:,:].min()

-184.37672

Data maximum longitude data[0,:,:].max()

-89.623276

Data minimum latitude data[1,:,:].min()

14.57134,

Data maximum latitude data[1,:,:].max()

53.500195

Our objective is to locate and obtain values that are situated nearby (with +/- 0.25 degree):

williams_flats_long = -118.49064628838096
williams_flats_lat = 47.9849149578827

Solution

It's effortless to slice through a matrix when the indexes are known with NumPy. However, in the previous example we didn't have that information-- so our first objective was to track down indexes based on these requirements:

longitude > williams_flats_long-0.25 & longitude< williams_flats_long+0.25

latitude > williams_flats_lat-0.25 & latitude <williams_flats_lat+0.25)

In the past, I'd typically populate a dataframe with my values first since this method was effective for handling mixed types of data:

import pandas as pd

df = pd.DataFrame()

x_axis_size = data.shape[2]
y_axis_size = data.shape[1]

xv, yv = np.meshgrid(np.arange(0,x_axis_size), np.arange(0,y_axis_size))

df['sample'] = xv.flatten() 
df['line'] = yv.flatten()

df['longitude'] = data[0,:,:].ravel()
df['latitude'] = data[1,:,:].ravel()

df['Power'] = data[2,:,:].ravel()

The above code will generate the following DataFame

             sample  line   longitude   latitude  Power
    0             0     0 -184.376724  53.500195    NaN
    1             1     0 -184.310455  53.493004    NaN
    2             2     0 -184.244812  53.486046    NaN
    3             3     0 -184.178406  53.478687    NaN
    4             4     0 -184.112915  53.471695    NaN
    ...         ...   ...         ...        ...    ...
    3749995    2495  1499 -112.517677  14.803576    NaN
    3749996    2496  1499 -112.495865  14.804007    NaN
    3749997    2497  1499 -112.474197  14.804358    NaN
    3749998    2498  1499 -112.452446  14.804746    NaN
    3749999    2499  1499 -112.430595  14.805193    NaN

Here we've added two new columns, sample and line, which reference the indexes of the matrix data.

Now, we can apply both previous conditions to our dataframe and filter it appropriately:

df = df[ (df['longitude'] > williams_flats_long-0.25) & (df['longitude']<williams_flats_long+0.25) ]
df = df[ (df['latitude'] > williams_flats_lat-0.25) & (df['latitude']<williams_flats_lat+0.25) ]

Ouput

            sample  line   longitude   latitude  Power
    211870    1870    84 -118.726280  48.206997    NaN
    211871    1871    84 -118.694878  48.208351    NaN
    211872    1872    84 -118.663307  48.209919    NaN
    211873    1873    84 -118.631660  48.211582    NaN
    211874    1874    84 -118.600403  48.212662    NaN
    ...        ...   ...         ...        ...    ...
    246887    1887    98 -118.392639  47.753323    NaN
    246888    1888    98 -118.361153  47.755108    NaN
    246889    1889    98 -118.329903  47.756531    NaN
    246890    1890    98 -118.298660  47.757881    NaN
    246891    1891    98 -118.267349  47.759369    NaN

and find the indexes:

idx_min = df['sample'].min()
idx_max = df['sample'].max()

print(idx_min,idx_max)

outputs

1870 1891

and

jdx_min = df_sub['line'].min()
jdx_max = df_sub['line'].max()

print(jdx_min,jdx_max)

outputs

84 98

Segmenting the matrix is now a straightforward procedure that allows us to quickly select our desired region of focus:

selected_data = data[:,jdx_min:jdx_max,idx_min:idx_max]

References

Links Site
numpy.argwhere umpy.org
numpy.where numpy.org