# How to slice a multidimensional matrix on column values with numpy ?

Published: March 13, 2023

Updated: March 13, 2023

Tags: Python; Numpy;

A case study of how to slice a multidimensional matrix based on column values with numpy:

## Case study

Let's discuss a realistic example where we have a multidimensional matrix (called hereafter data) of shape:

````(3,1500, 2500)`
```

where

````dim 0 = longitude`
`dim 1 = latitude`
`dim 2 = a physical value (the fire radiative power FRP for instance)`
```

The goal is to extract a smaller matrix for a given range of latitudes and longitudes .

## Data

Data minimum longitude data[0,:,:].min()

````-184.37672`
```

Data maximum longitude data[0,:,:].max()

````-89.623276`
```

Data minimum latitude data[1,:,:].min()

````14.57134,`
```

Data maximum latitude data[1,:,:].max()

````53.500195`
```

Our objective is to locate and obtain values that are situated nearby (with +/- 0.25 degree):

````williams_flats_long = -118.49064628838096`
`williams_flats_lat = 47.9849149578827`
```

## Solution

It's effortless to slice through a matrix when the indexes are known with NumPy. However, in the previous example we didn't have that information-- so our first objective was to track down indexes based on these requirements:

````longitude > williams_flats_long-0.25 & longitude< williams_flats_long+0.25`

`latitude > williams_flats_lat-0.25 & latitude <williams_flats_lat+0.25)`
```

In the past, I'd typically populate a dataframe with my values first since this method was effective for handling mixed types of data:

````import pandas as pd`

`df = pd.DataFrame()`

`x_axis_size = data.shape[2]`
`y_axis_size = data.shape[1]`

`xv, yv = np.meshgrid(np.arange(0,x_axis_size), np.arange(0,y_axis_size))`

`df['sample'] = xv.flatten() `
`df['line'] = yv.flatten()`

`df['longitude'] = data[0,:,:].ravel()`
`df['latitude'] = data[1,:,:].ravel()`

`df['Power'] = data[2,:,:].ravel()`
```

The above code will generate the following DataFame

````             sample  line   longitude   latitude  Power`
`    0             0     0 -184.376724  53.500195    NaN`
`    1             1     0 -184.310455  53.493004    NaN`
`    2             2     0 -184.244812  53.486046    NaN`
`    3             3     0 -184.178406  53.478687    NaN`
`    4             4     0 -184.112915  53.471695    NaN`
`    ...         ...   ...         ...        ...    ...`
`    3749995    2495  1499 -112.517677  14.803576    NaN`
`    3749996    2496  1499 -112.495865  14.804007    NaN`
`    3749997    2497  1499 -112.474197  14.804358    NaN`
`    3749998    2498  1499 -112.452446  14.804746    NaN`
`    3749999    2499  1499 -112.430595  14.805193    NaN`
```

Here we've added two new columns, sample and line, which reference the indexes of the matrix data.

Now, we can apply both previous conditions to our dataframe and filter it appropriately:

````df = df[ (df['longitude'] > williams_flats_long-0.25) & (df['longitude']<williams_flats_long+0.25) ]`
`df = df[ (df['latitude'] > williams_flats_lat-0.25) & (df['latitude']<williams_flats_lat+0.25) ]`
```

Ouput

````            sample  line   longitude   latitude  Power`
`    211870    1870    84 -118.726280  48.206997    NaN`
`    211871    1871    84 -118.694878  48.208351    NaN`
`    211872    1872    84 -118.663307  48.209919    NaN`
`    211873    1873    84 -118.631660  48.211582    NaN`
`    211874    1874    84 -118.600403  48.212662    NaN`
`    ...        ...   ...         ...        ...    ...`
`    246887    1887    98 -118.392639  47.753323    NaN`
`    246888    1888    98 -118.361153  47.755108    NaN`
`    246889    1889    98 -118.329903  47.756531    NaN`
`    246890    1890    98 -118.298660  47.757881    NaN`
`    246891    1891    98 -118.267349  47.759369    NaN`
```

and find the indexes:

````idx_min = df['sample'].min()`
`idx_max = df['sample'].max()`

`print(idx_min,idx_max)`
```

outputs

````1870 1891`
```

and

````jdx_min = df_sub['line'].min()`
`jdx_max = df_sub['line'].max()`

`print(jdx_min,jdx_max)`
```

outputs

````84 98`
```

Segmenting the matrix is now a straightforward procedure that allows us to quickly select our desired region of focus:

````selected_data = data[:,jdx_min:jdx_max,idx_min:idx_max]`
```