A case study of how to slice a multidimensional matrix based on column values with numpy:
Table of contents
Case study
Let's discuss a realistic example where we have a multidimensional matrix (called hereafter data) of shape:
(3,1500, 2500)
where
dim 0 = longitude
dim 1 = latitude
dim 2 = a physical value (the fire radiative power FRP for instance)
The goal is to extract a smaller matrix for a given range of latitudes and longitudes .
Data
Data minimum longitude data[0,:,:].min()
-184.37672
Data maximum longitude data[0,:,:].max()
-89.623276
Data minimum latitude data[1,:,:].min()
14.57134,
Data maximum latitude data[1,:,:].max()
53.500195
Our objective is to locate and obtain values that are situated nearby (with +/- 0.25 degree):
williams_flats_long = -118.49064628838096
williams_flats_lat = 47.9849149578827
Solution
It's effortless to slice through a matrix when the indexes are known with NumPy. However, in the previous example we didn't have that information-- so our first objective was to track down indexes based on these requirements:
longitude > williams_flats_long-0.25 & longitude< williams_flats_long+0.25
latitude > williams_flats_lat-0.25 & latitude <williams_flats_lat+0.25)
In the past, I'd typically populate a dataframe with my values first since this method was effective for handling mixed types of data:
import pandas as pd
df = pd.DataFrame()
x_axis_size = data.shape[2]
y_axis_size = data.shape[1]
xv, yv = np.meshgrid(np.arange(0,x_axis_size), np.arange(0,y_axis_size))
df['sample'] = xv.flatten()
df['line'] = yv.flatten()
df['longitude'] = data[0,:,:].ravel()
df['latitude'] = data[1,:,:].ravel()
df['Power'] = data[2,:,:].ravel()
The above code will generate the following DataFame
sample line longitude latitude Power
0 0 0 -184.376724 53.500195 NaN
1 1 0 -184.310455 53.493004 NaN
2 2 0 -184.244812 53.486046 NaN
3 3 0 -184.178406 53.478687 NaN
4 4 0 -184.112915 53.471695 NaN
... ... ... ... ... ...
3749995 2495 1499 -112.517677 14.803576 NaN
3749996 2496 1499 -112.495865 14.804007 NaN
3749997 2497 1499 -112.474197 14.804358 NaN
3749998 2498 1499 -112.452446 14.804746 NaN
3749999 2499 1499 -112.430595 14.805193 NaN
Here we've added two new columns, sample and line, which reference the indexes of the matrix data.
Now, we can apply both previous conditions to our dataframe and filter it appropriately:
df = df[ (df['longitude'] > williams_flats_long-0.25) & (df['longitude']<williams_flats_long+0.25) ]
df = df[ (df['latitude'] > williams_flats_lat-0.25) & (df['latitude']<williams_flats_lat+0.25) ]
Ouput
sample line longitude latitude Power
211870 1870 84 -118.726280 48.206997 NaN
211871 1871 84 -118.694878 48.208351 NaN
211872 1872 84 -118.663307 48.209919 NaN
211873 1873 84 -118.631660 48.211582 NaN
211874 1874 84 -118.600403 48.212662 NaN
... ... ... ... ... ...
246887 1887 98 -118.392639 47.753323 NaN
246888 1888 98 -118.361153 47.755108 NaN
246889 1889 98 -118.329903 47.756531 NaN
246890 1890 98 -118.298660 47.757881 NaN
246891 1891 98 -118.267349 47.759369 NaN
and find the indexes:
idx_min = df['sample'].min()
idx_max = df['sample'].max()
print(idx_min,idx_max)
outputs
1870 1891
and
jdx_min = df_sub['line'].min()
jdx_max = df_sub['line'].max()
print(jdx_min,jdx_max)
outputs
84 98
Segmenting the matrix is now a straightforward procedure that allows us to quickly select our desired region of focus:
selected_data = data[:,jdx_min:jdx_max,idx_min:idx_max]
References
Links | Site |
---|---|
numpy.argwhere | umpy.org |
numpy.where | numpy.org |