Most efficient way of looking up values in an array.
8 views (last 30 days)
Show older comments
Johann Vormadal
on 17 Mar 2023
Commented: Johann Vormadal
on 20 Mar 2023
I have an array (a kind of lookup array that I have generated) with 8 columns and around 400 rows. In every row the first four values represent the boundaries (y1, x2, y2 and x1) of rectangular regions in a plane. The last four values are integers I want to access if a query point is within the region described by the first four values. The rectangular regions fully cover the plane upto certain limits and do not overlap, but they are not regular/structured.
If I have, say, 10million points (represented by and x vector and a y vector) and I want to find for each of those the four integers (in column 5:8) that correspond to their region, What is the fastest way of looking this up? changing the way my lookup array represents the data is not off-limits, if it helps making the process faster.
I have tried the following, which is slow, seems very inefficient and breaks when the points are exactly on a boundary between regions:
QueryPoints = [X Y]; % My 10000000 x 2 vector for query points.
n = length(X); % Determine number of query points
SearchIndex = all( [Y -X -Y X] >= permute([IndexList(:,1) -IndexList(:,2:3) IndexList(:,4)], [3 2 1]),2);
myIntegers = zeros(n,4);
for i = 1:n
myIntegers(i,:) = IndexList(SearchIndex(i,1,:),5:8); % my 10000000 x 4 vector with the set of integers corresponding with my coordinates
end
Thanks.
4 Comments
Bruno Luong
on 17 Mar 2023
Edited: Bruno Luong
on 17 Mar 2023
IMO using autoexpension might be not a good idea, since the intermediarete result (before all) requires
(10e6*4*8*452)/1e9
144 Gb, so your computer might swap to HD and it works relly slow in thid condition.
It might be just loop on your rectangles (452 iterrations) then check which 10e6 point falling inside. The amount of memory requires is 0.33 Gb and it does not need to swap.
You can even split the 4 tests in succesive tests so that the later test work on a smaller set.
Accepted Answer
John D'Errico
on 17 Mar 2023
Edited: John D'Errico
on 17 Mar 2023
It hurts my head to see this ordering: (y1, x2, y2 and x1). Sigh. Why? Whatever floats your boat, I guess.
I would not be using a loop. Not even close. Not even remotely so. Nor would I be testing every rect, as your code does. Instead, I would be looking for a way to solve this using existing efficient code designed to solve that problem.
IF the rectangular regions fully tile a planar region, but do not overlap, I would first break each rect down into TWO triangles. The result will be a triangulation of that region. So I might now create a triangulation from those triangles. Don't use delaunay. (I don't know that your problem is convex anyway.) You already know the rects. And that means you already can compute the triangulation directly.
help triangulation
methods triangulation
As you can see, triangulation has a pointLocation method. I might be using that.
help triangulation/pointLocation
Before I did too much, I might look to see if other tools, for example, tsearch might be faster.
For example:
n = 21;
[X,Y] = ndgrid(linspace(0,1,n)); % 400 rects
XYgrid = [X(:),Y(:)];
[ri,rj] = ndgrid(1:n-1);
rind = sub2ind([n,n],ri(:),rj(:));
tri = [[rind,rind+1,rind+n];[rind+n+1,rind+1,rind+n]];
triang = triangulation(tri,X(:),Y(:));
rectid = [rind;rind]; % maps each triangle into the original rect it came from
% now, generate 1e7 points in that region
xytest = rand(1e7,2);
tic
triloc = pointLocation(triang,xytest);
rectloc = rectid(triloc);
toc
7 seconds in the online tool, (my computer is always faster when running my own copy of MATLAB. That was only 1.5 seconds in my own copy of MATLAB.) seems reasonable to locate 1e7 points. In fact of course, I could do better knowing the actual layout of this grid. But you tell us that your rects are not so trivially arranged as I have in this example. And pointLocation should not care.
4 Comments
John D'Errico
on 19 Mar 2023
@Bruno Luong - sorry, but you are completely wrong in this. I showed how to use a triangulation, and the function pointLocation. Yes, I did suggest that tsearch MIGHT be an optional alternative. But I explaicity stated to use pointLocation as a working option. In fact, pointLocation does explicitly NOT need a convex triangulation.
Seriously Bruno, TRY IT if you don't know the answer.
PS = polyshape([0 0;2 0;2 1;1 1;1 2;0 2])
plot(PS)
PStri = triangulation(PS)
PStri is a triangulation object, but not a convex one.
trimesh(PStri.ConnectivityList,PStri.Points(:,1),PStri.Points(:,2))
PStri can be used in by pointLocation.
pointLocation(PStri,2,2)
pointLocation(PStri,.5,.5)
Convexity is not a requirement for pointLocation.
Even so, IF the domain is always a convex one, then tsearch MIGHT be faster. I have not compared the older code to see how it competes.
Bruno Luong
on 19 Mar 2023
Edited: Bruno Luong
on 19 Mar 2023
Delaunay doesn't not have to be convex If I remember one of the definntion is that that circumscrbed circle on any triangle does not contain any vertices other than the three of the triangle.
It can also derived as the geometrical "dual" of voronoi.
I believe various MATLAB tsearch uses some special property of Delaunay to make the search efficient and it breaks downs if it is not.
I remember in the pass trying to do interpolation and provide my own triangulation (for example spliiting some arbitray quadrilateral mesh coming from somewhere else) and calling closestpoint and I fail to do (or may be the process takes too much time, I can't recall the detail why I drop it).
May be I should revisit again.
More Answers (1)
the cyclist
on 17 Mar 2023
Edited: the cyclist
on 17 Mar 2023
One simple step that could speed up your code a lot would be to preallocate memory for your output array. Put
myIntegers = zeros(n,4);
before your for loop.
See Also
Categories
Find more on Delaunay Triangulation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!