Fast subarray access when using GPU matrices

I need to optimize my GPU code and the slowest line of my code is adding multiple subarrays to one large matrix
for ii = 1:Npos
large_array(ROI{ii,:}) = large_array(ROI{ii,:}) + smaller_array(:,:,ii);
end
Npos is around ~500 and large_array ~2000x2000, smaller is ~256x256, ROI are continuous subregions of large_array
do you have any idea how to write it faster and remove the for-loop ?
The main issue is the huge overhead when Im calling subsref many times.

Answers (1)

I think the best way to proceed is to concoct a single indexing expression that you can use with smaller_array to result in a single update
large_array = large_array + smaller_array(idx);
Obviously, the trick is calculating idx. This depends on the layout of the "pages" of smaller_array. If the pages are in the correct order in a column-major sense, here's how you could come up with "idx" for the case where large_array is 4-by-4 and smaller_array is 2-by-2-by-4:
idx_0 = reshape(1:4, 2, 2); % [1, 3; 2, 4]
idx_1 = repmat(idx_0, 2, 2); % 2-by-2 grid of [1,3;2,4]
idx_2 = 2 * 2 * kron(idx_0, ones(2,2));
idx = idx_1 + (idx_2 - (2*2));
which gives
idx =
1 3 9 11
2 4 10 12
5 7 13 15
6 8 14 16

Products

Asked:

on 6 Apr 2015

Edited:

on 7 Apr 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!