Why is arrayfun for GPU slower than normal operations

Question

Theron FARRELL on 28 May 2019

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/464361-why-is-arrayfun-for-gpu-slower-than-normal-operations

Commented: Theron FARRELL on 3 Jun 2019

Accepted Answer: Joss Knight

Open in MATLAB Online

Hi there,

Here goes a piece of testing code, yet arrayfun runs more slowly. Any thoughts? Many thanks.

function Test_GPU1()
EP = gpuArray(eps*ones(10000, 1, 'single'));
ONE = gpuArray(ones(10000, 1, 'single'));
ZERO = gpuArray(zeros(10000, 1, 'single'));
Cur_FF_Output = gpuArray(0.5*ones(10000, 1, 'single'));
Cur_Desired_Output = gpuArray(0.5*ones(10000, 1, 'single'));
for iter = 1:1000
% In output layer, Cur_Delta = Del(C)/Del(z) =  Del(C)/Del(a) * Del(a)/Del(z)
% [~, Cur_Delta0] = Cost_Function_GPU(Cur_FF_Output, Cur_Desired_Output, Hyper_Para);
temp00 = Cur_FF_Output + eps;
temp11 = log(temp00);
temp22 = log(1-Cur_FF_Output+eps);
temp33 = Cur_Desired_Output.*temp11;
temp44 = 1-Cur_FF_Output.*temp22;
Cur_Delta = Cur_FF_Output-Cur_Desired_Output;
Cost = 0-sum(temp33+temp44);
temp00 = arrayfun(@plus, Cur_FF_Output, EP);
temp11 = arrayfun(@log, temp00);
temp22 = arrayfun(@log, arrayfun(@minus, ONE, arrayfun(@plus, Cur_FF_Output, EP)));
temp33 = arrayfun(@times, Cur_Desired_Output, temp11);
temp44 = arrayfun(@minus, ONE, arrayfun(@times, Cur_FF_Output, temp22));
Cur_Delta = arrayfun(@minus, Cur_FF_Output, Cur_Desired_Output);
Cost = arrayfun(@minus, ZERO, sum(temp33+temp44));
end
end

1 Comment
Show -1 older commentsHide -1 older comments

Jan on 28 May 2019

Of course arrayfun has a certain overhead. It is expected to run slower than calling the operators directly.

Sign in to comment.

Sign in to answer this question.

Answer 1

Joss Knight on 28 May 2019

4
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/464361-why-is-arrayfun-for-gpu-slower-than-normal-operations#answer_376974

Open in MATLAB Online

You are misunderstanding the use of arrayfun for gpuArray. Combine all those operations into a single function.

temp00 = arrayfun(@plus, Cur_FF_Output, EP);
temp11 = arrayfun(@log, temp00);
temp22 = arrayfun(@log, arrayfun(@minus, ONE, arrayfun(@plus, Cur_FF_Output, EP)));
temp33 = arrayfun(@times, Cur_Desired_Output, temp11);
temp44 = arrayfun(@minus, ONE, arrayfun(@times, Cur_FF_Output, temp22));
Cur_Delta = arrayfun(@minus, Cur_FF_Output, Cur_Desired_Output);
Cost = arrayfun(@minus, ZERO, sum(temp33+temp44));

becomes

function Cur_Delta = stuff(Cur_FF_Output, Cur_Desired_Output, EP)
    temp00 = Cur_FF_Output + EP;
    temp11 = log(temp00);
    temp22 = log(1 - (Cur_FF_Output + EP));
    temp33 = Cur_Desired_Output .* temp11;
    temp44 = 1 - (Cur_FF_Output .* temp22);
    Cur_Delta = Cur_FF_Output - Cur_Desired_Output;
end
Cur_Delta = arrayfun(@stuff, Cur_FF_Output, Cur_Desired_Output, EP);

Obviously, this can be extremely simplified. I've made a start, by removing the unnecessary ONE and ZERO variables.

After this, question whether you really need arrayfun, or should just call this function directly? MATLAB uses some clever optimisations that, for most sequences of element-wise operations, make using arrayfun unnecessary.

7 Comments
Show 5 older commentsHide 5 older comments

Theron FARRELL on 30 May 2019

Edited: Theron FARRELL on 30 May 2019

Open in MATLAB Online

Great! I compared two loops as follows

Cur_FF_Output = gpuArray(0.5*ones(10000, 1, 'single'));
Cur_Desired_Output = gpuArray(0.5*ones(10000, 1, 'single'));
tic
for iter = 1:1000
% In output layer, Cur_Delta = Del(C)/Del(z) =  Del(C)/Del(a) * Del(a)/Del(z)
% [~, Cur_Delta0] = Cost_Function_GPU(Cur_FF_Output, Cur_Desired_Output, Hyper_Para);
temp00 = Cur_FF_Output + eps;
temp11 = log(temp00);
temp22 = log(1-Cur_FF_Output+eps);
temp33 = Cur_Desired_Output.*temp11;
temp44 = 1-Cur_FF_Output.*temp22;
Cur_Delta = Cur_FF_Output-Cur_Desired_Output;
Cost = -sum(temp33+temp44);
end
toc
tic
for iter = 1:100
    [Cur_Delta, temp33, temp44] = arrayfun(@stuff, Cur_FF_Output, Cur_Desired_Output);
    Cost = -sum(temp33+temp44);;
end
toc

And the result is

Elapsed time is 1.595991 seconds.

Elapsed time is 0.476634 seconds.

The second one DID runs substantially faster. Wonderful!!! Thanks a lot for your generous succour!

So any practical guidelines for using or not using arrayfun() for gpu? I am afraid that they are not clear in official docs.

Also, if arrayfun() is not so helpful in @plus, @mius, and other basic element-wise operations but only so in user-built functions, will it be meaningful to provide the former any longer?

Jan on 31 May 2019

Open in MATLAB Online

No, arrayfun is not outdated. It is efficient, if it is used for the purpose it is written for. It is definitely not designed to perform basic operations on elementary arrays. The practical guideline for using arrayfun is to avoid it, when it is not needed. All, what arrayfun does is to forward the data of the input arrays elementwise to the specified function. Functions, which accept arrays directly (so-called "vectorized" functions) are much more efficient. Example:

a = rand(1000, 1000);
b = rand(1000, 1000);
c = a + b;
d = arrayfun(@plus, a, b)

In the first case the function plus gets the arrays and can process all data in one step. In the second case arrayfunc extracts the data elementwise and call plus 1 million times to obtain the result. This is a severe overhead.

If a function does not accept arrays as input, I'd still prefer a loop. See the example given in the documentation of arrayfun:

S(1).f1 = rand(1,5);
S(2).f1 = rand(1,10);
S(3).f1 = rand(1,15);
A = arrayfun(@(x) mean(x.f1),S);

While this looks cute and compact, I prefer the old-fashioned and C-stylish:

A = zeros(size(A));
for k = 1:numel(S)
    A(k) = mean(S(k).f1);
end

This might contain more chances for typos. Then the compact arrayfun call. But as in the cases of cellfun and structfun , the loop approach is usually faster.

I avoid arrayfun completely in my codes.

Theron FARRELL on 1 Jun 2019

I see. 'If you know what you're doing', 'At fast as', and 'no fun' are the key phrases, as I sense. *_^

To be candid, MATLAB, being the de facto most powerful, miraculous, as well as user-friendly scientific and technical SIMULATION and PROTOTYPING tool since 1984--I would not use the word computing here (let's forget MATLAB coder, embedded coder etc originally designed for auto industry at the moment), enjoys her AUTOMATIC optimisation without users' heavy involvement. One of the most typical examples is element-wise (vectorised) operations. That being said, my point is that a user should not take too many efforts on seeking THE most optimised code in lieu of concentrating on algorithmic designs and prototyping, which I do not think most users will do. Consequently, some notifications about pros and cons of using functions as well as PRACTICAL examples such as arrayfun() would be better to be given in the Help page, for example a more formal statement of your words above. Especially , in the advent of DNN, MATLAB would be better prepared for competing with loads of well-optimised, open-sourced code, tensorflow, theano, Caffe etc...

Joss Knight on 1 Jun 2019

I don't know how recently you viewed the latest documentation on GPU support. I think it's pretty comprehensive, and doesn't encourage you to use arrayfun unnecessarily. There's a great page that talks you through the various options for optimising your code which mentions arrayfun only as an advanced manoeuvre that might, but won't always, improve your performance. Generally we think of arrayfun as the way to 'write custom kernels in the MATLAB language', which many advanced users may look for. We rarely find it useful to document exactly what optimizations MATLAB is using. Usually it just confuses people, who start looking for performance improvements in the wrong place, or start blaming unexpected behaviour on the optimisations. Best to just do our best and leave ourselves the wiggle room to change the way things work from one release to the next.

Theron FARRELL on 3 Jun 2019

Understood! Thanks again for your great help and detailed explanation. I am always patient with MATLAB since 1997:-)

Sign in to comment.

Answer 2

Jan on 28 May 2019

1
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/464361-why-is-arrayfun-for-gpu-slower-than-normal-operations#answer_376977

Edited: Jan on 28 May 2019

Open in MATLAB Online

Of course arrayfun has a certain overhead. It is expected to run slower than calling the operators directly with arrays as inputs. In addition, in

Cur_FF_Output + eps

the second operand is a scalar, while in

arrayfun(@plus, Cur_FF_Output, EP)

Matlab has to process a vector. Addressing the elements of an array needs to access memory using a loop. Accessing a scalar is much cheaper.

What is the purpose of:

arrayfun(@minus, ZERO, sum(temp33+temp44))

? This is faster:

-sum(temp33+temp44)

4 Comments
Show 2 older commentsHide 2 older comments

Theron FARRELL on 30 May 2019

No, it is not a bug. Maybe somewhere I wrote the code mistakenly. My bad, sorry.

Jan on 31 May 2019

Open in MATLAB Online

Even arrayfun(@minus, 0, sum(temp33+temp44)) is too complicated compared to

-sum(temp33+temp44)

Sign in to comment.

Why is arrayfun for GPU slower than normal operations

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

7 Comments
Show 5 older commentsHide 5 older comments

More Answers (1)

4 Comments
Show 2 older commentsHide 2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Why is arrayfun for GPU slower than normal operations

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

7 Comments Show 5 older commentsHide 5 older comments

More Answers (1)

4 Comments Show 2 older commentsHide 2 older comments

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

7 Comments
Show 5 older commentsHide 5 older comments

4 Comments
Show 2 older commentsHide 2 older comments