Improving code performance by compiling
92 views (last 30 days)
Show older comments
Hello,
I'm coding a program where runtime is relevant, so I'm looking for ways to optimize performance. From what I've read here
the compiler / compiler SDK (don't really know the difference between the two) can create standalone apps that support most features - including graphics - but does not speed up the code since it is not compiled. The coder on the other hand can improve runtime, but does not support graphics (which I need). So in my case the only way to make use of compilation to speed up the program would be to put code into functions wherever possible and then compile those into mex files. Is that right?
4 Comments
Mohammad Sami
on 18 Mar 2021
Edited: Mohammad Sami
on 18 Mar 2021
As you noted, compiling the app into standalone executable will not improve the performance beyond what you have already achieved in Matlab. The only thing you can do is to vectorize your code and to follow the optimization techniques which you can find on this forum or in the help page here.
Accepted Answer
Jan
on 18 Mar 2021
Optimization of code is done in different steps:
- Write the code as clean and clear as possible. Do not start with a pre-mature optimization, because this is too prone to bugs.
- Prove that the code is working correctly using unit- or integration tests.
- Then you have a start point to compare the results with the improved versions.
- Use the profiler to find the bottlenecks. It is not worth to optimize a piece of code, which needs 1% of the processing time only.
- In many situations investing some brain power can accelerate the code massively: Process matrices columnwise instead of rowwise, move repeated code out of loops, avoid creating variables dynamically by EVAL, or LOAD without storing the output in a variable. Maths can be useful also e.g. by reducing the number of expensive EXP or POWER functions.
- Rewriting the bottlenecks as C-Mex functions can be very efficient, but if e.g. the amin work is spend in linear algebra routines, Matlab uses highly optimized libraries already.
- If rewriting the code as C-Mex is to expensive, try the Coder. This converts the code automatically but with some overhead compared to a manually written C-Mex function.
- If graphics are the bottleneck, there is no solution. Matlab 2009a was much faster for a lot of tasks, and the 20 year old Matlab 6.5 beat them all, because it did not use Java for the rendering. But of course the ancient Matlab versions have many drawbacks also - if you want a box around a diagram, you have to rotate the diagram by 0.0001 degreem if OpenGL is used as renderer...
After the optimization is ready, compare the results with the initial version. Compiled functions and even C-Mex functions need not be compatible with future versions, so care for keeping the original not optimized M-files.
Compiling can accelerate your code by a factor 2, with some luck. Exploiting the underlying maths and improving the MATLAB code can give you a fctor of 100. You find some examples in the forum with 200 and 1000 times faster code in pure Matlab. So maybe it is worth to share the code of the bottlenecks of your code.
3 Comments
Jan
on 20 Mar 2021
Edited: Jan
on 20 Mar 2021
I do not have the Coder. As far as I understand, it does not create standalone executables and including graphics is not working also. The Coder creates C++ code from Matlab code. This should be useful for the acceleration of some m-functions.
If an array is a vector only, the orientation does not matter: neighborinmg elements are stored side-by-side in the RAM also. For matrices, the elements of the same column are store continuously, but the element of the next column is stored with an offset in the RAM. The CPU moves the memory in blocks of 64 byte (the "cache line size") to the internal cache, where it can be processed much faster than in the RAM. Therefore the processing of matrices is faster, if they are accessed columnwise:
n = 1e4;
x = rand(n, n);
tic
s = 0;
for i1 = 1:n % Column index outside
for i2 = 1:n % Row index inside
s = s + x(i1, i2);
end
end
toc
tic
s = 0;
for i2 = 1:n % Other way around: Row outside
for i1 = 1:n % column inside
s = s + x(i1, i2); % Neigboring elements in the RAM
end
end
toc
% Elapsed time is 1.923739 seconds. row-wise
% Elapsed time is 0.160935 seconds. column-wise
tic
s = 0;
for i1 = 1:n
s = s + x(i1, :); % Slow copy of row-vectors
end
s = sum(s);
toc
tic
s = 0;
for i1 = 1:n
s = s + x(:, i1); % Fast copy of column vectors
end
s = sum(s);
toc
% Elapsed time is 1.270662 seconds.
% Elapsed time is 0.098323 seconds. 13 times faster!
The actual computations are very cheap. Therefore the speed in limited by the memory access. Then the processing of neighboring elements in the RAM is much faster.
The coder produced by the Coder would suffer from the same problem, because the speed of the memory access does not depend on the programming language, but on the hardware of the CPU and the caching mechanism.
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!