Asked by nadia
on 23 Feb 2018

Hi, I have a matlab program and I want to know how much memory my program use. is there any way to compute this? for example when I use tic,tac I can process the implementation time. is there any solution like this for memory usage? I want to calculate computational complexity of the whole algorithm. I have windows 10 and I use Matlab R2016a. thanks in advance.

Answer by Jan
on 23 Feb 2018

It is not even uniquely defined what "used memory" is. If you read a file, the contents is cached in the RAM by the operating system - does this count to be used by your code or by the operating system? If you shrink an array, the OS can decide, if the free'd part of the memory is reused directly or not. Before it can be re-used, the contents is overwritten with zeros, and the OS decides, when this is done. So this:

x = ones(1, 1e6);

x(5e5:end) = [];

y = ones(1, 5e5),

might use 8*1e6 or 8*1.5e6 bytes depending on some external factors, which cannot be controlled by Matlab. With iteratively growing arrays you can get similar problems:

x = [];

for k = 1:1e6

x(k) = rand;

end

This does not allocate 1e6 doubles, but sum(1:1e6), which is 4 TB! Of course some is free'd during the processing, but maybe with a certain delay. Finally the code "occupies" 8MB only, but what is its "memory consumption"?

I think, the question is very hard to answer, because you have to define exactly, what you want to measure. The results might be different for different runs, as tic/toc values are also for measuring the timings.

nadia
on 24 Feb 2018

Jan
on 30 Apr 2018

I did tell you already, that "memory consumption" is not uniquely defined. Time consumption is fragile also: E.g. the speed step setting and cooling method matters. If the processor reaches a certain temperature, it reduces its frequency. If the CPU cache is exhausted, it has to wait for the much slower RAM or even worse the virtual RAM stored on the hard disk. In consequence the time consumption depends on a variety of parameters, which are not part of the algorithm, but e.g. of the CPU architecture, the temperature at start of the code and cooling power, the temperature of the room. The relation between processing time and data size might follow a specific polynomial for some different inputs, but there can (and will) be inputs, which produce a totally different behavior.

One example is code, which uses Matlab's sum function. This is multi-threaded, if the input has more then 88999 elements. Starting a 2nd thread on the CPU costs time and energy and the additional work might reduce the processor speed to reduce heat production. Therefore the computing time on sums over 88999 and 89000 can differ by a factor 2 in both directions depending on the CPU.

If you are aware of these effects and consider them with great care, you can use timeit or some simple tic/toc commands for a rough estimation of the O complexity. Measure the timings for some input sizes between 1'000 and 100'000 elements. Fit e.g. a polynomial to the resulting curve. But do not assume, that the timing can be extrapolated to 1e6 elements reliably.

Sign in to comment.

Answer by Walter Roberson
on 19 Apr 2019

Edited by Walter Roberson
on 19 Apr 2019

You can turn on memory profiling; https://www.mathworks.com/matlabcentral/answers/214247-allocated-freed-self-memory-and-peak-memory-in-profiler to get an analysis of how much memory is being used as your calls proceed.

Note that when you turn on profiling, aspects of the JIT (Just In Time Compiler) might be turned off. It can take considerably longer to execute code when profiling is turned on, and you can end up with severely distorted understandings of relative executions time.

It is essentially impossible to deduce computational complexity by measuring execution time on modern computers.

A couple of months ago I posted this with regards to computational complexity of exp():

The question can be talked about from a theory standpoint, or from a practical standpoint.

From a theory standpoint, exp(), being a 1 to 1 function, could theoretically be implemented as a single lookup table, and therefore could be represented as taking constant time.

Some processors implement exp() in hardware, though it does not appear that any of the mainstream general purpose architectures do. NVIDIA offers __device__float __expf for base e exponentiation.

Otherwise you end up with a software implementation, which is what MATLAB uses. I do not know what algorithm MATLAB uses, but I see a polynomial approach is recommended: http://jrfonseca.blogspot.com/2008/09/fast-sse2-pow-tables-or-polynomials.html

If you examine the theory of what computational complexity measures, it basically measures the growth in the number of times that you need to apply a unary or binary operation to values, measuring how the number of times rises as the size of the input increases. Computational complexity does not care how long it takes to calculate any one operation -- that is all folded into constants of proportionality that do not matter when it comes time to talk about rate of growth. An algorithm that needs to touch every item of input 1073 times will be faster, on large enough inputs, than an algorithm that somehow computes the same thing with 3*n operations per n input (so 3*n*n = 3*n^2 operations). So as far as computational complexity is concerned, how long it takes to compute exp() does not matter: the time is a bounded constant per single input for any given processor. And yet, obviously in practice it makes a notable difference whether you do exp(sum(x)) or you do prod(exp(x)). Computational complexity doesn't care, but real processors do.

Theoretical analysis says that you can calculate a matrix multiplication in time proportional to about n^2.3 when the array is n by n: people have proven it in theory and said explicitly how it could be done. But in practice, the computation takes a lot of memory and needs a lot of initialization, so it is slower in theory than the common n^3 algorithm until you get array sizes somewhere beyond 10^10. What is faster in practice is the "blocked" matrix multiplication algorithms from the BLAS type libraries, that know about processor cache sizes and know to reduce locality of reference and are suitable for execution on multiple hardware threads simultaneously. Thus if you create an algorithm that involves a matrix multiplication, then the computational complexity you need to use for theory analysis is the n^2.3-ish figure, because it could be done in that proportion -- but the complexity you would measure for that step would be n^3, and you would get the results very much sooner than if you use the n^2.3-ish algorithm...

Sign in to comment.

Answer by Ahmad Al-Mahasneh
on 19 Apr 2019

Edited by Ahmad Al-Mahasneh
on 19 Apr 2019

Basically a work around it will be adding two memory commands before and after your program and checking how much difference in the avaiable memory. It might be not accurate but possible to run it for 10 times and record the mean and variance

[user,sys] = memory;%place in the begging of your program

[user2,sys2] = memory;%place in the end of your program

memory_used_in_bytes=user2.MemAvailableAllArrays-user.MemAvailableAllArrays;

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.