No performance improvement with preallocating for object arrays

Question

Tom Wenk on 13 Apr 2018

2
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/394671-no-performance-improvement-with-preallocating-for-object-arrays

Edited: Tom Wenk on 25 Apr 2018

Hi everyone,

I programmed a database where I want to save handle-class-objects in an array ( Matlab R2016b). Now for testing I programmed a loop and Matlab suggested me to preallocate the array. I started reading more about Preallocation and tried to test whether the time saved is worth it or not (because it is difficult in my task to say how many objects I get).

As far as I understood, I can preallocate an object-array by using:

objectArray( numberOfObjects ) = classObject

whereas classObject needs to have a default constructor. The array is then filled with default objects.

Now for testing I wrote the following code:

% Signal = class object / needs to be implemented first
numberOfObjects = 100000;
iterations = 100;
timeNeeded = zeros( iterations, 1 );
% WITH preallocation
for i = 1 : iterations
  clear objectArray;
  objectArray( numberOfObjects ) = Signal; % Preallocating the array
  tic;
      for index = 1 : numberOfObjects
          % Assigning elements to the array
          objectArray( index ) = Signal( 'Test', index, numberOfObjects, iterations );
      end
  timeNeeded( i ) = toc;
end
meanValue = mean( timeNeeded );
clear timeNeeded objectArray;
timeNeeded = zeros(50,1);
% WITHOUT preallocation
for i = 1 : iterations
  clear objectArray;
  tic;
      for index = 1 : numberOfObjects
          % Directly assigning the elements to the array without preallocation
          objectArray( index ) = Signal( 'Test', index, numberOfObjects, iterations );
      end
  timeNeeded( i ) = toc;
end
meanValue = [ meanValue; mean( timeNeeded ) ]

In this test I saved 100000 class objects for 100 times, took the time and calculated the mean. For me, the time needed with preallocating was 1.2640s and without preallocating 1.2613s.

Did I make a mistake or has the 'organization' of Matlab become so fast that preallocation is not needed anymore? In the the following article from 2011, the author says that with Matlab R2011a they improved some algorithms, so automatic array growth gets significantly faster. Maybe there have been some more improvements?

MathWorks Blogs: Automatic array growth gets a lot faster in R2011a

Thanks for the help

Tom

EDIT: The objects I'm talking about are handle class objects

EDIT: I also tried filling the array with the following line (in the for-loop)

objectArray = [ objectArray, Signal ]; % Assigning elements to the array

In this variant there was a huge improvement with preallocating against without it! Without preallocating I even had the time increasing non linear with the number of objects I assigned.

10 Comments
Show 8 older commentsHide 8 older comments

Stephen23 on 13 Apr 2018

Edited: Stephen23 on 13 Apr 2018

"but in this case I don't really know how many objects I will have to save, because it is object-oriented and you can add objects dynamically during run-time!"

In that case there might not be much point in preallocating. You could tell MATLAB to not give that warning: either globally in the code analyzer settings (not recommended) or by right clicking the line where the warning is shown and clicking "suppress ... on this line". You could add a small comment for future readers too, so they know why it is like that.

Code guidelines are just guidelines...

I don't have much experience with objects though, so hopefully one of the experts in that area has some more useful comments on this.

Tom Wenk on 13 Apr 2018

Maybe you're right and there is no point in preallocating the array.

But now that I tested the benefit and didn't see one, I still want to know why that is the case! :D

Sign in to comment.

Sign in to answer this question.

Answer 1

James Tursa on 13 Apr 2018

1
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/394671-no-performance-improvement-with-preallocating-for-object-arrays#answer_314995

Edited: James Tursa on 13 Apr 2018

Open in MATLAB Online

Pre-allocation of variables means different things depending on the variable class.

(1) Numeric, char, and logical classes:

This is pretty straightforward since the data model is easily understood. You have data pointer(s) that point to blocks of memory that hold the raw data. If you are going to fill that data in a loop, pre-allocation makes a lot of sense so the data areas don't get copied & recopied over and over again at each iteration. This copying and recopying can easily dominate run times.

(2) Cell and struct class and the old @directly style class objects:

This begins to get a little harder to understand, especially for beginners. You still have data pointer(s) that point to blocks of memory that hold the raw data, but the raw data of cell and struct variables is other variable pointers. That is, the raw data areas of cell and struct arrays contain 32-bit or 64-bit addresses of other variables. (The field names for struct variables is a side complication that doesn't affect the points of this discussion ... they are actually held in a "global" common field name pool that all structs use). Internally, the old @directory class objects are actually structs.

For cell and struct variables, pre-allocating of variables can make sense but only if you do it correctly and here is where it gets tricky for the beginner because they don't understand the underlying data model. Consider this code:

C = {}; % start with an empty cell
n = 1000; % number of cells that I want
for k=1:n
    C{k} = zeros(1,100); % pre-allocate because I know I will fill this in downstream
end
% isshared call goes here
% now fill in the data
for k=1:n
    C{k} = myfunction(100); % some function that returns a 1x100 double vector
end

The above code is bad. The first loop grows the size of the C array in a loop which is bad for the reasons already mentioned above. In this case the data that is getting copied & recopied are the 32-bit or 64-bit variable pointers, not the zeros(1,100) variables. So it is not quite as bad as it might originally seem, but it is still bad. But also there are 1000 different zeros(1,100) variables created and stored in the cell array, which are then wiped out with the very next loop. So just a bunch of wasted effort in the first loop. If you use my isshared( ) function from the FEX you can see that after the first loop (but prior to the second loop) there is no sharing:

>> [N,M] = isshared
N =
   Empty matrix: 0-by-1
M = 
   Empty cell array: 0-by-1

So now consider this modification:

n = 1000; % number of cells that I want
C = cell(1,n); % at this point C contains 1000 NULL pointers
C(1:n) = {zeros(1,100)}; % now C has 1000 addresses that are all the same
% isshared call goes here
% now fill in the data
for k=1:n
    C{k} = myfunction(100); % some function that returns a 1x100 double vector
end

This is a better way of doing the initial C pre-allocation, since there is only one physical zeros(1,100) variable in memory. That is, there are 1000 shared reference copies of that zeros(1,100) variable contained in C. All 1000 addresses in the data area of C are exactly the same. But again it is wasted effort because they are all wiped out with the second loop. To see the sharing going on after the initial pre-allocation (but prior to the last loop):

>> [N,M] = isshared
N =
        1000
M = 
    {1x1000 cell}
>> M{1}(1:3)
ans = 
    'C{1}'    'C{2}'    'C{3}'

So, all 1000 elements of C are in fact physically the same variable in memory. A good way to pre-allocate, but not if you are simply going to turn around and replace the elements downstream.

(3) Classdef objects:

Now it is a lot harder to understand, because the data model of classdef objects is not published. There is no official documentation telling you how or where the properties are stored, or what happens when you create arrays of them. Things can be quite different if the object is derived from handle or not. Or the objects may contain connections to other resources. So what happens when you create one of these objects, or arrays of these objects, can depend intimately on the details of the class definition. So a general discussion of what happens when you create arrays of them is probably out of scope until specific details of the class are known.

IF your class object is of the "straightforward" variety (e.g., has properties of regular MATLAB variables and functions that operate on them), then my guess is that the behavior of creating arrays of them acts like creating arrays of cell or struct variables. That is, you get deep or shared copies of stuff depending on how you do the pre-allocation. And there may be NULL or shared reference copies held in the data areas for some elements. Here, isshared( ) can shed some light on things. Consider this class definition:

classdef myclass3
    properties
        a
        b
    end
end

Then consider this code:

>> clear all
>> m = myclass3
m = 
  myclass3
    Properties:
      a: []
      b: []
    Methods
  >> m.a = 1:3;
  >> m.b = 4:6;
  >> A(3) = m;    % Method A
  >> B(1:3) = m;  % Method B
  >> [N,M] = isshared
  N =
       6
       6
  M = 
      {1x6 cell}
      {1x6 cell}
  >> M{1}
  ans = 
      'A(3).a'    'B(1).a'    'B(2).a'    'B(3).a'    'm.a'    []
  >> M{2}
  ans = 
      'A(3).b'    'B(1).b'    'B(2).b'    'B(3).b'    'm.b'    []

So things worked as we might have expected. When we created the A array, only the 3rd element got a copy of our m classdef object. The other two array element are there, but when we look at them they are just default empty elements:

>> A(1)
ans = 
  myclass3
    Properties:
      a: []
      b: []
    Methods

What is not clear to me is if the underlying variable addresses for A(1) and A(2) are NULL in the background (like cell and struct arrays), or if there is a physical variable (with empty properties) in memory. Since the data model is not published, I can't look there even in a mex routine to confirm anything. So how the A(1) and A(2) elements are created when A gets created is still unclear to me.

For the B variable, fortunately it appears to behave in the "shared" sense that we were hoping. That is, B(1) and B(2) and B(3) are all shared copies of each other (and of m). There is really only one physical copy of the 1:3 and 4:6 data in memory, just like we were hoping.

Another example with a slightly different class definition:

classdef myclass4
    properties
        a = 1:10;
        b = 2:20;
    end
end

Here the default object has non-empty properties. Now let's see what happens:

>> clear all
>> m = myclass4;
>> m.a = 1:3;
>> m.b = 4:6;
>> A(3) = m;
>> [N,M] = isshared
N =
     2
     2
     2
     2
M = 
    {1x2 cell}
    {1x2 cell}
    {1x2 cell}
    {1x2 cell}
>> M{1}
ans = 
    'A(1).a'    'A(2).a'
>> M{2}
ans = 
    'A(1).b'    'A(2).b'
>> M{3}
ans = 
    'A(3).a'    'm.a'
>> M{4}
ans = 
    'A(3).b'    'm.b'

So here it becomes clear that when building the object array A, MATLAB has apparently called the constructor only once to fill in the A(1) and A(2) elements, since we see that A(1) and A(2) are shared copies of each other.

Now consider this modification:

classdef myclass5
    properties
        a = 1:1000;
        b = rand(1,1000); % a random element
    end
end

And some array creation code:

>> clear all
>> m = myclass5;
>> A(3) = m;
>> [N,M] = isshared
N =
     4
     4
M = 
    {1x4 cell}
    {1x4 cell}
>> M{1}
ans = 
    'A(1).a'    'A(2).a'    'A(3).a'    'm.a'
>> M{2}
ans = 
    'A(1).b'    'A(2).b'    'A(3).b'    'm.b'

With this example, it appears that MATLAB has only called the default constructor once when m was created. It also appears that this default object is saved in memory somewhere and re-used for the A(1) and A(2) elements when the object array A is created. The fact that even the rand parts "b" of A(1) and A(2) and A(3) are shared with each other seems to further confirm that the default constructor was only called once overall. As long as the classdef remains in memory, this background default object appears to get re-used for all default object creations. It is only when the classdef itself is cleared from memory (which apparently causes that background default object to get cleared) that forces MATLAB to re-construct a new default object when it first gets called. E.g.,

>> clear all
>> m = myclass5;
>> m.b(1:3)
ans =
    0.3796    0.3191    0.9861
>> clear m
>> m = myclass5;
>> m.b(1:3)
ans =
    0.3796    0.3191    0.9861 % <-- same as first time. rand( ) did not get called again
>> clear all % <-- force the classdef itself to get cleared from memory
>> m = myclass5;
>> m.b(1:3)
ans =
    0.4271    0.9554    0.7242 % <-- now rand( ) gets called again

BACK TO THE ORIGINAL QUESTION:

Q: Can the original poster save any time by doing a pre-allocation of some sort for this object array?

A: Maybe or maybe not. It all depends on the specifics of what the class does, and what he is doing with the object elements downstream in his code. Based on the small examples I show, doing a "last element assignment" method of pre-allocating the object array ala Method A appears to be a good approach. The Method B approach really only makes sense if you are not replacing these elements downstream in your code.

I'm going to hazard a guess that growing a classdef object array in a loop probably behaves the same way as cell and struct variables. That is, you are probably only copying and recopying 32-bit or 64-bit pointers around in memory. If the classdef object array is not too big in size then you probably won't notice much of an overall timing difference between the pre-allocating and not pre-allocating cases.

FYI, isshared( ) can be found here:

https://www.mathworks.com/matlabcentral/fileexchange/65861-isshared-determines-data-sharing-status-among-workspace-variables

isshared( ) does not yet work with R2018a+ versions because the data memory model has changed and I have yet to write new mex code to support that.

2 Comments
Show NoneHide None

Tom Wenk on 14 Apr 2018

Edited: Tom Wenk on 16 Apr 2018

Open in MATLAB Online

"You still have data pointer(s) that point to blocks of memory that hold the raw data, but the raw data of cell and struct variables is other variable pointers."

So basically you just have an additional level before pointing at the raw data like below?

Numeric-Array: Pointer => raw data

Cell-Array: Pointer => Pointer => raw data

"But again it is wasted effort because they are all wiped out with the second loop."

As far as I understood preallocating there is always unnecessary data you fill the array with (which should allocate the same memory as the data you want to assign later) which is overwritten again later. The reason for this is that MATLAB can allocate continuous memory for the whole array. If you don't do so, every time you exceed the array dimensions (grow the array dynamically) and there is no more space in the memory to continuously save the data/pointer, MATLAB needs to look for more continuous space in memory and copy the whole array + new element to the new space. By not needing to do so, this is where you save the time through preallocation.

Maybe I'm mistaken here, if so please correct me.

Another example with a slightly different class definition:

classdef myclass4
  properties
      a = 1:10;
      b = 2:20;
  end
end
>> clear all
>> m = myclass4;
>> m.a = 1:3;
>> m.b = 4:6;

Why do you change the properties after initializing them in the class definition at the beginning?

James Tursa on 16 Apr 2018

Edited: James Tursa on 16 Apr 2018

Open in MATLAB Online

"Cell-Array: Pointer => Pointer => raw data"

Yes. That is essentially what is going on.

"As far as I understood preallocating there is always unnecessary data you fill the array with (which should allocate the same memory as the data you want to assign later) which is overwritten again later."

Let's separate the memory allocation part from the data filling part, and be clear about what we mean by data filling. When you create a variable with usual MATLAB functions, the raw data part does get initialized to something. If you use the zeros( ) function for numeric variables, the memory is allocated and the raw data is filled with 0's. E.g.,

x = zeros(1,10); % data memory is allocated and filled with 0's

If you use the cell( ) function to create a variable, memory is allocated for the raw data (in this case pointers) and the raw data is filled with NULL values. E.g.,

x = cell(1,10); % data memory is allocated and filled with NULL's

But for cell and struct variables, that is the only thing that is done. That "2nd layer" of variables is not filled in with anything until you the programmer put something there. Creating a variable to replace a NULL makes no sense if you are simply going to assign something to that element downstream in your code. That is where the waste is in the examples above. When you access a NULL pointer of a cell or struct variable at the m-code level, MATLAB creates a temporary empty double variable on the fly.

For classdef objects, the situation appears to be different. When I look at the reference count of the properties by hacking into them in a mex routine, it appears that reference copies of the default object have been created for each uninitialized element of the object array. I.e., it appears to me that the uninitialized object array elements are not NULL and created on the fly when accessed (like cell and struct array elements), but are actually physically present in the array element spots (as reference copies of the default object). This is educated guesswork on my part based on mex hacking since this info is not officially published, but it sure looks like that is what is going on. (Otherwise I can not explain the reference count numbers I am seeing). So you can ignore my guess in my original post. Based on some hacking just done a few minutes ago, I see evidence that the uninitialized object array elements actually physically contain reference copies of the default object (and do not contain NULL values like cell and struct arrays). At least that is what appears to happen when the default object has properties that are given a default value. Whether this behavior applies to objects whos properties are not given default values I have no way of checking, even with a mex hack.

For the myclass4 example above, I was just trying different things out to see what the resulting MATLAB behavior would be. Nothing else implied.

Sign in to comment.

Answer 2

John D'Errico on 13 Apr 2018

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/394671-no-performance-improvement-with-preallocating-for-object-arrays#answer_314932

Edited: John D'Errico on 13 Apr 2018

Open in MATLAB Online

First. There are many ways of pre-allocating a matrix!!!!!!

Ones pre-allocates with ones, not zeros, for example:

x = ones(100);

Spalloc allocates as sparse matrix.

Zeros allocates as zeros.

You can just use assignment, thus if was previously undefined, then this preallocates z with zeros:

z(100,100) = 0;

Use repmat.

z = repmat(0,100,100);

repmat is actually a good way to pre-allocate struct arrays.

A.x = 1;
A = repmat(A,100,100);

Assignment in general can be a pre-allocation. Thus, this pre-allocates z as a vector

z = 1:100000;

Or, you can pre-allocate as inf, or NaN.

z = inf(1000,100);
z = NaN(100);

I often use NaN pre-allocation myself, at least when debugging code. That way I can easily check to see what was left unfilled when I am testing code.

You can pre-allocate a cell array.

C = cell(100,100);

Others too, I am sure. If I spent a few minutes I could think of many.

Next, WHY pre-allaocate? When will it help?

pre-allocation only helps when you would otherwise grow that array dynamically. This is bad:

z = [];
for i = 1:100000
  z = [z,i];
end

In fact, that is obscenely bad. This is better:

z = zeros(1,100000);
for i = 1:100000
  z(i) = i;
end

But still not as good as this simple line, where z is pre-allocated in place exactly as I want it anyway.

z = 1:100000;

You pre-allocate because growing an array dynamically, as I did above is BAD. That forces MATLAB to reallocate memory each time the array is grown. Then it copies the ENTIRE array contents over, plus inserting the one new extra element. This is a quadratic growth thing, where the time spent grows quadratically with size.

So unless you are growing the array, pre-allocation is a waste of time. You do not need in general to allocate memory for arrays, unless they are growing.

Finally, IF you pre-allocate an array, then clear it inside a loop, the array goes away. It is as if you have never done any pre-allocation. In fact, you spent more time than was needed, because the pre-allocation was a waste of time.

4 Comments
Show 2 older commentsHide 2 older comments

Tom Wenk on 13 Apr 2018

Edited: Tom Wenk on 13 Apr 2018

Open in MATLAB Online

Thanks for the detailed answer!

"First. There are many ways of pre-allocating a matrix!!!!!! ..."

I think you mentioned nearly every possible preallocation but the one I'm using, which is the preallocation of an object array. I can't preallocate my array with any of these 'functionalities' if the elements I'm adding are class objects. This is the code I found to preallocate an array of objects:

objectArray( numberOfObjects ) = classObject

Maybe this is my mistake? But I don't know any other way to preallocate an array of objects.

"Next, WHY pre-allaocate? When will it help? ..."

Good explanation, besides that Matlab has a code analyzer that tells you when to use it and when not. I think in my posted code the array is dynamically growing, so preallocation should bring an improvement. And that is my problem: I don't see the improvement! This is were I need help: Do I have an error in my test programm? Or did I get the whole idea wrong?

"Finally, IF you pre-allocate an array, then clear it inside a loop, the array goes away. It is as if you have never done any pre-allocation. In fact, you spent more time than was needed, because the pre-allocation was a waste of time."

True again, but if this point is related to my code I don't see the reason for it. In the second variant "WITHOUT Preallocation" that exactly is the point of the clear inside the for-loop, because I want to compare the time needed with and without preallocation. In the first for-loop ("WITH Preallocation") I clear the array, but then preallocate it again, that shouldn't be an issue right?

John D'Errico on 14 Apr 2018

Open in MATLAB Online

Another way to preallocate class objects: use zeros. For example, this will work, if you have overloaded enough of the necessary functions.

X = zeros(1,N,'hpf');

You usually can preallocate class objects in other ways. Repmat would usually suffice. I assume that will require that you have implemented subsref for the class.

I did some quick tests here. I'll use my own HPF toolbox as an example.

N = 10000;
timeit(@() zeros(1,N,'hpf'))
ans =
     0.051575

So the preallocation is fast.

tic
X = zeros(1,N,'hpf');
for i = 1:N
  X(i) = i;
end
toc
Elapsed time is 3.409074 seconds.

I tried the same thing, but without a pre-allocation. It took roughly the same time.

tic
X = hpf(0);
for i = 1:N
  X(i) = i;
end
toc
Elapsed time is 3.540552 seconds.

In fact, most of that time was in the conversion step, because it needs to convert the number i into an HPF form in each step.

timeit(@() hpf(1))
ans =
    0.0002163

So well more than half of the time required was just in the form of just a conversion into HPF. 10000*0.0002163=2.163 seconds.

Instead, suppose I avoid the need to convert to HPF in each step?

clear X
tic
X = zeros(1,N,'hpf');
A = hpf(1);
for i = 1:N
  X(i) = A;
end
toc
Elapsed time is 0.230158 seconds.

Now, lets redo that, but without preallocation. Here I am not creating A as an HPF. So this next test is pure array growth, without preallocation.

N = 10000;
clear X
  tic
  X = hpf(0);
  A = hpf(1);
    for i = 1:N
      X(i) = A;
    end
    toc
  Elapsed time is 0.126227 seconds.

Interestingly, it seems to have been more efficient than the preallocated version.

N = 100000;
clear X
  tic
  X = hpf(0);
  A = hpf(1);
    for i = 1:N
      X(i) = A;
    end
    toc
  Elapsed time is 1.015532 seconds.
N = 1000000;
clear X
  tic
  X = hpf(0);
  A = hpf(1);
    for i = 1:N
      X(i) = A;
    end
    toc
  Elapsed time is 9.926349 seconds.

And when I increase the array size, I don't see quadratic growth, but essentially linear growth.

My guess is here we are seeing the effects of how MATLAB avoids copying a variable if it is not changed in the process.

So each of those elements of X are just references to A.

Tom Wenk on 16 Apr 2018

Edited: Tom Wenk on 16 Apr 2018

"So each of those elements of X are just references to A."

Then there should be a different result when you do it again with changing input right? Could you try that again and show the result? Because other than shown in my example the Signals I add should differ (I edited the post by now).

zeros(1,N,'hpf');

What does this overloaded function do? Without knowing I can't see the similarities or differences to my example.

John D'Errico on 16 Apr 2018

Open in MATLAB Online

This is pretty much equivalent to using repmat.

timeit(@() zeros(1,1000,'hpf'))
ans =
    0.0052384
timeit(@() repmat(hpf(0),1,1000))
ans =
    0.0052362

Both are efficient ways to create the result.

Sign in to comment.

Answer 3

Philip Borghesani on 13 Apr 2018

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/394671-no-performance-improvement-with-preallocating-for-object-arrays#answer_314981

Edited: Philip Borghesani on 13 Apr 2018

Your pre-allocation code is doing the work twice. Try profiling your code. ObjectArray( numberOfObjects ) = classObject will call the default class constructor once for each member of objectArray, then you are looping through the vector and re-initializing each object. The cost of calling the class constructor twice for each object can easily swamp the benefit of pre-allocating.

Growing an object array (or cell array or structure, especially for handle classes) is frequently not that bad. Only the pointers to the members need to be moved/copied with each growth so if the objects are large/complex the time to create new elements may swamp any time to copy the pointers. Of course if the array is huge O(n^2) will bite you every time.

3 Comments
Show 1 older commentHide 1 older comment

Philip Borghesani on 13 Apr 2018

Edited: Philip Borghesani on 13 Apr 2018

Open in MATLAB Online

With the posted example code yes. If the objects needed to be constructed with other then the default constructor then the loop would be needed to construct and initialize the individual objects. I think the specifics of the use case will dictate if pre-allocation is worth the effort.

I will guess that in many situations where object arrays are being created the best code (untested) is something like:

      objectCellArray{ numberOfObjects } = Signal;
      for index = 1 :numberOfObjects-1
          objectCellArray{ index } = Signal;
      end
      ObjectArray=[objectCellArray{:}];
      clear objectCellArray % optional

Tom Wenk on 14 Apr 2018

Edited: Tom Wenk on 16 Apr 2018

Hi Philip,

the second initialisation with the loop is on purpose, cause it is not meant as an initialisation but as an assignment. I want to preallocate the array and then assign elements to it in a for loop.

In the second variant I'm directly assigning the elements to the array in the loop without preallocating.

I hope I got your point right?

Sign in to comment.

Answer 4

Jan on 13 Apr 2018

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/394671-no-performance-improvement-with-preallocating-for-object-arrays#answer_314984

Open in MATLAB Online

When I run the code for

Signal = 7

The timings are (R2016b):

0.0017   % With pre-allocation
0.0134   % Without

The effect of pre-allocation is small in this example, if you use a simple numeric scalar as data. For a class object, there is not much additional overhead, because the much more expensive constructor is called exactly the same number of times in both cases. Only the storing in the array differs, but the concerning costs are comparable to appending/inserting scalar doubles in an array.

I think, the effects of pre-allocation are simply too small in your example to be impressing. Of course, pre-allocation helps to improve the speed of the program, but if the memory management is not the bottleneck of the code, the effects are tiny.

7 Comments
Show 5 older commentsHide 5 older comments

Jan on 16 Apr 2018

Edited: Jan on 16 Apr 2018

0.003sec is tiny and you cannot expect tic/toc to work in this precision exactly. Note that any other activity on the computer influences the speed also and that the processor might need some time to get its full frequency.

Your small experimental code demonstrates, that the pre-allocation is not the bottleneck of the code. But it is recommended to pre-allocate in every case, because it is not a big loss for small arrays, but gets important for huge arrays. Your medium size data do not profit remarkably or measurable from a pre-allocation, but it is a good programming pattern and worth to keep it as a habit.

It is not useful to care about -0.003 or +0.01 seconds when the total time is 1.264 and it is a test code only. Such "premature optimization" does not help to improve the code. I suggest to use a proper pre-allocation in general, write clean and clear code, and when the code is running successfully to use the profiler to find the bottlenecks for tuning the relevant part only.

I so spend time for such interesting experiments also. But consider, that the JIT acceleration is not documented and might perform the pre-allocation implicitly. Then a sophisticated analysis of what might happen where and why is not useful.

Jan on 16 Apr 2018

Edited: Jan on 16 Apr 2018

Open in MATLAB Online

"The decision was clear there was a huge improvement through preallocating." -- I've lost the overview. Which decision? Which code was improved by a pre-allocation? Which one is "the first variant" and which difference to what do you mean?

In "objectArray = [ objectArray, Signal]" a pre-allocation of objectArray would not be meaningful, because you append the new elements afterwards only. Afterwards objectArray will not have numberOfObject elements, but twice as much.

As far as I can see, you have "numberOfObjects", such that you do know the number of objects in advance.

I still do not understand, what the problem is. I repeat my suggestion and leave the discussion, because I do not have other useful information to tell:

Pre-allocate in general, because it saves time for large arrays.
If you pre-allocate an object array, the effects will be tiny, when the overhead for creating the vector of object pointers is negligible compared to the costs of the object constructor. But even then implement a proper pre-allocation simply to care about good programming practice. It is comparable to (or not to) pre-allocate a {1x1000} cell array, when the cell elements are huge arrays, which are expensive to create.
Do not rely on tic/toc having a precision below 0.1 sec.
Maybe the JIT acceleration improves code like "objectArray(index) = Signal;" by a magic injection of an implicit pre-allocation. It is useless to speculate about this, because it is not documented and depends on the Matlab version.
If you cannot know in advance, how many elements an array will have, either hope, that the JIT will care about this magically, or allocate in blocks at least:

r   = [];
len = 0;
cur = 0;
while rand > 0.00001
  cur = cur + 1;
  if cur > len  % If output is exhausted, grow by 1000 elements
    len    = len + 1000;
    r(len) = 0;
  end
  r(cur) = rand;
end
r = r(1:cur);

This is a cheap trick, which reduces the effects of an omitted pre-allocation by a factor of 1000. A smarter idea is not to use a fixed size, but to duplicate the array size or to use Fibonacci numbers.

Tom Wenk on 16 Apr 2018

Open in MATLAB Online

Sorry for the confusion, I got confused myself!!! :S

"The decision was clear there was a huge improvement through preallocating."

I made a stupid mistake here! At the very beginning (not stated in my question-post) I had this line:

objectArray = [ objectArray, Signal ];

Then I looked into preallocating and preallocated the array. Obviously I could not use the same line again because like you said it just appends the new elements. Therefore I used

objectArray( index ) = Signal;

with the preallocated array and compared the times. This was the huge difference which I thought was coming from the preallocating. After that I compared

objectArray( index ) = Signal;

with and without preallocation (my question-post). Here there was no improvement and I questioned preallocation for handle-class-object-arrays, also because I couldn't find any information about this special case.

So my new question is, where is the difference between

objectArray( index ) = Signal;

when index exceeds the array dimension and

objectArray = [ objectArray, Signal ];

?

Because in my imagination the procedure behind the code was the same in both lines. But this seems not to be the case.

"As far as I can see, you have "numberOfObjects", such that you do know the number of objects in advance."

The number I used in the test is totally made up! I asked my supervisor if he could give me any number but he said no, he wants to have the program as flexible, fast and robust as possible..

Sign in to comment.

No performance improvement with preallocating for object arrays

10 Comments
Show 8 older commentsHide 8 older comments

Answers (4)

2 Comments
Show NoneHide None

4 Comments
Show 2 older commentsHide 2 older comments

3 Comments
Show 1 older commentHide 1 older comment

7 Comments
Show 5 older commentsHide 5 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

No performance improvement with preallocating for object arrays

10 Comments Show 8 older commentsHide 8 older comments

Answers (4)

2 Comments Show NoneHide None

4 Comments Show 2 older commentsHide 2 older comments

3 Comments Show 1 older commentHide 1 older comment

7 Comments Show 5 older commentsHide 5 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

10 Comments
Show 8 older commentsHide 8 older comments

2 Comments
Show NoneHide None

4 Comments
Show 2 older commentsHide 2 older comments

3 Comments
Show 1 older commentHide 1 older comment

7 Comments
Show 5 older commentsHide 5 older comments