what is the difference between assigning with and without range?

4 views (last 30 days)
I have variables a and b both holding columns with same length N.
Is there a difference between assigning
a=b
and
a(1:N)=b
?
Maybe there is a difference in performance??
a is preassigned with zeros(N, 1, 'double')
  2 Comments
Ernst Reißner
Ernst Reißner on 23 Jul 2021
It seems to me that if the number of elements is known, and so numel is not needed, it is even faster with indexing.
I had the idea that it is faster with indexing because no additional memory is required
and garbage collection is circumvented.
But seemingly no.
Chunru
Chunru on 23 Jul 2021
I think matlab array object has the numel property (or something similar). So the overhead of getting numel is really minimal and one should not worry about it.

Sign in to comment.

Answers (2)

John D'Errico
John D'Errico on 23 Jul 2021
Yes. There is a difference, and a fundamental one. In the first case, the assignment a=b COMPLETELY replaces a. The variable is overwritten, if it already exists. If not, a new variable is created with that name. What a was before is completely irrelevant. Even the class of a is replaced. For example:
a = uint8(1:3);
b = rand(1,4);
whos a b
Name Size Bytes Class Attributes a 1x3 3 uint8 b 1x4 32 double
As you can see, the two variables are not even the same classes. But now when we use a = b, a has been replaced. It has a new size. And a is now double precision.
a = b
a = 1×4
0.0366 0.9518 0.5077 0.6811
whos a b
Name Size Bytes Class Attributes a 1x4 32 double b 1x4 32 double
Now, lets try the second example, where we use indexing.
a = uint8(1:3)
a = 1×3
1 2 3
b = rand(1,4)
b = 1×4
0.3677 0.6780 0.7891 0.8548
Now use indexing:
a(1:4) = b
a = 1×4
0 1 1 1
whos a b
Name Size Bytes Class Attributes a 1x4 4 uint8 b 1x4 32 double
Here, elements of a are now selectively replaced with elements of b. But now there is a class conversion that happens first. Here the elements of a are still uint8, so a round was performed to convert elements of b into elements of a.
Is one operation faster than the other? The index operation must certainly be slower. But this is not a slow thing. So I'll put the operations into a function, then use timeit.
a = ones(1,3e7,'uint8');
b = rand(1,3e7);
timeit(@() speedtest1(a,b))
ans = 1.1380e-05
timeit(@() speedtest2(a,b))
ans = 0.2106
So, where there was a class conversion, speedtest1 is way faster. In the next test, there will be no class conversion.
a = randn(1,3e7);
b = rand(1,3e7);
timeit(@() speedtest1(a,b))
ans = 4.5789e-06
timeit(@() speedtest2(a,b))
ans = 0.2247
So here the replacement was way faster. In both cases, when you do an insert of selected elements, MATLAB spends a lot of time, first, generating the index vector. Then it needs to overwrite those selected elements, making sure any class conversion is done if needed.
function a = speedtest1(a,b)
a = b;
end
function a = speedtest2(a,b)
a(1:numel(b)) = b;
end
So, is there a difference? Yes. There must be one.
  10 Comments
Walter Roberson
Walter Roberson on 25 Jul 2021
It is known that
x = 1 : 100;
for k = x
causes 1 : 100 to be executed and the result placed into an array, and then the for loop to iterate over elements of the stored array.
It is known that
for k = 1 : 100
does not cause 1 : 100 to be executed immediately, with instead the initial and final value and increments being stored in hidden locations, and the increment being added as needed. This can be demonstrated by performance timings, and in particular it can be demonstrated by using for with so many iterations requested that the memory required to store the loop values would exceed available memory.
So now.. .what about indexing? If you have
A(1:100) = 5
then does it do the equivalent of
temp13103 = 1:100;
A(x) = 5;
clear temp13103
or does it process the range internally without generating the vector? You could possibly tease that out with timing tests.
The model, with subsref() and subsasgn(), is that the vector is actually generated. User-provided subsref() and subsasgn() does not need to process general A:B:C colon operator, and instead receives an already-instantiatiated vector or else the literal ':' (which subref() and subsasgn() supposedly only get passed when the entire dimension is specified as colon by itself.)
... But the model for how user object classes work, is not necessarily the same as how MATLAB handles the Execution Engine.
A small number of releases ago, MATLAB started keeping hidden copies to "small enough" vectors, so if you wrote
A = 1:50;
B = 1:50;
then A and B might end up with the same data pointer and the normal reference count might not act the same as before. James (I think it was) showed that if you were to do an in-place operation on A then that could affect B even though they are supposedly not linked.
Exactly how the code was written affected whether sharing could take place. Spacing and comments were important.
I seem to recall 500 bytes being mentioned as the upper bound on when this internal sharing happened.
That leads me to wonder whether now some of that private sharing is going on for vectors used for index operations.
A(1:50) = 5;
B(1:50) = 7;
Does this involve 1:50 being generated as an actual vector at run-time, twice (once for each of the lines) ? Or does the parser now generate 1:50 internally and "private share" it with the 1:50 of the second line -- a point that could be important for timing purposes ? If so does the same thing happen for larger index vectors?
To be sure, if I had coded
A(1:100000) = X;
B(1:100000) = Y;
I would tend to think it was a Good Thing if MATLAB did not generate an actual index vector twice, either because MATLAB internally recodes it in terms of start and stop position or because it shares the vector. But it becomes important that we know how this does (or does not) work when we try to do timing tests of indexing: we might be timing the wrong thing.
Chunru
Chunru on 25 Jul 2021
If I were the one to implement the indexing such as a(2:2:1000) internally, I would prefere to not generationg the index 2:2:1000. Instead, I would using something similar to python iterator to generate those index when necessary (without taking up large size of memory). My bet is that MATLAB would not generate index vector explicitly here.
a = randn(1,1e6);
timeit(@() indexing1(a))
ans = 0.0012
timeit(@() indexing2(a))
ans = 0.0031
It shows that a(1:5e5) is faster, indicating indexing is likely imlicitly generated.
For the code like
A(1:100000) = X;
B(1:100000) = Y;
I guess there is no good reason we need to share 1:10000 if they have never been generated explicitly.
For subsref() and subsasgn(), they are designed for quite diffent purpose and we don't know matlab can manage those index generation in an implicit way.
function indexing1(a)
a(1:5e5) = 1;
end
function indexing2(a)
ind = 1:5e5;
a(ind) = 1;
end

Sign in to comment.


Chunru
Chunru on 23 Jul 2021
There should not be major performance difference between "a=b" and "a(1:N)=b". The assignment with range allow partial assignment of an array, for example:
a = zeros(100, 1);
b = randn(20, 1);
a(1:20) = b;
  2 Comments
John D'Errico
John D'Errico on 23 Jul 2021
Edited: John D'Errico on 23 Jul 2021
Really? Not much aof a major difference? See my example cases, where there is a factor of 10000 to 1 difference. ANd even if you force the copy to be resolved, there is STILL a significant difference.
Chunru
Chunru on 23 Jul 2021
@John D'Errico See my example below. I have explained the performance difference. The timeit for just create a pointer may not truly reflecting the difference. The factor of 10000 to 1 is not a fair comparison in the term I have explained below. I agree that type conversion will take more time (I have not thourt that from the question and assume same type here).

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!