MATLAB Answers

how to vectorise or speed up the specific code

1 view (last 30 days)
Christos Dimopoulos
Christos Dimopoulos on 18 Feb 2021
Commented: Bjorn Gustavsson on 26 Feb 2021
I tried to make the variables single to decrease the calculation time.
It's a gravitational interaction calculation between particles.
How can I vectorise this code to avoid so many loops?
Can I do it since the particles could be more than 1 million and there should be built matrixes 1million x1million cells?
Most time spent on these lines
d = [xj yj zj] - [xi yi zi]; % 3D distance as a vector
dr2 = d(1)^2+d(2)^2+d(3)^2; %3D distance as absolute value
AccelDouble = (d/sqrt(dr2))*((G*mj*(sign(mi)))/dr2); % acceleration increment
ar = ar + single(AccelDouble); % New acceleration
n = size(heavy,1);
AccelDouble = zeros(3,'double') ;
for i = 1:n
mi = heavy(i,1); % mass of first particle
if mi ~= 0 % Avoid calculate 0 mass objects
xi = heavy(i,2); % x position of first particle
yi = heavy(i,3); % y position of first particle
zi = heavy(i,4); % z position of first particle
ar = [0 0 0]; %3D
for j = 1:n % second count for reacting particles
if i ~= j % to avoid counting the same particles
mj = heavy(j,1); % mass of second particle (interacting)
if mj ~= 0 && mi ~= 0 % Avoid calculate 0 mass objects
xj = heavy(j,2); % x position of second particle
yj = heavy(j,3);
zj = heavy(j,4);
d = [xj yj zj] - [xi yi zi]; % 3D distance as a vector
dr2 = d(1)^2+d(2)^2+d(3)^2; %3D distance as absolute value
AccelDouble = (d/sqrt(dr2))*((G*mj*(sign(mi)))/dr2); % 3D acceleration increments
ar = ar + single(AccelDouble); % New 3D acceleration
for k = 1:3
heavy1(i,4+k) = heavy1(i,4+k) + ar(k)*timeStep; % 3D New Speed calculation
heavy1(i,k+1) = heavy1(i,k+1) + timeStep*heavy1(i,k+4); % 3D New posistion calculation
Walter Roberson
Walter Roberson on 26 Feb 2021
Can I do it since the particles could be more than 1 million and there should be built matrixes 1million x1million cells?
An array that was 1e6 x 1e6 would take 1e12 times the space for an individual element. For single precision that would be 4e12 bytes, which is slightly more than 3.6 terabytes, using 2^40 bytes as the definition of terabytes here.
... for each array.
Does your system have access to multiple terabytes of memory?
but this sounds like something a GPU should be able to do well.
The largest commercially available GPU is the NVIDIA Titan V, with 12 GB of memory. Considering the need for multiple matrices, you probably would not be able to handle much more than 25000 x 25000 single precision on such a system.

Sign in to comment.

Answers (2)

Bjorn Gustavsson
Bjorn Gustavsson on 19 Feb 2021
Newton says you can cut it in two if you take into account that the force from particle i on particle j is equal but directed in opposite direction of force from particle j on particle i. Also if proper pre-allocation is tricky just run the loops from n down to 1.
Finally if this is some kind of explicit Euler scheme I suggest you take a look at Størmer-Verlet schemes instead (or perhaps the 12th-10th-order-runge-kutta-nystrom-integrator that might also be an option) that ought to be more suited for equations of motion.
  1 Comment
Christos Dimopoulos
Christos Dimopoulos on 26 Feb 2021
Thank you for your answer.
You are correct that the same force value (and opposite vector) applies to both the masses but I am not sure how to put it in the code, as it seems will I have to create in each iteration a huge NxN matrix, that has limits to 10^5 particles. If we found a way without any addition in calculations we could reduce the calculation time to half!
As far as the code scheme if I am not wrong the ones that you have suggested are ways to increase precission between two steps without making less and less steps of more and more large period. We are not there yet. Also we will try some more physics changes before we proceed to complicating the code.

Sign in to comment.

Jan on 19 Feb 2021
Edited: Jan on 19 Feb 2021
ar = [0 0 0];
This creates a double array.
ar = ar + single(AccelDouble);
Here AccelDouble is converted to a single at first, which must be converted back to a double, because it is added to a double. This is a waste of time.
Either convert all values to singles from the beginning. Or stay at doubles.
mi ~= 0 is excluded twice. If would be more efficient, to remove all m==0 before the loops.
"sign(mi)"? Do you mean "single(mi)"?
AccelDouble = zeros(3,'double') ;
This produces a [3 x 3] array. Because it is overwritten ineach iteration, this is not a pre-allocation, but a waste of time. It does not matter, that it is overwritten by a [1 x 3] array, but confusing only.
I assume, your code can be accelerated. What are the used sizes of the input?
The integration scheme based of delta(v) = delta(t)*a is very basic. There are integrations methods on a higher order, which allow to increase the step sizes, which reduces the run time.
[EDITED] Vectorizing the inner loop and using SINGLE for all data reduces the runtim from 1.64 sec to 0.06 sec for 2e3 objetcs:
function test
% Mass, position, speed
heavy = rand(2e3, 7) .* [1e9, 1e6, 1e6, 1e6, 1e3, 1e3, 1e3];
heavy = Core1(heavy);
heavyS = single(heavy)
heavy = Core1(heavyS);
function heavy = Core3(heavy)
timeStep = 0.1;
G = 6.67430e-11;
M = heavy(:, 1);
X = heavy(:, 2:4);
V = heavy(:, 5:7);
n = numel(M);
MG = M * G;
for i = 1:n
mi = M(i);
D = X - X(i, :);
r2 = sum(D .* D, 2);
F = (D ./ sqrt(r2)) .* (MG * mi ./ r2);
F(i, :) = 0;
Ft = sum(F, 1);
V(i, :) = V(i, :) + Ft * timeStep;
X(i, :) = X(i, :) + timeStep * V(i, :);
heavy(:, 2:4) = X;
heavy(:, 5:7) = V;
% Elapsed time is 1.880953 seconds. Original code
% Elapsed time is 0.140433 seconds. Vectorized, double
% Elapsed time is 0.066267 seconds. Vectorized, single
I've implemented this with the assumption that your "sign(mi)" was thought as "single(mi)".
But note this: In the current implementation the problem is O(n^2). Doubling the input size does increase the runtime by a factor 4. I've tested the code with 2e3 rows. For 1e6 rows this will be more than 4 hours per time step.
How many timesteps do you want to calculate? I assume, there is no way to perform this efficiently in Matlab with this simple apporach. Even if you convert this to a C-mex method and increase the speed by a factor 2 (a pure guess only!), 2 hours per time step is still far away from allowing a serious number of time steps.
Bjorn has mentioned, that the force matrix is symmetrical, but it is not trivial to exploit this for 1e6 elements without exhausting the the memory. Using a sophisticated intergator is essential to reduce the number of timesteps for a certain accuracy.
Professional simulations use e.g. the leap-frog algorithm. This simplifies the computation by combining a neighborhood of elements to their center of mass: If we simulate the trajectories of the stars in the galaxy, the distribution of other stars in a spatial segment far away is in first approximation the force of its center of mass containing the sum of the masses. This can reduce the computing time by a factor 100 without loosing too much accruracy.
Bjorn Gustavsson
Bjorn Gustavsson on 26 Feb 2021
Yes then you have what essentially boils down to identical equations as in PIC plasma-simulations. Have a look at what people do in that field.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!