Speed up for loop in this code for calculating mutual information (maybe using GPU computing)

4 views (last 30 days)
So I want to use the code given HERE where we use the Kraskov estimation procedure to estimate the mutual information between two time series (for more information see Ref: Kraskov, Alexander, Harald Stögbauer, and Peter Grassberger. "Estimating mutual information." Physical review E 69.6 (2004): 066138).
Whilst the code seems to work fine for my uses, because of the length of time series and the amount of different time series (I am calculating mutual information between pairs of many different time series) I have, the code runs too slowly. I ran the code profiler in MATLAB and seems to be the case that the following section code in the function provided in the link is causing the slowdown:
% compute distance between each sample and its k-th nearest neighbour
dz = zeros(nObs, nObs);
dx = zeros(nObs, nObs);
dy = zeros(nObs, nObs);
for i = 1:nObs
for j = 1:nObs
dx(i,j) = sqrt(sum((X(i, :) - X(j, :)).^2));
dy(i,j) = sqrt(sum((Y(i, :) - Y(j, :)).^2));
dz(i,j) = max([dx(i, j), dy(i, j)]);
end
end
Is there any way to speed this up? I was thinking maybe a GPU based total/partial solution might be feasible/offer a sufficient speed up. Any alternative suggestions would be very helpful (maybe parallelsing and using a parfor loop instead, although the speed up would be less and would make my future projects more complicated). I am currently using MATLAB 2016b.
  3 Comments
Ansh
Ansh on 14 Jan 2018
Edited: Ansh on 14 Jan 2018
Hi Jan, nObs=size(X,1), where X is a 4364 by 1 vector.
Not sure if this matters, but potentially Y could be 4364 by 2 to (maybe) 4364 by 25/30 (the higher dimension of the columns is given as ideally I want the algorithm to work fast enough for the high dimension of Y, but 2 columns in Y is definitely a lower bound). Does this clarify things with regards to your answer given below?

Sign in to comment.

Accepted Answer

Jan
Jan on 14 Jan 2018
Edited: Jan on 15 Jan 2018
For a 1000x1000 matrix, this is 6 times faster already:
% Version 1:
n = size(X, 1);
X = X.';
Y = Y.';
dx = zeros(n, n);
dy = zeros(n, n);
for j = 1:n
Xj = X(:, j);
Yj = Y(:, j);
for i = j+1:n
dx(i,j) = sqrt(sum(bsxfun(@minus, X(:, i), Xj) .^ 2));
dy(i,j) = sqrt(sum(bsxfun(@minus, Y(:, i), Yj) .^ 2));
dx(j,i) = dx(i,j);
dy(j,i) = dy(i,j);
end
end
dz = max(dx, dy);
The original function took 29.5 sec (R2016b, Core2Duo, Win7/64), and the cleaned version 5.2 sec.
Here the data are processed columnwise, which is much faster because neighboring elements are accessed much faster in the memory. Then the comparison my max() is done outside the loop. And finally the resulting matrix is symmetric and you can omit the computation of X(:,i) and X(:,j) if you have the results for X(:,j) and X(:,i) already.
I tried to vectorized the inner loop:
% Version 2:
n = size(X, 1);
X = X.';
Y = Y.';
dx = zeros(n, n);
dy = zeros(n, n);
for j = 1:n
dx(j+1:n,j) = sqrt(sum(bsxfun(@minus, X(:, j+1:n), X(:, j)) .^ 2, 1));
dy(j+1:n,j) = sqrt(sum(bsxfun(@minus, Y(:, j+1:n), Y(:, j)) .^ 2, 1));
dx(j,j+1:n) = dx(j+1:n,j);
dy(j,j+1:n) = dy(j+1:n,j);
end
dz = max(dx, dy);
But this takes 21 sec for 1000x1000 arrays. But for smaller 100x100 inputs it is faster: 1.2 sec instead of 2.2 sec (100 iterations).
Now you have an efficient function to start a parallelization or computation on the GPU. Maybe this is useful (I cannot test it):
% Version 3:
parfor v = 1:2
if v == 1
for j = 1:n
dx(j+1:n, j) = sqrt(sum((X(:, j+1:n) - X(:, j)) .^ 2, 1));
dx(j, j+1:n) = dx(j+1:n, j);
end
else
for j = 1:n
dy(j+1:n, j) = sqrt(sum((Y(:, j+1:n) - Y(:, j)) .^ 2, 1));
dy(j, j+1:n) = dy(j+1:n, j);
end
end
end
But parfor for the inner loop will use more cores.
  7 Comments
Jan
Jan on 15 Jan 2018
@Ansh: I have edited the posted codes and replaced the auto-expansion by an explicit bsxfun. This let the code run 10% faster again.
Is the original problem solved? Then it would be kind to accept the answer. I've spent more than an hour to improve the code and it has been an interesting problem and I'd be glad to here a "thanks".
The code from your comment is a new problem. Please open a new thread and provide some input data. Explain there: Is Eps pre-allocated? What is the typical value of k? Sorting the complete array only to find the k.th smallest element is a waste of time.
Ansh
Ansh on 16 Jan 2018
I will accept the answer and start a new question. Definitely incredibly thankful for all your help so far. Apologies if I didn't seem grateful before.

Sign in to comment.

More Answers (0)

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!