Code run using Parfor function and error correction

2 views (last 30 days)
I am using the below code. However, it takes a lot of time to run this especially when it has to search the data. I get the following error when i run this code-
Error using signatureplottry (line 30)
Error: The variable startDate is perhaps intended as a reduction variable, but is actually an
uninitialized temporary.
See Parallel for Loops in MATLAB, "Temporary Variables Intended as Reduction Variables".
Adding just first 50 elements of the dataset. The original file has about 17 million rows. I also tried analyzing using the Profiler and observed that the maximum time is consumed in the search functions -
test12 = ALLDataVector.data(ALLDataVector.test >= temp1(2,:) & ALLDataVector.test < temp2(2,:)); %Consumes 206 seconds
tempreturn = ALLDataVector.data(ALLDataVector.test <= temp1(2,:)); %Consumes 104 seconds
% matlabpool open 2
startingTime = tic
%Code to convert time into strings and create a matrix
t1 = datetime(2014,12,1); %Iniliaze start time for an array calender time
t2 = datetime(2014,12,2);
t = t1:t2;
startDate = datetime("01-Dec-2014 00:00:00");
%CREATE LOOP FOR DIFFERENT INTERVALS
interval = 3;
n=1;
%2:2:18; %Number of minutes for the sampling frequency
%Total number of days between 1Dec 2014 to 7th Jan 2019 is equal to 1498 days
slotu = (n*(24*(60/interval)))-1;
returnsForDay = zeros(size(final_data)-1);
previousReturn = 0;
rv =[]; %Matrix to store values of realised variance per day
counter = 0; %Initilised the value of counter as 0
while startDate <= datetime("1-Dec-2014 23:59:59") %Define a condition where the last date is defined
parfor i = 0:slotu %The loop is defined from 0 until the slot for one day only
startDate = startDate + minutes(00:interval:interval); %The value of startDate is incremented everytime by the interval time, (Say 5 min or 7 min as per the sampling frequency)
endDate = startDate + minutes(00:01:01); %The value of endDate is incremented everytime by just one min after being incremented by the sampling frequency for the exclusion of that time
temp1 = datestr(startDate,'dd-mmm-yyyy HH:MM:SS'); %Converts start datetime array into input array
temp2 = datestr(endDate,'dd-mmm-yyyy HH:MM:SS'); %Converts end datetime array into input array
test12 = ALLDataVector.data(ALLDataVector.test >= temp1(2,:) & ALLDataVector.test < temp2(2,:));
%ALLDataVector.data - Lookup for a price in the ALLDataVector table
if(isempty(test12)) %Checks if test12 is empty or not
tempreturn = ALLDataVector.data(ALLDataVector.test <= temp1(2,:));
if(isempty(tempreturn)) %Checks if tempreturn1 is empty or not
previousValue = 300;
else
previousValue = tempreturn(end);
end
currentValue = previousValue;
else
currentValue = test12(1,:);
end
newcol = length(returnsForDay);
returnsForDay(newcol + 1) = currentValue - previousReturn;
previousReturn = currentValue;
end %for loop is closed
[RV]=realzvariation(returnsForDay); %Function call
rv(counter + 1) = RV; %all new elements move from RV and gets stored in rv
counter = counter + 1; %counter is increased by 1 (like i = i+1)
end %While loop is closed
fprintf('Done with %d iterations in %f seconds\n', slotu, toc(startingTime));
% matlabpool close

Accepted Answer

Walter Roberson
Walter Roberson on 23 Dec 2019
parfor i = 0:slotu %The loop is defined from 0 until the slot for one day only
startDate = startDate + minutes(00:interval:interval); %The value of startDate is incremented everytime by the interval time, (Say 5 min or 7 min as per the sampling frequency)
endDate = startDate + minutes(00:01:01); %The value of endDate is incremented everytime by just one min after being incremented by the sampling frequency for the exclusion of that time
Think about that more. Before the parfor begins, startDate has a particular value; call it sD0 for the moment. Then you ask to start a number of different workers. Each of the workers will have its startDate initialized to the current startDate, sD0. The work within the parfor loop is independent of the loop control variable, i, so each of the workers would then proceed to do the exact same work, including trying to extend returnsForDay .
For positive values of interval, 00:interval:interval is always going to be the vector [0 interval], so minutes(00:interval:interval) would be the vector [minutes(0), minutes(interval)] and you add that to startDate inside the parfor body, so if you managed to get through the loop body, after one interation startDate would have changed from a scalar to a vector. datestr() applied to the vector of datetimes gives a character array with two rows, and you consistently access the second row, so it is not at all obvious why you are bothering with having changed to a vector of two values.
You access the second row of the character vector returned from datestr(), and you compare against it with the test
ALLDataVector.test >= temp1(2,:)
and what are you expecting the result of that to be ?? temp1(2,:) is a character vector, not a datetime object, so you would e comparing characters. But you did not use one of the ISO formats for dates in which later date/times always compare greater than earlier date/times: you use three-letter month numbers, so you are testing whether particular dates are earlier alphabetically than others. This is not a robust sensible test! You would have to hand repair it if you just happen to use a different month, or if the system can cross a month boundary. Why would you ever do that??
But if you got through it all, then your local startDate would be a vector.
Now your different pool members finish executing a close to but not exactly the same time. You have changed startDate from the original scalar into a vector. When the next parfor iteration is to be dispatched, which value of startDate should it pass into the worker? The original scalar one that I earlier referred to as sD0 ? The version modified by all of the workers in exactly the same way to become the vector?
You have other problems as well...
You need to rewrite your control logic.
Subtract your initial start date from your end date in order, and convert that to minutes(). Divide by the interval and floor() in order to get the total number of iterations you need. Get rid of the while loop, and parfor that number of iterations.
Inside the worker, set the local start date as the global start date plus (the parfor counter minus 1) times the interval. Compute with that. Store into the column of returnsForDay indexed by the parfor index variable.
  2 Comments
Harsh Rob
Harsh Rob on 23 Dec 2019
I tried implementing the changes as recommended.
"Subtract your initial start date from your end date in order, and convert that to minutes(). Divide by the interval and floor() in order to get the total number of iterations you need. Get rid of the while loop, and parfor that number of iterations.
Inside the worker, set the local start date as the global start date plus (the parfor counter minus 1) times the interval. Compute with that. Store into the column of returnsForDay indexed by the parfor index variable."
However, I am unable to understand the exact logic here after the segregation of work to the workers. Could you please help me by providing an example source code for my mentioned code.
Thanks
Walter Roberson
Walter Roberson on 24 Dec 2019
firstDate = datetime("01-Dec-2014 00:00:00");
lastDate = datetime("1-Dec-2014 23:59:59");
interval = minutes(3);
numperiod = floor( (endDate - startDate) / interval );
returnsForDay = zeros(1, numperiod);
for i = 1 : numperiod
startDate = firstDate + (i-1) * interval;
endDate = startDate + minutes(1);
firstloc = find(ALLDataVector.test >= startDate, 1, 'first');
if isempty(firstloc) || firstloc == 1 || ALLDataVector.test(firstloc) >= endDate
this_return = 0;
else
this_return = ALLDataVector.data(firstloc) - ALLDataVector.data(firstloc-1);
end
returnsForDay(i) = this_return;
end

Sign in to comment.

More Answers (0)

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!