MATLAB Answers

0

Fast data structure for tabu list?

Asked by Paul Berglund on 15 Sep 2017
Latest activity Answered by Jan
on 16 Sep 2017
I am looking for a fast way to implement a tabu list where each element in the list is a vector of integers.
I am writing some optimization code where evaluating the quality of a given candidate solution is expensive so I want to record that I have evaluated a given candidate before so as to avoid doing it a second time.
I am currently doing something very simple:
ismember(candidateSolution,tabuList,'rows')
where the tabuList is a matrix where each row corresponds to a previously evaluated candidateSolution. This works well enough for small problem instances but starts to bog down when the number of rows in tabuList gets into the tens of thousands.
Presumably a more sophisticated data structure would improve performance, but I'm not sure what to try. Maybe a sparse matrix?

  2 Comments

How many elements per row? How many total candidate solutions are we talking about? What are the ranges of the integer values? Are you willing to consider mex functions?
This will vary, but as an example the one I'm having trouble with right now has 9 elements in a row. The integer values will always be under 2000, and in the current example they are all under 100.
The number of candidate solutions is large, but the number I need to keep track of is in the hundreds of thousands. Big but not huge.
I think I'll only try mex if I've exhausted other reasonable options.

Sign in to comment.

4 Answers

Answer by Walter Roberson
on 16 Sep 2017

Use containers.Map . Test with iskey()

  2 Comments

I didn't know about this. thanks, it could be useful. Is it old? The documentation does not say when it was introduced.
I was going to suggest a table. Could containers be an alternative to a table? And iskey() an alternative to ismember()?
R2008b according to the release notes.

Sign in to comment.


Answer by John D'Errico
on 15 Sep 2017

Sparse matrices are NOT the answer here. In fact, that would be an actively terrible reason to use a sparse matrix.
You might consider the memoize function, which allows MATLAB to recognize that it has seen a set of inputs to a specific function.

  2 Comments

Alas my MATLAB version is old and does not have memoize.
When an old release is an issue, it is always a good idea to tell us. Saves us wasting time answering, and you wasting time to tell us.

Sign in to comment.


Jan
Answer by Jan
on 16 Sep 2017
Edited by Jan
on 16 Sep 2017

You can use a hash, e.g. obtained by FEX: DataHash or FEX: GetMD5. The latter is faster, but the hashing will not be the bottleneck of the code.
tabuList = {};
hash8 = GetMD5(candidateSolution, 'Binary', 'uint8');
hash = char(typecast(hash8, 'uint16'));
if any(strcmp(hash, tabuList))
% Existing already
else
% New data, process...
tabuList{end+1} = hash;
end
This can be improved by pre-allocating the tabuList or perhaps by a binary search. But usually strcmp is fast, because it stops the comparison at the first not matching character.
[EDITED] The comparison is slightly faster with using all 16 bits of the CHAR type. Now this takes 37 seconds on my R2016b/64 system for 40'000 candidates with about 10'000 repeated keys. The main time is spent in any(strcmp), such that the drawback of the omitted pre-allocation is less important that the advantage of having a shorter tabuList. With a dedicated C-Mex function this can be accelerated:
tic;
tabuList = cell(1, 4e4); % Pre-allocate
iList = 0;
for k = 1:4e4
c = randi([1, 4], 1, 8);
% hash = GetMD5(c, 'Binary', 'base64');
hash8 = GetMD5(c, 'Binary', 'uint8');
hash = char(typecast(hash8, 'uint16'));
if anyStrcmp8(hash, tabuList))
% Existing already
else
% New data, process...
% Add hash to the list:
iList = iList + 1;
tabuList{iList} = hash;
end
end
disp(iList)
toc
And the Mex function anyStrcmp8.c:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
uint16_T *S, *CS;
size_t iC, nC, nS;
const mxArray *C, *aC;
// Get inputs:
S = (uint16_T *) mxGetData(prhs[0]);
nS = mxGetNumberOfElements(prhs[0]);
C = prhs[1];
nC = mxGetNumberOfElements(C);
// Loop over cell:
for (iC = 0; iC < nC; iC++) {
aC = mxGetCell(C, iC);
if (aC == NULL) { // Undeclared element reached:
plhs[0] = mxCreateLogicalScalar(false);
return;
}
CS = (uint16_T *) mxGetData(aC);
if (memcmp(S, CS, 8 * sizeof(uint16_T)) == 0) {
// Matching element found:
plhs[0] = mxCreateLogicalScalar(true);
return;
}
}
// No element found
plhs[0] = mxCreateLogicalScalar(false);
return;
}
This needs 13.6 sec for 40'000 keys with 25% repetitions.

  0 Comments

Sign in to comment.


Jan
Answer by Jan
on 16 Sep 2017

It is much faster to store the hash keys in an UINT64 matrix than in a cell string:
tic;
nKey = 4e4;
List = zeros(2, nKey, 'uint64');
iList = 0;
for k = 1:nKey
c = randi([1, 4], 1, 8);
hash = typecast(GetMD5(c, 'Binary', 'uint8'), 'uint64').';
if ~SearchHashKey(hash, List, iList)
% New data, process...
% Append hash to the list:
iList = iList + 1;
List(:, iList) = hash;
end
end
disp(iList);
toc
And the C-Mex function SearchHashKey.c:
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
uint64_T *V, *W;
size_t iW, nW;
if (nrhs != 3) {
mexErrMsgTxt("SearchHashKey.mex: 3 inputs required.");
}
if (!mxIsUint64(prhs[0]) || !mxIsUint64(prhs[1])) {
mexErrMsgTxt("SearchHashKey.mex: Inputs must be UINT64.");
}
V = (uint64_T *) mxGetData(prhs[0]);
W = (uint64_T *) mxGetData(prhs[1]);
nW = (size_t) mxGetScalar(prhs[2]);
if (mxGetNumberOfElements(prhs[0]) != 2 ||
mxGetNumberOfElements(prhs[1]) < 2 * nW) {
mexErrMsgTxt("SearchHashKey.mex: Inputs have bad sizes.");
}
for (iW = 0; iW < nW; iW++) {
if (V[0] == W[0] && V[1] == W[1]) {
plhs[0] = mxCreateLogicalScalar(true);
return;
}
W += 2;
}
plhs[0] = mxCreateLogicalScalar(false);
return;
}
This takes 1.3 seconds for 40'000 elements only.

  0 Comments

Sign in to comment.