Practical k-nearest neighbors implementation with big data set

1 view (last 30 days)
My data looks like this
K = 200; % Could go up to 1000, or more....
X = cell(1,K);
Y = cell(1,K);
num_of_neighbors = 50; % This is a constant
for j = 1:K
% We can assume that the number of columns is never bigger than 100
X{j} = rand(3231961,44); % Yup, it's big, about 3 millions
Y{j} = rand(323196,44); %
end
So for each j = 1:K, I want to find the 50-nearest-neighbors in X{j} for each point in Y{j} (each point is a row). A simple implementation would be:
for j = 1:K
[IDX,D] = knnsearch(X{j},Y{j},'K',num_of_neighbors);
end
but it is very slow. The thing is I'm using Windows 10, 16GB of Ram, and here's my GPU information:
>> gpuDevice
ans =
CUDADevice with properties:
Name: 'Quadro M1000M'
Index: 1
ComputeCapability: '5.0'
SupportsDouble: 1
DriverVersion: 8
ToolkitVersion: 7.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
AvailableMemory: 1.6909e+09
MultiprocessorCount: 4
ClockRateKHz: 1071500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
So I tried using parfor:
parfor j = 1:K
[IDX,D] = knnsearch(X{j},Y{j},'K',num_of_neighbors);
end
But the thing is each worker is supposed to take only X{j},Y{j}, but in fact (because X and Y was considered broadcast variables) they took all of X and Y!!! That's a lot of data :( . Of course I tested this method with smaller data, and tried to make X into a 3-D matrix X_new so that X{j} = X_new(:,:,j) . With this each worker would know which part they should take. But curiously, it did not show any improvements at all, and accumulating all of X{j} into 1 matrix is not very practical when X{j} is already large. So I really don't know how to parallelize the code :( .
I also tried to convert my data to single-precision floating points, but I'm on Windows 10, with only 1 GPU, and when I ran knnsearch with GPU inputs, I received an error (CUDA_ERROR_UNKNOWN or something). When I looked up the internet for clues, I found out that the reason is in this property of the GPU:
KernelExecutionTimeout: 1
So basically the computer forces the GPU to time out after a while so that it can have the resources for graphic display! I just need the GPU for data processing, so I decided to turn off GPU support for graphic display. Some googling and I found out that I had to turn on the Tesla Computing Cluster mode (TCC) for GPU. But the thing is Windows 10 forces the GPU to help with graphic display, and if I want to use it for computations only, then I have to plug in another GPU and then use one of them for computing! Please help me, thank you very much :(
  1 Comment
Joss Knight
Joss Knight on 30 Jan 2017
You can turn off TDR using the TDR registry keys. Give that a go, see if it helps. But really, the problem is that this is a laptop. Even with a superb graphics chip dedicated to compute, you are limited by your laptop's power and cooling capabilities.

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!