Practical k-nearest neighbors implementation with big data set

Question

Dang Manh Truong on 30 Jan 2017

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/322506-practical-k-nearest-neighbors-implementation-with-big-data-set

Commented: Joss Knight on 30 Jan 2017

My data looks like this

K = 200; % Could go up to 1000, or more....
X = cell(1,K);
Y = cell(1,K);
num_of_neighbors = 50; % This is a constant
for j = 1:K
    % We can assume that the number of columns is never bigger than 100
    X{j} = rand(3231961,44); % Yup, it's big, about 3 millions
    Y{j} = rand(323196,44); % 
end

So for each j = 1:K, I want to find the 50-nearest-neighbors in X{j} for each point in Y{j} (each point is a row). A simple implementation would be:

for j = 1:K
    [IDX,D] = knnsearch(X{j},Y{j},'K',num_of_neighbors);
end

but it is very slow. The thing is I'm using Windows 10, 16GB of Ram, and here's my GPU information:

>> gpuDevice
ans = 
    CUDADevice with properties:
                        Name: 'Quadro M1000M'
                       Index: 1
           ComputeCapability: '5.0'
              SupportsDouble: 1
               DriverVersion: 8
              ToolkitVersion: 7.5000
          MaxThreadsPerBlock: 1024
            MaxShmemPerBlock: 49152
          MaxThreadBlockSize: [1024 1024 64]
                 MaxGridSize: [2.1475e+09 65535 65535]
                   SIMDWidth: 32
                 TotalMemory: 2.1475e+09
             AvailableMemory: 1.6909e+09
         MultiprocessorCount: 4
                ClockRateKHz: 1071500
                 ComputeMode: 'Default'
        GPUOverlapsTransfers: 1
      KernelExecutionTimeout: 1
            CanMapHostMemory: 1
             DeviceSupported: 1
              DeviceSelected: 1

So I tried using parfor:

parfor j = 1:K
    [IDX,D] = knnsearch(X{j},Y{j},'K',num_of_neighbors);
end

But the thing is each worker is supposed to take only X{j},Y{j}, but in fact (because X and Y was considered broadcast variables) they took all of X and Y!!! That's a lot of data :( . Of course I tested this method with smaller data, and tried to make X into a 3-D matrix X_new so that X{j} = X_new(:,:,j) . With this each worker would know which part they should take. But curiously, it did not show any improvements at all, and accumulating all of X{j} into 1 matrix is not very practical when X{j} is already large. So I really don't know how to parallelize the code :( .

I also tried to convert my data to single-precision floating points, but I'm on Windows 10, with only 1 GPU, and when I ran knnsearch with GPU inputs, I received an error (CUDA_ERROR_UNKNOWN or something). When I looked up the internet for clues, I found out that the reason is in this property of the GPU:

KernelExecutionTimeout: 1

So basically the computer forces the GPU to time out after a while so that it can have the resources for graphic display! I just need the GPU for data processing, so I decided to turn off GPU support for graphic display. Some googling and I found out that I had to turn on the Tesla Computing Cluster mode (TCC) for GPU. But the thing is Windows 10 forces the GPU to help with graphic display, and if I want to use it for computations only, then I have to plug in another GPU and then use one of them for computing! Please help me, thank you very much :(

1 Comment
Show -1 older commentsHide -1 older comments

Joss Knight on 30 Jan 2017

You can turn off TDR using the TDR registry keys. Give that a go, see if it helps. But really, the problem is that this is a laptop. Even with a superb graphics chip dedicated to compute, you are limited by your laptop's power and cooling capabilities.

Sign in to comment.

Sign in to answer this question.

Practical k-nearest neighbors implementation with big data set

1 Comment
Show -1 older commentsHide -1 older comments

Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

Practical k-nearest neighbors implementation with big data set

1 Comment Show -1 older commentsHide -1 older comments

Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments