CUDA number of tasks exceed number of threads times blocks

2 views (last 30 days)
I have a problem if my number of tasks exceed the number of total available threads. Lets images I want to add tow vectors of length 100 000.
Matlab Code:
N=100*1000
a=double(-[1:N]);
b=double(2*[1:N]);
a_gpu=gpuArray(a);%Create array on GPU
b_gpu=gpuArray(b);%Create array on GPU
c_gpu=gpuArray(zeros(1,N));%Create array on GPU
k = parallel.gpu.CUDAKernel('add.ptx', 'add.cu');
k.ThreadBlockSize = 100;
k.GridSize=[100,1];
o = feval(k, a_gpu,b_gpu,c_gpu);
I know that I could increase the Threadblocksize and GridSize, but this is not I want to now. Imagine my vector would be much longer..
My Cuda code looks like this
__global__ void add( double *a, double *b, double *c) {
int tid = threadIdx.x + blockIdx.x * blockDim.x;
a[tid] = a[tid] + b[tid];
tid += blockDim.x * gridDim.x;
}
In the last line I try to force the program to really go to the end of my make, by using the same threads a second, third... time. That's what I read in the book "Cuda by Example".
But for some reason using Matlab it is not working. If I use this only using C and CUDA it works.
What is wrong with my code? What is the usual way to avoid if the number of tasks are larger than the MaxThreadSize time size Gridsize? I could use the other dimension too, but still how to avoid this problem?
Thanks a lot
Robert

Answers (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!