CUDA number of tasks exceed number of threads times blocks
2 views (last 30 days)
Show older comments
I have a problem if my number of tasks exceed the number of total available threads. Lets images I want to add tow vectors of length 100 000.
Matlab Code:
N=100*1000
a=double(-[1:N]);
b=double(2*[1:N]);
a_gpu=gpuArray(a);%Create array on GPU
b_gpu=gpuArray(b);%Create array on GPU
c_gpu=gpuArray(zeros(1,N));%Create array on GPU
k = parallel.gpu.CUDAKernel('add.ptx', 'add.cu');
k.ThreadBlockSize = 100;
k.GridSize=[100,1];
o = feval(k, a_gpu,b_gpu,c_gpu);
I know that I could increase the Threadblocksize and GridSize, but this is not I want to now. Imagine my vector would be much longer..
My Cuda code looks like this
__global__ void add( double *a, double *b, double *c) {
int tid = threadIdx.x + blockIdx.x * blockDim.x;
a[tid] = a[tid] + b[tid];
tid += blockDim.x * gridDim.x;
}
In the last line I try to force the program to really go to the end of my make, by using the same threads a second, third... time. That's what I read in the book "Cuda by Example".
But for some reason using Matlab it is not working. If I use this only using C and CUDA it works.
What is wrong with my code? What is the usual way to avoid if the number of tasks are larger than the MaxThreadSize time size Gridsize? I could use the other dimension too, but still how to avoid this problem?
Thanks a lot
Robert
0 Comments
Answers (0)
See Also
Categories
Find more on GPU Computing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!