I am trying to accelerate a specific funtion by assigning each row of a matrix to one GPU core and have that core processing that row and returning a new matrix. Lets say my input matrix is n by m, I want the computation to be distributed on n cores, while each of the n cores returns a matrix of the size k by m. The computation applied to each row is quite complicated, but only functions supported by the GPU are required.
As I understand this, arrayfun can only be used for single element operations, not arrays. The individual elements in one row of the input matrix, however, cannot be computed individually. I think pagefun and bsxfun also won't work, because they do not support self written functions. Is there any way to proceed like this in Matlab without the need to implement the entire code in cuda?