Does copying from gpu to host generally take longer than copying from host to gpu?

I am using the parallel computing toolbox, and it seems like copying data to a gpu using gpuArray() generally takes much longer than copying data back to the host using gather(). For example, if I try:
A = rand(500,500, 50);
Then the gather() takes about 0.055 seconds while the gpuArray() takes only 0.018 seconds. Is this behavior expected? Am I using the wrong method to time this?

Accepted Answer

Jill Reese
Jill Reese on 16 May 2013
I think you have arrived at the same conclusion as this blog post on GPU performance. It provides a lot of detail on how to properly benchmark GPU operations in MATLAB.
Adam on 16 May 2013
Thanks for the tip! That blog is very helpful. Does anyone have any idea why the gpu to host copy should be so much slower?

