Correctly timing kernel functions created with GPU Coder on Jetson

Question

Aaron Meldrum on 12 Feb 2020

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/505113-correctly-timing-kernel-functions-created-with-gpu-coder-on-jetson

Answered: Aaron Meldrum on 14 Feb 2020

Hi,

I'm getting started testing out a Jetson nano, and have been able to deploy code, run it, save variables to a file, and gather those back on the host computer (a Windows pc), but I'm fairly certain i'm not corretly timing the execution time. The basic structure of the main function is below, i've omitted the code of the kernels as it didn't seem necessary. I'm fairly sure this is not producing correct timings of different kernels, possibly due to the C code not waiting for the kernel call to finish before executing the 'toc' line?

Three quesitons: 1) If i were writing directly in CUDA C, i could put cudaDeviceSynchronize(); in. Would this solve this issue? If so, is there a matlab command I can use to get GPU Coder to place that line of code where I tell it? 2) Is there a method in GPU Coder to have it generate the code first, which I can then go in and edit, and have it compile my now edited code? I've been following the example herehttps://www.mathworks.com/help/supportpkg/nvidia/examples/getting-started-with-the-gpu-coder-support-package-for-nvidia-gpus.html , and I don't see a manner that lets me edit the code that GPU Coder creates before it gets compiled on the Jetson. I'm sure the option is there, but I don't know how to do that. 3) Is there a better method for timing kernels that the community reccomends? Although I've done a bit of CUDA coding, I'm very far from an expert, and am aware that I might be going at this totally wrong.

times = zeros(5,4);
outputArray1=zeros(200,200);
outputArray2=zeros(200,200);
outputArray3=zeros(200,200);
outputArray4=zeros(200,200);
for i = 1:5;
    
    tic
    outputArray1=SimpleFunction1;
    times(i,1)=toc;
    tic
    outputArray2=SimpleFunction2;
    times(i,2)=toc;
    tic
    outputArray3=SimpleFunction3;
    times(i,3)=toc;
    tic
    outputArray4=SimpleFunction4;
    times(i,4)=toc;
end
fId=fopen('times.bin','w');
fwrite(fId,times,'single');
...more file output for the other arrays