mex/mexcuda for a target GPU

2 views (last 30 days)
Matt J
Matt J on 25 Apr 2018
Commented: Matt J on 30 Apr 2018
I have obtained 3rd party code consisting of both a mex gateway file main.cpp and some other .cu source files. The compilation routine provided with this code is of the form,
mex -largeArrayDims main.cpp cuda_routine1.cu cuda_routine2.cu
The problem is, I need to compile so that the resulting mex file can run on a different target machine with a different graphics card from the machine that I will compile with. When compiling directly with nvcc, it appears from this blog that you can specify a target architecture using the -arch flag.
My question is, is it possible to do something similar when working through mex() or mexcuda()? Would a solution be to pass nvcc flags through the call to mex/mexcuda, and if so how might the call to mex above be modified to do so?
  2 Comments
Matt J
Matt J on 26 Apr 2018
Similarly, will it matter that my compile computer is running Windows Server 2012 whereas my target computer is running Windows 7? I've never had trouble porting mex files between different versions of Windows.
Matt J
Matt J on 27 Apr 2018
I just came across this Stackoverflow post, but am still wondering if there is a better alternative.

Sign in to comment.

Accepted Answer

Joss Knight
Joss Knight on 29 Apr 2018
The supported (if not exactly documented) way of doing this is to define the variable NVCCFLAGS in your call to mexcuda.
mexcuda('-v', 'mexGPUExample.cu', 'NVCCFLAGS=-gencode=arch=compute_30,code=sm_30')
However, there is no need for this. The default is to compile your mex function for all architectures. It will run on any GPU.
  3 Comments
Joss Knight
Joss Knight on 29 Apr 2018
Edited: Joss Knight on 29 Apr 2018
All that it affects is the size of the resulting binary. Every kernel will have a version for each architecture.
The arch flags give you essentially three options:
  1. Compile for every architecture (which, at the moment, means Kepler, Maxwell, Pascal and Volta), and make your executable bigger
  2. Compile for only your specific architecture, which means your executable isn't portable
  3. Compile for the lowest common denominator only, which means all later architectures will not be able to benefit from optimisations, possibly making them run slower than they might otherwise.
When you choose option 3 your executable has to be JIT-compiled from PTX byte-code to a later architecture. This step is a one-time cost but takes time. For MATLAB and MEX files compiled with the default options, this affects all architectures that come out after that version of MATLAB was released.
Matt J
Matt J on 30 Apr 2018
OKay, thanks again.

Sign in to comment.

More Answers (0)

Categories

Find more on Get Started with GPU Coder in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!