Main Content

coder.gpu.constantMemory

Pragma that maps a variable to the constant memory on GPU

Description

example

coder.gpu.constantMemory(v) maps the variable v to the constant memory space on the GPU device. Place this pragma within a parallelizable loop. If GPU Coder™ generates a kernel for the loop, it loads v to a device constant memory variable. It replaces any access to this variable within the kernel by access to the constant memory variable. Within the kernel, the variable v must be read-only. Otherwise, GPU Coder ignores this pragma. Use this pragma when every thread accesses every element of the parameter array or matrix.

This function is a code generation function. It has no effect in MATLAB®.

Examples

collapse all

This example shows how to map an input to the constant memory space on the GPU by using the coder.gpu.constantMemory pragma.

Write an entry-point function myFun that accepts two inputs a of size 256x256 and constant k of size 1x3. The function has a nested for-loops that adds the constants to each element of a. To create a kernel, place the coder.gpu.kernel() pragma outside the nested for-loop. The coder.gpu.constantMemory(k) places the read-only input k into the constant memory of the GPU.

function b = myFun(a,k)
  b = coder.nullcopy(zeros(size(a)));
  coder.gpu.kernel();
    for j = 1:256
      for i = 1:256
        coder.gpu.constantMemory(k);  
        b(i,j) = a(i,j) + k(1) + k(2) + k(3);
      end
    end
end

Create a configuration object for MEX code generation.

cfg = coder.gpuConfig('mex');

Define a cell array input that declares the size and data type of the inputs a,k to the function myFun.

input = {ones(256),ones(1,3)}

Generate a MEX function myFun_mex by using -config, -args, and -report options to specify configuration, provide input arguments, and generate a code generation report.

codegen -config cfg -args input -report myFun

In the report, on the C code tab, click myFun.cu.

The read-only variable k is declared as const_k by using the __constant__ qualifier as shown in the code snippet.

/* Variable Definitions */
__constant__ real_T const_k[3];

cudaMemcpyToSymbol call copies the value of k from the host to the device constant memory const_k.

  cudaMemcpyToSymbol(const_k, k, 24U, 0U, cudaMemcpyHostToDevice);
  cudaMemcpy(gpu_a, a, 524288U, cudaMemcpyHostToDevice);
  myFun_kernel1<<<dim3(128U, 1U, 1U), dim3(512U, 1U, 1U)>>>(gpu_a, gpu_b);
  cudaMemcpy(b, gpu_b, 524288U, cudaMemcpyDeviceToHost);  

The kernel body accesses the constant const_k and adds it to each element of a

static __global__ __launch_bounds__(512, 1) void myFun_kernel1(const real_T *a,
 real_T *b)
{
  int32_T i;
  int32_T j;
  int32_T threadIdX;
  threadIdX = (int32_T)(blockDim.x * blockIdx.x + threadIdx.x);
  i = threadIdX / 256;
  j = threadIdX - i * 256;
  if ((!(j >= 256)) && (!(i >= 256))) {
    b[i + (j << 8)] = ((a[i + (j << 8)] + const_k[0]) + const_k[1]) + const_k[2];
  }
}

Input Arguments

collapse all

The name of the variable that must be mapped to the constant memory space on the GPU device.

Version History

Introduced in R2017b