gpucoder.atomicAdd

Atomically add a specified value to a variable in global or shared memory

Since R2021b

Syntax

[A,oldA] = gpucoder.atomicAdd(A,B)

Description

[A,oldA] = gpucoder.atomicAdd(A,B) adds B to the value of A in global or shared memory and writes the result back into A. The operation is atomic in a sense that the entire read-modify-write operation is guaranteed to be performed without interference from other threads. The order of the input and output arguments must match the syntax provided.

example

Examples

collapse all

Simple CUDA Atomic Addition

Perform a simple atomic addition operation by using the gpucoder.atomicAdd function and generate CUDA^® code that calls corresponding CUDA atomicAdd() APIs.

In one file, write an entry-point function myAtomicAdd that accepts matrix inputs a and b.

function [a,oldA] = myAtomicAdd(a,b)

oldA = coder.nullcopy(a);

for i = 1:size(a,1)
    for j = 1:size(a,2)
        for k = 1:size(a,3)
            [a(i,j,k),oldA(i,j,k)] = gpucoder.atomicAdd(a(i,j,k),b(i,j,k));
        end
    end
end

To create a type for a matrix of doubles for use in code generation, use the coder.newtype function.

A = coder.newtype('single', [30 30 20], [1 0 1]);
B = coder.newtype('single', [30 30 20], [1 0 1]);
inputArgs = {A,B};

To generate a CUDA library, use the codegen function.

cfg = coder.gpuConfig('lib');
cfg.GenerateReport = true;

codegen -config cfg -args inputArgs myAtomicAdd -d myAtomicAdd

The generated CUDA code contains the myAtomicAdd_kernel1 kernel with calls to the atomicAdd() CUDA APIs.

//
// File: myAtomicAdd.cu
//
...
static __global__ __launch_bounds__(1024, 1) void myAtomicAdd_kernel1(
    const float b_data[], const int b_size[3], int a_size[3],
    const int oldA_size[3], const int b_a_size, const int i, float oldA_data[],
    float a_data[])
{
  unsigned long loopEnd;
  unsigned long threadId;
...

  for (unsigned long idx{threadId}; idx <= loopEnd; idx += threadStride) {
    unsigned long tmpIndex;
    int b_i;
    int j;
    int k;
    k = static_cast<int>(idx % (static_cast<unsigned long>(b_a_size) + 1UL));
    tmpIndex = (idx - static_cast<unsigned long>(k)) /
               (static_cast<unsigned long>(b_a_size) + 1UL);
    j = static_cast<int>(tmpIndex % 30UL);
    tmpIndex = (tmpIndex - static_cast<unsigned long>(j)) / 30UL;
    b_i = static_cast<int>(tmpIndex);
    oldA_data[(b_i + oldA_size[0] * j) + oldA_size[0] * 30 * k] =
        atomicAdd(&a_data[(b_i + a_size[0] * j) + a_size[0] * 30 * k],
                  b_data[(b_i + b_size[0] * j) + b_size[0] * 30 * k]);
  }
}
...

void myAtomicAdd(float a_data[], int a_size[3], const float b_data[],
                 const int b_size[3], float oldA_data[], int oldA_size[3])
{
  dim3 block;
  dim3 grid;
...

    cudaMemcpy(gpu_a_data, a_data, a_size[0] * (30 * a_size[2]) * sizeof(float),
               cudaMemcpyHostToDevice);
    myAtomicAdd_kernel1<<<grid, block>>>(gpu_b_data, *gpu_b_size, *gpu_a_size,
                                         *gpu_oldA_size, b_a_size, i,
                                         gpu_oldA_data, gpu_a_data);
    oldA_data_dirtyOnGpu = true;
    cudaMemcpy(a_data, gpu_a_data, a_size[0] * (30 * a_size[2]) * sizeof(float),
               cudaMemcpyDeviceToHost);
  }
...

}

Input Arguments

collapse all

`A`, `B` — Operands
scalars | vectors | matrices | multidimensional arrays

Operands, specified as scalars, vectors, matrices, or multidimensional arrays. Inputs A and B must satisfy the following requirements:

Have the same data type.
Have the same size or have sizes that are compatible. For example, A is an M-by-N matrix and B is a scalar or 1-by-N row vector.
Requires CUDA device with a minimum compute capability of 6.0 when the data type is double.

Data Types: double | single | int32 | uint32 | uint64

Limitations

Function handle input to the gpucoder.stencilKernel pragma cannot contain calls to atomic functions. For example,
```
out1 = gpucoder.stencilKernel(@myAtomicAdd,A,[3 3],'same',B);
```

Version History

Introduced in R2021b

gpucoder.atomicAdd

Syntax

Description

Examples

Simple CUDA Atomic Addition

Input Arguments

`A`, `B` — Operands
scalars | vectors | matrices | multidimensional arrays

Limitations

Version History

See Also

Functions

Topics

gpucoder.atomicAdd

Syntax

Description

Examples

Simple CUDA Atomic Addition

Input Arguments

A, B — Operands scalars | vectors | matrices | multidimensional arrays

Limitations

Version History

See Also

Functions

Topics

`A`, `B` — Operands
scalars | vectors | matrices | multidimensional arrays