gpucoder.atomicMax

Atomically find the maximum between a specified value and a variable in global or shared memory

Since R2021b

Syntax

[A,oldA] = gpucoder.atomicMax(A,B)

Description

[A,oldA] = gpucoder.atomicMax(A,B) compares B to the value of A in global or shared memory and writes the max(A,B) back into A. The operation is atomic in a sense that the entire read-modify-write operation is guaranteed to be performed without interference from other threads. The order of the input and output arguments must match the syntax provided.

example

Examples

collapse all

Find the Maximum Using CUDA atomicMax

Perform a simple atomic addition operation by using the gpucoder.atomicMax function and generate CUDA^® code that calls corresponding CUDA atomicMax() APIs.

In one file, write an entry-point function myAtomicMax that accepts matrix inputs a and b.

function a = myAtomicMax(a,b)

coder.gpu.kernelfun;
for i =1:numel(a)
    [a(i),~] = gpucoder.atomicMax(a(i), b);
end

end

To create a type for a matrix of doubles for use in code generation, use the coder.newtype function.

A = coder.newtype('int32', [1 30], [0 1]);
B = coder.newtype('int32', [1 1], [0 0]);
inputArgs = {A,B};

To generate a CUDA library, use the codegen function.

cfg = coder.gpuConfig('lib');
cfg.GenerateReport = true;

codegen -config cfg -args inputArgs myAtomicMax -d myAtomicMax

The generated CUDA code contains the myAtomicMax_kernel1 kernel with calls to the atomicMax() CUDA APIs.

//
// File: myAtomicMax.cu
//
...

static __global__ __launch_bounds__(1024, 1) void myAtomicMax_kernel1(
    const int32_T b, const int32_T i, int32_T a_data[])
{
  uint64_T loopEnd;
  uint64_T threadId;
...

  for (uint64_T idx{threadId}; idx <= loopEnd; idx += threadStride) {
    int32_T b_i;
    b_i = static_cast<int32_T>(idx);
    atomicMax(&a_data[b_i], b);
  }
}
...

void myAtomicMax(int32_T a_data[], int32_T a_size[2], int32_T b)
{
  dim3 block;
  dim3 grid;
...

    cudaMemcpy(gpu_a_data, a_data, a_size[1] * sizeof(int32_T),
               cudaMemcpyHostToDevice);
    myAtomicMax_kernel1<<<grid, block>>>(b, i, gpu_a_data);
    cudaMemcpy(a_data, gpu_a_data, a_size[1] * sizeof(int32_T),
               cudaMemcpyDeviceToHost);
...

}

Input Arguments

collapse all

`A`, `B` — Operands
scalars | vectors | matrices | multidimensional arrays

Operands, specified as scalars, vectors, matrices, or multidimensional arrays. Inputs A and B must satisfy the following requirements:

Have the same data type.
Have the same size or have sizes that are compatible. For example, A is an M-by-N matrix and B is a scalar or 1-by-N row vector.

Data Types: int32 | uint32 | uint64

Version History

Introduced in R2021b

gpucoder.atomicMax

Syntax

Description

Examples

Find the Maximum Using CUDA atomicMax

Input Arguments

`A`, `B` — Operands
scalars | vectors | matrices | multidimensional arrays

Version History

See Also

Functions

Topics

gpucoder.atomicMax

Syntax

Description

Examples

Find the Maximum Using CUDA atomicMax

Input Arguments

A, B — Operands scalars | vectors | matrices | multidimensional arrays

Version History

See Also

Functions

Topics

`A`, `B` — Operands
scalars | vectors | matrices | multidimensional arrays