Main Content

Parallel Processing Unit for Optimized Code Generation

The parallel processing unit (PPU) is a specialized processing unit designed to speed up complex computations in an application model. Use PPU core to implement models with large data processing requirements or fast execution time requirements. This core is based on single instruction, multiple data (SIMD) vector DSP architecture and uses a specialized memory called vector closely coupled memory (VCCM) to speed up computations. Application models using the PPU core of Infineon® AURIX™ microcontrollers store the participating data in VCCM.

You can use Digital Port Read, Digital Port Write, Encoder, PWM, TMADC, DSADC, FCC, Interprocess Data Read, Interprocess Data Write, and Interprocess Data Channel blocks to design multicore models using the PPU core of AURIX TC4x microcontrollers.

The PPU core speeds up the computations by running the optimized functions generated using the code replacements (CRL) technique and reduction operations. For more information on the CRL technique, see What Is Code Replacement?. The PPU referenced model uses a function-call subsystem that uses CRL functionality to replace parts of generated code with hardware-specific code.

Code Replacement Library for PPU

The CRL technique optimizes the PPU run time. By default, the code generation does not contain the CRL functionality. Explore the available libraries to identify those that best meet the needs of your application.

To view the available CRLs, open the Code Replacement Viewer from the MATLAB® command window using the crviewer command. In the left pane, select the name of a library. The viewer displays information about the library in the right pane.

CRL viewer

In the left pane, expand the library, explore the list of tables in the library, and select a table from the list. The middle pane displays the function and operator entries in the selected table, along with abbreviated information for each entry. In the middle pane, when you select a function or operator, the viewer displays information about that entry in the right pane.

CRL table entries

Observe the supported data type for the available CRL entries in the middle and right panes of the code replacement viewer. Before using any CRL, ensure the data type of the application model's variables or data elements matches the required CRL entry's data type.

Select CRL and Generate Optimized Code

The PPU core supports these code replacement libraries for optimized code generation.

  • MetaWare TC4x PPU DSPLib — Generates function-based optimized code to perform DSP operations.

  • MetaWare TC4x PPU SIMD 256-bit — Generates compiler specific intrinsics-based optimized code to perform 256-bit SIMD operations on Infineon AURIX TC4Dx microcontrollers.

  • MetaWare TC4x PPU SIMD — Generates compiler specific intrinsics-based optimized code to perform 512-bit SIMD operations on Infineon AURIX TC49x microcontrollers.

  • MetaWare TC4x PPU VecLib — Generates function-based optimized code to perform linear algebra operations using basic linear algebra subprograms (BLAS ) and linear algebra package (LAPACK). For more information, see BLAS and LAPACK.

  • MetaWare TC4x PPU for Deep Learning — Generates function-based optimized code to perform tensor multiplications in fully connected, long short-term memory (LSTM), bidirectional LSTM (BiLSTM) and gated recurrent unit (GRU) layers of deep neural networks. For more information on these layers, see List of Deep Learning Layers (Deep Learning Toolbox).

Follow these steps to use CRL for any PPU referenced application model:

  1. Open the Simulink® model that uses a PPU core.

  2. Click Ctrl+E or click Modeling > Model Settings to open the Configuration Parameters window.

  3. Select Hardware Implementation and set the Processing Unit parameter to PPU.

  4. Navigate to Code Generation > Interface > Code replacement libraries and click Select to select a CRL from the libraries available for the PPU core of Infineon TC4x microcontrollers. This figure displays the CRLs available for the model in the Code Verification and Validation with PIL Using PPU example.

    CRL for PPU

  5. Use the Add CRL and Remove CRL buttons to add and remove a CRL.

    Available CRLs

  6. After completion of the PIL simulation steps from the Code Verification and Validation with PIL Using PPU example, this image shows the code generation report of the model with code replacements.

    CRL

  7. For PPU and TriCore based multicore models like Getting Started with PPU Acceleration for Infineon AURIX TC4x Microcontrollers, the SoC builder tool guides you through the validate, build and run procedure. Click Configure, Build, Deploy & Start on the Hardware tab of the top model to launch the SoC builder tool. Once you complete the build procedure, observe the code generation report with hardware-specific code replacements.

For application models running on the PPU core, the CRL assigns VCCM to the participating data. However, there can be a conflict with other memory section or storage class assigned to the same data elements by using the Embedded Coder Dictionary and Code Mappings Editor – C. In these cases, Simulink resolves the conflict as below:

  • If you assign a memory section to a specific data element and the same data element is also a designated candidate for the VCCM through the CRL, then there is a conflict and the VCCM gets highest priority. The generated code contains the hardware-specific code replacements for the functions involving that data element.

  • If you assign storage class to a data element and the same data element is also a designated candidate for the VCCM through CRL functionality, then there is a conflict and the storage class gets highest priority. The generated code does not contain the hardware-specific code replacements for the functions involving that data element.

Generate Optimized Code Using Reduction Operations

A reduction operation reduces a set of elements, such as an array, to a single value using an associative binary operator. For example, calculating the sum of the elements in an array is a reduction operation that uses the addition operator. For application models using PPU core, you can optimize reduction operations by generating parallel code that uses the SIMD instructions for the operation.

Follow these steps to use reduction operations for any PPU referenced application model:

  1. Open the Simulink model that uses a PPU core.

  2. Click Ctrl+E or click Modeling > Model Settings to open the Configuration Parameters window.

  3. Select Hardware Implementation and set the Processing Unit parameter to PPU.

  4. Navigate to Code Generation > Optimization, set Leverage target hardware instruction set extensions parameter to PPU, and enable Optimize reductions parameter. Click Ok.

    Enable optimize reductions for Infineon AURIX TC4x Mcus

Known Limitations

  • The PPU core does not support External mode of simulation.

  • The PPU core does not support C++ code generation.

  • You must set the Hardware board parameter to Infineon AURIX TC4x - TriBoards in the Configuration Parameters window of the Simulink model. If you set Device vendor to Infineon, Device type to PPU, and select the CRL manually while keeping the Hardware board parameter set to None, code generation fails.

  • The PPU core does not support model references, and Simulink displays an error message if you set the Processing Unit parameter to PPU in the top-level model and the referenced model.

    Error Message for PPU model reference

See Also

| | | | |

Related Topics