Improving Simulation Performance in Simulink
By Weiwu Li, Reid Spence, and Guy Rouleau, MathWorks
- First Steps to Improving Simulation Performance
- Selecting the Correct Simulation Mode
- Enabling Fast Restart for Iterative Simulations
- Running Multiple Simulations in Parallel
- Using Model Referencing and Simulink Cache
- Analyzing Your Models for Simulation Bottlenecks
- Modifying and Simplifying Your Model
- Additional Useful Techniques
- Applying Techniques and Measuring Results
Whatever the level of complexity of the model, every Simulink® user wants to improve simulation performance. This article presents many practical tips and techniques to help you get the best performance out of your simulation workflows.
First Steps to Improving Simulation Performance
You first need to look at your simulation workflow to understand how many times the model needs to run the edit, initialize, and simulation stages. For example, the following four common workflows have very different characteristics. The best pick of simulation mode and options for performance will depend on which of these four workflows you are using.
The first workflow is edit-update-repeat (Figure 1). This workflow is typical when you press Ctrl+D to ensure the model updates, but you are not ready to do simulation validation yet; this workflow is typical when you make large modifications to your models.
The second workflow is edit-sim-repeat workflow, which would require editing the structure and initialize the model each time (Figure 2).
The third workflow is tune-sim-repeat (Figure 3). It focuses on iterative tuning of some parameters, with no need for editing the model structure.
The fourth workflow is the multisim workflow (Figure 4). It usually deals with many simulations with verified models, where offloading the execution would be helpful.
Another factor that will affect our pick of performance solutions is which stage of the simulation is dominating total computation time. You can use the timing info in the
SimulationMetadata object returned in the
Simulink.SimulationOutput object to see what proportion of the total time is taken by initialization or execution. For example, in the example below we can say that this model’s performance is dominated by execution time, so we should focus on performance solutions to speed up model execution rather than initialization.
You can look at the simulation metadata for the model used in your most common workflow to have further understanding of the simulation properties. The
SimulationMetadata object contains information about a simulation run including model information, timing information and execution information, etc. You can retrieve the timing info by accessing the property
out.SimulationMetadata.TimingInfo. An example of the timing info of simulation metadata looks like this:
WallClockTimestampStart: '2021-12-16 10:57:15' WallClockTimestampStop: '2021-12-16 10:58:44' InitializationElapsedWallTime: 0.6918 ExecutionElapsedWallTime: 87.8463 TerminationElapsedWallTime: 0.4087 TotalElapsedWallTime: 88.9468 ProfilerData: 'Profiler is not enabled'
Depending on your specific simulation workflow and the model characteristics (initialization time versus execution time, etc.), you could quickly refer to Table 1 below to select and try the right techniques.
|Model Reference -
|Model Reference -
Incremental Loading and
|Modify Your Models||x||x||x|
Simulink provides three simulation modes: normal, accelerator, and rapid accelerator (Figure 5). As their names imply, accelerator is generally faster than normal, and rapid accelerator is faster still. Each increase in speed typically means sacrificing flexibility, interactivity, and/or diagnostics. In many cases, if you can work without one of these capabilities—at least temporarily—simulation speed will improve.
In normal mode, Simulink interprets your model during each simulation run. If you change your model frequently, this is generally the preferred mode to use because it requires no separate compilation step as do accelerator and rapid accelerator mode.
In accelerator mode, Simulink compiles a model into an execution engine in memory, eliminating the block-to-block overhead of an interpreted simulation in normal mode. Accelerator mode supports the debugger and profiler but only a limited set of runtime diagnostics. Accelerator is often used when the simulation time is much longer than the compilation time.
In rapid accelerator mode, Simulink compiles a standalone executable for the model, which can run on a separate process. You can use rapid accelerator mode only when the full model is capable of generating code. This mode restricts interaction with the model during simulations. For example, rapid accelerator mode does not support debugging. As with accelerator mode, it's best to use rapid accelerator mode when your simulations take much longer than the one-time compilation.
You may wonder which mode you should choose for your workflow. Figure 6 shows the performance of a hypothetical model simulation in normal, accelerator, and rapid accelerator modes.
Typically, normal mode is recommended when you are in a “development” workflow where you modify the model often and execute update diagram or short simulation between modifications. Conversely, use rapid accelerator when you want to run multiple simulations without making structural changes to the models, such as adding or removing blocks.
If you know the model will not be changed between runs, you can instruct Simulink to skip the rapid accelerator initialization by setting
RapidAcceleratorUpToDateCheck to “off.” Keep in mind that changes to nontunable parameters will be ignored. If your model hits a limitation blocking usage of rapid accelerator, use accelerator mode instead.
When Accelerator Modes Benefit Some Models More than Others
Sometimes, you might notice that some models benefit from accelerator or rapid accelerator mode more than others. If your model does not see much performance improvement, it may be in the following situations:
Your model’s algorithm is primarily contained in a few complex blocks, such as the Fast Fourier Transform block or lookup tables. A small model may run slower in an accelerator mode because native blocks are highly optimized. In contrast, a model with many basic blocks is more likely to benefit from acceleration.
Your model contains mostly compiled code, such as code from S-functions, Stateflow® blocks, and MATLAB® functions. Using the compilation step will not further increase model speed.
Your simulation runs include initialization or termination phases. Because the accelerator modes work only on the simulation phase of each run, they may not offer much improvement if they require time-consuming initialization or termination phases. For details, see the Accelerating the Initialization Phase section of this article.
Your model contains blocks that cannot be compiled, such as Interpreted MATLAB Function blocks, or MATLAB System objects in interpreted mode.
In a typical Simulink simulation workflow, as you press the Run button, Simulink will first update and compile the model, then simulate it. This process repeats for any subsequent runs. However, if your iterative simulations only need to change model inputs or tunable parameters, you normally don’t need to edit the model structure. For these workflows, the initialization phase is usually unnecessary after the first run and will quickly add up when running hundreds or even thousands of simulations.
Simulink provides a nice feature called Fast Restart. As the name suggests, it lets you run iterative simulations by compiling the model only once (Figure 7).
With Fast Restart on, the model does not terminate automatically after each run. Instead, the model is automatically initialized again by using the saved initialization information for the next set of simulations without recompilation.
When Fast Restart is being used, the model is locked down, and you will not be able to edit the model structure. That prevents any structural changes to the model until the simulations have finished. However, you will still be able to change tunable parameters or change input signals and see their effect on the simulations.
You can turn Fast Restart on from the toolstrip as shown above and in programmatic simulations using the
Using the Simulation Operating Point
Engineers typically simulate a Simulink model iteratively for different inputs, boundary conditions, and operating conditions. In many situations, these simulations share a common startup phase in which the model transitions from its initial state to some other state. An electric motor, for example, may be brought up to speed before various control sequences are tested.
Using simulation operating points, you can save a simulation snapshot at the end of the startup phase and then restore it for use as the initial state for future simulations. This technique does not improve simulation speed per se, but it can reduce total simulation time for consecutive runs because the startup phase needs to be simulated only once (see Figure 8 below).
You can reduce the total amount of time it takes to run multiple independent simulations by distributing simulation tasks among multiple processing cores with Simulink and Parallel Computing Toolbox™. You can further reduce overall simulation time by using MATLAB Parallel Server™, which lets you scale simulations to clusters and clouds.
Common use cases for running simulations in parallel include Monte Carlo analysis, design optimization, and test case sweeps. For example, you might set up a Monte Carlo simulation in which you vary the value of a parameter across a predetermined range. You can then perform simulations for each parameter value independently and in parallel on multiple cores.
You can parallelize many of the tasks involved in design optimization, including estimating model parameters from test data, tuning controller gains to achieve a desired response, optimizing design parameters, performing sensitivity analysis, and performing robustness analysis. The total simulation time decreases as the number of processors in use increases.
Typically, you have two choices on how to set up parallel multisimulation runs: The Multiple Simulations panel or a script that calls the
parsim function. You can set up and run the simulations directly with the Multiple Simulations panel in the Simulink Editor (see Figure 9). In the Multiple Simulations panel, you can specify values for block parameters and workspace variables for the simulations. By selecting the “use parallel” option, you can enable parallel execution of your simulation automatically.
You could also use the
parsim construct to start your parallel simulation using MATLAB script. Compared to using the Multiple Simulations panel, using
parsim script provides more flexibility and advanced options.
Learn more about configuring and running simulations with Multiple Simulations panel.
Model referencing lets you include one model in another using a Model block. Each instance of this is called a model reference. Like subsystems, model referencing allows you to organize a large model hierarchy (Figure 10).
Using model references has many performance benefits:
Incremental loading. The reference models are loaded only when needed.
Accelerated simulation. Model referencing allows you to store parts of a simulation in different model files. If a large part of the simulation does not change, you can place it inside a referenced model in accelerator mode. This part will be compiled only once the first time and initialize and simulate faster on subsequent runs.
Incremental rebuilding. If the Model reference rebuild option selected is “If any changes in known dependencies detected,” the simulation targets that implement these referenced models are not generated every time you run simulations, but only regenerated if the referenced model and/or any of its dependencies or interfaces change. This option is faster than if the “If any changes detected” option is selected, because it skips the step of computing the model checksum. If you know for sure your model structure is not changed during the simulations, you could set the Rebuild option to “Never” to accelerate the process even further.
Parallel building. For models that contains large model reference hierarchies, you can reduce code generation and compilation time by building the reference models in parallel. With Parallel Computing Toolbox or MATLAB Parallel Server, you can distribute the code generation and compilation across multiple MATLAB workers in your configuration.
Model references are typically used in a team environment, where you may run simulations based on components someone else built. Simulink can package these built artifacts into a single Simulink cache file with the .SLXC extension for each model in the hierarchy (Figure 11).
With Simulink cache files, Simulink builds only the out-of-date files as long as the Rebuild configuration parameter is set to “If any changes detected (default)” or “If any changes in known dependencies detected.” You and your team members can share these SLXC files and the corresponding Simulink model files with each other.
When you run the simulation on your machine, Simulink extracts the necessary derived files from the SLXC file for each model. As a result, Simulink does not need to perform unnecessary rebuilds and completes the simulation more quickly.
In addition, Simulink cache files apply to multiple types of build artifacts, which might not be model reference simulation targets (e.g., rapid accelerator targets). Sharing these Simulink cache files significantly reduces the rebuild cost in a team-based workflow.
Learn more about model referencing.
Learn more about Simulink cache files.
There are built-in capabilities in Simulink that can help you systematically understand your models, identify simulation performance bottlenecks, and improve simulation speed.
Performance Advisor can be started from the debug tab along with Simulink Profiler and Solver Profiler (Figure 12).
Performance Advisor analyzes the model and runs through different checks for conditions and configuration settings that might cause inefficient simulation performance. Performance Advisor provides recommendations for better model configuration settings as well as mechanisms for fixing issues automatically or manually.
Once the recommended changes are applied to your model, Performance Advisor can verify how much weave fared in improving performance (Figure 13).
Learn more about Performance Advisor.
There are two types of profilers you can use in Simulink to profile your model for slow simulations. If you are not sure which one to use, refer to Figure 14 for a quick start. Simulink Profiler is for determining which blocks take the most simulation time; and Solve Profiler analyzes why variable-step solver takes certain steps.
Simulink Profiler lets you quantify how much time each phase of your simulation takes and how much time each block takes to simulate. Simulink Profiler is often used with normal mode during design phase so that you can examine the time cost of each block. Using Simulink Profiler can generate a great deal of data. To minimize the amount of data that you need to review, focus on those methods that consume the most time and those that are most frequently called.
Simulink Profiler identifies the source of simulation slowdown, so that you can manually assess performance of simulation execution time. You could also replace the most computation extensive blocks with lightweight ones to accelerate the simulation (Figure 15).
Learn more about Simulink Profiler.
When simulation of a model with a variable step solver slows down, takes too many small steps, or stops responding, the Solver Profiler can help you understand solver behavior and identify the factors affecting the simulation.
Multiple factors can affect solver behavior and limit the simulation speed. The Solver Profiler logs and reports all the major events that occur when simulating a model:
- Zero-crossing events
- Solver exception events
- Solver reset events
- Jacobian computation events
If your model includes Simscape™ blocks, you can also look into various physical quantities of this these blocks using Simscape Results Explorer.
The Solver Profiler presents graphical and statistical information about the simulation, solver settings, events, and errors. You can use this data to identify locations in the model that caused simulation bottlenecks and take action, such as updating the system’s stiffness for better simulation performance.
Solver Profiler is usually used with variable-step solver to help explain why your simulation is taking too many steps or why a step is so small. On the other hand, if your focus is why each simulation step takes too long to finish, or if you are using a fixed-step solver, then Simulink Profiler discussed in the previous section is the right tool for that (Figure 16).
Most of the techniques described so far require few, if any, changes to the model itself. You can achieve additional performance improvements by applying techniques that involve modifications to the model.
Accelerating the Initialization Phase
When you update or open a model, Simulink runs the mask initialization code. If you have complicated mask initialization commands that contain many calls to
set_param, consider consolidating consecutive calls to
set_param() into a single call with multiple argument pairs. This can reduce the overhead associated with these calls.
Learn more about mask code execution.
If you use MATLAB scripts to load and initialize data, you can often improve performance by loading MAT-files instead. The drawback is that the data in a MAT-file is not in a human-readable form and can therefore be more difficult to work with than a script. However, loading typically initializes data much more quickly than the equivalent script.
Learn more about loading data from a MAT-file into your workspace.
Reducing Model Complexity
Simplifying your model without sacrificing fidelity is an effective way to improve simulation performance. Here are three ways to reduce model complexity.
Replace a subsystem with a lower-fidelity alternative. In many cases, you can simplify your model by replacing a complex subsystem model with one of the following:
- A linear or nonlinear dynamic model created from measured input-output data using System Identification Toolbox™
- A high-fidelity, nonlinear statistical model created using Model-Based Calibration Toolbox™
- A linear model created using Simulink Control Design™
- A lookup table
You can maintain both representations of the subsystem in a library and use variant subsystems to manage them.
Learn more about using configurable subsystem blocks.
Learn more about reduced order modeling.
Reduce the number of blocks. When you reduce the number of blocks in your model, fewer blocks will need to be updated during simulations, leading to faster simulation runs. Vectorization is one way to reduce your block count. For example, if you have several parallel signals that undergo a similar set of computations, try combining them into a vector and performing a single computation. Another way is to enable the Block Reduction optimization in the Optimization > General section of the configuration parameters.
Use frame-based processing. In frame-based processing, samples are processed in batches instead of one at a time. If your model includes an analog-to-digital converter, for example, you can collect the output samples in a buffer and process the buffer with a single operation, such as a fast Fourier transform. Processing data in chunks in this way reduces the number of times that blocks in your model must be invoked. In general, scheduling overhead decreases as frame size increases. However, larger frames consume more memory, and memory limitations can adversely affect the performance of complex models. Experiment with different frame sizes to find one that maximizes the performance benefit of frame-based processing without causing memory issues.
In general, the more interactive the model, the longer it will take to simulate. The tips in this section illustrate ways to improve performance by giving up some interactivity.
Disable debugging diagnostics. Some enabled diagnostic features noticeably slow simulations. You can disable them in the Diagnostics pane of the Configuration Parameters dialog box.
Disable simulation animations. By default, Stateflow charts highlight the current active states and animate the state transitions that take place as the model runs. This feature is useful for debugging, but it slows the simulation. To accelerate simulations, either close all Stateflow charts or disable the animation. Similarly, if you’re using Simulink 3D Animation™, SimMechanics™ visualization, FlightGear, or another 3D animation package, consider disabling the animation or reducing scene fidelity to improve performance.
Learn more about speeding up simulation in Stateflow.
Adjust viewer-specific parameters and manage viewers through enabled subsystems.
If your model contains a scope viewer that displays a large number of data points and you can’t eliminate the scope, try adjusting the viewer parameters to trade off fidelity for rendering speed. Be aware, however, that by using decimation to reduce the number of plotted data points, you risk missing short transients and other phenomena that would be obvious with more data points. You can place viewers in enabled subsystems to control more precisely which visualizations are enabled and when.
Learn more about how scope signal viewer parameter settings can affect performance.
Using MATLAB Functions Instead of Interpreted MATLAB Function Blocks
To call a MATLAB function within your Simulink model, use a MATLAB Function block instead of an Interpreted MATLAB Function block or a MATLAB S-function. The MATLAB Function is the faster alternative. It supports the generation of embeddable C code. While the MATLAB Function block does not support all MATLAB functions, the subset of the MATLAB language that it does support is extensive.
To quickly find all the Interpreted MATLAB Function blocks in your model, use the Performance Advisor.
Learn more about working with the MATLAB Function block.
You log large data sets. When logging large amounts of data (e.g., in models that include To Workspace, To File, or Scope blocks), use decimation or limit the logged output to the last part of the simulation. Avoid logging redundant data (e.g., log the time only once) and extraneous data (e.g., log integer values instead of doubles when feasible). Logging override can also be used to control which signals are being logged without rebuilding the simulation target for accelerator mode.
Learn more about exporting simulation data.
The following techniques apply to some specific models; if your model falls into the patterns described below, you probably can use these to increase simulation speed.
Optimizing Hardware Acceleration
Single instruction multiple data (SIMD) is a data-level parallel processing technique that performs the same operation on multiple data points simultaneously. Many modern CPUs have SIMD instructions that, for example, perform several additions or multiplications at once. For computationally intensive operations on supported blocks, SIMD intrinsics could improve the simulation performance.
Simulink supports SIMD for hardware acceleration in all simulation modes which use code generation technologies (e.g., accelerator or rapid accelerator mode).
Simulink provides a configset parameter to control the type of SIMD in the Simulation Target panel. For hardware acceleration options, the default choice (“leverage generic hardware”) is available on all X86 CPUs and no rebuild is needed. The other choice (“leverage native hardware”) is CPU dependent. If your CPU supports newer SIMD instruction sets such as SSE2/AVX2/AVX512 then this choice could provide even faster simulation speed (Figure 17).
When you use a simulation target, there are many factors to determine if SIMD is available or if it can improve your simulation speed or not. You could use performance advisor described earlier to run a check with “check hardware Acceleration Settings” under simulation target to understand if SIMD is helpful with your specific model or not.
Learn more about hardware acceleration.
Leveraging Multicore Simulation
In Simulink there are several different techniques to accelerate a single simulation by dispatching some parts of the model for multicore execution. These techniques don’t apply to all models but might be useful for your specific use cases.
Multicore simulation of dataflow domains. If you model and simulate a computationally intensive signal processing or multirate signal processing system in Simulink, dataflow domains will improve performance. Dataflow domain is a new execution domain that simulates using a model of computation synchronous dataflow. Dataflow execution domain is data-driven and can simulate using multiple cores.
You specify dataflow as the execution domain for a subsystem by setting the Domain parameter to Dataflow using Property Inspector. Dataflow domains automatically partition your model and simulate the system using multiple threads for better simulation performance (Figure 18).
You can use the run analysis button on the multicore tab to analyze the dataflow domain for simulation performance. It can profile the model, calculate the execution time of each block, find existing parallelism in the model, and partition it into multiple threads. It can suggest a latency value for further increasing the simulation throughput by pipelining the execution of the blocks. The analysis in Figure 19 below suggests that the latency should be set from 0 to 2 for best performance.
Learn more about dataflow domain.
Multicore simulation with cosimulation components. Your simulation model might include cosimulation components. A cosimulation component could be an S-function block, which is implemented as a cosimulation gateway between Simulink and third-party tools or custom code. It can also be an FMU in cosimulation mode imported to Simulink or a Model block in accelerator mode. If these cosimulation components are thread-safe, these components can be eligible to run on multiple threads. Being thread-safe means that the block can work with multiple threads accessing shared data, resources, and objects without any conflicts.
Not all models have cosimulation components that can run on multiple threads. In addition, multithread cosimulation only supports normal simulation mode. By default, Simulink configures all eligible models and blocks to be ready for multiple threads execution. If performance can be improved, Simulink automatically runs all models on multiple threads.
Learn more about multicore cosimulation.
Using multicore simulation with a For Each subsystem. The For Each subsystem is a subsystem in Simulink that repeats algorithm execution during a simulation step for each element of a subarray in an input signal or mask parameter array. If your model contains For Each subsystems that are computationally intensive, you could potentially speed up the simulation by executing the For Each subsystem iteration on multiple threads.
A For Each subsystem multicore simulation only supports rapid accelerator simulation mode. By default, the multithreaded simulation support for the For Each subsystem is enabled. Simulink will automatically profile the simulation on the fly. Parallel execution is only enabled when it detects performance benefits. You can use model parameter
'MultithreadedSim' to manually opt in or out of multithread simulation.
Learn more about using a For Each subsystem.
To illustrate the relative effectiveness of these techniques on a realistic project, we measured the simulation time performance improvement provided by applying some of the changes suggested above to a model of an automatic transmission system (Figure 20).
To improve performance, we first used Performance Advisor to run the checks and made the following suggested changes:
- Disabled expensive diagnostics that check for solver data inconsistency, division by singular matrix, Inf or NaN block output, simulation range checking, and array bounds exceeded
- Replaced Interpreted MATLAB Function blocks with MATLAB functions (This change has the greatest single effect.)
- Enabled the Block reduction optimizations
- Closed and commented out scopes
These changes reduced simulation time from 57 seconds to 3.3 seconds on average. Using the optimized model, we can now apply rapid acceleration and parallel simulation to compare the performance of those techniques (Table 2).
|Original Model, Normal Mode, Serial Execution, Fast Restart off||5731 seconds|
|Improved Model, Normal Mode, Serial Execution, Fast Restart off||328 seconds|
|Improved Model, Normal Mode, Serial Execution, Fast Restart on||307 seconds|
|Improved Model, Rapid Accelerator Mode, Serial Execution, with
|Improved Model, Normal Mode, Parallel Simulation, Fast Restart on (4 local workers)||198 seconds|
|Improved Model, Rapid Accelerator Mode, Parallel Simulation, with
- Five Practical Tips to Speed Up Your Simulink Simulations (5 videos) - Video Series
- Optimize Performance - Documentation
- Simulink Performance Improvements - Overview
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.