Main Content

Distributed Pipelining

What Is Distributed Pipelining?

Distributed pipelining, or register retiming, is a speed optimization that moves existing delays in a design to reduce the critical path while preserving functional behavior. This optimization moves the delays within a subsystem while preserving the hierarchy.

The HDL Coder™ software uses an adaptation of the Leiserson-Saxe retiming algorithm.

For example, in this model, there is a delay of 2 at the output.

The diagram shows the generated model after distributed pipelining redistributes the delay to reduce the critical path.

Benefits and Costs of Distributed Pipelining

Distributed pipelining can reduce your design’s critical path, enabling you to use a higher clock rate and increase throughput.

However, distributed pipelining requires your design to contain a number of delays. If you need to insert additional delays in your design to enable distributed pipelining, this increases the area and the initial latency of your design.

How Distributed Pipelining Works

HDL Coder applies distributed pipelining with other optimization options to move the existing delays in your design to reduce the critical path. The existing delays in your design can either be design delays or pipeline delays. Design delays are delays that you manually insert in your design by using Delay blocks, or other blocks that have state, including Queue, HDL FIFO, or Buffer blocks. Pipeline delays are delays that are generated by optimization settings, such as input or output pipelining options, or block implementation settings, such as the ShiftAdd implementation for a Divide block and the CORDIC approximation for a Trigonometric Function block. See InputPipeline and OutputPipeline.

When you generate code, before distributed pipelining runs, the input and output pipelines you specify are inserted at the input and outputs ports in your design. HDL Coder uses these pipelines to determine latency requirements and an outline for ideal pipelining for your design.

HDL Coder begins distributed pipelining by calculating the propagation delay for the components in your design by assigning each component an equal weight, except for wire components, such as Selector blocks and Bit Concat blocks, that are assigned zero-propagation delay. Distributed pipelining then redistributes the design delays and pipeline delays based on the weight assigned to each component in your design. Distributed pipelining takes into consideration all of the pipeline stages and distributes them as evenly as possible in order to obtain the shortest critical path. The inserted pipelines might not appear in the place you originally requested them because distributed pipelining can move these to improve the critical path.

If pipelines are required at specific point after a block, you can set the HDL Block Property ConstrainedOutputPipeline on the block at the specified point to the number of pipelines that you want to keep as output pipelines after that block. Constrained output pipelining does not insert extra pipelines, but instead provides information to distributed pipelining to first prioritize pipelines moved to or left at that specific point in your design. This option is used only to prevent distributed pipelining from moving specified pipelines or to require distributed pipelining to move pipelines to that specified point. For more information, see Constrained Output Pipelining.


If you want to disable distributed pipelining from running on your subsystem and you have set constrained output pipelining on blocks in your design, disable distributed pipelining for your subsystem and reset the ConstrainedOutputPipeline property on the blocks to zero.

If you do not want the design delays you have in your design to be moved by distributed pipelining, disable Allow design delay distribution. See Allow design delay distribution.

Requirements for Distributed Pipelining

Distributed pipelining requires your design to contain delays or registers that can be redistributed. You can use input pipelining or output pipelining to insert more registers. You can insert these delays manually or insert them by setting the HDL block properties InputPipeline or OutputPipeline for the subsystem or block that you plan to set distributed pipelining. For more information, see InputPipeline and OutputPipeline.

If your design does not meet your timing requirements at first, try adding more delays or registers to improve your results.

Specify Distributed Pipelining

You can set distributed pipelining on a model or the top-level DUT subsystem. For finer control, you can set distributed pipelining on subsystems, Stateflow® charts, and MATLAB Function blocks in the top-level DUT subsystem. See Use Distributed Pipelining Optimization in Models with MATLAB Function Blocks.

To enable distributed pipelining for the model from the UI:

  1. In the Apps tab, select HDL Coder. The HDL Code tab appears.

  2. Click Settings. In the HDL Code Generation > Optimization pane, in the Pipelining tab, select Distributed pipelining and click OK.

To enable distributed pipelining for a model, on the command line, enter:

hdlset_param('modelname', 'DistributedPipelining', 'on')

If you have a top-level device under test (DUT) that contains a subsystem hierarchy, such as lower-level subsystems, and you want distributed pipelining to run throughout the DUT and the lower-level subsystems, enable the model configuration parameter Distributed pipelining and leave the HDL block property DistributedPipelining as inherit for the DUT and all lower-level subsystems.

To prevent distributed pipelining from running in a specific lower-level subsystem in the DUT, set the HDL block property DistributedPipelining to off for that subsystem.


If you insert pipeline registers, output data can be in an invalid state initially. To avoid test bench errors resulting from initial invalid samples, disable output checking for those samples. For more information, see Ignore output data checking (number of samples).

Distributed Pipelining Report

To see the distributed pipelining information in the report, before you generate code for each subsystem or model reference, enable the optimization report. In the HDL Code tab, select Report Options, and then select Generate optimization report.

When you generate the optimization report, in the Distributed Pipelining section, you see the effect of the distributed pipelining optimization for the subsystems that have distributed pipelining enabled in the generated model and in the Distributed Pipelining Summary. If distributed pipelining is unsuccessful, the report shows diagnostic messages and blocks that caused distributed pipelining to fail.

When distributed pipelining encounters barriers, the HDL Coder generates a highlighting script named highlightDistributedPipeliningBarriers.m. This script highlights all the barriers present in the model.

If you set the HDL block property ConstrainedOutputPipeline for a block in your design, the Constrained Output Pipeline Summary displays the requested constrained output pipelines and if those requests are met during code generation.

If distributed pipelining is successful, the Detailed Report in the Distributed Pipelining section displays comparative listings of registers before and after you apply the distributed pipelining transform.

Limitations of Distributed Pipelining

The distributed pipelining optimization has these limitations:

  • Your pipelining results might not be optimal in hardware because the operator latencies in your target hardware might differ from the estimated operator latencies used by the distributed pipelining algorithm.

  • HDL Coder generates pipeline registers at the outputs instead of distributing the registers to reduce critical path in these situations:

    • Stateflow chart containing a state, local variable, or a matrix with statically unresolvable index.

  • HDL Coder distributes pipeline registers around these blocks instead of within them:

    • Model

    • Sum (Cascade implementation)

    • Divide

    • Reciprocal

    • Sqrt

    • Sine HDL Optimized and Cosine HDL Optimized

    • Product (Cascade implementation)

    • MinMax

    • Upsample

    • Downsample

    • Rate Transition

    • Zero-Order Hold

    • Reciprocal Sqrt (RecipSqrtNewton implementation)

    • Trigonometric Function (CORDIC Approximation)

    • Single Port RAM

    • Dual Port RAM

    • Simple Dual Port RAM

    • Blocks with floating-point IP (i.e. native floating-point)

  • If you enable distributed pipelining for a subsystem that contains these blocks, HDL Coder generates a warning message during code generation in the HDL Code Generation Check Report. This message identifies blocks that act as barriers for distributed pipelining in the subsystem. HDL Coder distributes pipeline registers around nested subsystems.

    • M-PSK Demodulator Baseband

    • M-PSK Modulator Baseband

    • QPSK Demodulator Baseband

    • QPSK Modulator Baseband

    • BPSK Demodulator Baseband

    • BPSK Modulator Baseband

    • PN Sequence Generator

    • Repeat

    • HDL Counter

    • LMS Filter

    • Sine Wave

    • Viterbi Decoder

    • Triggered Subsystem

    • Counter Limited

    • Counter Free-Running

Selected Bibliography

Leiserson, C.E, and James B. Saxe. “Retiming Synchronous Circuitry.” Algorithmica. Vol. 6, Number 1, 1991, pp. 5-35.

See Also


Model Settings

Related Examples

More About