How to optimize multiplications with hdl coder

Hello community
I'm using Simulink to generate VHDL code. The system runs on 2.5 MHz sample rate and is supposed to run on a target architecture with 100 MHz.
The model uses quite a lot of constant multiplications, too many to fit on a FPGA, i.e. the available DSP blocks are not sufficient.
Since the FPGA runs on a much faster clock than the Simulink model, I want to use the 40 clock cycles to optimize the system by implementing the multiplications based on the shift and add algorithm or by multiplexing between the hardware multipliers.
1) Is it possible to automatically implement the shift and add approach with the hdl coder? If yes, how?
2) Is it possible to automatically implement the multiplexing approach with the hdl coder? If yes, how?
3) Is it possible to automatically approximate a constant with the nearest fixpoint representation and then implement the multiplication as a shift operation only? If yes, how?

1 Comment

Can you share your design? What are the sizes of multipliers you have in mind?
ConstMultiplierOptimization (CSD/FCSD) would be a good choice if you have Gain blocks (multiplication by constants) and you do not want to use hard multipliers on the FPGA.

Sign in to comment.

Answers (1)

ConstMultiplierOptimization
The ConstMultiplierOptimization implementation parameter lets you specify use of canonical signed digit (CSD) or factored CSD optimizations for processing coefficient multiplier operations in the generated code.
The following table shows the ConstMultiplierOptimization parameter values.
ConstMultiplierOptimization SettingDescription
'none'
(Default)By default, HDL Coder does not perform CSD or FCSD optimizations. Code generated for the Gain block retains multiplier operations.
'CSD'When you specify this option, the generated code decreases the area used by the model while maintaining or increasing clock speed, using canonical signed digit (CSD) techniques. CSD replaces multiplier operations with add and subtract operations. CSD minimizes the number of addition operations required for constant multiplication by representing binary numbers with a minimum count of nonzero digits.
'FCSD'This option uses factored CSD (FCSD) techniques, which replace multiplier operations with shift and add/subtract operations on certain factors of the operands. These factors are generally prime but can also be a number close to a power of 2, which favors area reduction. This option lets you achieve a greater area reduction than CSD, at the cost of decreasing clock speed.
'auto'
When you specify this option, HDL Coder chooses between the CSD or FCSD optimizations. The coder chooses the optimization that yields the most area-efficient implementation, based on the number of adders required. When you specify 'auto', the coder does not use multipliers, unless conditions are such that CSD or FCSD optimizations are not possible (for example, if the design uses floating-point arithmetic).
The ConstMultiplierOptimization parameter is available for the following blocks:
  • Gain
  • Stateflow® chart
  • Truth Table
  • MATLAB Function
  • MATLAB System

9 Comments

Hello Kiran Kintali
Thank you for your answer. I read about the ConstMultiplierOptimization before but I couldn't find this setting in the options. I didn't realize that this option is in the component's preferences rather than the global optimization settings, so thanks for that hint.
My model uses 64 bit word size, i.e. 64 bit multiplications are required (which require internal 128 bit mapping)
hdlset_param('modle_name', 'MultiplierPartitioningThreshold', 18);
If you want to split large multiplier into smaller chunks you can consider using this global optimization option.
MultiplierPartitioningThreshold
Multiplier partitioning bit width threshold
N must be an integer greater than or equal to 2.
The maximum bit width for a multiplier. If a multiplier has a bit width greater than or equal to MultiplierPartitioningThreshold, HDL Coder™ splits the multiplier into smaller multipliers.
To improve your hardware mapping results, set MultiplierPartitioningThreshold to the bit width of the DSP or multiplier hardware on your target device.
In addition if you would like to make a multiplier with logic elements (shift-add) architecture and not multiplier resource on the FPGA currently you need to build a custom mutlplier.
HDLCoder team is automating this in the near release as a multiplier block specific option.
See attached example for additional details. If you have trouble opening the model
slprivate('showprefs')
Uncheck Do not load models created with a newer version of Simulink or use the command below.
>> set_param(0,'ErrorIfLoadNewModel', 'off')
Hello Kiran Kintali
Once again thank you for your answers.
I tested the settings on a small subsystem and so far the results are looking good. The CSD optimization gives the best result for me. I have some additional questions regarding optimization options.
1) My target architecture will most likely have 18x18 multipliers. Therefore, I set the MultiplierPartitioningThreshold to 18. This works fine on my subsystem, i.e. the resource report shows me that 18x18 multipliers have been synthezised. However, if I use the same option on my whole project, then the report shows me that 64x64 multipliers were used. Are there situation in which the slicing fails?
2) What about resource sharing? Many of my constant multiplications are equal. Thus, it should be possible to use the same hardware by multiplexing. The options shown here do not exist in my Simulink settings / Workflow Advisor (Im Using 2020a release), i.e. there is no "Resource sharing factor" option. There is a "Recource sharing" option in the global settings however, but there I can only specify a "minimum bitwidth", which I cannot find in the documentation what it exactly does.
3) My model has over 30 gain blocks. Is it possible to set the CSD optimization for all blocks at once rather than rightclicking on each block and select the wanted setting?
#1 please share your project. if splitting fails you should have gotten a message. we can track this as a bug and provide resolution. By Minimum bitwidth you are referring to the threshold for split here.
#2 The link you provided is MATLAB to HDL workflow. For Simulink the resource sharing options are at subsystem level. You need to mark the sharable subsystems as atomic and make sure they have identical contents (types, sizes, complexity, rates etc.,). For MATLAB to HDL the settings are at global level in the optimization pane or on the MATLAB config object. Resource sharing factor
#3 There is no global setting. We will take this as input for future improvement for HDL Coder optimization settings. For now you can use a script that does find_system on the DUT to find all gain blocks and run hdlset_param on them. See hdlsaveparams for example on the syntax of block settings serialized to text in MATLAB.
I attached a little test-case model which performs a multiplication with a very large and a very small number. The multiplication is implemented 3 times: Once without optimization, once with CDS, and once wich FCDS.
Multiplier partitioning is set to 18 in the global settings. You mentioned that resource sharing must be at the subsystem. However, there is no Resource sharing option on the subsystem's HDL block properties. Also, can you explain what "mark as atomic" means?
Interestingly, in this test-case model the hdl coder ignors all optimization settings. I.e. not even the CDS optimization is performed (Resource summary shows no adders). You can also see this in the VHDL code, i.e. the multiplications are not implemented correctly. Unfortunately, I cannot upload the .vhd files here.
Once again, thank you very much for your support.
There are missing initialization variables in the model without with the model would not initilizae and compile.
It looks like I need simin, fs_sim and other init variables to ctrl-d the model.
Without being able to compile i cannot generate HDL code from the model.
Pressing ctrl+D doesn't change this behaviour, the code generator still ignores the optimization. Asking for further help.
I found the problem: it is the word length. If I reduce the word size to 64 bit, then Simulink implements CDS and FCDS. If I reduce it further to 32 bit, then it also implements the multiplier slicing.

Sign in to comment.

Products

Release

R2020a

Asked:

on 21 Jul 2020

Edited:

on 24 Jul 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!