Latency in floating point

7 views (last 30 days)
Nabil Mederbel
Nabil Mederbel on 7 Feb 2020
Edited: Walter Roberson on 19 Jan 2021
hello,
We have implemented a simple multiply/add design in simulink were single precision datatype was used , after generation of VHDL code and post imolementation in vivado we obtained the RTL circuit shown below
my question is Why the Floating Point Multiplication is faster than Floating Point Addition
  3 Comments
Nabil Mederbel
Nabil Mederbel on 7 Feb 2020
I am using the ZYNQ UltraScale from Xilinx.....but I think it is related to the IEEE standart for floating point representation or ?
Walter Roberson
Walter Roberson on 7 Feb 2020
I would need to do a bunch of research on what that architecture is, but I notice that they say:
"Enhanced DSP slices incorporating 27x18-bit multipliers and dual adders that enable a massive jump in fixed- and IEEE Std 754 floating-point arithmetic performance and efficiency"
That suggests to me that they put more resources into multiplication than they do into addition.
IEEE 754 has a bunch of decoding overhead that makes it less efficient than could be done, but that overhead would be much the same for addition and multiplication.
It is not uncommon for vendors to analyze the instructions that their devices tend to be used for, and heavily optimize the most commonly used instructions and dedicate less to instructions that the vendor finds make up less of what the customers want to do.

Sign in to comment.

Answers (1)

Nabil Mederbel
Nabil Mederbel on 7 Feb 2020
Thank you Robenson for you feddback, so as i understand, the Latency Values of Floating Point Operators assigned in HDL Coder are irrespective of Hardware ?
because here also, mathworks team assigned more latency for add/subtract operation than multiplication
  1 Comment
Walter Roberson
Walter Roberson on 7 Feb 2020
Edited: Walter Roberson on 19 Jan 2021
No, HDL Coder permits you to choose floating-point latency.
If you look at various CPUs you will see that the latency of floating-point operations varies quite a bit.
ARM has a 4 cycle latency for FADD and a 5 cycle latency for FMUL
The values used by default in HDL Coder probably represent some particular implementation that happens to have put more design work into multiplication than addition. But as well it would be important check the interval specifications for the instructions: you can plausibly start another addition every clock cycle but multiplication only every few clock cycles.

Sign in to comment.

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!