## Numeric Considerations with Native Floating-Point

Native floating-point technology can generate HDL code from your floating-point design. Floating-point designs have better precision, higher dynamic range, and a shorter development cycle than fixed-point designs. If your design has complex math and trigonometric operations, use native floating-point technology.

HDL Coder™ generates code that complies with the IEEE-754 standard of floating-point arithmetic. HDL Coder native floating-point supports:

Round to nearest rounding mode

Denormal numbers

Exceptions such as NaN (Not a Number), Inf, and Zero

Customization of ULP (Units in the Last Place) and relative accuracy

### Round to Nearest Rounding Mode

HDL Coder native floating-point uses the round to nearest even rounding mode. This mode resolves all ties by rounding to the nearest even digit.

This rounding method requires at least three trailing bits after the 23 bits of the
mantissa. The MSB is called Guard bit, the middle bit is called the Round bit, and the LSB
is called the Sticky bit. The table shows the rounding action that HDL Coder performs based on different values of the three trailing bits.
`x`

denotes a *don’t care* value and can take either a
0 or a 1.

Rounding bits | Rounding Action |
---|---|

`0xx` | No action performed. |

`100` | A tie. If the mantissa bit that precedes the Guard bit is a 1, round up, otherwise no action is performed. |

`101` | Round up. |

`11x` | Round up. |

### Denormal Numbers

Denormal numbers are numbers that have an exponent field equal to zero and a nonzero mantissa field. The leading bit of the mantissa is zero.

$$value={(-1)}^{sign}*(0+\underset{i=1}{\stackrel{23}{\Sigma}}{b}_{23-i}{2}^{-i})*{2}^{-126}$$

Denormal numbers have magnitudes less than the smallest floating-point number that can be represented without leading zeros in the mantissa. The presence of denormal numbers indicates loss of significant digits that can accumulate over subsequent operations and eventually result in unexpected values.

The logic to handle denormal numbers involves counting the number of leading zeros and performing a left shift operation to obtain the normalized representation. Addition of this logic increases the area footprint on the target device and can affect the timing of your design.

When using native floating-point support, you can specify whether you want HDL Coder to handle denormal numbers in your design.

### Exception Handling

If you perform operations such as division by zero or compute the logarithm of a
negative number, HDL Coder detects and reports exceptions. The table summarizes the mapping from the
encoding of a floating-point number to the value of the number for various kinds of
exceptions. `x`

denotes a *don’t care* value and can
take either a 0 or a 1.

Sign | Exponent | Significand | Value | Description |
---|---|---|---|---|

x | 0xFF | 0x00000000 |
$$value={(-1)}^{S}\infty $$
| Infinity |

x | 0xFF | A nonzero value |
| Not a Number |

x | 0x00 | 0x00000000 |
| Zero |

x | 0x00 | A nonzero value |
$$value={(-1)}^{sign}*(0+\underset{i=1}{\stackrel{23}{\Sigma}}{b}_{23-i}{2}^{-i})*{2}^{-126}$$
| Denormal |

x | 0x00 < E < 0xFF | x |
$$value={(-1)}^{sign}*(1+\underset{i=1}{\stackrel{23}{\Sigma}}{b}_{23-i}{2}^{-i})*{2}^{(e-127)}$$
| Normal |

### Relative Accuracy and ULP Considerations

The representation of infinitely real numbers with a finite number of bits requires an approximation. This approximation can result in rounding errors in floating-point computation. To measure the rounding errors, the floating-point standard uses relative error and ULP (Units in the Last Place) error.

#### ULP

If the exponent range is not upper-bounded, Units in Last Place (ULP) of a floating-point number x is the distance between two closest straddling floating-point numbers a and b nearest to x. The IEEE-754 standard requires that the result of an elementary arithmetic operation such as addition, multiplication, and division is correctly round. A correctly rounded result means that the rounded result is within 0.5 ULP of the exact result.

An ULP of one means adding a `1`

to the decimal value of the number.
The table shows the approximation of pi to nine decimal digits and how the ULP of one
changes the approximate value.

Floating-point number | Value in decimal | IEEE-754 representation for Single Types | ULP |
---|---|---|---|

3.141592741 | 1078530011 | 0|10000000|10010010000111111011011 | 0 |

3.141592979 | 1078530012 | 0|10000000|10010010000111111011100 | 1 |

The gap between two consecutively representable floating-point numbers varies according to magnitude.

Floating-point number | Value in decimal | IEEE-754 representation for Single Types | ULP |
---|---|---|---|

1234567 | 1234613304 | 0|10010011|00101101011010000111000 | 0 |

1234567.125 | 1234613305 | 0|10010011|00101101011010000111001 | 1 |

#### Relative Error

Relative error measures the difference between a floating-point number and the approximation of the real number. Relative error returns the distance from 1.0 to the next larger number. This table shows how the real value of a number changes with the relative accuracy.

Floating-point number | Value in decimal | IEEE-754 representation for Single Types | ULP | Relative error |
---|---|---|---|---|

8388608 | 1258291200 | 0|10010110|00000000000000000000000 | 0 | 1 |

8388607 | 1258291198 | 0|10010101|11111111111111111111110 | 1 | 2.3841858e-07 |

1 | 1065353216 | 0|01111111|00000000000000000000000 | 0 | 1.1920929e-07 |

2 | 1073741824 | 0|10000000|00000000000000000000000 | 1 | 2.3841858e-07 |

The magnitude of the relative error depends on the real value of the floating-point number.

In MATLAB^{®}, the `eps`

function measures the relative accuracy of the
floating-point number. For more information, see `eps`

.