Main Content

Whenever you add two fixed-point numbers, you may need a carry bit to correctly represent the result. For this reason, when adding two B-bit numbers (with the same scaling), the resulting value has an extra bit compared to the two operands used.

a = fi(0.234375,0,4,6); c = a+a

c = 0.4688 DataTypeMode: Fixed-point: binary point scaling Signedness: Unsigned WordLength: 5 FractionLength: 6

a.bin

ans = 1111

c.bin

ans = 11110

If you add or subtract two numbers with different precision, the radix point first needs to be aligned to perform the operation. The result is that there is a difference of more than one bit between the result of the operation and the operands.

a = fi(pi,1,16,13); b = fi(0.1,1,12,14); c = a + b

c = 3.2416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 18 FractionLength: 14

In general, a full precision product requires a word length equal to the sum
of the word length of the operands. In the following example, note that the word
length of the product `c`

is equal to the word length of
`a`

plus the word length of `b`

. The
fraction length of `c`

is also equal to the fraction length of
`a`

plus the fraction length of
`b`

.

a = fi(pi,1,20), b = fi(exp(1),1,16)

a = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 20 FractionLength: 17 b = 2.7183 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 13

c = a*b

c = 8.5397 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 36 FractionLength: 30

Note that in C, the result of an operation between an integer data type and a
double data type promotes to a double. However, in MATLAB^{®}, the result of an operation between a built-in integer data type
and a double data type is an integer. In this respect, the `fi`

object behaves like the built-in integer data types in MATLAB.

When doing addition between `fi`

and
`double`

, the double is cast to a `fi`

with the same numerictype as the `fi`

input. The result of the
operation is a `fi`

. When doing multiplication between
`fi`

and `double`

, the double is cast to a
`fi`

with the same word length and signedness of the
`fi`

, and best precision fraction length. The result of the
operation is a `fi`

.

a = fi(pi);

a = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 13

b = 0.5 * a

b = 1.5708 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 32 FractionLength: 28

When doing arithmetic between a `fi`

and one of the built-in
integer data types, `[u]int[8, 16, 32]`

, the word length and
signedness of the integer are preserved. The result of the operation is a
`fi`

.

a = fi(pi); b = int8(2) * a

b = 6.2832 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 24 FractionLength: 13

When doing arithmetic between a `fi`

and a logical data type,
the logical is treated as an unsigned `fi`

object with a value
of 0 or 1, and word length 1. The result of the operation is a
`fi`

object.

a = fi(pi); b = logical(1); c = a*b

c = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 17 FractionLength: 13

`fimath`

properties define the rules for performing arithmetic
operations on `fi`

objects, including math, rounding, and overflow
properties. A `fi`

object can have a local
`fimath`

object, or it can use the default
`fimath`

properties. You can attach a `fimath`

object to a `fi`

object by using `setfimath`

.
Alternatively, you can specify `fimath`

properties in the
`fi`

constructor at creation. When a `fi`

object has a local `fimath`

, rather than using the default
properties, the display of the `fi`

object shows the
`fimath`

properties. In this example, `a`

has
the `ProductMode`

property specified in the
constructor.

a = fi(5,1,16,4,'ProductMode','KeepMSB')

a = 5 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 4 RoundingMethod: Nearest OverflowAction: Saturate ProductMode: KeepMSB ProductWordLength: 32 SumMode: FullPrecision

`ProductMode`

property of `a`

is set to
`KeepMSB`

while the remaining `fimath`

properties use the default values.**Note**

For more information on the `fimath`

object, its properties,
and their default values, see fimath Object Properties.

The following table shows the bit growth of `fi`

objects,
`A`

and `B`

, when their
`SumMode`

and `ProductMode`

properties use the
default `fimath`

value, `FullPrecision`

.

A | B | Sum = A+B | Prod = A*B | |
---|---|---|---|---|

Format | `fi(v` | `fi(v` | — | — |

Sign | `s` | `s` | `S` =
(`s` ||`s` ) | `S` =
(`s` ||`s` ) |

Integer bits | `I` | `I` | `I` | `I` |

Fraction bits | `f` | `f` | `F` | `F` |

Total bits | `w` | `w` | `S` | `w` |

This example shows how bit growth can occur in a
`for`

-loop.

T.acc = fi([],1,32,0); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); for n = 1:length(x) acc = acc + x(n) end

acc = 1 s33,0 acc = 3 s34,0 acc = 6 s35,0

`acc`

increases with each iteration of the loop.
This increase causes two problems: One is that code generation does not allow
changing data types in a loop. The other is that, if the loop is long enough, you
run out of memory in MATLAB. See Controlling Bit Growth for some strategies to
avoid this problem.By specifying the `fimath`

properties of a
`fi`

object, you can control the bit growth as operations
are performed on the object.

F = fimath('SumMode', 'SpecifyPrecision', 'SumWordLength', 8,... 'SumFractionLength', 0); a = fi(8,1,8,0, F); b = fi(3, 1, 8, 0); c = a+b

c = 11 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 8 FractionLength: 0 RoundingMethod: Nearest OverflowAction: Saturate ProductMode: FullPrecision SumMode: SpecifyPrecision SumWordLength: 8 SumFractionLength: 0 CastBeforeSum: true

The `fi`

object `a`

has a local
`fimath`

object `F`

. `F`

specifies the word length and fraction length of the sum. Under the default
`fimath`

settings, the output, `c`

,
normally has word length 9, and fraction length 0. However because
`a`

had a local `fimath`

object, the
resulting `fi`

object has word length 8 and fraction length
0.

You can also use `fimath`

properties to control bit growth in
a `for`

-loop.

F = fimath('SumMode', 'SpecifyPrecision','SumWordLength',32,... 'SumFractionLength',0); T.acc = fi([],1,32,0,F); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); for n = 1:length(x) acc = acc + x(n) end

acc = 1 s32,0 acc = 3 s32,0 acc = 6 s32,0

Unlike when `T.acc`

was using the default
`fimath`

properties, the bit growth of
`acc`

is now restricted. Thus, the word length of
`acc`

stays at 32.

Another way to control bit growth is by using subscripted assignment.
`a(I) = b`

assigns the values of `b`

into
the elements of `a`

specified by the subscript vector,
`I`

, while retaining the `numerictype`

of
`a`

.

T.acc = fi([],1,32,0); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); % Assign in to acc without changing its type for n = 1:length(x) acc(:) = acc + x(n) end

acc (:) = acc + x(n) dictates that the values at subscript vector,
`(:)`

, change. However, the `numerictype`

of output `acc`

remains the same. Because
`acc`

is a scalar, you also receive the same output if you
use `(1)`

as the subscript
vector.

for n = 1:numel(x) acc(1) = acc + x(n); end

acc = 1 s32,0 acc = 3 s32,0 acc = 6 s32,0

The `numerictype`

of `acc`

remains the same
at each iteration of the `for`

-loop.

Subscripted assignment can also help you control bit growth in a function. In
the function, `cumulative_sum`

, the
`numerictype`

of `y`

does not change, but
the values in the elements specified by *n*
do.

function y = cumulative_sum(x) % CUMULATIVE_SUM Cumulative sum of elements of a vector. % % For vectors, Y = cumulative_sum(X) is a vector containing the % cumulative sum of the elements of X. The type of Y is the type of X. y = zeros(size(x),'like',x); y(1) = x(1); for n = 2:length(x) y(n) = y(n-1) + x(n); end end

y = cumulative_sum(fi([1:10],1,8,0))

y = 1 3 6 10 15 21 28 36 45 55 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 8 FractionLength: 0

**Note**

For more information on subscripted assignment, see the `subsasgn`

function.

Another way you can control bit growth is by using the `accumpos`

and `accumneg`

functions to
perform addition and subtraction operations. Similar to using subscripted
assignment, `accumpos`

and `accumneg`

preserve
the data type of one of its input `fi`

objects while allowing
you to specify a rounding method, and overflow action in the input
values.

For more information on how to implement `accumpos`

and
`accumneg`

, see Avoid Multiword Operations in Generated Code

When performing fixed-point arithmetic, consider the possibility and consequences
of overflow. The `fimath`

object specifies the overflow and
rounding modes used when performing arithmetic operations.

Overflows can occur when the result of an operation exceeds the maximum or
minimum representable value. The `fimath`

object has an
`OverflowAction`

property which offers two ways of dealing
with overflows: saturation and wrap. If you set
`OverflowAction`

to `saturate`

, overflows
are saturated to the maximum or minimum value in the range. If you set
`OverflowAction`

to `wrap`

, any overflows
wrap using modulo arithmetic, if unsigned, or two’s complement wrap, if
signed.

For more information on how to detect overflow see Underflow and Overflow Logging Using fipref.

There are several factors to consider when choosing a rounding method, including cost, bias, and whether or not there is a possibility of overflow. Fixed-Point Designer™ software offers several different rounding functions to meet the requirements of your design.

Rounding Method | Description | Cost | Bias | Possibility of Overflow |
---|---|---|---|---|

`ceil` | Rounds to the closest representable number in the direction of positive infinity. | Low | Large positive | Yes |

`convergent` | Rounds to the closest representable number. In the case of a
tie, `convergent` rounds to the nearest even
number. This approach is the least-biased rounding method
provided by the toolbox. | High | Unbiased | Yes |

`floor` | Rounds to the closest representable number in the direction of negative infinity, equivalent to two’s complement truncation. | Low | Large negative | No |

`nearest` | Rounds to the closest representable number. In the case of a
tie, `nearest` rounds to the closest
representable number in the direction of positive infinity. This
rounding method is the default for `fi` object
creation and `fi` arithmetic. | Moderate | Small positive | Yes |

`round` | Rounds to the closest representable number. In the case of a
tie, the `round` method rounds:Positive numbers to the closest representable number in the direction of positive infinity. Negative numbers to the closest representable number in the direction of negative infinity.
| High |
Small negative for negative samples Unbiased for samples with evenly distributed positive and negative values Small positive for positive samples
| Yes |

`fix` | Rounds to the closest representable number in the direction of zero. | Low |
Large positive for negative samples Unbiased for samples with evenly distributed positive and negative values Large negative for positive samples
| No |