FAQ: Why is a fixed-point type's Fraction Length or Integer Length sometimes negative?

Question

MathWorks Fixed Point Team on 7 Nov 2022

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/1845588-faq-why-is-a-fixed-point-type-s-fraction-length-or-integer-length-sometimes-negative

Edited: MathWorks Fixed Point Team on 7 Nov 2022

Accepted Answer: MathWorks Fixed Point Team

Sometimes fixed-point variables have fraction lengths that are negative.

a = 5.6632765314184e+15
a = 5.6633e+15
aFi = fi(a,1,4)
aFi = 
   5.6295e+15

          DataTypeMode: Fixed-point: binary point scaling
            Signedness: Signed
            WordLength: 4
        FractionLength: -50
curFractionLength = aFi.FractionLength
curFractionLength = -50

Sometimes fixed-point variables have negative integer lengths.

b = 0.00037;
bFi = fi(b,0,3)
bFi = 
   3.6621e-04

          DataTypeMode: Fixed-point: binary point scaling
            Signedness: Unsigned
            WordLength: 3
        FractionLength: 14
curIntegerLength = bFi.WordLength - bFi.FractionLength
curIntegerLength = -11

How can that be correct? Is that a bug?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

MathWorks Fixed Point Team on 7 Nov 2022

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/1845588-faq-why-is-a-fixed-point-type-s-fraction-length-or-integer-length-sometimes-negative#answer_1093498

Edited: MathWorks Fixed Point Team on 7 Nov 2022

Open in MATLAB Online

Maximizing precision for limited bits

The ability to have a negative fraction lengths or a negative integer lengths is a very good thing. For a given number of bits, it allows a fixed-point type to give maximum precision for very big numbers and for very small numbers respectively.

To see this benefit, look at the amount of error in the examples provided in the question.

a = 5.6632765314184e+15;
aFi = fi(a,1,4);
b = 0.00037;
bFi = fi(b,0,3);
error_a = double(aFi) - a;
relative_abs_error_a = abs(error_a) / abs(a)
relative_abs_error_a = 0.0060
error_b = double(bFi) - b;
relative_abs_error_b = abs(error_b) / abs(b)
relative_abs_error_b = 0.0102

A relative error of under 1% and just over 1% is "pretty darn good" for just 4 bits and 3 bits of word length, respectively.

Fraction Length and Integer Length are Intuitive when "scaling is small".

When thinking of fixed-point numbers using binary-point notation, fraction length and integer length are intuitive

        Type         Real World   Notation: Binary Point
                        Value       
 numerictype(0,3,0)       5     = 101.   
 numerictype(0,3,1)      2.5    =  10.1  
 numerictype(0,3,2)     1.25    =   1.01 
 numerictype(0,3,3)     0.625   =    .101

FALSE Belief that binary-point must be adjacent to the bits.

If you only look at examples of binary-point displays, it is natural to form a FALSE believe that the binary-point must be adjacent to the bits that make up the variables word-length. But that is not true.

Binary-point notation is too limiting. There is an extremely useful and well-known generalization that breaks the adjacent binary-point needless constraint.

Scientific Notation Removes Needless Limits

Decimal-point notation is intuitive, but limiting. As you know, power of 10 scientific notation allows you to represent very big and very small numbers with good accuracy using far fewer digits than decimal-point notation would require.

Likewise, binary-point notation can be generalized with power of 2 scientific-style notation. Same concept, same benefits, more accuracy with few symbols for very big and very small numbers.

A flavor of binary scientific notation that uses integer valued mantissas is shown here. Notice the integer mantissa's are multiplied by two raised to an exponent.

        Type         Real World   Notation: Integer Mantissa in Binary
                         Value           and Pow2 Exponent
 numerictype(0,3,-2)      20     = 101        * 2^2 
 numerictype(0,3,-1)      10     =  101       * 2^1 
  numerictype(0,3,0)       5     =   101      * 2^0 
  numerictype(0,3,1)      2.5    =    101     * 2^-1
  numerictype(0,3,2)     1.25    =     101    * 2^-2
  numerictype(0,3,3)     0.625   =      101   * 2^-3
  numerictype(0,3,4)    0.3125   =       101  * 2^-4
  numerictype(0,3,5)    0.15625  =        101 * 2^-5

The attributes of these eight types are

vecWordLengths =
     3     3     3     3     3     3     3     3
vecFractionLengths =
    -2    -1     0     1     2     3     4     5
vecIntegerLengths =
     5     4     3     2     1     0    -1    -2

Notice that the first two fraction lengths are negative and the last to integer lengths are negative.

But none of that is a problem. What really lives in memory of the microcontroller or FPGA or ASIC are the bits that make up the word length. The word length is the only thing that needs to be positive. Let's have some fun and call that the "Law of Bit Conservation".

If you embrace the binary scientific notation way of thinking about fixed-point types, then thinking about the types fixed exponent will become a more natural description.

vecFixedExponents =
     2     1     0    -1    -2    -3    -4    -5

Conceptual padding bits

If you were really keen to see the binary-point display even if the integer length or fraction length was negative, then what could you do? Just like starting with decimal scientific notation number and converting it to a traditional decimal-point display format, you could as needed jamb in some padding slots on the left end or the right end. For example, for 3e16, you could jamb in 16 zeros after the 3 to get a decimal-point display. Same concept applys in taking binary scientific notation to binary-point display.

For example, if fraction length was negative, then you need to jamb in -1 * FractionLength bits on the least significant end. These pad bits would always be zero.

Likewise, if integer length was negative, then you need to jamb in -1 * IntegerLength bits on the most significant end. If the value is unsigned, then these pad bits are always zero. If the value is two's complement and non-negative, then these pad bits are all zero. If the value is two's complement and negative, then these pad bits are all one. A simplified way to describe all these cases is to say that the pad bits are just a sign extension. For unsigned types, the conceptual sign bit is always implicilty zero.

These slides have shown how to reconcile the interconnected concept for word length, fraction length, and integer length when one of the latter two is negative. The math is all consistent. The "Law of Bit Conservation" is satisfied in all the cases.

In summary,

Negative fraction lengths are great for maximized accuracy of very large numbers.

Negative integer lengths are great for maximized accuracy of very small numbers.

Binary-point thinking is more intuitive, but for a limited range of scalings.

Scientific notation is more general and gives the power of maximum accuracy for a limited number of bits.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

FAQ: Why is a fixed-point type's Fraction Length or Integer Length sometimes negative?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

FAQ: Why is a fixed-point type's Fraction Length or Integer Length sometimes negative?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments