how to mask specific bits in a signed fixed point number?

28 views (last 30 days)
I have been trying to emulate a simple multiplier with fixed point inputs and output. I would like to mask the last 4 bits of the inputs and test the output. I tried using bitand() funtion, but it only accepts integer values. What can i do in the case of fixed point decimal values?
for example:
a = -2.345 and b = 0.2755 (with 16-b fixed point quantization)
c = a * b; (output also quantized to 16 bits)
I want to mask the last 4 bits of a and b, observe the output c . What function should i use?

Accepted Answer

Andy Bartlett
Andy Bartlett on 11 Apr 2022
Edited: Andy Bartlett on 11 Apr 2022
Full-precision multiply
A full-precision multiplication of the 16 bit inputs can be done like so
format compact
format long
fp = fipref;
fp.NumericTypeDisplay = 'short';
% Set a and b using signed 16 bits
% assuming a and b are constants
% set best-precision scaling based on the values
isSigned = 1;
nBits = 16;
a = fi(-2.3451220703125,isSigned,nBits)
b = fi( 0.2763544921875,isSigned,nBits)
% Full precision multiply
yFullPrecProduct = a .* b
which outputs
a =
b =
yFullPrecProduct =
Notice that the output of 16 bits times 16 bits is 32 bits.
Reduced precision output from multiply
Reducing the size of a fixed-point multiplication's output begs a critical question of which bits to keep in the reduced precision output and which bits to discard.
Depending on the answer, there will also be a questions about how to handle overflows or rounding or both. If the full-precision output is signed, you may also need to decide if you want the reduced precision output to remain signed or change to unsigned.
Here is an example of keeping the most significant 16 bits
ntc = numerictype(c);
nBitsY1 = 16;
nPrecisionBitsToDrop = 16;
fmSatFloor = fimath('RoundingMethod', 'Floor', ...
'OverflowAction', 'Saturate');
nty1 = numerictype( ...
nBitsY1, ...
ntc.FractionLength - nPrecisionBitsToDrop);
y1 = fi(yFullPrecProduct,nty1,fmSatFloor);
y1 = removefimath(y1)
which outputs
y1 =
Here is an example keeping the least significant 16 bits
ntc = numerictype(c);
nBitsY2 = 16;
nPrecisionBitsToDrop2 = 0;
fmSatFloor = fimath('RoundingMethod', 'Floor', ...
'OverflowAction', 'Saturate');
fmWrapFloor = fimath('RoundingMethod', 'Floor', ...
'OverflowAction', 'Wrap');
nty2 = numerictype( ...
nBitsY2, ...
ntc.FractionLength - nPrecisionBitsToDrop2);
y2sat = fi(yFullPrecProduct,nty2,fmSatFloor);
y2sat = removefimath(y2sat)
y2wrap = fi(yFullPrecProduct,nty2,fmWrapFloor);
y2wrap = removefimath(y2wrap)
which outputs
y2sat =
y2wrap =
Notice that two different outputs were computed.
One that handles overflow by saturating. In this case, it saturated to the most negative representable value of the final output type.
The other that handles overflow by wrapping which means just throwing away the dropped most significant bits and always keeping the lower significant bits verbatim.
Masking bits
Masking bits to force certain bits to be zero and/or certain bits to be ones can be done in C, MATLAB, and Simulink using bit-wise AND and bit-wise OR. Functions or Simulink blocks for bit set and bit clear can also be used.
In MATLAB, the functions bitand and bitor are available. When using these with fixed-point fi objects, both arguments must have identical types, so that requires a little bit of care.
This function provides an example of using bitand to force the n least significant bits of the input to be zero.
function y = bitClearLSB(u,nBits)
%bitClearLSB clear the n least significant bits of input
% Usage:
% y = bitClearLSB(u,nBits)
% Inputs
% u is any fixed-point or integer variable
% nBits a non-negative integer value (defaults to 1)
% Copyright 2022 The MathWorks, Inc.
if nargin < 2
nBits = 1;
numel(nBits)==1 && isequal(size(nBits),size(u)),...
'nBits must be scalar or same size as u.')
all((nBits >= 0) & (nBits == floor(nBits)) & isfinite(nBits)),...
'nBits must be a non-negative integer value.')
% Built-in integers will be handled using equivalent fi object
u1 = castIntToFi(u);
assert(isfi(u1) && isfixed(u1), 'u must be integer or fixed-point.')
ntu1 = numerictype(u1);
% Create raw bit mask with all ones in bit positions to keep as is
% and all zeros in bit positions to clear
% Example for word length of 8 bits
% nBits rawBitMask
% 0 1111
% 1 1110
% 2 1100
% 3 1000
% 4 0000
wl = ntu1.WordLength;
ntRawBits = numerictype(0,wl,0);
rawBitMask = repmat( upperbound(ntRawBits), size(nBits) );
rawBitMask(:) = bitsll(rawBitMask,nBits);
% bitand for fi requires both types to be identical
% including fimath properties
% so reinterpret bitMask
% then set fimath
bitMask = reinterpretcast(rawBitMask,ntu1);
bitMask = setfimath(bitMask,fimath(u1));
y1 = bitand(u1,bitMask);
% if built-in integer cast back to that type
y = cast(y1,'like',y1);
Here is an example of applying that to a variable.
format compact
format long
fp = fipref;
fp.NumericTypeDisplay = 'short';
% Set a and b using signed 16 bits
% assuming a and b are constants
% set best-precision scaling based on the values
isSigned = 1;
nBits = 16;
b = fi( 0.2763544921875,isSigned,nBits)
% Clear 4 LSBs of b
nBitsClear = uint8(4);
b1 = bitClearLSB(b,nBitsClear)
which outputs
b =
b1 =
RoundingMethod: Nearest
OverflowAction: Saturate
ProductMode: FullPrecision
SumMode: FullPrecision
The generated C code for the bit clearing operation will be simple like the following
void myFunc(int16_T a, unsigned char nBitsClear, int16_T *y1)
int16_T tmp_bit_mask;
tmp_bit_mask = 65535 << nBitsClear;
*y2 = a & tmp_bit_mask;
Hopefully, this example gives you enough of an idea to craft whatever bit masking operation you are seeking.
Then combing that with the multiplication examples above should allow you to figure out a solution to your overall problem.
Consider casting
Since your high level goal involved multiplication, bit masking might not be the simplest way to achieve your goal. If your goal is to get rid of a certain number of most significant bits or least significant bits, you might want to consider using casting.
Consider the example given above of keeping the most significant 16 bits of variable (that happend to be a multiplication product). That dropped the least significant 16 bits of the input. Mathematically, that is equivalent keeping the output 32 bits but using masking such that the least significant 16 bits are all zeros.
Downcasting to 16 bits can be easier to think about and model than doing the bit masking. A big benefit is that subsequent operations can be more efficient. For example, bit masking then doing a 32 bit by 32 bit multiplication producing a 64 bit ideal product is less efficient than downcasting to 16 bits, then doing a 16 bit by 16 bit multiplication that produces a 32 bit ideal product.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!