How to store sequence of bits as a bit stream and use the least possible memory ?

15 views (last 30 days)
I am implementing a huffman encoding procedure which for example encodes the decimal number -5 as '100010'. Until now I have implemented some code which correctly produces the encoded sequence. The encoded sequence is stored as a string. However, I want the bit sequence to be stored as a bit stream and use only the neccesary amount of space for storage. For example, I want the encoded sequence for -5, which is '100010', to take 6 bits of memory. Is that possible ?
  1 Comment
Guillaume
Guillaume on 10 Jan 2020
Note that in all modern processors and certainly all that run matlab (x86/x64) the minimum unit of memory is 8 bits and everything is in multiple of 8.

Sign in to comment.

Answers (3)

Matt J
Matt J on 10 Jan 2020
Edited: Matt J on 10 Jan 2020
You could chain all your strings together as a logical column vector and use bwpack, see
Example:
>> A='10001010001010001011010100011000100010100010';
>> B=bwpack((A-'0').');
>> whos A B
Name Size Bytes Class Attributes
A 1x44 88 char
B 2x1 8 uint32

Stephane Dauvillier
Stephane Dauvillier on 10 Jan 2020
You can use the functions:
  1. dec2bin to tranform decimal number into binary,
  2. string to transform the output of the dec2bin from char to string
  3. regexprep to replace the first 0 by empty
  4. strjoin to join everything
For instance:
strjoin(regexprep(string(dec2bin([66,8,19,7])),"^0+","","once"),"")

Guillaume
Guillaume on 10 Jan 2020
A bit similar to Matt's suggestion, but may be slightly more memory efficient since bwpack uses multiples of 32-bit, is to:
  • split your stream into block of 8-bits
  • pad the last block to 8 bits if shorter than 8.
  • convert each block to uint8, the smallest unit of storage on x86/x64
eg.
bitstream = char(randi(double('01'), 1, 19)) %random stream of 19 0-1 characters
encoded = bitstream - '0'; %convert to binary vector. Uses 4 times as much memory as bitstream
encoded = [encoded, zeros(1, mod(-numel(bitstream), 8))]; %pad with zeros at the to multiple of 8
encoded = uint8(sum(reshape(encoded, 8, []) .* 2.^(7:-1:0)')) %convert each sequence of 8 bits into 8-bit integer. 1st bit is MSB
You can easily convert back to the (padded) bitstream:
padded_origstream = reshape(dec2bin(encoded, 8)', 1, [])

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!