- Read Method: Implement the read method to read a specified number of frames from randomly shuffled positions in a binary file. Use "fseek" to move the file position pointer based on the shuffled frame order in the binary file.
- Shuffle Method: Implement a custom "shuffle" method that generates a random permutation of frame indices. This method should update the order in which frames are accessed during reading, without altering the binary file.
Shuffle method on custom datastore written for a single binary file
    11 views (last 30 days)
  
       Show older comments
    
I am writing a custom datastore and am seeking some assistance.  My datasets consist of stacks of 2D images (frames) stored sequentially in a single binary file.  While it's very straight forward to read in the binary stream using fread, each full dataset itself can easily be on the order of 50+ GB, making it infeasible to load everything at once on the hardware equipment I have available.  This was my original motivation for exploring the use of a datastore.
In addition to the need for managing out-of-memory data, I also would like to partition the data into chunks where each chunk contains a random collection of frames from this binary file.  If possible, I would like to use the shuffle method for the datastore superclass to accomplish this, as this seems to be the "proper" approach (although I'm very open to alternatives).
The problem I am currently having is that the default datastore shuffle method appears only to randomize the order of files in a datastore directory.  However, since I only have one (very large) binary file, it doesn't seem to "shuffle" anything at all - running readall on the shuffled datastore returns the exact same data as if I were to run it on the original datastore.  I would rather need it to "shuffle" the frames within the binary file.  Presumably, if I were to save each frame as an individual image file on disk, then I could get this to work using imageDatastore or fileDatastore.  However, then I would have to go through all my files and save them to disk again as individual files, which seems rather silly.
I have written code to load a chunk of the data manually by jumping around the file using fseek.  However, then I lose access to the datastore object as well as its built-in functionality.  So I thought I would throw this question out there to see if anyone could offer some help.  
0 Comments
Answers (1)
  Sanjana
      
 on 6 Oct 2024
        Hi, 
You can implement a custom datastore in MATLAB to shuffle frames within a single large binary file while maintaining the benefits of a datastore.
Custom Datastore class:Create a custom datastore class that extends the matlab.io.Datastore class. This class can be implemented to read and shuffle frames within a binary file.
Implementing Custom Read and Shuffle methods:
Example Custom Datastore class definition:
classdef CustomFrameDatastore < matlab.io.Datastore
    properties
        FileName
        FrameSize
        TotalFrames
        CurrentIndex
        FrameOrder
    end
    methods
        function ds = CustomFrameDatastore(fileName, frameSize, totalFrames)
            ds.FileName = fileName;
            ds.FrameSize = frameSize;
            ds.TotalFrames = totalFrames;
            ds.CurrentIndex = 1;
            ds.FrameOrder = randperm(totalFrames);
        end
        function data = read(ds)
            if ds.CurrentIndex > ds.TotalFrames
                error('No more data to read.');
            end
            fid = fopen(ds.FileName, 'rb');
            frameIndex = ds.FrameOrder(ds.CurrentIndex);
            fseek(fid, (frameIndex-1)*ds.FrameSize, 'bof');
            data = fread(fid, ds.FrameSize, 'uint8');
            fclose(fid);
            ds.CurrentIndex = ds.CurrentIndex + 1;
        end
        function reset(ds)
            ds.CurrentIndex = 1;
        end
        function tf =  hasdata(ds)
            tf = ds.CurrentIndex <= ds.TotalFrames;
        end
        function shuffle(ds)
            ds.FrameOrder = randperm(ds.TotalFrames);
            ds.CurrentIndex = 1;
        end
    end
end
Here is the example code to use the above custom datastore:
% Initialize datastore
frameSize = 1024 * 1024; % Example frame size
totalFrames = 50000; % Example total number of frames
ds = CustomFrameDatastore('largefile.bin', frameSize, totalFrames);
% Shuffle and read frames
ds.shuffle();
while hasdata(ds)
    frameData = ds.read();
    % Process frameData
end
I hope this helps!
0 Comments
See Also
Categories
				Find more on Low-Level File I/O in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
