How to process a large binary file with set skipping patterns
1 view (last 30 days)
Show older comments
We need to process very large binary files (500 GB) to plot. However the file reading and skipping known patterns seems to take really long. Currently, it takes about 10 minutes to process 800 MB including multiple 265 MB files.
Following are the code being used currently, and I will appreciate with any suggestion to improve performance on this.
BTW, after test out smaller files, we do eventually try to read in a large block (at least 1 GB) by increasing the NsperBatch/Kavg parameters.
Filepath = 'xxxxx';
NsperBatch = 2048;
header_size = 123;
Kavg=1200; % Average every K set of values, K=1200 in this case
pattern_to_skip = int16([1024 0 2240 -24500]);
%
%%Header Information
load Header.mat
%
%%Read RF.bin file under the specified path
filename = 'rf.bin';
%
%%Run the data file to identify block to skip and swap types
pattern_to_skip = typecast( swapbytes(pattern_to_skip), 'uint8'); % Convert data types as defined storing sequence - big endian
PL = length(pattern_to_skip); % Length for the skip pattern
fid = fopen(filename,'r');
bytes = reshape( fread(fid, inf, '*uint8'), 1, []); % Original data row vector
fclose(fid);
%
orig_num_bytes = length(bytes);
skiplocs = strfind(bytes, pattern_to_skip); % Find the skip pattern and delete bypes in data file
for idx = fliplr(skiplocs)
bytes(idx:idx+PL-1) = []; % Delete bytes from the original data row vector
end
%
postskip_num_bytes = length(bytes);
fprintf('%d groups of [1024 0 2240 -24500] were skipped\n', (orig_num_bytes - postskip_num_bytes) / PL ); % Number of pattern skipped
%
fileID=fopen('post_rf.bin','w'); % Save the skip-pattern file
fwrite(fileID, bytes);
fclose(fileID);
%
Determine RF data size based on defined batches (NsperBatch=2048 in this case)
filename = 'post_rf.bin';
fid2 = fopen(filename,'r');
magSpectrumMAT_Avg=[];
ix = 0;
s = dir(filename);
while ~feof(fid2)
bytes_in = ftell(fid2);
fprintf('Processed %d bytes out of %d; %.2f%%\n', bytes_in, s.bytes, bytes_in / s.bytes * 100);
magSpectrum=0;
for k=1:Kavg
data=fread(fid2,NsperBatch*2,'int16');
dataIQ = complex(data(1:2:end,:), data(2:2:end,:));
num_dataIQ = length(dataIQ);
dataIQ_GnOff = (Gain * dataIQ) + (Offset+Offset*1i);
dataIQ_GnOff = dataIQ_GnOff * db2mag(Reference_L);
%
target_num_dataIQ = NsperBatch;
if num_dataIQ ~= target_num_dataIQ
fprintf('Note: complex data is not a multiple of %d samples long, padding\n', NsperBatch);
dataIQ_GnOff(target_num_dataIQ) = 0;
end
%
mag_dataIQ_GnOff = abs(fftshift( fft( reshape(dataIQ_GnOff, NsperBatch, []) ) ) / sqrt(NsperBatch)).^2;
magSpectrum = magSpectrum + mag_dataIQ_GnOff;
end
%
ix = ix + 1;
magSpectrum_avg = magSpectrum/Kavg;
magSpectrumMAT_Avg = [magSpectrumMAT_Avg magSpectrum_avg];
end
%
magSpectrumMAT_dB=pow2db(magSpectrumMAT_Avg); % Convert magnitude to dB after taking Average
%
save('rf_magSpectrumMAT_dB.mat','magSpectrumMAT_dB')
%
fclose all;
2 Comments
Walter Roberson
on 1 Mar 2018
Your code
for idx = fliplr(skiplocs)
bytes(idx:idx+PL-1) = []; % Delete bytes from the original data row vector
end
is less efficient than it needs to be. Instead use
bytes_to_delete = cell2mat( arrayfun(@(IDX) IDX:IDX+PL-1, fliplr(skiplocs), 'uniform', 0) );
bytes(bytes_to_delete) = [];
Answers (0)
See Also
Categories
Find more on Data Type Conversion in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!