Clear Filters
Clear Filters

Appending to a very large file

10 views (last 30 days)
Stefan Oline
Stefan Oline on 5 Jan 2021
Commented: Stefan Oline on 8 Mar 2021
I'm having trouble writing very large files to disk. I'm appending 64 smaller files (each ~1 GB) into a sinlge giant matrix. I expect the file to be ~64 GB, and I'm running into an "Out of memory" problem during processing. I'm wondering if there's a more efficient way to do this without needing to load all of the smaller files into memory before writing one monster file to disk. Is there a way for me to load each one at a time and append that to the file, then clear memory and load the next?
Current code looks like this:
close all
% Make a for loop to import every channel
for i=1:64
fprintf('i = %f\n', i);
[Samples, Header] = Nlx2MatCSC(['CSC' num2str(i,'%02.f') '.ncs'],...
[0 0 0 0 1], 1, 1, [] );
%temp_1 = Samples';
temp_2 = reshape(Samples,[],1)';
if exist('signal_mat')
signal_mat = vertcat(signal_mat,temp_2);
signal_mat = temp_2;
clear Samples Header temp_2
clear i
% Demedian the data
fprintf('Demedian data');
signal_med = median(signal_mat);
signal_mat_demed = signal_mat - signal_med;
%% Write to file for KS2
fprintf('Write data');
fid = fopen('myNewFile.dat', 'w');
fwrite(fid,signal_mat, 'int16');
fid = fopen('myNewFile_demed.dat', 'w');
fwrite(fid,signal_mat_demed, 'int16');

Accepted Answer

Jan on 7 Jan 2021
This line increases the problem:
signal_mat = vertcat(signal_mat,temp_2);
In e.g. the last step, you concatenate a 63 GB array with a 1 GB array and copy it to a new 64 GB array. This requires 63+64 GB of RAM.
Pre-allocation would avoid this problem. In your case it could work with 64 + X GB RAM, where X might be 8 or 20. But even then this is a huge signal. How much RAM do you have?
Stefan Oline
Stefan Oline on 7 Jan 2021
Preallocating the matrix was a huge help, thanks for pointing that out. I've got 64GB of ram. For the very large files, it sounds like I'll still have to use datastore.
Stefan Oline
Stefan Oline on 8 Mar 2021
Hello, if I could ask a follow up question, I'm having trouble writing the .dat file since it's so large (~64GB). I attempted to follow the method here:
I'm having trouble doing two things.
  1. Denoising the 64 channels by finding the median across all 64 channels and subtracting that from each signal.
  2. Writing the output (a 64 x 407297536 matrix) to a .dat which will end up being ~64GB.
Is there an easy way to demedian the signals, and then to write them to disk as a giant .dat?
Thanks very much.
%% User inputs
channels = 1:64;
demed_flag = 1;
store_flag = 1;
% Choose a directory to store the files
outDir = 'H:\Falkner_lab\Ephys\2020.07.16_Mouse2357\2020.08.28\tall_eg';
writeDir = 'H:\Falkner_lab\Ephys\2020.07.16_Mouse2357\2020.08.28\tall_eg\write';
%% Setup
n_channels = length(channels);
% Check how many samplesa are in a single channel
[Sample_check, Header] = Nlx2MatCSC(['CSC' num2str(channels(1),'%02.f') '.ncs'],...
[0 0 0 0 1], 1, 1, [] );
temp_a = reshape(Sample_check,[],1)';
n_samples = length(temp_a);
clear Header Sample_check temp_a
%% Import .ncs data files from the channels list to individual .mat files
fprintf('*Importing data*\n')
for i=1:n_channels
disp(['Importing channel ' num2str(i) ' of ' num2str(n_channels) ' (' ...
num2str(i/n_channels*100,2) '%)'])
%fprintf('i = %.0f\n', i )
[Samples, Header] = Nlx2MatCSC(['CSC' num2str(channels(i),'%02.f') '.ncs'],...
[0 0 0 0 1], 1, 1, [] );
data = reshape(Samples,[],1)';
% Choose a file name - ensure these progress in order
fname = fullfile(outDir, sprintf('data_%05d.mat', channels(i)));
% Save the data and increment counters
save(fname, 'data', '-v7.3');
clear Samples Header data fname
clear i
fprintf('*Importing data complete*\n');
%% Create a datastore from the files
% Read the data back in as a tall array. First create a datastore ...
fprintf('*Creating a datastore*\n');
ds = fileDatastore(fullfile(outDir, '*.mat'), ...
'ReadFcn', @(fname) getfield(load(fname), 'data'), ...
'UniformRead', true);
fprintf('*Creating a datastore complete*\n');
% ... and then a tall array
fprintf('*Storing the datastore in a tall array*\n');
tdata = tall(ds);
fprintf('*Storing the datastore in a tall array complete*\n');
%% Demedian the signals
% ???
%% Write to file for KS2
if store_flag == 1
fprintf('*Writing data*\n');
fid = fopen('myNewFile_zeros.dat', 'w');
fwrite(fid,tdata, 'int16');
fprintf('*Writing data complete*\n');

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!