Creating multiple equally sized matrices from a single numerical cell

I have a very large text file composed of, in essence one row of numbers. Once I have reorganized the file into a matrix of, for example 500 x 10, I wish to create new matrices every 10 rows and have these save with their own title. A major problem I've experienced with my text file is that it's too big for Matlab, with an out of memory error appearing. This is why I need to separate each matrix into its own set of data. I have already turned a row of 1049600 numbers into a matrix of 1025 x 1024 but now the file is 50 of these sets in one file (1049600 x 50) and I need to create 50 1025 x 1024 matrices.
fid = fopen('test0001.asc');
Cell = textscan( fid, '%d', 'delimiter', ';');
Data = cell2mat(Cell);
N = 1024;
Finish = reshape(Data, N, [])';
The above is the code i had for the smaller files
I considered organizing the data into 51250 rows of 1024 and then creating a while ~ feof loop but this seems like it would require too much code and would thus be too slow. My thought was to have say:
F1 = Data(1:1025, :);
f2 = Data(1026:2051, :);
.....
Any thoughts at all would be much appreciated

 Accepted Answer

Firstly, the idea of generating lots of variables is popular with beginners, but really should be avoided:
Also note that the MATLAB documentation is really good. It is readable, and has articles on lots of topics. Such as this one, which gives a good, robust method for reading a large file into MATLAB:
The core idea of that code is to call textscan in a loop, use textscan's N option to specify how much data to read, and save the data into a cell array. The N option simply defines how many times the format is applied when reading the file.
You should be able to work it out from the examples in the documentation.
As an alternative you might like to read about Tall Arrays, which are a special kind of data type especially for working with very large data files that cannot be read into memory:
EDIT 2017-02-10: add code from comment:
%%Create Fake Datafile %%
% fid = fopen('temp2.txt','wt');
% for k = 1:50,
% fprintf(fid,'%d;',randi([0,255],1,1025*1024));
% end
% fclose(fid);
%%Read DataFile %%
R = 1025;
C = 1024;
opt = {'EndOfLine',';', 'CollectOutput',true};
fid = fopen('temp2.txt','rt');
k = 0;
while ~feof(fid)
Z = textscan(fid,'%d', R*C, opt{:});
if ~isempty(Z{1})
k = k+1;
S = sprintf('temp2_%02d.txt',k);
dlmwrite(S,reshape(Z{1},[],R).',';') % might need to translate
end
end
fclose(fid);

12 Comments

Thanks Stephen, Those links did help a lot and I'm fairly sure the while loop method will work. There's just one problem: I need to use textscan before the while loop to reformat the data from the original file
fid = fopen( 'TEST_A.asc');
Cell = textscan( fid, '%d', 'delimiter', ';');
B = cell2mat(Cell);
A = 1024;
Data = reshape( B, [], A)';
k = 0;
N = 50;
while feof(Data)
k = k + 1;
C = textscan( Data, '%d', N, 'delimiter', ';');
end
Do I need to save the Data as a file by using an output function before I do the while loop? Is there something other than textscan that I can use?
@Aaron Smith: I am totally confused. Your question states that a "major problem ... with my text file is that it's too big for Matlab". Now your second line reads the whole file into MATLAB:
Cell = textscan( fid, '%d', 'delimiter', ';')
Isn't this what we were trying to avoid? I thought the problem was that you could not do this without an error? If the file contains 1049600 values then this should not cause a memory error. How many values does the actual file contain? Note that in general you should keep data together as much as possible, and not split it up.
There are some major bugs in your code too, e.g. the second textscan and the use of feof on numeric data, and the missing fclose.
The task is this:
  1. in a loop...
  2. ...read a block of data from the file
  3. ...reshape data vector into a matrix
  4. ...save this matrix into new file (or do something else?)
In which case something like this should work:
Initially the large file did cause memory errors but that stopped occurring. The file contains 52480000 values, which I first need to reshape into 51250 rows of 1024 values. Then, and this is where i tried using the loop, to reshape the 51250 rows of 1024 into 50 matrices of 1025 x 1024. That's why I was wondering what i can use inside the while loop other than textscan that will allow me to separate the data into blocks.
@Aaron Smith: mat2cell will split a numeric matrix into blocks.
But I am still confused: what is the point of splitting the data up inside the while loop? If you simply read the right amount of data on each loop iteration then no splitting is required.
essentially the 50 matrices represent 50 separate images, 1025x1024 pixels. I need the program to recognize the distinction so after every 1025 rows a new block is created. I need to split the data up so that each block can be looked at individually
Try something like this:
%%Create Fake Datafile %%
% fid = fopen('temp2.txt','wt');
% for k = 1:50,
% fprintf(fid,'%d;',randi([0,255],1,1025*1024));
% end
% fclose(fid);
%%Read DataFile %%
R = 1025;
C = 1024;
opt = {'EndOfLine',';'};
fid = fopen('temp2.txt','rt');
k = 0;
while ~feof(fid)
Z = textscan(fid,'%d', R*C, opt{:});
if ~isempty(Z{1})
k = k+1;
S = sprintf('temp2_%02d.txt',k);
dlmwrite(S,reshape(Z{1},R,[]),';')
end
end
fclose(fid);
It reads the huge file (50*1025*1024 numbers with ; delimiter) in blocks of 1025*1024 numbers, then reshapes that into a matrix and saves each matrix in a new file.
Thanks Stephen. Just one question, The randi in the fprintf statement. I need to keep my data in the order they are in in the file (after every 1024th number a new row is formed and the position of each value is important as it denotes a certain pixel) but randi is used to generate new numbers. Is there something else i can use here?
Thanks for your help, and patience. I am very much s novice at this point and your help is much appreciated
@Aaron Smith: my code has two sections. The first section is entitled "Create Fake Datafile": does the word "Fake" give you any hints what that section does? Consider that I do not have your data file, so I simply created some fake data using random numbers, and saved this in a file so that I would have something to test my code on. So that is what the first section does: it creates a huge file with random numbers in it.
Also note that the first section is commented out, so that it does not run. I do not comment out code just for fun: it is to stop it from running. You do not need to use this code, I simply put it there for your interest (to know how I generated my huge fake data file).
I presume that you have your own huge data file. Use that.
The second section, entitled "Read DataFile" is the code for you to use.
I didn't notice that you had commented out your first section. Missed the % completely. Thanks :)
@Aaron Smith: I hope that it works for you. Please remember to accept the answer that helped you most. This is a simple way for you to show your appreciation to us (we are all volunteers).
I have the code working fairly well, I just had one thing I'm not too sure about, what does the opt = {'EndOfLine', ';'}; line in your code do? What is its purpose? Thanks again Stephen
@Aaron Smith: take a look at these two lines:
opt = {'EndOfLine',';'};
...
Z = textscan(fid,'%d', R*C, opt{:});
one defines the cell array opt, the other provides the elements of opt as inputs to textscan. So it is simply a convenient way to write the inputs without writing them all in one line like this:
Z = textscan(fid,'%d', R*C, 'EndOfLine',';');
For just two arguments it does not make much difference, but sometimes there can be quite a few arguments, and I find the cell array keeps things tidy. It is just a personal choice to do it like that, there is no deeper meaning. You can write the inputs on one line, if you wish to.

Sign in to comment.

More Answers (1)

Matlab, since R2014b, has had tools to allow reading in chunks files that are too big to fit in memory. Why not use these? See datastore and in your particular case tabulartextdatastore.
Since R2016b, that support has been made even easier, with the introduction of Tall arrays.

Categories

Products

Asked:

on 7 Feb 2017

Commented:

on 15 Feb 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!