Deep Learning Datastores causing errors with size/length

2 views (last 30 days)

Please help me use datastores to train a neural network.
I have two cell arrays, both {259,1}. One holds sequences, one holds sequence responses. They are saved as A.mat and B.mat respectively. I want to put these arrays into a datastore and use them to train a network. When I run the cell arrays directly, without using a datastore, training the network does work. A has been padded so each cell is exactly 10x10000, and each cell of B is 1x10000.
I have tried the following things:
1)
AData = datastore('A.mat','type','file','ReadFcn',@load); %sequences
BData = datastore('B.mat','type','file','ReadFcn',@load); %responses
CData = combine(AData, BData); %combination
... %layers, options, hyperparameters, etc.
[test.net, test.info] = trainNetwork(CData, layers, options); %train network
Error:
Error using trainNetwork (line 184)
Invalid training data. Predictors must be a N-by-1 cell array of sequences, where N is the number of sequences. All sequences
must have the same feature dimension and at least one time step.

Error in DL_T3_ds (line 96)[test.net, test.info] = trainNetwork(CData, layers, options);

>> preview(CData) %for clarity

ans =

1×2 cell array

{1×1 struct} {1×1 struct}

2) C3_data.mat is a file that contains only A and B arrays.

sequenceData = datastore('C3_data.mat','type','file','ReadFcn',@load); %C3_data is A and B combined to one file
... %layers, options, and hyperparameters
[test.net, test.info] = trainNetwork(sequenceData, layers, options); %train network
Error:
Error using trainNetwork (line 184)
Invalid training data. For a network with 1 inputs and 1 output, the datastore read function must return a cell array with 2
columns, but it returns an cell array with 1 columns.

Error in DL_T3_ds (line 99)[test.net, test.info] = trainNetwork(sequenceData, layers, options);

3) Using a function to avoid load creating struct

function varargout = loadStructFromFile(fileName)
varargout = struct2cell(load(fileName));
end

AData = datastore('A.mat','type','file','ReadFcn',@loadStructFromFile);
BData = datastore('B.mat','type','file','ReadFcn',@loadStructFromFile);
CData = combine(AData, BData);

[test.net, test.info] = trainNetwork(CData, layers, options);

Error using trainNetwork (line 184)
Unexpected input size: The input layer expects sequences with the same sequence length and channel dimension 10.

Error in DL_T3_ds (line 99)
[test.net, test.info] = trainNetwork(CData, layers, options);


>> preview(CData) %for clarity

ans =

1×2 cell array

{259×1 cell} {259×1 cell}

I would appreciate any help.

Accepted Answer

Ben
Ben on 12 Jul 2022
I think the issue here is just wrangling the datastore-s to read out a BatchSize x 2 cell array, where each cell in the first column contains only the numeric input data, and each cell in the 2nd column contains the response data (numeric or categorical).
This gets confusing because the combine method for datastores is wrapping data in an additional cell.
Here's an example to show how to make this work with dummy data:
% Setup fake data - I'll use a sequence length of 100 rather than 10,000.
x = randn(10,100);
save('x1','x');
y = randn(1,100);
save('y1','y');
x = randn(10,100);
save('x2','x');
y = randn(1,100);
save('y2','y');
% Create datastores to read in data.
% You want to just get the data out of the struct that is loaded. I found it easiest to write a simple function:
getVarFromStruct = @(strct,varName) strct.(varName);
xds = fileDatastore("x*","ReadFcn",@(fname) getVarFromStruct(load(fname),"x"),"FileExtensions",".mat");
yds = fileDatastore("y*","ReadFcn",@(fname) getVarFromStruct(load(fname),"y"),"FileExtensions",".mat");
% Combine
cds = combine(xds,yds);
% Note that cds.read now returns a 1x2 cell array, and each cell contains numeric data (not another cell!).
% Dummy network training
layers = [sequenceInputLayer(10);lstmLayer(1);regressionLayer];
opts = trainingOptions("adam","MaxEpochs",1);
net = trainNetwork(cds,layers,opts);

More Answers (1)

Matthew Miller
Matthew Miller on 14 Jul 2022
You are exactly correct.
There was another layer of cells between the data and the datastore, and that was causing the error. The trainNetwork function was reading this second layer and throwing the error. I believe that's what you meant by 'additional cell.'
I saved each array as a single-variable file and used the filedatastore implementation you demonstrated. It worked perfectly. Thank you very much for your help.
MM

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!