How to split a dataset into training/validation images, assuming I have multiple subfolders ?

31 views (last 30 days)
M J on 18 Jul 2020
Answered: Anmol Dhiman on 22 Jul 2020
Hi everyone,
Assume I have three different groups (three animals) for a network I would like to train : Dogs, Cats and Cows.
For each class, I have, say, 10 images. Now the thing is that each of these images also has a subfolder containing multiple patches (cropped out of each image). So the path would look like this:
all.classes / dog.images / dog.image.1 /
I would like to randomly split the entire dataset into training/validation images, but instead of working at the patch level, I would like to do so at the image level. For instance : all patches of dog.image.1, dog.image.2 and dog.image.3 will be used for validation while the rest (patches of dog.image.4 to dog.image.10) will be used for training. In other words, I do not want to mix all patches of all 10 images in a single pool and randomly draw 70% for training and 30% for validation.
I usually do the following :
imds = imageDatastore ('all.classes', ...
'IncludeSubfolders', true, ...
'LabelSource', 'foldernames');
[imdsTrain, imdsValidation] = splitEachLabel (imds, 0.7, 'randomized')
If possible, how can I modify this code in order to divide my dataset at the image level instead?
Thank you very much!
Edit : The classes have different numbers of images each. n = 10 was used for simplification purposes only.

Accepted Answer

Anmol Dhiman
Anmol Dhiman on 22 Jul 2020
Hi M J,
In my opinion there is no direct way to do so. You can seperate both training and validation manually or programitacally ( link) and apply imageDatastore individually.
Anmol Dhiman

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!