DataStore setup when using trainFasterRCNNObjectDetector for multiclass bbox detection problem
Show older comments
I have read through the documentation and wish to try to do a simple example where I detect and classify a few simple objects in images. There are three classes for my problem and there can be multiple instances of each class within a single image.
The example MATHWORKS supplies for trainFasterRCNNObjectDetector is of a single class and they have a table that they train on that looks like this.
imageFilename vehicle
____________________ ________________
{'vehicles/image_00001.jpg'} {[126 78 20 16]}
{'vehicles/image_00002.jpg'} {[100 72 35 26]}
The trainFasterRCNNObjectDetector documentation does say "You can train a Faster R-CNN detector to detect multiple object classes."
The documentation says that the trainingData input variable can be a datastore or a table. It also says " When the output contains three columns, the second column must contain the bounding boxes, and the third column must contain the labels. In this case, the first column can contain any type of data. For example, the first column can contain images or point cloud data."
I have a large amount of data so I am going to use a data store. I constructed a data store that uses a table that looks like this.
File_Location bboxs categories
___________________ __________________________ __________________
{'F:\COCO\val2017\val2017\000000397133.jpg'} { 2×4 double } { 2×1 categorical}
{'F:\COCO\val2017\val2017\000000252219.jpg'} { 3×4 double,} { 3×1 categorical}
{'F:\COCO\val2017\val2017\000000087038.jpg'} {14×4 double } {14×1 categorical}
{'F:\COCO\val2017\val2017\000000480985.jpg'} { 8×4 double } { 8×1 categorical}
So the table has 3 columns. First column is the file location, second column is a collection of bounding boxes, third column are the item labels associated with the bounding boxes. For example, looking at row one, the image 000000397133.jpg has two bounding boxes given by the 2x4 double entry and they are labeled by the { 2×1 categorical} entry. I believe this is a valid table based upon the documentation which says
"The first column must be images.
The second column must be M-by-4 matrices of bounding boxes of the form [x, y, width, height], where [x,y] represent the top-left coordinates of the bounding box.
The third column must be a cell array that contains M-by-1 categorical vectors containing object class names. All categorical data returned by the datastore must contain the same categories."
From my thinking the value M can change as we go from row to row due to their being different instances of the objects in each image. [However, that last sentence in the documentation is confusing to me, the one that says "All categorical data returned by the datastore must contain the same categories" What does that mean?
Assuming my table is OK, I create an AugmentedImageDataStore with the table using,
dataStore = augmentedImageDatastore(inputSize,dataTable,ColorPreprocessing="gray2rgb")
Here is where things go wrong. If I set
dataStore.MiniBatchSize = 1;
and do a read of the dataStore
cpy = copy(dataStore);
reset(cpy);
sampleData = read(cpy);
I get
input response
_________________ ____________
{224×224×3 uint8} {1×1×2 cell}
What has happened is we have the image as the input, and the response is the bbox and the labels packed into a {1x1x2 cell}. Why is it not returning three columns? trainFasterRCNNObjectDetector is very unhappy with this output from the datastore read. It is expecting to see three values returned as it looks for the labels to be in the third column. Deep in the bowels of checkGroundTruthDatastore.m there is a check
% Check whether data has enough columns for labels
ncols = size(sampleData,2);
if ~any(ncols == 3)
error( message('vision:ObjectDetector:readOutputNotCellTable'));
end
hasLabels = true;
Which thows an error when I try to use my dataStore in a call to trainFasterRCNNObjectDetector
Error using vision.internal.inputValidation.checkGroundTruthDatastore
The read method of the training input datastore must return an M-by-3
cell or table.
So I am confused why my datastore has squashed the labels and the bounding boxes together.
Another odd thing, likely related, is that if I increase dataStore.MiniBatchSize to something larger than 1, I get an error when doing a read from the dataStore. For example,
dataStore.MiniBatchSize = 8;
cpy = copy(dataStore);
reset(cpy);
sampleData = read(cpy);
Throws an error:
Error using table
All table variables must have the same number of rows.
Error in augmentedImageDatastore>datastoreDataToTable (line 778)
data = table(input,response);
Error in augmentedImageDatastore/read (line 317)
[data,info] = datastoreDataToTable(input,info);
This is a more fundamental error and clearly related to how I have constructed my table.
So my question is, what is it that I am misunderstanding and doing wrong? Thank you for the responses!
Michael
1 Comment
Michael Vrhel
on 27 May 2022
Answers (1)
Vidip
on 23 Jan 2024
0 votes
I understand that you are facing issues with the structure of datastore which is required for training ‘trainFasterRCNNObjectDetector’ for multiclass bounding box.
The error message you're receiving indicates that the ‘augmentedImageDatastore’ is not returning data in the format expected by trainFasterRCNNObjectDetector. Specifically, it expects the data to be in an M-by-3 format, where each row corresponds to an image, its associated bounding boxes, and its labels.
The sentence "All categorical data returned by the datastore must contain the same categories" means that the set of possible class labels (categories) should be consistent across all data. This does not mean that each image must have instances of all classes; rather, it means that the datastore should be aware of all potential classes that could be present in any image.
To resolve this issue, you may need to use a custom datastore that properly handles the data format for object detection. MATLAB provides ‘boxLabelDatastore’ to handle the bounding box and label data, which you can combine with an ‘imageDatastore’ using combine function to create a suitable datastore for ‘trainFasterRCNNObjectDetector’.
For further information, refer to the documentation link below:
Categories
Find more on Object Detection in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!