DataStore setup when using trainFasterRCNNObjectDetector for multiclass bbox detection problem

Question

0 votes

I have read through the documentation and wish to try to do a simple example where I detect and classify a few simple objects in images. There are three classes for my problem and there can be multiple instances of each class within a single image.

The example MATHWORKS supplies for trainFasterRCNNObjectDetector is of a single class and they have a table that they train on that looks like this.

imageFilename vehicle

____________________ ________________

{'vehicles/image_00001.jpg'} {[126 78 20 16]}

{'vehicles/image_00002.jpg'} {[100 72 35 26]}

The trainFasterRCNNObjectDetector documentation does say "You can train a Faster R-CNN detector to detect multiple object classes."

The documentation says that the trainingData input variable can be a datastore or a table. It also says " When the output contains three columns, the second column must contain the bounding boxes, and the third column must contain the labels. In this case, the first column can contain any type of data. For example, the first column can contain images or point cloud data."

I have a large amount of data so I am going to use a data store. I constructed a data store that uses a table that looks like this.

File_Location bboxs categories

___________________ __________________________ __________________

{'F:\COCO\val2017\val2017\000000397133.jpg'} { 2×4 double } { 2×1 categorical}

{'F:\COCO\val2017\val2017\000000252219.jpg'} { 3×4 double,} { 3×1 categorical}

{'F:\COCO\val2017\val2017\000000087038.jpg'} {14×4 double } {14×1 categorical}

{'F:\COCO\val2017\val2017\000000480985.jpg'} { 8×4 double } { 8×1 categorical}

So the table has 3 columns. First column is the file location, second column is a collection of bounding boxes, third column are the item labels associated with the bounding boxes. For example, looking at row one, the image 000000397133.jpg has two bounding boxes given by the 2x4 double entry and they are labeled by the { 2×1 categorical} entry. I believe this is a valid table based upon the documentation which says

"The first column must be images.

The second column must be M-by-4 matrices of bounding boxes of the form [x, y, width, height], where [x,y] represent the top-left coordinates of the bounding box.

The third column must be a cell array that contains M-by-1 categorical vectors containing object class names. All categorical data returned by the datastore must contain the same categories."

From my thinking the value M can change as we go from row to row due to their being different instances of the objects in each image. [However, that last sentence in the documentation is confusing to me, the one that says "All categorical data returned by the datastore must contain the same categories" What does that mean?

Assuming my table is OK, I create an AugmentedImageDataStore with the table using,

dataStore = augmentedImageDatastore(inputSize,dataTable,ColorPreprocessing="gray2rgb")

Here is where things go wrong. If I set

dataStore.MiniBatchSize = 1;

and do a read of the dataStore

cpy = copy(dataStore);

reset(cpy);

sampleData = read(cpy);

I get

input response

_________________ ____________

{224×224×3 uint8} {1×1×2 cell}

What has happened is we have the image as the input, and the response is the bbox and the labels packed into a {1x1x2 cell}. Why is it not returning three columns? trainFasterRCNNObjectDetector is very unhappy with this output from the datastore read. It is expecting to see three values returned as it looks for the labels to be in the third column. Deep in the bowels of checkGroundTruthDatastore.m there is a check

% Check whether data has enough columns for labels

ncols = size(sampleData,2);

if ~any(ncols == 3)

error( message('vision:ObjectDetector:readOutputNotCellTable'));

end

hasLabels = true;

Which thows an error when I try to use my dataStore in a call to trainFasterRCNNObjectDetector

Error using vision.internal.inputValidation.checkGroundTruthDatastore

The read method of the training input datastore must return an M-by-3

cell or table.

So I am confused why my datastore has squashed the labels and the bounding boxes together.

Another odd thing, likely related, is that if I increase dataStore.MiniBatchSize to something larger than 1, I get an error when doing a read from the dataStore. For example,

dataStore.MiniBatchSize = 8;

cpy = copy(dataStore);

reset(cpy);

sampleData = read(cpy);

Throws an error:

Error using table

All table variables must have the same number of rows.

Error in augmentedImageDatastore>datastoreDataToTable (line 778)

data = table(input,response);

Error in augmentedImageDatastore/read (line 317)

[data,info] = datastoreDataToTable(input,info);

This is a more fundamental error and clearly related to how I have constructed my table.

So my question is, what is it that I am misunderstanding and doing wrong? Thank you for the responses!

Michael

1 Comment
Show -1 older comments Hide -1 older comments

Michael Vrhel on 27 May 2022

Open in MATLAB Online

% Sample code showing the issue with the data store read
% Create a simple table for data store as described for multi-class object
% classification documentation for trainFasterRCNNObjectDetector.  See
% the input description for the input argument 
% trainingData — Labeled ground truth datastore | table
% Per the documentation the data is
% "The first column must be images.
% The second column must be M-by-4 matrices of bounding boxes of the form 
% [x, y, width, height], where [x,y] represent the top-left coordinates 
% of the bounding box.
% The third column must be a cell array that contains M-by-1 categorical
% vectors containing object class names. All categorical data returned by 
% the datastore must contain the same categories."
% So let me construct such an object
clearvars
inputSize = [224,224,3];
% Use 6 images in the database
numImages = 6;
% Lets say I have 3 classes consisting of person, dog, and cat, and each
% class could occur multiple times on each image.  Just use one image
% that comes with MATLAB for the example.
peppers = char(fullfile(matlabroot,"toolbox","matlab","imagesci","peppers.png"));
% The bounding boxes and labels
images = cell(numImages, 1);
bboxes = cell(numImages, 1);
labels = cell(numImages, 1);
% First image has two objects
images{1} = peppers;
bboxes{1} = ...
  [388.6600  109.4100         0   62.1600
   69.9200  277.6200  262.8100   36.7700];
labels{1} = {"person"; "dog"};
% Second image has three objects
images{2} = peppers;
bboxes{2} = ...
  [326.2800  197.2500  121.9400  171.2700
  174.5600    9.7900  226.4500  123.6600
   71.2400  167.0600  510.4400  215.7600];
labels{2} = {"person"; "cat"; "dog"}; 
% Third image has 14 objects
images{3} = peppers;
bboxes{3} = ...
  [226.0400   28.2200  239.7200   17.1200
  229.3100   51.1200  225.3800   34.9700
   11.5900   98.4000   10.6400  204.1400
   30.4100  234.2800   33.0600  229.0200
  257.8500   19.5200  167.0200    7.3300
  224.4800   46.4600  234.0000   34.9600
   44.1300  326.8600   15.7800  195.3200
   97.0000  223.4600   37.4600  228.0600
   68.1800   13.1100  209.6800   10.6500
  238.1900   38.6700  231.0800   37.1800
   16.1800  345.4100    9.1500    1.0000
   42.8800  173.4100   34.5300  190.0000
   79.1600   72.9400  408.2900  638.0000
  232.2600  185.4100  231.2500  101.0000];
labels{3} ={"person";"person";"cat";"person";"person";"person";"dog";...
    "person";"person";"person";"person";"cat";"person";"person"};
% Fourth image has eight objects
images{4} = peppers;
bboxes{4} = ...
   [47.1900  320.1600  266.3700  290.0300
  296.1200  275.0500  293.1300  299.7900
   28.3000   27.0600   23.9700   15.2400
   33.1700  104.5300   88.9600   19.8700
   32.7500   10.0500  369.5000  302.2000
  298.9400  302.9600  278.5200  298.2200
   16.5200   13.7000    5.5000   12.7300
   29.2200   25.6900   45.6500   18.7300];
labels{4} ={"person";"person";"cat";"person";"person";"person";"dog";...
    "person"};
% Fifth image has 13 objects
images{5} = peppers;
bboxes{5} = ...
  [322.5700  270.5900   18.8100   17.7800
  290.8100  107.4700   29.0600  556.2800
   65.0900  129.5400  120.8600  309.3600
  127.6200  259.2700  271.1200   32.0900
  273.6400  281.0000   16.2800   46.1200
  292.1100   26.0200   25.4000  494.9300
   50.9500   35.1000  257.1400  276.5400
  129.6500  281.0600  281.3200   92.0800
    1.9200  276.4700   12.8000  126.0700
  266.8800   15.5100   42.0200  300.0000
  114.4500   41.3200  269.0000  280.0000
  155.7900  104.7300  274.4500   25.0000
  424.1200  267.5500    8.8900   54.0000];
labels{5} ={"person";"cat";"person";"person";"person";"dog";...
    "person";"person";"person";"person";"cat";"person";"person"};
% Sixth image has 1 object
images{6} = peppers;
bboxes{6} = ...
  [210.2700  143.2900  219.8200  276.1500];
labels{6} = {"cat"};
% Create the table
dataValTable = table(Size=[numImages 3], ...
    VariableTypes=["string" "cell" "categorical"], ...
    VariableNames=["data" "boxes" "labels"]);
dataValTable.data = images;
dataValTable.boxes = bboxes;
dataValTable.labels = labels;
% Create the data store from the table with a minibatch size of 2
valDataStore = augmentedImageDatastore(inputSize,dataValTable);
valDataStore.MiniBatchSize = 2;
% Perform a test read
cpy = copy(valDataStore);
reset(cpy);
sampleData = read(cpy);
% This read fails.  In datastoreDataToTable we see 
%
%function [data,info] = datastoreDataToTable(input,info)
%
%response = info.Response;
%info = rmfield(info,'Response');
%if isempty(response)
%    data = table(input);
%else
%    response = convert4DArrayToCell(response);
%    data = table(input,response);
%end
% What happens is that the input to data = table(input, response) has
% a 2x1 cell array for the 2 images for the minibatch.  The responses
% though are of size 1 x 1 x 2 x 2 cell array. That is not going to work
% for a table creation since the number of rows are different.  It is like
% the response data is transposed (permuted) from what it should be.  I
% have created it though in the manner described in the documentation. Any
% help would be appreciated

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Vidip on 23 Jan 2024

0 votes

I understand that you are facing issues with the structure of datastore which is required for training ‘trainFasterRCNNObjectDetector’ for multiclass bounding box.

The error message you're receiving indicates that the ‘augmentedImageDatastore’ is not returning data in the format expected by trainFasterRCNNObjectDetector. Specifically, it expects the data to be in an M-by-3 format, where each row corresponds to an image, its associated bounding boxes, and its labels.

The sentence "All categorical data returned by the datastore must contain the same categories" means that the set of possible class labels (categories) should be consistent across all data. This does not mean that each image must have instances of all classes; rather, it means that the datastore should be aware of all potential classes that could be present in any image.

To resolve this issue, you may need to use a custom datastore that properly handles the data format for object detection. MATLAB provides ‘boxLabelDatastore’ to handle the bounding box and label data, which you can combine with an ‘imageDatastore’ using combine function to create a suitable datastore for ‘trainFasterRCNNObjectDetector’.

For further information, refer to the documentation link below:

https://in.mathworks.com/help/vision/ref/boxlabeldatastore.html

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

DataStore setup when using trainFasterRCNNObjectDetector for multiclass bbox detection problem

1 Comment
Show -1 older comments Hide -1 older comments

Answers (1)

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

DataStore setup when using trainFaste​rRCNNObjec​tDetector for multiclass bbox detection problem

1 Comment Show -1 older comments Hide -1 older comments

Answers (1)

0 Comments Show -2 older comments Hide -2 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

DataStore setup when using trainFasterRCNNObjectDetector for multiclass bbox detection problem

1 Comment
Show -1 older comments Hide -1 older comments

0 Comments
Show -2 older comments Hide -2 older comments