I am attempting to build a multi-layer convolutional neural network, with multiple conv layers (and pooling, dropout, activation layers in between). However, I am a bit confused about the sizes of the weights and the activations from each conv layer.
For simplicity, let's assume each conv layer consists of M filters of size m x m. I define each conv layer using convolution2dLayer([m,m],M,'Padding','Same').
The first layer takes in a single image and outputs M images (4D array with last dimension M). The first layer also has weights of dimension m x m x 1 x M. This is all what I would expect.
The subsequent layers are where I am getting confused. I expect the 2nd conv layer to take in M images, and apply M filters of size m x m (weight dimension m x m x 1 x M), resulting in an output with M^2 images, as we apply all M filters to each of the M inputs. Instead, the weights have dimensions m x m x M x M, and there are only M output images (according to the "activations" function).
The later conv layers are the same as the 2nd layer, where the weights are size m x m x M x M, and there are only M output images from each layer.
Am I missing something?