In a custom deep learning training loop, can I use my own custom function for computing the gradients?

19 views (last 30 days)
Will
Will on 13 Jan 2022
Edited: Sahar Idrees on 17 Mar 2022
I would like to train a CNN using a custom training loop. However, I am wondering if there is a way that I can use my own gradient computation function instead of using the automatic differentiation provided by dlfeval(), modelGradients(), and dlgradient(), that are used with a dlnetwork(), which is used in custom training loops, versus an lgraph.
For example, in place of dlgradient(), can I use my own custom gradient function?
To expand, I currently have a custom MATLAB Fully Convolutional Network (for image-to-image regression), for which I use trainNetwork() for the training, which is typical. However, I would like to control certain convolution layer weights during a few iterations of the training, as in "set and hold them to certain values" for a few iterations. My understanding is that I would have to use a custom training loop for this. (Is this true?) However, in my current custom CNN, I also have a custom output layer which computes my loss and gradients, via custom forward and backward loss functions. e.g.,
loss = forwardLoss(layer, Y, T)
dLdY = backwardLoss(layer, Y, T)
I would like to maintain the functionality of my custom forwardLoss() and backwardLoss() functions since I have certain analytical and diagnostic capabilities in them. So again, the questions are...
  1. In a custom training loop, can I use my own custom gradient function? If so, are there unique conventions I must follow for the custom training loop?
  2. Also, if my goal is primarily just to dynamically set the weights of certain convolutional layers to certain values for a few iterations of training, then do I even need a custom loop, using dlnetwork() for this, or can I do this with a network trained using trainNetwork()? I have checked the various documentation but have not found an answer yet.
Any assistance would be appreciated.

Answers (2)

Katja Mogalle
Katja Mogalle on 21 Jan 2022
Re "However, I would like to control certain convolution layer weights during a few iterations of the training, as in "set and hold them to certain values" for a few iterations. My understanding is that I would have to use a custom training loop for this. (Is this true?)"
Yes, it is true. This is the perfect scenario for using custom training loops.
I might be understanding your approach incorrectly but I suspect you don't want a custom gradient computation but instead you want to use a custom loss function. A few bits and bobs about custom training loops:
Re "my goal is primarily just to dynamically set the weights of certain convolutional layers to certain values for a few iterations of training"
Indeed, these would need to be done with a custom training loop. You can access and edit the learnable parameters of a specific layer by using the Learnables property of dlnetwork.
I hope this helps. Please provide some more details if I misunderstood your request.
  1 Comment
Will
Will on 24 Jan 2022
Katja.
Thank you. I had checked the sites, and they do indeed describe custom training loops, dlnetworks, modelGradients, and so forth fairly well. I had also tried building a custom training loop using those exact links. However, ultimately I found that, as I understand, dlgradients() requires a "loss" that is computed using a loss function that involves components (i.e., internals) that support the "trace" requirements of "automatic differentiation" as supported by the dlfeval() function that calls the modelGradients() function. In my case, I have both a custom loss function and custom-written code for computing the gradients for the final layer of my network. Based on the documention, and my attempts, it appears that a custom loss function that does not use the required 'trace" functionality that is required for "automatic differentiation", does not work with dlfeval(); hence, it does not work in a custom training loop.
The Mathworks links below mention the "trace" requirement. Perhaps the requirement could be clarified on the main custom training loops and modelGradients() pages also.
https://www.mathworks.com/help/deeplearning/ref/dlfeval.html
https://www.mathworks.com/help/deeplearning/ug/deep-learning-with-automatic-differentiation-in-matlab.html
The next link, below, lists the Loss functions that can be used in a custom training loop. It does indeed state that " you can use a custom loss function". However, from what I ascertain from the above links and my attemps, is that the custom loss function has to meet the above-mentioned trace requirements. Hence, there there appear to be limitations to the degree to which a custome loss function can be used. (If not, please let me know).
Finally, I did have a telecomm with another Mathworks technical rep. who provided an alternate solution,that looks like it should work for what I am trying to do, and particularly, with use of the trainNetwork() function versus using a custom training loop. The proposed solution involves using the freezeWeights() to first conduct one round of training with the desired weights frozen. Then, to take the resulting, trained network and use a similar (I believe custom) unfreezeWeights() function to unfreeze the desired weights. Then, start a second round of training using the trained network (with desired weights unfrozen so that they can be updated).
I believe the technical rep. plans to also provide the specifics of the solution to this post. I will also report back on the outcome of using the approach.

Sign in to comment.


Reece Teramoto
Reece Teramoto on 27 Jan 2022
Hi Will,
Great speaking with you the other day. As we discussed, it would be good to post the solution here for others to use. I did speak with @Katja Mogalle and she had some additional info to add to this solution.
Here is a summary of our understanding of your workflow: In your neural network, you have some layers that you'd like to set to initial values and have their learnables remain frozen for around half the total epochs, while the other layers learn. Then, you'd like to unfreeze the weights for the remaining epochs of training.
Here is how you can approach this using trainNetwork without needing a custom training loop:
  1. Set the initial weights of the desired layers.
  2. Freeze the weights of the desired layers.
  3. Call trainNetwork for half the total epochs (or whatever the desired amount is).
  4. Unfreeze the weights.
  5. Retrain for the remainder of the epochs by calling trainNetwork again, using the unfrozen layers of the previously trained network.
Specifically, here are some references on what you can use for each step:
Set the initial weights of the desired layers.
Freeze the weights of the desired layers.
  • Freeze the weights using the "freezeWeights" helper function. This function ships with MATLAB but is not on the default path. It simply sets the "WeightLearnRateFactor" property of the desired layers to 0.
edit(fullfile(matlabroot,'examples','nnet','main','freezeWeights.m'))
  • Here is an example of using the function to freeze the first 5 layers of a network:
layers(1:5) = freezeWeights(layers(1:5));
Call trainNetwork for half the total epochs (or whatever the desired amount is).
  • Specify the epochs in the trainingOptions.
net = trainNetwork(data, layers, opts)
Unfreeze the weights.
  • I've attached a function that does this. It just sets the "WeightLearnRateFactor" of the desired layers to 1.
net.Layers(1:5) = unfreezeWeights(net.Layers(1:5));
Retrain for the remainder of the epochs by calling trainNetwork again, using the unfrozen layers of the previously trained network.
net = trainNetwork(data, net.Layers, opts)
Now, a small disclaimer about this proposed method that @Katja Mogalle mentioned when I spoke to her about this:
  • When calling trainNetwork for the second time, it’s not 100% the same as if we’d been able to continuously train and unfreeze some weights after a few epochs (e.g. via a custom training loop). The optimization algorithm (e.g. SGDM) has some parameters that would be reset on the second call to trainNetwork. Also be careful if you have some learning rate dropping scheme. This might not at all be a problem, but we just wanted to mention it in case you expected that training of the other weights (ones that weren’t frozen) would precisely continue where it left off after the first call to trainNetwork.
  1 Comment
Sahar Idrees
Sahar Idrees on 17 Mar 2022
Hi,
Thanks so much Katja and Reece for all this information. I have a slightly different problem. I need to have a custom regression layer with a custom forward an backward loss function and to calculate those loss functions I need to access some other variables/parameters from my data as well ( that are not a part of training data). I am able to make them the properties of my custom layer so that they can be used bythe forward and backward loss functions. I understand that my options are:
  1. either use trainNetwork with my custom layer, or
  2. create a dlnetwork and custom training loop and define my loss function and then let dlgradient automatically calculate differentiation.
I cannot use first option because that would require me to keep track of the indices of the minibatches and I dont know how to do that in trainNetwork. I however can track the indices by designing a custom training loop and taking option 2 above. However, in taking that option I need to design my own dlgradient function, because again I would need to access those additional parameters that are not a part of training data in the calculation of backward loss funtion (gradient).
So my questions are:
  1. Can I redefine my own dlgradient function?
  2. If I take the option of designing y own cutom regression layer with a custom forward and backward function, is there a way of keeping track of minibatch indices?
I ll really appreciate your guidance.
Cheers,
Sahar

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!