low weighted cross entropy values

I am building a network for semantic segmentation, with a weighted cross entropy loss. It seems possible to add weights related to my 8 classes ( inverse-frequency and normalized weights for each class) with the crossentropy() function. My issue is that the loss values that are calculated during training seem to be lower than what i should expect (values are between 0 and 1 but I would have expected them to be between 2-3).
My class weights vector is
norm_weights =[ 0.0011 0.4426 0.0023 0.0037 0.0212 0.0022 0.0065 1.0000]
And this is how I implement my loss function:
lossFcn = @(Y,T) crossentropy(Y,T,norm_weights,WeightsFormat="UC",...
NormalizationFactor="all-elements",ClassificationMode="multilabel")*n_class_labels;
[netTrained2, info] = trainnet(augmented_ds,net2,lossFcn,options);
If anyone would have a clue about the issue, that would be helpful!

3 Comments

values are between 0 and 1 but I would have expected them to be between 2-3.
Why?
I am reproducing a network from a research paper. My network architecture & training options are the same. My data is also from the same database. In their loss graphs, the initial loss values during training are between 2 and 3 so I assumed that this should also be the case for my network. When I use the crossentropy function without weights, such as:
[netTrained1, info] = trainnet(augmented_ds,net1,'crossentropy',options);
I do get higher loss values than when I 'personnalize' my crossentropy loss function so that it has weights
I reproduced the methodology from this research article as closely as I could, including how they format their network input and such. I am questionning wether there is a problem with my loss function or not because the loss values that I obtain are actually very small. I said that they were between 0 and 1, and I should have specified that they actually currently gravitate around 0.026502. I know that the goal is for the loss to tend towards zero, but my network isn't trained (I reproduced a SegNet architecture), my training accuracy is around 20%, so the loss values seem very low to me.

Sign in to comment.

 Accepted Answer

Matt J
Matt J on 19 Aug 2025
There are a few possible reasons for the discrepancy that I can think of,
(1) Your norm_weights do not add up to 1
(2) You have selected the NormalizationFactor="all-elements" in crossentropy(). According to the doc, though, trainnet does not normalize with all elements. It ignores the channel dimensions
(3) Other hidden normalization factors that may be buried in the blackbox that is trainnet(). I don't know if it is possible or worthwhile trying to dig them out.

7 Comments

Ève
Ève on 19 Aug 2025
Edited: Ève on 19 Aug 2025
For (1), I didn't think it was necessary to make sure they add up to 1. It seems like in various examples (https://www.mathworks.com/help/vision/ug/semantic-segmentation-using-deep-learning.html ,
they don't, from my comprehension
For (2), I get what you are saying, but I think that's is if we only specify 'crossentropy' as the function in trainnet. The way I did it, with crossentropy(), there seem to be multiple normalization options, from my understanding. I tried to train it with no NormalizationFactor and my loss values are now in the hundreds. so that seems odd once again. I'm trying to familiarize myself with the equations (algorithms) provided by the doc.
I also removed ClassificationMode="multilabel" because even tho I thought semantic segmentation was multilabel classification, it seems like this input argument isn't specified in none of the semantic segmentation MATLAB examples I see.
Matt J
Matt J on 20 Aug 2025
Edited: Matt J on 20 Aug 2025
For (1), I didn't think it was necessary to make sure they add up to 1.
I don't say that it is necessarry, but it should affect the scale of the loss function.
For (2), I get what you are saying, but I think that's is if we only specify 'crossentropy' as the function in trainnet.
What do you mean "only"? What other usage of trainnet() are we comparing with?
Ève
Ève on 20 Aug 2025
Edited: Ève on 20 Aug 2025
That's a good point (for (1))! I'll check that. Thanks for clarifying, the weights are indeed a multiplying factor in the loss.
For (2), what I meant was that from my understanding, if you only specify your loss function this way,
[netTrained1, info] = trainnet(augmented_ds,net1,'crossentropy',options);
the normalization will be 'normalized by dividing by the number of non-channel elements of the network output', as the doc says. But if you rather implement it using additionnal options from the crossentropy function, such as
[netTrained2, info] = trainnet(augmented_ds,net2,lossFcn = @(Y,T) crossentropy(Y,T,norm_weights,WeightsFormat="UC",...
NormalizationFactor="all-elements"),options);
you have to choose between 4 NormalizationFactor (between batch-size, all-elements, mask included, none), which are other ways to normalize the loss. Am I getting this right?
you have to choose between 4 NormalizationFactor (between batch-size, all-elements, mask included, none), which are other ways to normalize the loss. Am I getting this right?
Yes, but that was my entire point in (2). You cannot expect agreement between trainnet and your personalized loss function, because their normalization strategies do not seem to coincide.
Matt J
Matt J on 20 Aug 2025
Edited: Matt J on 20 Aug 2025
If trainnet is successfully training your network with lossFcn='crossentropy', but with larger (by a factor of approximately K) loss values, then with your personalized loss function, you could try increasing your learning rates and decreasing your regularization weights by K. Or, just scale your custom lossFcn by K.
The point is, if the loss function computation only differs in each case by a global scale factor, it shouldn't impact training so much as long as the learning rates and regularization weights are kept in the same general ratio with the loss function values.
I understand, I'll try your suggestions. Thanks a lot for the feedback, I really appreciate it.
I'll accept your answer because as you suggested, my loss values are low simply because they reflect the scale of my weights, which are for the most part very small values. I may revise the way I calculate them. I'll also add for anyone reading this that I was wrong about the ClassificationMode in my lossFcn; for my type of classification problem, it should be set to "single-label" (default). I left the rest of the function the same.

Sign in to comment.

More Answers (0)

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Asked:

on 19 Aug 2025

Commented:

on 21 Aug 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!