SVM Fitcsvm() thresholds vs ROC curve thresholds

35 views (last 30 days)
Jaya
Jaya on 23 Oct 2024 at 13:52
Answered: the cyclist on 24 Oct 2024 at 14:03
I am using SVM for binary classification and the Mdl=fitcsvm() in MATLAB returns trained model Mdl containing info like Alpha, Bias, KernelParameters, etc. I test my testdata on this model over a range of threshold values to plot a ROC curve. But I have 2 questions in this whole process.
  1. The Mdl never tells what threshold was used? Like >threshold is +ve class; <threshold is -ve class. Or is there a default threshold it uses? Is it 0?
  2. In the ROC I'm plotting, let's say I decide on a threshold point that gives me satisfactory (TP,FP). But how do I use that? I mean the saved model Mdl does not surely use this. If default is 0 but I like 0.3 then how to make the Mdl know this when it is used in real application? Or should I explictly put in my code this 0.3? How? If it's impossible then what's the use of plotting roc curve in this context?

Answers (1)

the cyclist
the cyclist on 24 Oct 2024 at 14:03
fitcsvm() fits the best possible SVM to the data. That function itself does not make the predictions.
The resulting model object from
Mdl=fitcsvm()
has a predict method that is used to make the actual predictions:
[class,score] = predict(Mdl,X)
For an SVM, the score output indicates a likelihood that a label comes from a particular class. A positive value means it is likely to be from that class. MATLAB will assign the class output accordingly.
If you are unsatisfied with that, then I believe you will need to calculate the posterior probability yourself, from the score. There is a section on classification scores and posterior probability in that documentation I linked. It admittedly looks a bit involved, but I don't know of a simpler way.
I see that there is a fitSVMPosterior function that looks like it might be useful in what you are doing. I also suggest searching on "scoreTransform", which I see popping up around different functions. I have used some of this a long time ago, but memory fades a bit.
I know that in scikit-learn, some models have a input parameter where you can ask for the output to be the posterior probability instead of the score, but I don't know for sure if there is an equivalent in MATLAB.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!