Using fitcsvm for binary linear classification of unbalanced data

13 views (last 30 days)
rogueMedStudent7 on 14 Jun 2017
Answered: Ankita Nargundkar on 21 Jun 2017
Here is a simple example of the issue I'm running into:
tt=[1 8;2 7;3 6;4 5;5 4;6 3]; %the above 6 points are all on a line with slope -1
labels=[1 1 1 1 -1 1];
c=[0 1;2 0];
This spits out mod.Beta=[0 0] with a mod.Bias of 1. Therefore, the output of predict() for any point x is 1. That is, it ignores the minority class, which is a common problem for unbalanced classes. However, the cost matrix is supposed to fix that. I impose a cost that is twice as much for misclassifying the minority class, so it should draw a dividing line with a slope of 1 (and a Beta with slope -1) which correctly classifies the minority point and incorrectly classifies only a single majority point (rather than incorrectly classifying a single minority point as it is currently doing). I've tried switching the cost matrix to [0 2;1 0] to no avail. I also notice that the returned model has mod.Cost=[0 1;1 0]. It's as if it's completely ignoring my cost matrix input. What is going on here? Any help is greatly appreciated.

Answers (1)

Ankita Nargundkar
Ankita Nargundkar on 21 Jun 2017
>> c=[0 2.2;1 0];
>> mod=fitcsvm(tt,labels,'KernelFunction','linear','Cost',c);
>> mod.predict(tt)
ans =
Is this what you expect? One point to be noted is misclassified and minority point is classified correctly.
Documentation says
"For two-class learning, if you specify a cost matrix, then the software updates the prior probabilities by incorporating the penalties described in the cost matrix. Consequently, the cost matrix resets to the default."
That explains the strange behavior of mod.Cost being reset. Indeed you will notice that mod.Prior does change with different values of c.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!