How to find probability of classification in boosted tree (AdaboostM2)

Question

Sal on 30 Dec 2015

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/262160-how-to-find-probability-of-classification-in-boosted-tree-adaboostm2

Answered: Ruben Fernandez on 31 Jan 2019

Hello, I am using boosted tree for multi-class classification (which uses fitensemble with AdaboostM2, script generated by classification app). I am getting nearly 92% training accuracy with this settings while bagged tree is giving me nearly 82%. However, I need the probability of each class apart from the final class outcome. When I am using [class,score] = predict(...), what I am getting is NOT the probability (which is inline with the documentation), rather the averaged count among selected trees and therefore, they are not confined within [0,1]. To get the probability, I am currently summing up each row and divide each elements in the corresponding row with the summation. I understand this is not the correct way to get the probability, but I am out of idea here.

I need the probability as the competition requires me to submit probability only. Is there any way to get the class probability for the boosting method?

2 Comments
Show NoneHide None

Brendan Hamm on 30 Dec 2015

AdaboostM1 and AdaboostM2 do not provide the probabilities for each classification. These are not probabilistic methods, but rather use heuristics to guide the learners. This is documented in the Ensemble Methods section of the doc as well (might involve some scrolling/searching) as in the predict method, under the Definitions Section (although it does not mention adaboostM2 by name).

Sal on 30 Dec 2015

Hi Brendan,

Thanks. I know AdaboostM2 does not provide probabilities. But I was still interested to get some way to transform those scores to probabilities as I am getting almost 92% training accuracy with this method. The next best method from both MATLAB and R (using a variety of algorithms and tuning effort) is slightly less than 82%.

Sign in to comment.

Sign in to answer this question.

Answer 1

Ruben Fernandez on 31 Jan 2019

1
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/262160-how-to-find-probability-of-classification-in-boosted-tree-adaboostm2#answer_358919

Any solution for this problem?

Simply summing up each row and divide each elements in the corresponding row with the summation it's statiscally correct to transform scores from a boosted tree to probablities?

Thanks

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Ilya on 30 Dec 2015

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/262160-how-to-find-probability-of-classification-in-boosted-tree-adaboostm2#answer_204708

For AdaBoostM1 you can convert scores into probabilities by assigning string 'doublelogit' to the ScoreTransform property of the ensemble object. For AdaBoostM2 there is no simple transformation.

What you do may be good enough for the competition. If classification accuracy is used to determine the winner, you just need a monotone transformation to [0,1]. If they really want probabilities, isotonic regression has been explored in the literature for mapping scores onto probabilities. I can give you some pointers if you'd like, but the approach would be somewhat involved. First, you would need to compute scores for a dataset with known labels not used for training (using either an independent test set or cross-validation), then you would have to fit isotonic regression on that dataset to find probability for each score and then you would need to fit a non-parametric curve of your choice mapping those scores onto those probabilities.

2 Comments
Show NoneHide None

Sal on 30 Dec 2015

Hi Ilya, Thanks. Can you clarify a bit regarding the "monotonic transformation to [0,1]". I have 3 classes. Do you mean I should transform the output of AdaboostM2 to [0 1 0; 1 0 0; 0 0 1;...] for the submission? In that case, because the competition use multiclass loss, my submission will be heavily penalized for the wrong predictions. Is it possible to get the total vote count from the trees in learner as well as the number of trees (are these the same as learners??)? In that case, I would divide the total vote count for a class (for a particular observation/entry in the test set) by total number of trees. This might give a better estimate.

I should mention that class distribution is slightly imbalanced in my dataset (64.82%, 25.35%, and 9.84%). I have tried rusboost. But it is not performing better than adaboostM2. Beside, I don't know, yet, if rusboost support the probabilistic output.

I have tried random forest (without adaboostM2) in R as well. But so far, this MATLAB fitensemble with adaboostM2 was giving me the best solution. Looks like I am kind of out of luck here :(

Ilya on 31 Dec 2015

You said "To get the probability, I am currently summing up each row and divide each elements in the corresponding row with the summation." That is a monotone transformation to [0,1].

One weak learner is one tree. Look at the Trained property.

Sign in to comment.

How to find probability of classification in boosted tree (AdaboostM2)

2 Comments
Show NoneHide None

Answers (2)

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None

See Also

Categories

Tags

Products

Community Treasure Hunt

How to find probability of classification in boosted tree (AdaboostM2)

2 Comments Show NoneHide None

Answers (2)

0 Comments Show -2 older commentsHide -2 older comments

2 Comments Show NoneHide None

See Also

Categories

Tags

Products

Community Treasure Hunt

2 Comments
Show NoneHide None

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None