In a multiclass classification problem using Random Forest/Tree Bagger. How would I determine the most important features for each particular class?

3 views (last 30 days)
Is there a quick and easy way to do this or will it require modification of the code?

Answers (1)

Ilya
Ilya on 10 Jul 2014
You would need to specify more precisely what you mean by "features important for each class". Features are important (or not) for separating classes from each other.
For example, you can recast your question as "what features are important for separating this class from all other classes?" Then you can solve this binary problem. That is, you label observations of this class as "positive" and observations of all other classes as "negative". Then you run TreeBagger to separate the two formed classes and get estimates of feature importance.
  3 Comments
Ilya
Ilya on 10 Jul 2014
I don't know what "Gini for each class" is. The Gini index is a measure of class separation defined for several (at least two) classes. You might have a clever idea how to modify that definition, but it's fair to say this is not mainstream practice.
In MATLAB, you have access to all trees through the Trees property of the TreeBagger object. Each tree exposes class probabilities in each node and the variable chosen for splitting this node. This should be enough for you to compute the gain in some criterion due to each decision split, provided you choose a criterion that can be expressed in terms of class probabilities before and after the split (that is, in the parent and two child nodes). You can then see how much each variable contributed to that gain.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!