How to access training data in regression trees in TreeBagger
3 views (last 30 days)
I need to access training data (x) in each regression tree within an ensemble of trees created by TreeBagger.
I am using TreebBagger.Trees, which returns a cell with all the trees in the ensemble. The problem is that the trees are CompactRegressionTrees, which do not include the data for training the regression tree.
I am wondering how I can either make TreeBagger use RegressionTrees instead of CompactRegressionTrees when building the ensemble, or if there is any other way of accessing training data at leaf nodes of CompactRegressionTrees.
Ilya on 13 Aug 2015
Logical indices of observations used for each tree are stored in the OOBIndices property. This property wouldn't tell you though if an observation is sampled multiple times for the same tree.
If you need access to that info, your best shot is to introduce another property in the TreeBagger class to hold numeric indices of observations used for each tree. Take a look at line 1945 or so that should look like this:
idxtrain = weightedSample(s,w,fboot,sampleWithReplacement);
You just need to store the idxtrain array for each tree. I would add another output to the loopBody function and modify the call to loopBody accordingly.
I wouldn't recommend replacing compact trees with full trees. This is harder and would blow up memory consumption.