What is 'categorical predictor' in decision tree for regression

5 views (last 30 days)
Hi, I'd like to use Matlab's own example for the question. Please refer to https://uk.mathworks.com/help/stats/fitrtree.html for the original example.
>> load carsmall
>> whos
Name Size Bytes Class Attributes
Acceleration 100x1 800 double
Cylinders 100x1 800 double
Displacement 100x1 800 double
Horsepower 100x1 800 double
MPG 100x1 800 double
Mfg 100x13 2600 char
Model 100x33 6600 char
Model_Year 100x1 800 double
Origin 100x7 1400 char
Weight 100x1 800 double
>> tree = fitrtree([Weight, Cylinders],MPG,...
'categoricalpredictors',2,'MinParentSize',20,...
'PredictorNames',{'W','C'})
tree =
RegressionTree
PredictorNames: {'W' 'C'}
ResponseName: 'Y'
CategoricalPredictors: 2
ResponseTransform: 'none'
NumObservations: 94
Properties, Methods
What exactly is the Categorical Predictors in this case and why it is 2?

Accepted Answer

Adam Danz
Adam Danz on 10 Jan 2019
Edited: Adam Danz on 10 Jan 2019
Matlab's fitrtree() function returns a regression tree object. Read more about this object and its properties here:
As you'll read in the link above, the "CategoricalPredictors" contains index values corresponding to the columns of the categorical predictor data (if none of the predictors are categorical, this will be empty []).
So, why is it CategoricalPredictors equal to 2?
Now read about the function you're using fitrtree()
One of the name-value pairs (<- link) is 'CategoricalPredictors' which, is specified in your call to fitrtree() as 2. That's because you have two predictors being treated as categorical variables, [Weight, Cylinders].
  3 Comments
Salad Box
Salad Box on 15 Jan 2019
Actually, as I read a few times, 'the "CategoricalPredictors" contains index values corresponding to the columns of the categorical predictor data', 'index value' (or the 'entry') means if index value is 1, that is the first column of the predictor data, in this case, it is 'Weight'; if 'index value' is 2, that is the second column of the predictor data, in this case, it is 'Cylinders'.
So
tree = fitrtree([Weight, Cylinders],MPG,...
'categoricalpredictors',2,'MinParentSize',20,...
'PredictorNames',{'W','C'})
indicates 'Cylinders' (the 2nd column in the predictor data) is a categorical predictor. In fact, there are only 4, 6, 8 cylinders, the number is not countinuous.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!