Decision tree non-numerical data statistics toolbox

18 views (last 30 days)
Tania on 15 Jun 2014
Commented: Tania on 29 Jul 2014
Hi,
It says in the statistics toolbox documentation:Classification trees give responses that are nominal, such as 'true' or 'false'. Regression trees give numeric responses. I am trying to build a decision tree. I am working with numeric (output) and non-numeric data(inputs).I think the classification tree would be more appropriate than the regression tree, or (as the regression tree seems to work just with numeric data). Is it possible to use non-numeric data in order to predict numeric data?And if so, how could I do this with the help of the statistics toolbox?Would Classificationtree.fit be the right choice?
Thank you :)

Ilya on 16 Jun 2014
The type of tree you need is defined by the type of output. If your output is numeric ("numeric" here means that you can do greater and less comparisons and compute a meaningful distance between values), regression tree is the right choice.
For either type of tree, you need to convert your inputs to a numeric matrix. Then you can indicate what variables are non-numeric (categorical) using the 'CategoricalPredictors' parameter; if all your variables are categorical, set it to 'all'.
You can convert your non-numeric data to numeric in many ways. One way would be to use the categorical class in MATLAB on each variable in your data, for example:
>> colors = categorical({'g' 'r' 'b'; 'b' 'r' 'g'});
>> numeric_colors = double(colors);
Then use the new numeric variables as columns in the matrix you pass to the fit function.
Tania on 29 Jul 2014
For either type of tree, you need to convert your inputs to a numeric matrix. Then you can indicate what variables are non-numeric (categorical) using the 'CategoricalPredictors' parameter; if all your variables are categorical, set it to 'all'.
- What can I do if only one of my variables is categorical?
Thank you!