ClassificationTreeCoderConfigurer

Coder configurer of binary decision tree model for multiclass classification

Description

A ClassificationTreeCoderConfigurer object is a coder configurer of a binary decision tree model for multiclass classification (ClassificationTree or CompactClassificationTree).

A coder configurer offers convenient features to configure code generation options, generate C/C++ code, and update model parameters in the generated code.

  • Configure code generation options and specify the coder attributes of the tree model parameters by using object properties.

  • Generate C/C++ code for the predict and update functions of the classification tree model by using generateCode. Generating C/C++ code requires MATLAB® Coder™.

  • Update model parameters in the generated C/C++ code without having to regenerate the code. This feature reduces the effort required to regenerate, redeploy, and reverify C/C++ code when you retrain the tree model with new data or settings. Before updating model parameters, use validatedUpdateInputs to validate and extract the model parameters to update.

This flow chart shows the code generation workflow using a coder configurer.

For the code generation usage notes and limitations of a classification tree model, see the Code Generation sections of CompactClassificationTree, predict, and update.

Creation

After training a classification tree model by using fitctree, create a coder configurer for the model by using learnerCoderConfigurer. Use the properties of a coder configurer to specify the coder attributes of the predict and update arguments. Then, use generateCode to generate C/C++ code based on the specified coder attributes.

Properties

expand all

predict Arguments

The properties listed in this section specify the coder attributes of the predict function arguments in the generated code.

Coder attributes of the predictor data to pass to the generated C/C++ code for the predict function of the classification tree model, specified as a LearnerCoderInput object.

When you create a coder configurer by using the learnerCoderConfigurer function, the input argument X determines the default values of the LearnerCoderInput coder attributes:

  • SizeVector — The default value is the array size of the input X.

  • VariableDimensions — This value is [0 0](default) or [1 0].

    • [0 0] indicates that the array size is fixed as specified in SizeVector.

    • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns.

  • DataType — This value is single or double. The default data type depends on the data type of the input X.

  • Tunability — This value must be true, meaning that predict in the generated C/C++ code always includes predictor data as an input.

You can modify the coder attributes by using dot notation. For example, to generate C/C++ code that accepts predictor data with 100 observations of three predictor variables, specify these coder attributes of X for the coder configurer configurer:

configurer.X.SizeVector = [100 3];
configurer.X.DataType = 'double';
configurer.X.VariableDimensions = [0 0];
[0 0] indicates that the first and second dimensions of X (number of observations and number of predictor variables, respectively) have fixed sizes.

To allow the generated C/C++ code to accept predictor data with up to 100 observations, specify these coder attributes of X:

configurer.X.SizeVector = [100 3];
configurer.X.DataType = 'double';
configurer.X.VariableDimensions = [1 0];
[1 0] indicates that the first dimension of X (number of observations) has a variable size and the second dimension of X (number of predictor variables) has a fixed size. The specified number of observations, 100 in this example, becomes the maximum allowed number of observations in the generated C/C++ code. To allow any number of observations, specify the bound as Inf.

Number of output arguments to return from the generated C/C++ code for the predict function of the classification tree model, specified as 1, 2, 3, or 4.

The output arguments of predict are label (predicted class labels), score (posterior probabilities), node (node numbers for predicted classes), and cnum (class numbers of predicted labels), in that order. predict in the generated C/C++ code returns the first n outputs of the predict function, where n is the NumOutputs value.

After creating the coder configurer configurer, you can specify the number of outputs by using dot notation.

configurer.NumOutputs = 2;

The NumOutputs property is equivalent to the '-nargout' compiler option of codegen. This option specifies the number of output arguments in the entry-point function of code generation. The object function generateCode generates two entry-point functions—predict.m and update.m for the predict and update functions of a classification tree model, respectively—and generates C/C++ code for the two entry-point functions. The specified value for the NumOutputs property corresponds to the number of output arguments in the entry-point function predict.m.

Data Types: double

update Arguments

The properties listed in this section specify the coder attributes of the update function arguments in the generated code. The update function takes a trained model and new model parameters as input arguments, and returns an updated version of the model that contains the new parameters. To enable updating the parameters in the generated code, you need to specify the coder attributes of the parameters before generating code. Use a LearnerCoderInput object to specify the coder attributes of each parameter. The default attribute values are based on the model parameters in the input argument Mdl of learnerCoderConfigurer.

Coder attributes of the child nodes for each node in the tree (Children of a classification tree model), specified as a LearnerCoderInput object.

The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer:

  • SizeVector — The default value is [nd 2], where nd is the number of nodes in Mdl.

  • VariableDimensions — This value is [0 0](default) or [1 0].

    • [0 0] indicates that the array size is fixed as specified in SizeVector.

    • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns.

  • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl.

  • Tunability — This value must be true.

If you modify the first dimension of SizeVector to be newnd, then the software modifies the first dimension of the SizeVector attribute to be newnd for the properties ClassProbability, CutPoint, and CutPredictorIndex. Similarly, if you modify the first dimension of VariableDimensions to be 1, then the software modifies the first dimension of the VariableDimensions attribute to be 1 for these properties.

Coder attributes of the class probabilities for each node in the tree (ClassProbability of a classification tree model), specified as a LearnerCoderInput object.

The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer:

  • SizeVector — The default value is [nd c], where nd is the number of nodes in Mdl and c is the number of classes.

  • VariableDimensions — This value is [0 0](default) or [1 0].

    • [0 0] indicates that the array size is fixed as specified in SizeVector.

    • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns.

  • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl.

  • Tunability — This value must be true.

If you modify the first dimension of SizeVector to be newnd, then the software modifies the first dimension of the SizeVector attribute to be newnd for the properties Children, CutPoint, and CutPredictorIndex. Similarly, if you modify the first dimension of VariableDimensions to be 1, then the software modifies the first dimension of the VariableDimensions attribute to be 1 for these properties.

Coder attributes of the misclassification cost (Cost of a classification tree model), specified as a LearnerCoderInput object.

The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer:

  • SizeVector — This value must be [c c], where c is the number of classes.

  • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector.

  • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl.

  • Tunability — The default value is true.

Coder attributes of the cut point for each node in the tree (CutPoint of a classification tree model), specified as a LearnerCoderInput object.

The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer:

  • SizeVector — The default value is [nd 1], where nd is the number of nodes in Mdl.

  • VariableDimensions — This value is [0 0](default) or [1 0].

    • [0 0] indicates that the array size is fixed as specified in SizeVector.

    • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns.

  • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl.

  • Tunability — This value must be true.

If you modify the first dimension of SizeVector to be newnd, then the software modifies the first dimension of the SizeVector attribute to be newnd for the properties Children, ClassProbability, and CutPredictorIndex. Similarly, if you modify the first dimension of VariableDimensions to be 1, then the software modifies the first dimension of the VariableDimensions attribute to be 1 for these properties.

Coder attributes of the cut predictor index for each node in the tree (CutPredictorIndex of a classification tree model), specified as a LearnerCoderInput object.

The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer:

  • SizeVector — The default value is [nd 1], where nd is the number of nodes in Mdl.

  • VariableDimensions — This value is [0 0](default) or [1 0].

    • [0 0] indicates that the array size is fixed as specified in SizeVector.

    • [1 0] indicates that the array has variable-size rows and fixed-size columns. In this case, the first value of SizeVector is the upper bound for the number of rows, and the second value of SizeVector is the number of columns.

  • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl.

  • Tunability — This value must be true.

If you modify the first dimension of SizeVector to be newnd, then the software modifies the first dimension of the SizeVector attribute to be newnd for the properties Children, ClassProbability, and CutPoint. Similarly, if you modify the first dimension of VariableDimensions to be 1, then the software modifies the first dimension of the VariableDimensions attribute to be 1 for these properties.

Coder attributes of the prior probabilities (Prior of a classification tree model), specified as a LearnerCoderInput object.

The default attribute values of the LearnerCoderInput object are based on the input argument Mdl of learnerCoderConfigurer:

  • SizeVector — This value must be [1 c], where c is the number of classes.

  • VariableDimensions — This value must be [0 0], indicating that the array size is fixed as specified in SizeVector.

  • DataType — This value is 'single' or 'double'. The default data type is consistent with the data type of the training data you use to train Mdl.

  • Tunability — The default value is true.

Other Configurer Options

File name of the generated C/C++ code, specified as a character vector.

The object function generateCode of ClassificationTreeCoderConfigurer generates C/C++ code using this file name.

The file name must not contain spaces because they can lead to code generation failures in certain operating system configurations. Also, the name must be a valid MATLAB function name.

After creating the coder configurer configurer, you can specify the file name by using dot notation.

configurer.OutputFileName = 'myModel';

Data Types: char

Verbosity level, specified as true (logical 1) or false (logical 0). The verbosity level controls the display of notification messages at the command line.

ValueDescription
true (logical 1)The software displays notification messages when your changes to the coder attributes of a parameter result in changes for other dependent parameters.
false (logical 0)The software does not display notification messages.

To enable updating machine learning model parameters in the generated code, you need to configure the coder attributes of the parameters before generating code. The coder attributes of parameters are dependent on each other, so the software stores the dependencies as configuration constraints. If you modify the coder attributes of a parameter by using a coder configurer, and the modification requires subsequent changes to other dependent parameters to satisfy configuration constraints, then the software changes the coder attributes of the dependent parameters. The verbosity level determines whether or not the software displays notification messages for these subsequent changes.

After creating the coder configurer configurer, you can modify the verbosity level by using dot notation.

configurer.Verbose = false;

Data Types: logical

Options for Code Generation Customization

To customize the code generation workflow, use the generateFiles function and the following three properties with codegen, instead of using the generateCode function.

After generating the two entry-point function files (predict.m and update.m) by using the generateFiles function, you can modify these files according to your code generation workflow. For example, you can modify the predict.m file to include data preprocessing, or you can add these entry-point functions to another code generation project. Then, you can generate C/C++ code by using the codegen function and the codegen arguments appropriate for the modified entry-point functions or code generation project. Use the three properties described in this section as a starting point to set the codegen arguments.

This property is read-only.

codegen arguments, specified as a cell array.

This property enables you to customize the code generation workflow. Use the generateCode function if you do not need to customize your workflow.

Instead of using generateCode with the coder configurer configurer, you can generate C/C++ code as follows:

generateFiles(configurer)
cgArgs = configurer.CodeGenerationArguments;
codegen(cgArgs{:})
If you customize the code generation workflow, modify cgArgs accordingly before calling codegen.

If you modify other properties of configurer, the software updates the CodeGenerationArguments property accordingly.

Data Types: cell

This property is read-only.

Input argument of the entry-point function predict.m for code generation, specified as a cell array of a coder.PrimitiveType object. The coder.PrimitiveType object includes the coder attributes of the predictor data stored in the X property.

If you modify the coder attributes of the predictor data, then the software updates the coder.PrimitiveType object accordingly.

The coder.PrimitiveType object in PredictInputs is equivalent to configurer.CodeGenerationArguments{6} for the coder configurer configurer.

Data Types: cell

This property is read-only.

List of the tunable input arguments of the entry-point function update.m for code generation, specified as a cell array of a structure including coder.PrimitiveType objects. Each coder.PrimitiveType object includes the coder attributes of a tunable machine learning model parameter.

If you modify the coder attributes of a model parameter by using the coder configurer properties (update Arguments properties), then the software updates the corresponding coder.PrimitiveType object accordingly. If you specify the Tunability attribute of a machine learning model parameter as false, then the software removes the corresponding coder.PrimitiveType object from the UpdateInputs list.

The structure in UpdateInputs is equivalent to configurer.CodeGenerationArguments{3} for the coder configurer configurer.

Data Types: cell

Object Functions

generateCodeGenerate C/C++ code using coder configurer
generateFilesGenerate MATLAB files for code generation using coder configurer
validatedUpdateInputsValidate and extract machine learning model parameters to update

Examples

collapse all

Train a machine learning model, and then generate code for the predict and update functions of the model by using a coder configurer.

Load the fisheriris data set, which contains flower data, and train a decision tree model.

load fisheriris
X = meas;
Y = species;
Mdl = fitctree(X,Y);

Mdl is a ClassificationTree object.

Create a coder configurer for the ClassificationTree model by using learnerCoderConfigurer. Specify the predictor data X. The learnerCoderConfigurer function uses the input X to configure the coder attributes of the predict function input.

configurer = learnerCoderConfigurer(Mdl,X)
configurer = 
  ClassificationTreeCoderConfigurer with properties:

   Update Inputs:
             Children: [1×1 LearnerCoderInput]
     ClassProbability: [1×1 LearnerCoderInput]
             CutPoint: [1×1 LearnerCoderInput]
    CutPredictorIndex: [1×1 LearnerCoderInput]
                Prior: [1×1 LearnerCoderInput]
                 Cost: [1×1 LearnerCoderInput]

   Predict Inputs:
                    X: [1×1 LearnerCoderInput]

   Code Generation Parameters:
           NumOutputs: 1
       OutputFileName: 'ClassificationTreeModel'


  Properties, Methods

configurer is a ClassificationTreeCoderConfigurer object, which is a coder configurer of a ClassificationTree object.

To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see Change Default Compiler (MATLAB).

Generate code for the predict and update functions of the classification tree model (Mdl) with default settings.

generateCode(configurer)
generateCode creates these files in output folder:
'initialize.m', 'predict.m', 'update.m', 'ClassificationTreeModel.mat'

The generateCode function completes these actions:

  • Generate the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively.

  • Create a MEX function named ClassificationTreeModel for the two entry-point functions.

  • Create the code for the MEX function in the codegen\mex\ClassificationTreeModel folder.

  • Copy the MEX function to the current folder.

Display the contents of the predict.m, update.m, and initialize.m files by using the type function.

type predict.m
function varargout = predict(X,varargin) %#codegen
% Autogenerated by MATLAB, 15-May-2019 14:39:43
[varargout{1:nargout}] = initialize('predict',X,varargin{:});
end
type update.m
function update(varargin) %#codegen
% Autogenerated by MATLAB, 15-May-2019 14:39:43
initialize('update',varargin{:});
end
type initialize.m
function [varargout] = initialize(command,varargin) %#codegen
% Autogenerated by MATLAB, 15-May-2019 14:39:43
coder.inline('always')
persistent model
if isempty(model)
    model = loadLearnerForCoder('ClassificationTreeModel.mat');
end
switch(command)
    case 'update'
        % Update struct fields: Children
        %                       ClassProbability
        %                       CutPoint
        %                       CutPredictorIndex
        %                       Prior
        %                       Cost
        model = update(model,varargin{:});
    case 'predict'
        % Predict Inputs: X
        X = varargin{1};
        if nargin == 2
            [varargout{1:nargout}] = predict(model,X);
        else
            PVPairs = cell(1,nargin-2);
            for i = 1:nargin-2
                PVPairs{1,i} = varargin{i+1};
            end
            [varargout{1:nargout}] = predict(model,X,PVPairs{:});
        end
end
end

Train a decision tree for multiclass classification using a partial data set and create a coder configurer for the model. Use the properties of the coder configurer to specify coder attributes of the model parameters. Use the object function of the coder configurer to generate C code that predicts labels for new predictor data. Then retrain the model using the entire data set, and update parameters in the generated code without regenerating the code.

Train Model

Load the fisheriris data set, which contains flower data. This data set has four predictors: the sepal length, sepal width, petal length, and petal width of the flowers. The response variable contains the flower species names: setosa, versicolor, and virginica. Train a classification tree model using half of the observations.

load fisheriris
X = meas;
Y = species;

rng('default') % For reproducibility
n = length(Y);
c = cvpartition(Y,'HoldOut',0.5);
idxTrain = training(c,1);
XTrain = X(idxTrain,:);
YTrain = Y(idxTrain);

Mdl = fitctree(XTrain,YTrain);

Mdl is a ClassificationTree object.

Create Coder Configurer

Create a coder configurer for the ClassificationTree model by using learnerCoderConfigurer. Specify the predictor data. The learnerCoderConfigurer function uses the input XTrain to configure the coder attributes of the predict function input. Also, set the number of outputs to 4 so that the generated code returns predicted labels, scores, node numbers, and class numbers.

configurer = learnerCoderConfigurer(Mdl,XTrain,'NumOutputs',4);

configurer is a ClassificationTreeCoderConfigurer object, which is a coder configurer of a ClassificationTree object.

Specify Coder Attributes of Parameters

Specify the coder attributes of the classification tree model parameters so that you can update the parameters in the generated code after retraining the model.

First, specify the coder attributes of the X property of configurer so that the generated code accepts any number of observations. Modify the SizeVector and VariableDimensions attributes. The SizeVector attribute specifies the upper bound of the predictor data size, and the VariableDimensions attribute specifies whether each dimension of the predictor data has a variable size or fixed size.

configurer.X.SizeVector = [Inf 4];
configurer.X.VariableDimensions
ans = 1×2 logical array

   1   0

The size of the first dimension is the number of observations. Setting the value of the SizeVector attribute to Inf causes the software to change the value of the VariableDimensions attribute to 1. In other words, the upper bound of the size is Inf and the size is variable, meaning that the predictor data can have any number of observations. This specification is convenient if you do not know the number of observations when generating code.

The size of the second dimension is the number of predictor variables. This value must be fixed for a machine learning model. Because the predictor data contains 4 predictors, the value of the SizeVector attribute must be 4 and the value of the VariableDimensions attribute must be 0.

If you retrain the tree model using new data or different settings, the number of nodes in the tree can vary. Therefore, specify the first dimension of the SizeVector attribute of one of these properties so that you can update the number of nodes in the generated code: Children, ClassProbability, CutPoint, or CutPredictorIndex. The software then modifies the other properties automatically.

For example, set the first value of the SizeVector attribute of the CutPoint property to Inf. The software modifies the SizeVector and VariableDimensions attributes of Children, ClassProbability, and CutPredictorIndex to match the new upper bound on the number of nodes in the tree. Additionally, the first value of the VariableDimensions attribute of CutPoint changes to 1.

configurer.CutPoint.SizeVector = [Inf 1];
SizeVector attribute for Children has been modified to satisfy configuration constraints.
SizeVector attribute for CutPredictorIndex has been modified to satisfy configuration constraints.
VariableDimensions attribute for Children has been modified to satisfy configuration constraints.
VariableDimensions attribute for CutPredictorIndex has been modified to satisfy configuration constraints.
SizeVector attribute for ClassProbability has been modified to satisfy configuration constraints.
VariableDimensions attribute for ClassProbability has been modified to satisfy configuration constraints.
configurer.CutPoint.VariableDimensions
ans = 1×2 logical array

   1   0

Generate Code

To generate C/C++ code, you must have access to a C/C++ compiler that is configured properly. MATLAB Coder locates and uses a supported, installed compiler. You can use mex -setup to view and change the default compiler. For more details, see Change Default Compiler (MATLAB).

Generate code for the predict and update functions of the classification tree model (Mdl).

generateCode(configurer)
generateCode creates these files in output folder:
'initialize.m', 'predict.m', 'update.m', 'ClassificationTreeModel.mat'

The generateCode function completes these actions:

  • Generate the MATLAB files required to generate code, including the two entry-point functions predict.m and update.m for the predict and update functions of Mdl, respectively.

  • Create a MEX function named ClassificationTreeModel for the two entry-point functions.

  • Create the code for the MEX function in the codegen\mex\ClassificationTreeModel folder.

  • Copy the MEX function to the current folder.

Verify Generated Code

Pass some predictor data to verify whether the predict function of Mdl and the predict function in the MEX function return the same output arguments. To call an entry-point function in a MEX function that has more than one entry point, specify the function name as the first input argument.

[label,score,node,cnum] = predict(Mdl,XTrain);
[label_mex,score_mex,node_mex,cnum_mex] = ClassificationTreeModel('predict',XTrain);

Compare label and label_mex by using isequal. Similarly, compare node to node_mex and cnum to cnum_mex.

isequal(label,label_mex)
ans = logical
   1

isequal(node,node_mex)
ans = logical
   1

isequal(cnum,cnum_mex)
ans = logical
   1

isequal returns logical 1 (true) if all the input arguments are equal. The comparison confirms that the predict function of Mdl and the predict function in the MEX function return the same labels, node numbers, and class numbers.

Compare score and score_mex.

max(abs(score-score_mex),[],'all')
ans = 0

In general, score_mex might include round-off differences compared to score. In this case, the comparison confirms that score and score_mex are equal.

Retrain Model and Update Parameters in Generated Code

Retrain the model using the entire data set.

retrainedMdl = fitctree(X,Y);

Extract parameters to update by using validatedUpdateInputs. This function detects the modified model parameters in retrainedMdl and validates whether the modified parameter values satisfy the coder attributes of the parameters.

params = validatedUpdateInputs(configurer,retrainedMdl);

Update parameters in the generated code.

ClassificationTreeModel('update',params)

Verify Generated Code

Compare the output arguments from the predict function of retrainedMdl and the predict function in the updated MEX function.

[label,score,node,cnum] = predict(retrainedMdl,X);
[label_mex,score_mex,node_mex,cnum_mex] = ClassificationTreeModel('predict',X);

isequal(label,label_mex)
ans = logical
   1

isequal(node,node_mex)
ans = logical
   1

isequal(cnum,cnum_mex)
ans = logical
   1

max(abs(score-score_mex),[],'all')
ans = 0

The comparison confirms that the labels, node numbers, class numbers, and scores are equal.

More About

expand all

Introduced in R2019b