To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, compare cross-validated AUC values.
Load the NLP data set. Preprocess the data as in Estimate k-fold Cross-Validation Posterior Class Probabilities.
There are 9471 observations in the test sample.
Create a set of 11 logarithmically-spaced regularization strengths from through .
Cross-validate a binary, linear classification models that use each of the regularization strengths and 5-fold cross-validation. Optimize the objective function using SpaRSA. Lower the tolerance on the gradient of the objective function to 1e-8
.
CVMdl =
ClassificationPartitionedLinear
CrossValidatedModel: 'Linear'
ResponseName: 'Y'
NumObservations: 31572
KFold: 5
Partition: [1×1 cvpartition]
ClassNames: [0 1]
ScoreTransform: 'none'
Properties, Methods
Mdl1 =
ClassificationLinear
ResponseName: 'Y'
ClassNames: [0 1]
ScoreTransform: 'logit'
Beta: [34023×11 double]
Bias: [-13.2559 -13.2559 -13.2559 -13.2559 -9.1017 -7.1128 -5.4113 -4.4974 -3.6007 -3.1606 -2.9794]
Lambda: [1.0000e-06 3.5481e-06 1.2589e-05 4.4668e-05 1.5849e-04 5.6234e-04 0.0020 0.0071 0.0251 0.0891 0.3162]
Learner: 'logistic'
Properties, Methods
Mdl1
is a ClassificationLinear
model object. Because Lambda
is a sequence of regularization strengths, you can think of Mdl1
as 11 models, one for each regularization strength in Lambda
.
Predict the cross-validated labels and posterior class probabilities.
label
is a 31572-by-11 matrix of predicted labels. Each column corresponds to the predicted labels of the model trained using the corresponding regularization strength. posterior
is a 31572-by-2-by-11 matrix of posterior class probabilities. Columns correspond to classes and pages correspond to regularization strengths. For example, posterior(3,1,5)
indicates that the posterior probability that the first class (label 0
) is assigned to observation 3 by the model that uses Lambda(5)
as a regularization strength is 1.0000.
For each model, compute the AUC. Designate the second class as the positive class.
Higher values of Lambda
lead to predictor variable sparsity, which is a good quality of a classifier. For each regularization strength, train a linear classification model using the entire data set and the same options as when you trained the model. Determine the number of nonzero coefficients per model.
In the same figure, plot the test-sample error rates and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale.
Choose the index of the regularization strength that balances predictor variable sparsity and high AUC. In this case, a value between to should suffice.
Select the model from Mdl
with the chosen regularization strength.
MdlFinal
is a ClassificationLinear
model containing one regularization strength. To estimate labels for new observations, pass MdlFinal
and the new data to predict
.