Train support vector machine (SVM) classifier for one-class and binary classification
fitcsvm
trains or cross-validates a support vector
machine (SVM) model for one-class and two-class (binary) classification on a
low-dimensional or moderate-dimensional predictor data set. fitcsvm
supports mapping the predictor data using kernel functions, and supports sequential
minimal optimization (SMO), iterative single data algorithm (ISDA), or
L1 soft-margin minimization via quadratic programming for
objective-function minimization.
To train a linear SVM model for binary classification on a high-dimensional data set,
that is, a data set that includes many predictor variables, use fitclinear
instead.
For multiclass learning with combined binary SVM models, use error-correcting output
codes (ECOC). For more details, see fitcecoc
.
To train an SVM regression model, see fitrsvm
for low-dimensional and moderate-dimensional predictor data
sets, or fitrlinear
for high-dimensional data
sets.
returns a support vector machine
(SVM) classifier
Mdl
= fitcsvm(Tbl
,ResponseVarName
)Mdl
trained using the sample data contained in the table
Tbl
. ResponseVarName
is the name of
the variable in Tbl
that contains the class labels for
one-class or two-class classification.
specifies options using one or more name-value pair arguments in addition to the
input arguments in previous syntaxes. For example, you can specify the type of
cross-validation, the cost for misclassification, and the type of score
transformation function.Mdl
= fitcsvm(___,Name,Value
)
fitcsvm
trains SVM classifiers for one-class or two-class
learning applications. To train SVM classifiers using data with more than two
classes, use fitcecoc
.
fitcsvm
supports low-dimensional and moderate-dimensional
data sets. For high-dimensional data sets, use fitclinear
instead.
Unless your data set is large, always try to standardize the predictors (see
Standardize
). Standardization makes predictors
insensitive to the scales on which they are measured.
It is a good practice to cross-validate using the KFold
name-value pair argument. The cross-validation results determine how well the
SVM classifier generalizes.
For one-class learning:
The default setting for the name-value pair argument
Alpha
can lead to long training times. To
speed up training, set Alpha
to a vector mostly
composed of 0
s.
Set the name-value pair argument Nu
to a
value closer to 0
to yield fewer support vectors
and, therefore, a smoother but crude decision boundary.
Sparsity in support vectors is a desirable property of an SVM classifier. To
decrease the number of support vectors, set BoxConstraint
to
a large value. This action increases the training time.
For optimal training time, set CacheSize
as high as the
memory limit your computer allows.
If you expect many fewer support vectors than observations in the training
set, then you can significantly speed up convergence by shrinking the active set
using the name-value pair argument 'ShrinkagePeriod'
. It is a
good practice to specify 'ShrinkagePeriod',1000
.
Duplicate observations that are far from the decision boundary do not affect
convergence. However, just a few duplicate observations that occur near the
decision boundary can slow down convergence considerably. To speed up
convergence, specify 'RemoveDuplicates',true
if:
Your data set contains many duplicate observations.
You suspect that a few duplicate observations fall near the decision boundary.
To maintain the original data set during training,
fitcsvm
must temporarily store separate data sets:
the original and one without the duplicate observations. Therefore, if you
specify true
for data sets containing few duplicates, then
fitcsvm
consumes close to double the memory of the
original data.
After training a model, you can generate C/C++ code that predicts labels for new data. Generating C/C++ code requires MATLAB Coder™. For details, see Introduction to Code Generation.
For the mathematical formulation of the SVM binary classification algorithm, see Support Vector Machines for Binary Classification and Understanding Support Vector Machines.
NaN
, <undefined>
, empty character vector
(''
), empty string (""
), and
<missing>
values indicate missing values.
fitcsvm
removes entire rows of data corresponding to a missing
response. When computing total weights (see the next bullets),
fitcsvm
ignores any weight corresponding to an observation with
at least one missing predictor. This action can lead to unbalanced prior probabilities
in balanced-class problems. Consequently, observation box constraints might not equal
BoxConstraint
.
fitcsvm
removes observations that
have zero weight or prior probability.
For two-class learning, if you specify the cost matrix (see Cost
),
then the software updates the class prior probabilities p (see Prior
)
to pc by incorporating the
penalties described in .
Specifically, fitcsvm
completes these steps:
Compute
Normalize pc* so that the updated prior probabilities sum to 1.
K is the number of classes.
Reset the cost matrix to the default
Remove observations from the training data corresponding to classes with zero prior probability.
For two-class learning, fitcsvm
normalizes all observation weights (see
Weights
) to sum to 1. The function then renormalizes the
normalized weights to sum up to the updated prior probability of the class to which the
observation belongs. That is, the total weight for observation j in
class k is
wj is the normalized weight for observation j; pc,k is the updated prior probability of class k (see previous bullet).
For two-class learning, fitcsvm
assigns a box constraint to each
observation in the training data. The formula for the box constraint of observation
j is
n is the training sample size,
C0 is the initial box constraint (see the
'BoxConstraint'
name-value pair argument), and is the total weight of observation j (see previous
bullet).
If you set 'Standardize',true
and the 'Cost'
,
'Prior'
, or 'Weights'
name-value pair
argument, then fitcsvm
standardizes the predictors using their
corresponding weighted means and weighted standard deviations. That is,
fitcsvm
standardizes predictor j
(xj) using
xjk is observation k (row) of predictor j (column).
Assume that p
is the proportion of outliers that you expect in the training
data, and that you set 'OutlierFraction',p
.
For one-class learning, the software trains the bias term such that
100p
% of the observations in the training data have
negative scores.
The software implements robust learning for
two-class learning. In other words, the software attempts to remove
100p
% of the observations when the optimization
algorithm converges. The removed observations correspond to gradients that
are large in magnitude.
If your predictor data contains categorical variables, then the software generally uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable.
The PredictorNames
property stores
one element for each of the original predictor variable names. For
example, assume that there are three predictors, one of which is a
categorical variable with three levels. Then PredictorNames
is
a 1-by-3 cell array of character vectors containing the original names
of the predictor variables.
The ExpandedPredictorNames
property
stores one element for each of the predictor variables, including
the dummy variables. For example, assume that there are three predictors,
one of which is a categorical variable with three levels. Then ExpandedPredictorNames
is
a 1-by-5 cell array of character vectors containing the names of the
predictor variables and the new dummy variables.
Similarly, the Beta
property stores
one beta coefficient for each predictor, including the dummy variables.
The SupportVectors
property stores
the predictor values for the support vectors, including the dummy
variables. For example, assume that there are m support
vectors and three predictors, one of which is a categorical variable
with three levels. Then SupportVectors
is an n-by-5
matrix.
The X
property stores the training data as originally input
and does not include the dummy variables. When the input is a table,
X
contains only the columns used as predictors.
For predictors specified in a table, if any of the variables contain ordered (ordinal) categories, the software uses ordinal encoding for these variables.
For a variable with k ordered levels, the software creates k – 1 dummy variables. The jth dummy variable is –1 for levels up to j, and +1 for levels j + 1 through k.
The names of the dummy variables stored in the ExpandedPredictorNames
property
indicate the first level with the value +1.
The software stores k – 1 additional
predictor names for the dummy variables, including the names of levels
2, 3, ..., k.
All solvers implement L1 soft-margin minimization.
For one-class learning, the software estimates the Lagrange multipliers, α1,...,αn, such that
[1] Christianini, N., and J. C. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, UK: Cambridge University Press, 2000.
[2] Fan, R.-E., P.-H. Chen, and C.-J. Lin. “Working set selection using second order information for training support vector machines.” Journal of Machine Learning Research, Vol. 6, 2005, pp. 1889–1918.
[3] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, Second Edition. NY: Springer, 2008.
[4] Kecman V., T. -M. Huang, and M. Vogt. “Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance.” Support Vector Machines: Theory and Applications. Edited by Lipo Wang, 255–274. Berlin: Springer-Verlag, 2005.
[5] Scholkopf, B., J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. “Estimating the Support of a High-Dimensional Distribution.” Neural Comput., Vol. 13, Number 7, 2001, pp. 1443–1471.
[6] Scholkopf, B., and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, Adaptive Computation and Machine Learning. Cambridge, MA: The MIT Press, 2002.
ClassificationPartitionedModel
| ClassificationSVM
| CompactClassificationSVM
| fitSVMPosterior
| fitcecoc
| fitclinear
| predict
| quadprog
| rng