Introduction to Feature Selection
This topic provides an introduction to feature selection algorithms and describes the feature selection functions available in Statistics and Machine Learning Toolbox™.
Feature Selection Algorithms
Feature selection reduces the dimensionality of data by selecting only a subset of measured features (predictor variables) to create a model. Feature selection algorithms search for a subset of predictors that optimally models measured responses, subject to constraints such as required or excluded features and the size of the subset. The main benefits of feature selection are to improve prediction performance, provide faster and more costeffective predictors, and provide a better understanding of the data generation process [1]. Using too many features can degrade prediction performance even when all features are relevant and contain information about the response variable.
You can categorize feature selection algorithms into three types:
Filter Type Feature Selection — The filter type feature selection algorithm measures feature importance based on the characteristics of the features, such as feature variance and feature relevance to the response. You select important features as part of a data preprocessing step and then train a model using the selected features. Therefore, filter type feature selection is uncorrelated to the training algorithm.
Wrapper Type Feature Selection — The wrapper type feature selection algorithm starts training using a subset of features and then adds or removes a feature using a selection criterion. The selection criterion directly measures the change in model performance that results from adding or removing a feature. The algorithm repeats training and improving a model until its stopping criteria are satisfied.
Embedded Type Feature Selection — The embedded type feature selection algorithm learns feature importance as part of the model learning process. Once you train a model, you obtain the importance of the features in the trained model. This type of algorithm selects features that work well with a particular learning process.
In addition, you can categorize feature selection algorithms according to whether or not an algorithm ranks features sequentially. The minimum redundancy maximum relevance (MRMR) algorithm and stepwise regression are two examples of the sequential feature selection algorithm. For details, see Sequential Feature Selection.
You can compare the importance of predictor variables visually by creating partial
dependence plots (PDP) and individual conditional expectation (ICE) plots. For
details, see plotPartialDependence
.
For classification problems, after selecting features, you can train two models (for example, a full model and a model trained with a subset of predictors) and compare the accuracies of the models by using the compareHoldout
, testcholdout
, or testckfold
functions.
Feature selection is preferable to feature transformation when the original features and their units are important and the modeling goal is to identify an influential subset. When categorical features are present, and numerical transformations are inappropriate, feature selection becomes the primary means of dimension reduction.
Feature Selection Functions
Statistics and Machine Learning Toolbox offers several functions for feature selection. Choose the appropriate feature selection function based on your problem and the data types of the features.
Filter Type Feature Selection
Function  Supported Problem  Supported Data Type  Description 

fscchi2  Classification  Categorical and continuous features  Examine whether each predictor variable is independent of a response variable by using individual chisquare tests, and then rank features using the pvalues of the chisquare test statistics. For examples, see the function reference page 
fscmrmr  Classification  Categorical and continuous features  Rank features sequentially using the Minimum Redundancy Maximum Relevance (MRMR) Algorithm. For examples, see the function reference page 
fscnca *  Classification  Continuous features  Determine the feature weights by using a diagonal adaptation of neighborhood component analysis (NCA). This algorithm works best for estimating feature importance for distancebased supervised models that use pairwise distances between observations to predict the response. For details, see the function reference page

fsrftest  Regression  Categorical and continuous features  Examine the importance of each predictor individually using an Ftest, and then rank features using the pvalues of the Ftest statistics. Each Ftest tests the hypothesis that the response values grouped by predictor variable values are drawn from populations with the same mean against the alternative hypothesis that the population means are not all the same. For examples, see the function reference page 
fsrmrmr  Regression  Categorical and continuous features  Rank features sequentially using the Minimum Redundancy Maximum Relevance (MRMR) Algorithm. For examples, see the function reference
page 
fsrnca *  Regression  Continuous features  Determine the feature weights by using a diagonal adaptation of neighborhood component analysis (NCA). This algorithm works best for estimating feature importance for distancebased supervised models that use pairwise distances between observations to predict the response. For details, see the function reference page

fsulaplacian  Unsupervised learning  Continuous features  Rank features using the Laplacian Score. For examples, see the function reference page 
relieff  Classification and regression  Either all categorical or all continuous features  Rank features using the ReliefF algorithm for classification and the RReliefF algorithm for regression. This algorithm works best for estimating feature importance for distancebased supervised models that use pairwise distances between observations to predict the response. For examples, see the function reference page 
sequentialfs  Classification and regression  Either all categorical or all continuous features  Select features sequentially using a custom criterion. Define a function that measures the characteristics of data to select features, and pass the function handle to the 
*You can also consider fscnca
and fsrnca
as embedded type feature selection functions because they return a trained model object and you can use the object functions predict
and loss
. However, you typically use these object functions to tune the regularization parameter of the algorithm. After selecting features using the fscnca
or fsrnca
function as part of a data preprocessing step, you can apply another classification or regression algorithm for your problem.
Wrapper Type Feature Selection
Function  Supported Problem  Supported Data Type  Description 

sequentialfs  Classification and regression  Either all categorical or all continuous features  Select features sequentially using a custom criterion. Define a function that implements a supervised learning algorithm or a function that measures performance of a learning algorithm, and pass the function handle to the For examples, see the function reference page 
Embedded Type Feature Selection
Function  Supported Problem  Supported Data Type  Description 

DeltaPredictor property of a ClassificationDiscriminant model object  Linear discriminant analysis classification  Continuous features  Create a linear discriminant analysis classifier by using For examples, see these topics:

fitcecoc with templateLinear  Linear classification for multiclass learning with highdimensional data  Continuous features  Train a linear classification model by using For an example, see Find Good Lasso Penalty Using CrossValidation. This example determines a good lassopenalty strength by evaluating models with different strength values using 
fitclinear  Linear classification for binary learning with highdimensional data  Continuous features  Train a linear classification model by using
For an example, see Find Good Lasso Penalty Using CrossValidated AUC. This example
determines a good lassopenalty strength by evaluating
models with different strength values using the AUC values.
Compute the crossvalidated posterior class probabilities by
using 
fitrgp  Regression  Categorical and continuous features  Train a Gaussian process regression (GPR) model by using For examples, see these topics:

fitrlinear  Linear regression with highdimensional data  Continuous features  Train a linear regression model by using For examples, see these topics:

lasso  Linear regression  Continuous features  Train a linear regression model with Lasso regularization by using For examples, see the function reference page

lassoglm  Generalized linear regression  Continuous features  Train a generalized linear regression model with Lasso regularization by using For details, see the function reference page

oobPermutedPredictorImportance ** of ClassificationBaggedEnsemble  Classification with an ensemble of bagged decision trees (for example, random forest)  Categorical and continuous features  Train a bagged classification ensemble with tree learners by using For examples, see the function reference page and the topic 
oobPermutedPredictorImportance ** of RegressionBaggedEnsemble  Regression with an ensemble of bagged decision trees (for example, random forest)  Categorical and continuous features  Train a bagged regression ensemble with tree learners by using For examples, see the function reference page 
predictorImportance ** of ClassificationEnsemble  Classification with an ensemble of decision trees  Categorical and continuous features  Train a classification ensemble with tree learners by using For examples, see the function reference page 
predictorImportance ** of ClassificationTree  Classification with a decision tree  Categorical and continuous features  Train a classification tree by using For examples, see the function reference page 
predictorImportance ** of RegressionEnsemble  Regression with an ensemble of decision trees  Categorical and continuous features  Train a regression ensemble with tree learners by using For examples, see the function reference page 
predictorImportance ** of RegressionTree  Regression with a decision tree  Categorical and continuous features  Train a regression tree by using For examples, see the function reference page 
stepwiseglm ***  Generalized linear regression  Categorical and continuous features  Fit a generalized linear regression model using stepwise regression by using For details, see the function reference page

stepwiselm ***  Linear regression  Categorical and continuous features  Fit a linear regression model using stepwise regression by using For details, see the function reference page

**For a treebased algorithm, specify 'PredictorSelection'
as 'interactioncurvature'
to use the interaction test for selecting the best split predictor. The interaction test is useful in identifying important variables in the presence of many irrelevant variables. Also, if the training data includes many predictors, then specify 'NumVariablesToSample'
as 'all'
for training. Otherwise, the software might not select some predictors, underestimating their importance. For details, see fitctree
, fitrtree
, and templateTree
.
***stepwiseglm
and stepwiselm
are not wrapper type functions because you cannot use them as a wrapper for another training function. However, these two functions use the wrapper type algorithm to find important features.
References
[1] Guyon, Isabelle, and A. Elisseeff. "An introduction to variable and feature selection." Journal of Machine Learning Research. Vol. 3, 2003, pp. 1157–1182.
See Also
rankfeatures
(Bioinformatics Toolbox)