SVM with Dummy Variables

4 views (last 30 days)
Melissa McCoy
Melissa McCoy on 29 Jul 2015
Commented: Melissa McCoy on 13 Aug 2015
Context: I have a cell array with 19 features that are all categorical (nominal) (as columns) and ~1500 data entries (as rows). I've looped through all the columns and used double(dummyvar(nominal(featureVector))) to convert all the features into dummy variables (vectors of 1s & 0s) and all looks right.
Problem: When I try to feed this as the input data X to fitcsvm() it gives me an error as it expects X to be a floating point matrix.
Error using ClassificationSVM.prepareData (line 602)
You can pass only floating-point data for X to SVM.
If I convert the cell array into a matrix, then the dummy variable vectors will be represented as columns and thus they lose their identity as dummy variables as fitcsvm() expects each column to be a predictor in itself and now thinks there are (num of features)*(num of categories in each feature) predictors. So I don't see how I can use dummy variables with an SVM in Matlab which is mind boggling and I know this is a basic problem many will have.
Thanks so much for your help!

Accepted Answer

Ilya on 29 Jul 2015
Just convert your cell array into a matrix. Yes, dummy variables will lose their identity in the sense that different levels of a categorical predictor will be treated as different predictors. This is common practice though.
  1 Comment
Melissa McCoy
Melissa McCoy on 13 Aug 2015
Many thanks for your answer earlier!
Quick followup question - how then does sequential feature selection work? I've tried to implement it with sequentialfs() but obviously it doesn't realized that, for example, the first 3 columns actually refer to one feature and just takes the first column. I've posted my question here if helpful:
Many thansk again!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!