Clear Filters
Clear Filters

Feature selection for SVM classifier

3 views (last 30 days)
Jos Huigen
Jos Huigen on 25 Jun 2019
I am trying to have matlab do a feature selection for me so I can use the svm classifier on my data and check the ideal performance for each amount of features used for the classification. In my script, I have checked the differentiation between the two groups ("healthy" and "sick") through t-statistics. The t-statistics actually already show me which features would be best, since the feature with the lowest p-value would have the best discriminating properties, but I want it to be done by the sequentialfs command. The problem is, that the feature selection selects different genes than I would have chosen when looking at the p-values (my first-choice feature would be A and the feature selection selects B). Could anyone check if there is something wrong with either the t-statistics or the feature selection? I have attached the dataset matrix to this message. Any help is greatly appreciated!
load samples1
ID=samples1(:,12)
ID(ID<3)=0
ID(ID>=3)=1
samples1(:,13)=ID
%% Determining significancy of feature differentiation between sick and healthy group
sick=find(samples1(1:60,12)>=3);
healthy=find(samples1(1:60,12)<3);
sick2 = samples1(sick,:);
healthy2 = samples1(healthy,:);
[h,p,ci,stats] = ttest2(healthy2,sick2);
%% Train/Test Division
%
x_train=(samples1(1:60,2:7))
y_train=(samples1(1:60,13))
x_test=(samples1(61:end,2:7))
y_test=(samples1(61:end,13))
%% CV partition
c=cvpartition(y_train,'LeaveOut')
%% feature selection
opts = statset('display','iter');
classf = @(x_train, y_train, x_test, y_test)...
sum(predict(fitcsvm(x_train, y_train,'KernelFunction','RBF','Kernelscale','auto'), x_test)~=y_test);
[fs, history] = sequentialfs(classf, x_train, y_train, 'cv', c, 'options', opts,'nfeatures',6);
%% Best hyperparameter
X_train_w_best_feature = x_train(:,fs);
Mdl = fitcsvm(X_train_w_best_feature,y_train,'KernelFunction','rbf','OptimizeHyperparameters','auto',...
'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',...
'expected-improvement-plus','ShowPlots',true)); % Bayes' Optimization.
%% Final test with test set
X_test_w_best_feature = x_test(:,fs);
test_accuracy_for_iter = sum((predict(Mdl,X_test_w_best_feature) == y_test))/length(y_test)*100
%% Extract error rate
label = predict(Mdl, X_test_w_best_feature)
L=loss(Mdl,X_test_w_best_feature,y_test)

Answers (0)

Categories

Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!