Feature selection for SVM classifier
3 views (last 30 days)
Show older comments
I am trying to have matlab do a feature selection for me so I can use the svm classifier on my data and check the ideal performance for each amount of features used for the classification. In my script, I have checked the differentiation between the two groups ("healthy" and "sick") through t-statistics. The t-statistics actually already show me which features would be best, since the feature with the lowest p-value would have the best discriminating properties, but I want it to be done by the sequentialfs command. The problem is, that the feature selection selects different genes than I would have chosen when looking at the p-values (my first-choice feature would be A and the feature selection selects B). Could anyone check if there is something wrong with either the t-statistics or the feature selection? I have attached the dataset matrix to this message. Any help is greatly appreciated!
load samples1
ID=samples1(:,12)
ID(ID<3)=0
ID(ID>=3)=1
samples1(:,13)=ID
%% Determining significancy of feature differentiation between sick and healthy group
sick=find(samples1(1:60,12)>=3);
healthy=find(samples1(1:60,12)<3);
sick2 = samples1(sick,:);
healthy2 = samples1(healthy,:);
[h,p,ci,stats] = ttest2(healthy2,sick2);
%% Train/Test Division
%
x_train=(samples1(1:60,2:7))
y_train=(samples1(1:60,13))
x_test=(samples1(61:end,2:7))
y_test=(samples1(61:end,13))
%% CV partition
c=cvpartition(y_train,'LeaveOut')
%% feature selection
opts = statset('display','iter');
classf = @(x_train, y_train, x_test, y_test)...
sum(predict(fitcsvm(x_train, y_train,'KernelFunction','RBF','Kernelscale','auto'), x_test)~=y_test);
[fs, history] = sequentialfs(classf, x_train, y_train, 'cv', c, 'options', opts,'nfeatures',6);
%% Best hyperparameter
X_train_w_best_feature = x_train(:,fs);
Mdl = fitcsvm(X_train_w_best_feature,y_train,'KernelFunction','rbf','OptimizeHyperparameters','auto',...
'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',...
'expected-improvement-plus','ShowPlots',true)); % Bayes' Optimization.
%% Final test with test set
X_test_w_best_feature = x_test(:,fs);
test_accuracy_for_iter = sum((predict(Mdl,X_test_w_best_feature) == y_test))/length(y_test)*100
%% Extract error rate
label = predict(Mdl, X_test_w_best_feature)
L=loss(Mdl,X_test_w_best_feature,y_test)
0 Comments
Answers (0)
See Also
Categories
Find more on Dimensionality Reduction and Feature Extraction in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!