Is there a way to identify which dataset a value belongs to for overlapping datasets?
8 views (last 30 days)
Show older comments
I have three types of datasets. These data sets visually shows that `data a` has comparatively lower values compared to `data b` and `data c`. I used a box plot to make a comparison and it shows that they have differences but there are overlaps. I will demonstrate them in the code below:
clc; clear all; close all
load("dataset.mat")
figure
hold on
xlabel('index of points')
ylabel('data value')
plot(a,'.',DisplayName='data1')
plot(b,'.',DisplayName='data2')
plot(c,'.',DisplayName='data3')
figure;
boxplot([a b c],'Notch','on','Labels',{'data1','data2','data3'})
grid on
Now considering these data sets, I have a set of values, say [4 7 40 8 4], I want to predict which dataset these value may belong to. Is there a way to do that? Having a very basic knowledge of statistics, I cannot come up with a solution. I found one solution based on which Kernel density estimate (kde) was used for comparison. However, the data was distinctly separable. In my case, the datasets are more overlapped, is there a way to predict in this case? Forgive my very basic knowledge and suggest a solution. Will appreciate it.
Thanks in advance.
figure
hold on
[fn,xfn,bwn] = kde(a);
plot(xfn,fn)
[fn,xfn,bwn] = kde(b);
plot(xfn,fn)
[fn,xfn,bwn] = kde(c);
plot(xfn,fn)
2 Comments
Jeff Miller
on 25 Mar 2024
You might look into logistic regression and discriminant function analysis. These are both techniques for predicting category membership.
Accepted Answer
Chunru
on 25 Mar 2024
websave("dataset.mat", "https://www.mathworks.com/matlabcentral/answers/uploaded_files/1650591/dataset.mat")
load("dataset.mat")
figure
hold on
xlabel('index of points')
ylabel('data value')
plot(a,'.',DisplayName='data1')
plot(b,'.',DisplayName='data2')
plot(c,'.',DisplayName='data3')
whos
figure;
boxplot([a b c],'Notch','on','Labels',{'data1','data2','data3'})
grid on
x = [4 7 40 8 4]';
% K Nearest neighbour (KNN) classification
data = [a; b; c];
label = [ones(size(a)); 2*ones(size(b)); 3*ones(size(b)) ];
Mdl = fitcknn(data, label, "NumNeighbors", 80); % larger number of neighbours
predictedClass = predict(Mdl, x) % predicted class
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!