How to form the training set ?

Hello all, I am new to machine learning and wanna use MATLAB for it... I am trying to form a training set in MATLAB on the basis of following expression:
where S denotes the training set, M = 10, m = 1 to M, is the training feature such that , denotes the training label such that .
My query is what should be the dimension of my training set. I think it should be .
Any help in this regard will be highly appreciated.

1 Comment

Any help in this regard would be highly appreciated...

Sign in to comment.

 Accepted Answer

If I understand all of your notation correctly, I think your training set needs to be an Mx3 matrix.
If means that each observation of x has two components (epsilon minus and epsilon plus), then for each observation of the training set, you need two values to represent x, and one to represent y. So
M = [0.2 0.3 -1;
-0.3 0.4 1;
...
0.6 0.5 -1];
would be the representation in which
  • 1st column is x (epsilon minus)
  • 2nd column is x (epsilon plus)
  • 3rd column is y

16 Comments

chaaru datta
chaaru datta on 14 May 2022
Edited: chaaru datta on 14 May 2022
Thank you so much sir for your answer....
But I have a query that how to assign label to each observation.This doubt arises to me because the first column of training set is related to epsilon minus , second column is related to epsilon plus then how should I decide for the label of that observation to be minus or plus.
Is this a supervised learning task? If so, then you should know all the input features (x) and the label y.
You have to know the features and the labels, in order to train the model.
If you don't know the values of the features and the label, you might have an unsupervised learning task.
Maybe you could explain more about your problem, and post your data?
Yes sir ...it's a supervised learning task and I know all the input features (x).
Also, I know that the label is either -1 or 1.
But I am having doubt that if we consider the first row then in 3rd column what should I label? Plus 1 or Minus 1?
I'm still not sure I understand your question. Do you want separate arrays for input and label?
X = [0.2 0.3;
-0.3 0.4;
...
0.6 0.5];
Y = [-1;
1;
...
-1];
No sir I don't want separate arrays for input and label...
Basically, I want the same array as earlier one i.e. M×3.
But my query is how one decides that my first row third column label is minus 1 or plus 1.
I'm confused.
You wrote "Also, I know that the label is either -1 or 1."
So, use the information you know. If you know the value is -1, put -1. If you know the value is +1, use 1.
Ok sir...Thanks a lot once again...will implement it in MATLAB now...
Hello Sir, I had implemented this training set (Mx3) for SVM. However , I am getting accuracy around 50 % whereas I was expecting it to be around 98%.
Can you upload your data and code? (You can use the paperclip icon in the INSERT section of the toolbar.)
Without seeing your data/code, it's impossible to know whether you have implemented something incorrectly, or if you just are expecting too much accuracy.
Hi sir,
I had shared my code and Training set...
I'm confused again, because the code you uploaded ...
  • doesn't load the data
  • seems to just generate random data (maybe for testing the code?)
  • doesn't fit a statistical model
When you say you got low accuracy, I don't see where you have calculated that.
Also, I did fit a logistic regression model to the data in that file (and also looked at some scatter plots and correlation coefficients), and it doesn't look like En_minus or En_plus have much explanatory power at all for Target:
data = readtable("https://www.mathworks.com/matlabcentral/answers/uploaded_files/999375/Dataset_PIDpaper_7_pls15dB_prac18.xlsx");
data.Target = (data.Target+1)/2;
modelspec = 'Target ~ En_plus + En_minus';
mdl = fitglm(data,modelspec,'Distribution','binomial')
mdl =
Generalized linear regression model: logit(Target) ~ 1 + En_minus + En_plus Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ___________ __________ ________ ________ (Intercept) -0.0073256 0.0094742 -0.77322 0.43939 En_minus -0.00045798 0.00088569 -0.51709 0.60509 En_plus 0.0018022 0.00089727 2.0086 0.044583 100000 observations, 99997 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 4.04, p-value = 0.133
Sir, I would like to answer your queries one by one....
1) I am using SVM to do the classification of wireless signals.
2) Data is not loaded : because I used MATLAB to generate the data (training set) of dimension Mx3, where M = 10^5.
3) seems to just generate random data : Data generated has random values because it is related to wireless channels which are random in nature.
3) I don't see where you have calculated the accuracy: Using this training set, I calculated the accuracy in Python.
I am also sharing the research paper which I am trying to implement.
I have to admit I can't spend the time to fully understand your code or that paper. But, here is my impression.
In your code, it looks Train_label_final is not just random, but random with no relationship to Train_set_features. In other words, this is the case where the signal-to-noise ratio is tiny. [SNR(dB) very negative.] In the paper, notice that when SNR(dB) = -15, they also get an accuracy of about 50%. I think you are seeing exactly the same thing.
But I don't see anywhere in your code where you coded an example in which SNR is large, so you have never simulated a case where the accuracy would be high.
Sir, in the code the large SNR of +15 dB is shown on line 21. And it's effect is included in line 44 and line 57....
I see that the signal is used in the calculation of the features, but it doesn't affect the label, right?
The label you generated is completely random, not affected by the features. Here is the code to generate the labels, with all other code removed:
M_train = 1*10^5; % for training iteration, given in paper as 10^5
M_train_detail = int32(randi([0, 1], [1, M_train])); % generating random tag symbols
Train_label_final = [];
for kk = 1:(M_train)
if M_train_detail(kk)== 0
lab = -1;
else
lab = 1;
end
Train_label = [lab];
Train_label_final = [ Train_label_final; Train_label];
end
This is random, with no reference to signal or the features. Therefore, it is no surprise that you cannot predict these labels from the features.
Yes sir...you are right. I am generating the labels but they are not affected by the features.
Also, I would like to describe the system model given in paper in brief.
1) System model contains Radio frequency source, tag and reader. 2) Tag reflects (backscatters) two types of signal viz., -1 and +1. 3) When reflected signal from tag is -1 , then epsilon minus feature is obtained at reader else epsilon plus is obtained at the reader. 4) Thus my training set consists of epsilon minus, epsilon plus and labels for each reflected signal from the tag.

Sign in to comment.

More Answers (1)

the cyclist
the cyclist on 17 May 2022
I spent a little bit more time with the paper.
It seems to me that in the paper, the labels y are supposed to be used when generating s (Eq. 5 & 6) and then epsilon (Eq. 7 & 8).
But you don't use your labels as part of the calculation of the features.

7 Comments

Yes sir...you are right...but I had also generated the features according to the labels...
For e.g in code line 44 to 54 is for label -1 and line 57 to 67 for label +1.
But the labels used to generate the feature are not what you use in the variable Train_label (which is the 3rd column of Train_set). Shouldn't they be the same labels? Instead, Train_label is just random noise.
Can you also post the Python code with the model, so I can see how you are using the output of the MATLAB program?
Sir, I am sharing the Pyhton code....
Sir, in this paper we have two features based on energy of signals and they are en_min and en_pls as mentioned in MATLAB code on line 54 and 67. So how should I assign the label to these features?
Hello sir, I would like to clarify few of my doubts one by one.
1) As per our earlier discussion training set has to be M x 3. So if we assume M =10 then training set will be 10 x 3, in which first column belongs to en_min, second coulmn belongs to en_pls and last column is of label. Is this correct sir?
2) If the mth bit from tag to reader is -1 then only en_min will be obtained. Then my query is what should be the value of en_pls ?
3) If the mth bit from tag to reader is 1 then only en_pls will be obtained. Then my query is what should be the value of en_min ?
4) If we consider that for a mth bit both en_min and en_pls are available then what we should write in the corresponding label i.e., in the third column.
Sir, I would like to tell you my few observations.
1) Sir, I also did in the following way: if mth bit is -1 then en_min will have value and en_pls will be zero and label is -1. And if mth bit is 1 then en_pls will have value and en_min will be zero and label is 1. However, I get 100% accuracy which is definetly not correct.
2) I also form the training set wherein, I compare en_min and en_pls and assign the label 1 if en_pls is greater than en_min and viceversa. But here also, I got 99.92% accuracy even if SNR is -15dB which is again not correct.
I'm not sure I can spend enough time reviewing the paper, and your code, to be able to answer these for you.
But what is very clear to me is that in your current code, the labels you are using are completely unrelated to the energy level features, so they will be unpredictable.
It seems possible to me that in the training set, the label are supposed to be almost perfectly predictable, but the testing set (with different labels) will not be as predictable. That is normally what happens in machine learning problems.
I can try to take another look, but probably not for a few days.
It's ok sir...Thank you so much for your whole hearted support...I will keep trying to implement this paper...Sir, pls do let me know once you are free so that if I could further discuss with you...
Also it would be better if you could suggest some links to me to solve such machine learning problems..
Hello Sir, can you please share your insights on forming training set as done in this paper.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!