You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
How to remove the value using Histogram
2 views (last 30 days)
Show older comments
I have the following data, in which my original value is 15 which have count of 7360
I want to remove the remaining values which count less then 33% of orginal values or multiple of the original value
for example in this case I have 30,45 ,60,75 and 90 I want to remove this values. and value of 1 also
How can i do that in MATLAB
Answers (1)
Star Strider
on 24 Jan 2023
I have only a vague idea of what you want to do, especially since the .mat file does not appear to contain the same data as depicted in the posted plot image.
Try this —
LD = load(websave('histogram','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1272595/histogram.mat'))
v = LD.ans;
Ev = linspace(0, 100, 101)
figure
hh = histogram(v, Ev);
Vals = hh.Values;
Edgs = hh.BinEdges;
Retain = (Vals > max(Vals)/3);
Out = Vals(Retain)
OutBinsLowerEdge = Edgs(Retain)
If you want to remove the associated data in the original file corresponding to those values, that would be relatively straightforward using logical indexing. Another approach would be to use histcounts, return the 'Bins' output, and index into that.
.
21 Comments
Med Future
on 24 Jan 2023
The dataset is same as original plot. How can i get the array after processing through it.
What about if i got mutiple of value
Med Future
on 24 Jan 2023
Out give the output array as we have original mat file?
Star Strider
on 24 Jan 2023
I still have no clear idea of what you want to do. It would help somewhat if youoposted the histogram call that produced the data you posted. I have no idea what that is.
If you want to eliminate only those values that are greater than less than of the highest frequency values, one option would be:
LD = load(websave('histogram','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1272595/histogram.mat'))
LD = struct with fields:
ans: [8839×1 double]
Data = LD.ans;
Ev = linspace(0, 100, 1001); % Change This To Produce The Values You Want
figure
hh = histogram(Data, Ev);
[N,Edges,Bin] = histcounts(Data, Ev);
Retain = N > max(N)/3;
FindBins = find(Retain)
FindBins = 1×2
150 151
RetainDataLv = (Bin >= FindBins(1)) & (Bin <= FindBins(2)); % Values In 'Bin' Corresponding To 'Retain' Test
RetainData = Data(RetainDataLv) % Return Desired Subset OF 'Data'
RetainData = 7360×1
15.0200
15.0100
14.9800
15.0100
15.0000
15.0150
14.9950
15.0050
15.0050
15.0050
See if this does what you want.
.
Star Strider
on 25 Jan 2023
The ‘Ev’ vector is the vector of the edges.
I selected it because I still have no idea what you want to do. It seems to produce something similar to the original histogram plot you posted. You never defined how you coded that, so I am doing my best to fill in those gaps.
Image Analyst
on 25 Jan 2023
@Star Strider you're not the only one. I have no idea what he wants to do. Perhaps removing data points based on bin heights, then re-histogramming, or possibly making the one bin not so high. Certainly needs a better explanation because everyone is confused and don't know what @Med Future wants.
Med Future
on 25 Jan 2023
Let me explain it again for you. I have the orginal value which have more number of counts (7360) as you can see it in the histogram
The remaining are the noise.
I want to Delete the remaining Values which have counts less than 33 of maximum counts of value
h=histogram(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
Med Future
on 25 Jan 2023
@Star Strider @Image Analyst I hope now you have understand my problem. please if not let me know i will explain again
Image Analyst
on 25 Jan 2023
Yes, it's clearer now but I'm turning in for the night. If Star doesn't answer you, I'll answer tomorrow.
Med Future
on 25 Jan 2023
@Star Strider @Image Analyst I have shared the data above and this is 2nd dataset, Your code does not work on this dataset too.
Star Strider
on 25 Jan 2023
I still do not understand ‘I want to Delete the remaining Values which have counts less than 33 of maximum counts of value’ so I am guessing that ‘33’ actually means 33 counts, although that also has an ambiguous reference. So I’m interpreting that as any bin with less than 33 counts less than the counts in the maximum bin.
LD = load(websave('seconddata','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1273595/secondata.mat'));
NewDataset = LD.NewData
NewDataset = 1×16075
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
h=histogram(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
[N,Edges,Bin] = histcounts(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
Retain = N > max(N)-33;
FindBins = find(Retain)
FindBins = 99
RetainDataLv = (Bin == FindBins(1)); % Values In 'Bin' Corresponding To 'Retain' Test
RetainData = NewDataset(RetainDataLv) % Return Desired Subset OF 'Data'
RetainData = 1×3474
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
Using histcounts makes this easier.
.
Med Future
on 25 Jan 2023
@Star Strider @Image Analyst Sorry for making this ambigous. I want to delete the values which counts are 33% of the counts of maximum Value.
For example in data I have the maximum value counts are 4641. then 33% of 4641 is (== 1392). I want to remove the values which counts are less than 1392.
Star Strider
on 25 Jan 2023
O.K. That requires one small change —
LD = load(websave('seconddata','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1273595/secondata.mat'));
NewDataset = LD.NewData
NewDataset = 1×16075
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
h=histogram(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
[N,Edges,Bin] = histcounts(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
FindBins = 99
RetainDataLv = (Bin == FindBins(1)); % Values In 'Bin' Corresponding To 'Retain' Test
RetainData = NewDataset(RetainDataLv) % Return Desired Subset OF 'Data'
RetainData = 1×3474
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
This returns the original values in bins that are greater than of the maximum bin count value.
.
Med Future
on 26 Jan 2023
The BinLimits is 1 to 10000, when i run the following code the FindBins value shows there are 5 bins 99, 100,150,200,250
[N,Edges,Bin] = histcounts(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
But in the following code i got only 1 bin of 99
RetainDataLv = (Bin == FindBins(1)); % Values In 'Bin' Corresponding To 'Retain' Test
RetainData = NewDataset(RetainDataLv)
It retain only one Bin Values not all Bins Values
Star Strider
on 26 Jan 2023
Edited: Star Strider
on 26 Jan 2023
That must be different data.
Using slightly changed code on both available data —
LD = load(websave('seconddata','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1273595/secondata.mat'));
NewDataset = LD.NewData(:)
NewDataset = 16075×1
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
h=histogram(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
[N,Edges,Bin] = histcounts(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
FindBins = 99
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
SzRD = size(RetainDataLv);
RetainData = NewDataset(any(RetainDataLv,min(SzRD))) % Return Desired Subset OF 'Data'
RetainData = 100.0000
LD = load(websave('histogram','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1272595/histogram.mat'))
LD = struct with fields:
ans: [8839×1 double]
Data = LD.ans(:);
h=histogram(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
[N,Edges,Bin] = histcounts(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
FindBins = 1×2
14 15
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
SzRD = size(RetainDataLv)
SzRD = 1×2
8839 2
RetainData = Data(any(RetainDataLv,min(SzRD))) % Return Desired Subset OF 'Data'
RetainData = 7464×1
15.0200
15.0100
14.9800
15.0100
15.0000
15.0150
14.9950
15.0050
15.0050
15.0050
See if that does what you want.
EDIT — (26 Jan 2023 at 13:20)
Added ‘(:)’ in the assignment defining the data after the load call to force the data vectors to be column vectors.
.
Med Future
on 26 Jan 2023
The BinLimits Changes from [1 100] to [1 10000] Now the FindBins have values of [99 100 150 200 250]
When I run the code on 'seconddata' dataset it gives error
Arrays have incompatible sizes for this operation.
[N,Edges,Bin] = histcounts(NewData,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
Star Strider
on 26 Jan 2023
I do not understand the reason you are getting the error.
My latest code (adjusted to work with row or column matrices of ‘RetainDataLv’) runs without error when I ran it in my previous Comment with both .mat files.
Having consistent row or column data files would help.
Med Future
on 26 Jan 2023
You have to change the The BinLimits Changes from [1 100] to [1 10000] and check in your previous code you only have one bin which is 99 thats why no error in that code
Star Strider
on 26 Jan 2023
Changed —
LD = load(websave('seconddata','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1273595/secondata.mat'));
NewDataset = LD.NewData(:)
NewDataset = 16075×1
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
h=histogram(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
[N,Edges,Bin] = histcounts(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
FindBins = 1×5
99 100 150 200 250
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
SzRD = size(RetainDataLv);
[~,idx] = min(SzRD);
RetainData = NewDataset(any(RetainDataLv,idx)) % Return Desired Subset OF 'Data'
RetainData = 10289×1
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
LD = load(websave('histogram','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1272595/histogram.mat'));
Data = LD.ans(:);
h=histogram(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
[N,Edges,Bin] = histcounts(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
FindBins = 1×2
14 15
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
SzRD = size(RetainDataLv)
SzRD = 1×2
8839 2
[~,idx] = min(SzRD);
RetainData = Data(any(RetainDataLv,idx)) % Return Desired Subset OF 'Data'
RetainData = 7464×1
15.0200
15.0100
14.9800
15.0100
15.0000
15.0150
14.9950
15.0050
15.0050
15.0050
This appears to work.
.
Med Future
on 27 Jan 2023
Can you please solve this
Arrays have incompatible sizes for this operation.
Data=NewData;
h=histogram(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
[N,Edges,Bin] = histcounts(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
SzRD = size(RetainDataLv)
[~,idx] = min(SzRD);
RetainData = Data(any(RetainDataLv,idx))
Star Strider
on 27 Jan 2023
You need to force ‘Data’ to be a column vector to work with my code, using the ‘(:)’ operator:
Data=NewData(:);
I decided to do this to make my code compatible with all the data sets, since some are row vectors and some are coliumn vectors.
Try this —
LD = load(websave('secondata','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1275815/secondata.mat'));
Data = LD.NewData(:) % Force Column Vector
Data = 16075×1
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
h=histogram(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
[N,Edges,Bin] = histcounts(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
FindBins = 1×5
99 100 150 200 250
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
SzRD = size(RetainDataLv);
[~,idx] = min(SzRD);
RetainData = Data(any(RetainDataLv,idx)) % Return Desired Subset OF 'Data'
RetainData = 10289×1
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
100.0000
That should work with all the data vectors, regardless of whether their initial orientation is as row or column vectors.
.
See Also
Categories
Find more on Logical in Help Center and File Exchange
Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)