You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
How to find error in experimental data using MATLAB code?
27 views (last 30 days)
Show older comments
Hi everyone, I downloaded experimental raw data and then analyse with a software. I have used a standard analysis software package (GUISDAP) and extract electron density, Ne = 1×1000. Then I used the following Matlab code to get data from some specific columns,
A=Ne(1, 90:90:1000); %%this gives A=1×11
As the Matlab code is giving me data exactly after 90 data points. So in this case how I estimate or minimise the error/noise from the data. Someone give me the suggestion to find standard deviation. Actually I do not know in this case what means the noise and how to deal with it. Any guidance will be appreciated. Thanks
Accepted Answer
Adam Danz
on 7 Aug 2018
Edited: Adam Danz
on 8 Aug 2018
Someone give me the suggestion to find standard deviation. Actually I do not know in this case what means the noise and how to deal with it.
Are you asking what 'noise' means in regard to data or are you asking how to interpret the noise in your data?
Noise is just variability of a measurement. If I step on a digital scale 5 times my weight might be [79.9, 80.1, 80.0, 79.8, 80.1] kg. The variation might be explained in how I was centered on the scale each time, my posture, or the lousy mechanics within the scale. Of course I have a True (capital T) weight at any point in time but the 'noisy' scale is the only way to estimate that True weight and along with the estimates comes noise (variability).
For example, let's say my True weight (which isn't knowable) is 80.0001425458541254789 kg. The mean of my 5 measurements is 79.98 and the standard deviation is 0.13038.
x = [79.9, 80.1, 80.0, 79.8, 80.1];
m = mean(x)
s = std(x) % <- That's how to calculate standard deviation
If your data are normally distributed, ~68% of your data will be within +/- 1 standard deviation from the mean. If the scale continues to behave the same and I quickly weight myself 1000 more times, ~68% of the measurement will be between 79.85 and 80.11 kg.
upper = m + 1*s
lower = m - 1*s
If your data are not approximately normally distributed, you should not be using stand deviations but you could use a non parametric test such as bootstrapped confidence intervals. It might be difficult to determine the distribution of your data since you only seem to have 11 data points in 'A'.
I'm not certain that this explanation is what you were looking for so please feel free to follow up or tell me how far off base my interpretation of your noisy question was ;)
25 Comments
Safi ullah
on 8 Aug 2018
@ Adam Danz, I would like to thanks for your reply. Actually I am asking how to interpret the noise in my given data?. If I have the following code,
% A=Ne(1, 90:90:1800); %%this gives A=1×20
I know about the definition of noise but in my case do not know how to show it or how to remove it from my data. my data has normal distribution.
Safi ullah
on 8 Aug 2018
@ Adam Danz, what does it mean when some times people write like "here only those data points are considered which are two times of standard deviation" thanks
Adam Danz
on 8 Aug 2018
Edited: Adam Danz
on 8 Aug 2018
Given a vector of noisy data, you can't really 'remove' the noise unless you know what the signal should be in the first place. You can, however, clean up the data to produce a less noisy estimate of the signal. The more data you have, the better. You seem to have 20 data points which might be enough depending on what method you use. I'll list some ideas below.
Without seeing your data I can't recommend a particular method. You could embed a screenshot of plot(A) which might help.
If all 20 points in 'A' are 20 repetitions of the same measure (ie, I step on the scale 20 times to estimate my weight), just take the mean (if normally distributed) or median (otherwise).
If your 20 points in 'A' are a time series you've got several options (even more than I have listed)
- Do a moving average
- or a moving average filter
- Smooth your data ( examples )
- or apply a filter
- detect outliers and remove them - maybe even prior to the steps above
- Here's an example of outlier detection & smoothing combo
- Additional methods of outlier detection and smoothing
- If you know what your data should look like, you could do curve fitting .
- Don't remove the noise. Instead, plot the error using error bars and let the reader interpret it (this is the best solution in some cases)
These methods and the parameters you choose will provide different results which reinforces my statement that you really can't just remove noise - you can only reduce it to produce an estimate of the signal.
what does it mean when people write "here only those data points are considered which are two times of standard deviation"
First some background info. The standard deviation std() is a measure of dispersion. "Within one standard deviation" means all data that are between mean-std and mean+std. So if the mean is 20 and the std is 5, all data between 15 and 25 are within 1 std. If the data are normally distributed, 68.2% of the data will be within 1 std of the mean.
"Within 2 standard deviations" refers to all data between mean-2*std and mean+2*std. For a mean of 20 and std of 5, that's all data between 10 and 30 which will account for 95.4% of the data.
Finally, 3 standard deviations account for 99.6% of the data. For intuition on these percentages, see the first figure in this wiki article .
To drive this message home, below is a figure showing 1000 random numbers pulled from a normal distribution with mean 20 and std 5. The histogram shows the distribution along the x axis and the vertical reference lines shows the first 3 standard deviations.
The green lines show where the 2nd std falls. Data points along the x axis outside of those green lines are ignored if we're only considering data within 2 std of the mean.
Safi ullah
on 8 Aug 2018
@ Dear Adam Danz thanks for detail guidance. From your answer I learned a lot but still do not reached to my desire point. This time I want to explain clearly my question step wise [1]. I have different data sets from same experiment. For each data set I used the same code
A=Ne(1, 90:90:1800); %%this gives A=1×20
[2]. Finally I have combine all the “A” and then get A=1×350 The plot(A) is given below, where A=1×350
I have confusion that (a) either for each dataset we need to find std deviation or only for final A=1×350 we need to find it. (b). What is simply meaning of to minimise noise from the data? In my case by finding mean, variance and stad deviation which one I will show that this was the noise (c). will be helpful for me if you explain the below two sentences which I have taken from my research field papers. the sentences are “The requirement for points to be included is that the value of Z is twice or more than that of the background noise.” And another one is “In Figure 5, the Z is plotted only if the backscatter signal is larger than two times the standard deviation of the background signal” thanks
Adam Danz
on 8 Aug 2018
Your question A)
I'm a bit confused since (a) has 20 elements and (A) has 350 elements but 350 is not divisible by 20.
Anyway, your question is impossible to answer without knowing what (a) or (A) represents. Is (A) a continuous signal that was split up into multiple datasets? Are all of the (a)s repeated measures of the same thing? I need more info about what (a) and/or (A) are. Also, what are the units of the x and y axes? For example, is x time (seconds)? Is y a spike rate (spikes / second)?
Your question B)
That again depends on what (a)/(A) are.
Your question C)
“The requirement for points to be included is that the value of Z is twice or more than that of the background noise.”
Without more info I can only guess by the verbiage that the data were z-scored , the background noise was estimated using standard deviation, and any data points that exceeded 2 z-scores were ignored.
"In Figure 5, the Z is plotted only if the backscatter signal is larger than two times the standard deviation of the background signal"
I explained this is my previous comment and I provided a plot to demonstrate this concept. Please read through that again. I don't know what the "Z" or the "backscatter signal" is but the data were only plotted if the data exceed 2 standard deviations of the mean.
Safi ullah
on 8 Aug 2018
@ Dear Adam Danz here I am going to explain the lines where you have confusion. In my previous comment I have used (a) only for a number. As I asked three questions so for simplicity I had given the names (a), (b) and (c), noted that (a) has no relation with A. The unit of x-axis is “seconds” and y-axis is “m–1” Also I repeat the sentence in another form “The requirement for points to be included is that the value of Reflectivity (“m–1”) is twice or more than that of the background noise.” And last point from my given fig what you suggest for me thanks
Adam Danz
on 8 Aug 2018
In my comment, (a) refers to the 20-element vector that you pull from each dataset. (A) represents that 350-element vector that combines all of the (a)s.
I still don't know what the data represents and without that context, it's difficult to give recommendations.
“The requirement for points to be included is that the value of Reflectivity (“m–1”) is twice or more than that of the background noise.”
This gives me a hint. I don't know how they measured the background noise. Perhaps a signal-to-noise ratio ? Given your long vector of (A), you could compute the SNR to estimate the noise and then include data points in (A) that are > 2x the noise.
Safi ullah
on 8 Aug 2018
@ Adam Danz, you simply consider this comment. A=1×350, where the A contain the experimental data of Reflectivity of unit 1/meter. About the sentence “The requirement for points to be included is that the value of Reflectivity (“m–1”) is twice or more than that of the background noise.” as here do not used SNR but from your comment I understand the point. Now if you only guide me about two points. First if any one consider the data within first st.div and ignore the data which are greater than within first st.div then in this case is the ignored data means noise? and second can I use st.div or not? thanks bro
Adam Danz
on 8 Aug 2018
Q1) No. Given a vector of noisy data and no other data, you cannot know what is noise and what is signal. In fact, all of it is noise unless you have more information about the data. You can only estimate what the signal is amid the noise. In order to estimate that signal, you may want to get rid of outliers. But then the question is, what is an outlier? As an experimenter or analyst, you must define what an outlier is in a case-by-case basis. For example, "all data outside of 2 std of the mean is an outlier". The rest of the data can be considered to estimate the signal.
In some cases you may have more information which will help you estimate the signal from the noise. Let's say that your data (A) is from some device with a known minimum output of about 0.3 and you measured the output with a noisy sensor. Looking at your plot, you could eliminate all data under 0.3 and you can confidently call that "noise". The rest of the data, however, is not purely "signal". It's "signal + noise".
Q2) Can you use std() for what purpose? std() can be used to set thresholds that identify outliers but I'm not sure what's what you want to do. If you attach a mat file with (A) variable and clarify what you want to do, I'd be glad to continue helping you. I hope it's clear, though, why you can't just separate signal and noise unless you have more information about what the signal should be.
Safi ullah
on 8 Aug 2018
@ Adam Danz, thanks for continious guidance. I have attached the mat file. here A is reflectivity in unit (1/meter).Now I want to reduce noise from A. After discussing with you I understand that this A is full of noise. So if I use std div and defined the threshol of A, that is e.g Thresold = within 2 std of the mean, So after this all data outside of 2 std div of the mean will be outlier/noise. When I removed that data points then A will be reduce from A=1*350, and I will say that I minimise the noise from data, am I wright.?
Adam Danz
on 8 Aug 2018
Edited: Adam Danz
on 8 Aug 2018
Hi Safi, poking around with your data was a good idea and I have some more feedback for you.
First, you say two different things in your most recent comment. You say you want to 'reduce the noise' from A (ie, smoothing) and then later you say you want to remove outliers. So I'm still not quite sure which one you want to do, or both.
Judging from your last sentence, this is what I understand: you want to detect outliers in A and removing them and you consider the outliers to be the spikes in your data.
Open the embedded plot by right-clicking and opening it in a new tab (it will be bigger).
Your data (A) is plotted in the first subplot below and the distribution of (A) is plotted in the 2nd subplot. 'A' is definitely not from a normal distribution even though you said it was normal in your first comment under my answer.
Nevertheless, in the 1st and 2nd subplots I show you the mean and the 1st and 2nd standard deviations from the mean (see legend). You could choose one of these as a threshold for this dataset since the mean is so far from the spikes.
Using that function, I identified outliers greater than 2 std from the mean of (A) and labeled them with blue dots in the first subplot. Then I re-plotted (A) without the outliers which is shown in the 3rd subplot. Note that the mean of (A) has changed since the outliers are no longer pulling it.
I attached the code used to produce these plots.
To answer your questions explicitly, if you remove the outliers as I have done, you have removed what you perceive to be some of the noise which gives you a better estimate of the signal.
Image Analyst
on 8 Aug 2018
Edited: Image Analyst
on 8 Aug 2018
Safi, for that spiky plot you gave, please post a picture of what you might expect the desired output to look like. I don't know if you want the "noise" removed, clipped, smoothed over or what. A picture would help.
Safi ullah
on 8 Aug 2018
@ DearAdam Danz, from your each comment I understand too much. but as I am working on this topic first time this is the reason I missed the key points. In reply to your recent comment, First thing is that I do not need to plot the fig from A. I need to use A for some other purpose. In my field when I saw papers then by using the same experiment data they necessarily show that from their data how they minimise the noise. As I do not know the difference between outlier, spikes and noise that is the reason I make confusion. and also I do not know what is the criteria for data set to be normal. will be good if you explain the following points because I think I am near to reach the desired point. First: my concept about to minimise the noise (in my last comment) is right or not. Second: what is the criteria for normal data distribution
Safi ullah
on 8 Aug 2018
@ Image Analyst, thanks for reply, Actually I do not need to plot any fig from that data which I have shown in A. I just need from that how to remove the noise (if noise do not remove completely then how to minimise the noise). I am confuse that from st.div how we say that we minimise the noise?. thanks
Safi ullah
on 8 Aug 2018
Edited: Safi ullah
on 9 Aug 2018
@ DearAdam Danz, to define threshold from st.div and after that all other data which do not satisfied that threshold can we say that those data are noise/outlier. Can in this way I say that I minimise the noise from my data by removing all those data points which do not satisfied the threshold.? Now I understand my data is not Normally distributed. But if technically it is not a big mistake, then I wish I only consider with 2 st.div and refer the outside data of 2 st.div as noise.?
Adam Danz
on 9 Aug 2018
Hi Safi, your goal is very unclear to us and I think that's partially because it's unclear to you. Methods of reducing noise and detecting outliers differ depending on what type of data you're working with and you've been unclear about what your data is, what you perceive to be noise, and your goal (detect outliers, smooth the data, etc).
I can't give any recommendations to you before you can explicitly and precisely describe your data and your goal. Providing a picture, as @ImageAnalyst recommended is a good idea.
I recommend reading some basic chapters on "signal and noise" and "outliers". There's lots of free literature and videos out there if you search for it.
I just want to reiterate a final point I've made several times and I hope this helps with your understanding of the basic concept of noisy data. The plot below shows a noisy sine curve in blue. The red line is the actual sine curve but we cannot access that data because our sensors were noisy -- we only have access to the blue noisy signal. The 4 different subplots show different levels of noise. You can see that the signal is more identifiable in the last subplot because it is less noisy. There are several ways we can estimate the signal (the red line) but we can't simply "remove" the noise because the blue line is all we've got to work with.
Now consider this example where our sensor was quite precise and our signal was clean except for a burst of noise. In this example, it is reasonable to attempt to "remove" the noise from the signal. We could interpolate the window of noise or we could use curve-fitting or other techniques.
Lastly, consider this last example where our sensor was a little noisy but there were several samples that were far from the rest of the distribution. These are outliers (with red 'x') and are fairly easy to detect and remove although the method will vary depending on the distribution of your data and how you decide to define an outlier.
I hope this helps conceptually and I recommend spending some time with the literature or videos so these concepts become clearer to you. Even 1/2 a day of learning will make a big difference.
Safi ullah
on 9 Aug 2018
Edited: Safi ullah
on 9 Aug 2018
@ DearAdam Danz thank you so much for your detail and kind guidance. from your comments I have learned too much. Further I will focus on your recommendation to study literature and videos. Lastly I just want that you give me final comments on my one point. [1]. my goal is that, I used vector A, in a paper, then one of reviewer ask me the question that minimise the noise from vector A. Once again thank you so much
Safi ullah
on 9 Aug 2018
Reviewer comment. “Fig.3 is full of noise and needed to minimise the noise because within uncertainties many data points should not be included in the fig.3. (Note fig.3 is my plot(A))”.
Adam Danz
on 9 Aug 2018
Edited: Adam Danz
on 9 Aug 2018
Ok, we're getting closer, Safi! Since I can't interpret your data and therefore I cannot know what the signal should look like, please circle the parts in your figure that you consider to be noise. Circle the data points that should not be included in the figure. You can do this by hand in microsoft paint, for example.
Safi ullah
on 9 Aug 2018
@ Dear Adam, in my paper I have several figures like one is shown here as "plot(A)". Reviewer do not ask me to plot any specific shape and I also do not need any specific shape, reviewer just ask me that from this fig you minimise the noise. He did not mentioned how much minimisation will be enough. From your comments I come to know that I need to plot only those data points of A which is within 2 st.div of the mean. the rest data I do not consider. Because one other researcher also give me the suggestion that reviewer comment means that you defined any threshold by using st.div.
Adam Danz
on 9 Aug 2018
Edited: Adam Danz
on 9 Aug 2018
The reviewer said "many data points should not be included in the fig.". S/he's asking you to identify outliers and remove them.
doc isoutlier % matlab vs 2017a or later
This will help you identify outliers and remove them. You'll need to document in your paper how you eliminated the outliers.
Safi ullah
on 9 Aug 2018
@ Dear Adam, thanks for your valuable comments. I have learned a lot from your comments. I will do this, thanks
Adam Danz
on 9 Aug 2018
My pleasure. Be sure to read the documentation on isoutlier() and choose parameter values that makes sense for your data and that you can explain in your paper.
More Answers (0)
See Also
Categories
Find more on Data Import from MATLAB in Help Center and File Exchange
Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)