How to fix this boxplot.. My data has so many values which are zero, i want to get rid of them in my analysis!

43 views (last 30 days)
for k =1:size(wint2_r,3)
wn_r = squeeze(wint2_r(:,:,k));
w_r(:,k) = wn_r(:);
end
figure(2);
clf;
subplot(2,1,1)
boxplot(w_r)
subplot(2,1,2)
w_r(w_r == 0) = NaN;
boxplot(w_r)
  3 Comments
dpb
dpb on 19 May 2016
There aren't any rewards for seeing how little information you can post, Sofie. Give us some context here and put the fully-explained question in the question, don't try to ask the question in the title.
The short answer would be use the second option above excepting instead of setting the values to NaN, simply remove them entirely.
w_r(w_r == 0) = [];
That, of course, changes the dataset drastically and may produce something that looks nice on a plot but has no meaning--only you can determine if that makes any sense to do or not. At least one question to answer in that regards would be if the dataset has so many zeros, why is that? Are they really zero or no response or what????
Sophia
Sophia on 20 May 2016
Sorry about not explaining it enough so that its self explanatory.. There are couple of problems, one is the data have all positive values which can be seen correctly in the first subplot but in the second subplot its showing negative values in the second boxplot for no reason.
2. I want to make a boxplot like the one in the second subplot so that it justifies how the median have changed over time but on the other hand there is something weird its not looking like the boxplot examples i have seen before, something wrong with the whiskers.

Sign in to comment.

Answers (1)

dpb
dpb on 21 May 2016
Edited: dpb on 21 May 2016
OK, on reflection I believe my previous comment actually is an Answer -- so deleted it and moved it with some refinements here.
The question of trying to force a plot to have a given appearance by arbitrarily removing values from the dataset is simply, in my opinion, misguided. Even if it were to look the way you thought it should, what would it mean since the data that made the plot are no longer the actual data?
Since the box plot limits are the 25- and 75th percentiles, respectively, if there are so many repeated initial values (zero or any other) that they comprise 75% or greater of the total then the correct "answer" for the boxplot is as shown; the median and the outliers; the two percentile points are subsumed in a single location and so can't be shown.
You're misreading the second plot; there are no(*) negative values, the bottom whisker appears to be at identically zero and the median is somewhere in the neighborhood of 2. The y-axis has been scaled to have a little visual space below the origin so the bottom whiskers don't lay on the bounding box as they would if the lower axis origin were zero.
() There may be a negative value in the second bar as an outlier; perhaps there's roundoff present in the original dataset and there is one (or a few) very small negative value(s)? What does *min(x) return for that case?
In summary, looks to me like the boxplot function is working precisely as advertised; the question is in what is the meaning of the data itself. For that question, we have no information here; only you can judge what it means as noted before; the meaning depending upon just what the values represent and how/why there are so many zero responses in it. I suspect still that the removal of those simply to get a plot that looks "pretty" or expected is not the right answer.
  2 Comments
Sophia
Sophia on 24 May 2016
dpb.. The removal of zero is the correct way of using this data, I am working on the sea ice motion data, the pixels with the zero values are not actually the places where the sea ice motion is zero but the areas where there is no ice, lets say land masses or the areas on the periphery's so the removal wont alter my actual data set anyways.. But if i include them that would be the wrong interpretation.
dpb
dpb on 24 May 2016
Edited: dpb on 24 May 2016
Well, we can't know that going in; you've got to tell us enough context so we don't go down these blind alleys.
But, in my mind that raises the question of what distinquishes from '0' as "no ice" and '0.0' as "no ice motion"??? There's got to be a limit of resolution for the measurement so one presumes there must be at least some locations that are stable?
So, what was the result of min for the given case 2 for all the data? With this information on what the data represent, is it not possible that the motion could be a retrograde one and, therefore, negative? Seems plausible to me, but other than the name don't know anything about which ice or motion relative to what, etc., etc., etc., ... so is still conjecture.
Does the Answer not satisfy the question given the caveat that you should simply select the non-zero data? Is not then the appearance of zero at the bottom of the whisker simply a manifestation of there being nonzero values small enough to be indistinguishable at the resolution of the graph?

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!