Problem in determining standard error and plotting 3 error bars

Hello everyone,
I am trying to make a figure where Y-axis is temperature and X-axis represents particle concentration. The I-type vertical bars indicate standard error. I tried this:
Data = table2array(WORK01S1); % convert it into array
Temp = double(Data(:,1)); % extract the column of Temperature from Data
Particle = double(Data(:,2)); % extract the column of Particle from Data
Particle = Particle*1000000000; % To change kg/um3 to ug/um3
CAPE = double(Data(:,3)); % extract the column of CAPE from Data
[N,edges,bins] = histcounts(Particle,10) % I divided the Particle concentration in to 10 bins (just to check I selected 10 bins)
% I used below line to calculate standard error but I am not sure whether
% its correct or not
std_err= std(Data,[],2)/sqrt(size(Data,2))
I am not able to calculate standard error properly and aslo I want 3 error bars as shown in the figure (values of CAPE too). What changes and additions should I make in my code? I can't figure out this. I'm grateful for any help. Data is attached.

7 Comments

Why are you dividing std() by the size of the data? std() is already defined to divide by sqrt(N-1)
@Walter Roberson Thank you for your reply.
I am trying to calculate standard error. And I got the above formula while I was searching for calculating Standard error in MATLAB.
"Standard error of what?"
The standard deveiation (std function in MATLAB) is an estimate of the population variablility while the estimate of the variability of the mean of the population is std()/sqrt(n)
The latter is the estimate of how good the estimate of the statistic (the mean, here) is relative to the population mean; the former is about the variablilty of the distribution individual values.
Both are correct; they just represent different things -- we don't know and can't know which one you want...although it would be typical to put the population estimate on a plot of the data, but if the statistic shown is properly identified as to what it is, either could be chosen.
@dpb Thank you for explanation.
I want Standard error of mean (as it can be seen in the figure).
So what's the problem, then?
@dpb Problem is I can't figure out how to make above figure.
Based on my data should I firstly make Particle bins? And then for each bin I calculate standard error of mean? But it will produce the SEM of these particle bins. As you can see it has another variable CAPE. If I divide the particle data into bins, how I am going to get the SEM for CAPE values and plot it?
@dpb does the below method work:
Make bins of particle concentration
For each bin find out the number of values of CAPE that fall into it
Calculate the SEM of these CAPE values
Plot it

Sign in to comment.

 Accepted Answer

tData=readtable('WORK.xlsx');
tData.ParticleConc=1E9*tData.ParticleConc;
tData.CAPE_Bin=discretize(tData.CAPE,[0,500,900,inf]);
tStats=grpstats(tData,'CAPE_Bin',{'mean','std',fnStdMn});
tStats.Properties.VariableNames=strrep(tStats.Properties.VariableNames,'Fun3','stdMn')
figure
hAx=axes; hold on
hL=splitapply(@plotrows,tData.ParticleConc,tData.Temp,tData.CAPE_Bin);
hAx.XScale='log';
grid on
provides the beginnings for the figure, but your data aren't separated by the temperature gradient nearly as neatly as the example figure -- and your particle concentrations cover several orders of magnitude wider range..
The data for the error bars is in the tStats structure as the stdMnTemp value, errorbar will let you add it to the plot although it's already messy enough owing to the overlay of the observations, adding the error bars will basically turn it into a solid blob over the bulk of the figure.
>> tStats
tStats =
3×11 table
CAPE_Bin GroupCount mean_Temp std_Temp stdMn_Temp mean_ParticleConc std_ParticleConc stdMn_ParticleConc mean_CAPE std_CAPE stdMn_CAPE
________ __________ _________ ________ __________ _________________ ________________ __________________ _________ ________ __________
1 1 75 245.93 3.554 0.047386 64.121 90.012 1.2002 156.36 131.14 1.7486
2 2 15 248.27 3.7696 0.2513 11.246 12.918 0.86118 653.97 109.81 7.3209
3 3 59 249.2 3.7406 0.063401 5.6333 10.606 0.17975 1781.2 640.89 10.862
>>
The basics to produce the figure; looks like will need to do a lot of clean up and possibly other selection of binning intervals or some other presentation to be useful.
Oh -- the legend is also simple enough, just write the desired strings.
I wrote a little helper function plotrows to be able to call from splitapply with the grouping variable; you could always just write a straightahead loop selecting the bin variable subset in turn. It looked like
>> type plotrows
function hL=plotrows(x,y,varargin)
xy=sortrows([x y]);
hL=plot(xy(:,1),xy(:,2),varargin{:});
end
The thing was to sort the data to have a smooth line; I tried getting an indexing variable by group in the table, but it didn't work as seemed as should have and I ran out of time to mess with it any further...

5 Comments

@dpb There is problem in using plotrows. Kindly tell me what does this mean?
hL=splitapply(@plotrows,tData.ParticleConc,tData.Temp,tData.CAPE_Bin);
Error using splitapply (line 132)
Applying the function 'plotrows' to the 1st group of data generated the following error:
Undefined function 'plotrows' for input arguments of type 'double'.
It means you didn't take the function I showed you and put it in an m-file of that name on your working path. You've got to have it to run the function...
@dpb what does fnstdMn means?
Above lines of code works for me if I change fnstdMn into sem. Using it as it is gives me error:
Unrecognized function or variable 'fnstdMn'.
Error in CAPE (line 19)
tStats=grpstats(tData,'CAPE_Bin',{'mean','std',fnstdMn});
Oh, I forgot I had defined an anonymous function to compute the std deviation of the mean -- it was/is
fnStdMn=@(x)std(x)./sqrt(size(x,1));
Using errorbar
errorbar(tData.ParticleConc,tData.Temp, tStats.stdMn_Temp)
Error using errorbar>checkSingleInput (line 272)
YNegativeDelta must be empty or the same size as YData.
Error in errorbar (line 135)
yneg = checkSingleInput(neg, sz, 'YNegativeDelta');
Standard error of the mean or standard deviation of the mean?
fnStdMn=@(x)std(x)./sqrt(size(x,1));

Sign in to comment.

More Answers (1)

Zhou, independently from how to make the figure, this code
Data = table2array(WORK01S1); % convert it into array
Temp = double(Data(:,1)); % extract the column of Temperature from Data
Particle = double(Data(:,2)); % extract the column of Particle from Data
Particle = Particle*1000000000; % To change kg/um3 to ug/um3
CAPE = double(Data(:,3)); % extract the column of CAPE from Data
seems much too complicated. I suggest perhaps
Temp = WORK01S1.Temp;
Particle = WORK01S1.ParticleConc*1000000000;
CAPE = WORK01S1.CAPE;
But more to answer your question:
Use the table. It will make your life easier.
>> WORK01S1 = readtable("WORK.xlsx");
>> WORK01S1.binnedCAPE = discretize(WORK01S1.CAPE,[0 500 900 Inf],"categorical");
>> WORK01S1.binnedParticleConc = discretize(WORK01S1.ParticleConc,10,"categorical");
>> meanTemp = varfun(@mean,WORK01S1,"GroupingVariables",["binnedCAPE" "binnedParticleConc" ],"InputVariables","Temp");
>> sem = @(x) std(x)/sqrt(length(x));
>> semTemp = varfun(sem,WORK01S1,"GroupingVariables",["binnedCAPE" "binnedParticleConc" ],"InputVariables","Temp")
>> meanTemp.SE = semTemp.Fun_Temp
meanTemp =
13×5 table
binnedCAPE binnedParticleConc GroupCount mean_Temp SE
__________ ____________________ __________ _________ _______
[0, 500) [0, 3.8e-08) 45 -26.889 0.52438
[0, 500) [3.8e-08, 7.6e-08) 13 -28.538 0.93106
[0, 500) [7.6e-08, 1.14e-07) 5 -29 0.89443
[0, 500) [1.14e-07, 1.52e-07) 2 -22.5 0.5
[0, 500) [1.52e-07, 1.9e-07) 2 -27.5 1.5
[0, 500) [1.9e-07, 2.28e-07) 2 -24.5 2.5
[0, 500) [2.66e-07, 3.04e-07) 2 -23 1
[0, 500) [3.04e-07, 3.42e-07) 3 -27.667 3.6667
[0, 500) [3.42e-07, 3.8e-07] 1 -26 0
[500, 900) [0, 3.8e-08) 14 -24.429 0.99291
[500, 900) [3.8e-08, 7.6e-08) 1 -29 0
[900, Inf] [0, 3.8e-08) 56 -23.661 0.48031
[900, Inf] [3.8e-08, 7.6e-08) 3 -26.333 3.6667
Now you can make the plot. One way is to loop over CAPE bins, where at each iteration, you get the data for one CAPE bin:
for cape_i = categories(meanTemp.binnedCAPE)'
i = meanTemp.binnedCAPE == cape_i;
Temp = meanTemp.mean_Temp(i);
Conc = meanTemp.binnedParticleConc(i);
% make plot
end
Of course, your data doesn't really support your figure, but that's for you to reconcile.

4 Comments

"Now you can make the plot. One way is to loop over CAPE bins..."
Peter, why aren't there more plotting routines with options for grouping -- or methods tied in with rowfun, varfun, splitapply? gscatter is handy, but is lone in its arena -- although one could go back and set the 'linestyle' there; I almost did that.
I got frustrated that with grouping variables there's no way to sort the groups effectively so the plot will show up in ordered fashion. I built an index variable that was the order vector in the group thinking I could pass it to an anonymous function using plot() but it didn't work as expected and I ran out of time to try to figure out what was actually going on. One thing in these exercises is they're not easy to debug... :)
Point taken, thanks. This really should be easier, I will see what can be done about that.
Thanks...I know there's more to do than time/resources to do it, but... :)
"... with grouping variables there's no way to sort the groups effectively ..."
This one ties in with the previous complaint/suggestions I've made that one can't return the sorting index from sort except as the second, optional output and there's no way to use anything except the first return in the anonymous function effectively for these purposes. I've built a customized sorti that does return the index first; I think there should be a flag parameter in sort for the purpose.
I still don't follow why my use of
@(x,y,ix) plot(x(ix),y(ix))
in splitapply wasn't in sorted order after building that sort index by group and passing it in the anonymous function, but don't have the time just now to pursue it further. I must've made a bonehead blunder somewhere, but it wasn't apparent where that was.
In the case where duplicate values are to be assigned the same index, then findgroups() returns sort order.

Sign in to comment.

Asked:

on 13 Jun 2022

Commented:

on 14 Jun 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!