ttest and confidence interval

14 views (last 30 days)
petros bomb
petros bomb on 18 Nov 2021
Edited: Adam Danz on 22 Nov 2021
ttest returns the confidence interval for some (1-a) probability.
Shouldn't that interval be exacly the same interval with [-tc*s/sqrt(n)+μ,tc*s/sqrt(n)+μ]?
where tc is the critical t value for 1-a probability, s the standard derivation of sampe (s=sqrt(var(Table)/(n-1)), μ the theoritical mean value and n the number of elements?
Thank you for your time.
There is the code also:
n=30;
tau=10;
ValuesTable=exprnd(tau,n,1);%30 values from exponential with mean=tau=10
%mean and standard derivation of sample
m=mean(ValuesTable);
s=0;
for i=1:n
s=s+(ValuesTable(i)-m)^2;
end
s=sqrt(s/(n-1));%standard derivation
%tcritical for a=0.05 and n=29 degrees of freedom
%tc(degrees of freedom=29 and 1-a/2=1-0.05/2=0.975)
tc=2.045;
%interval from equation
c=[-tc*s/sqrt(n)+tau,+tc*s/sqrt(n)+tau];
%interval from ttest
[~,~,d,~]=ttest(ValuesTable,tau,'Alpha',0.05);
fprintf("Interval1: [%f,%f] \nInterval2: [%f,%f] \n",c(1),c(2),d(1),d(2));
Interval1: [6.401967,13.598033] Interval2: [7.313208,14.510082]

Answers (1)

Adam Danz
Adam Danz on 18 Nov 2021
Edited: Adam Danz on 18 Nov 2021
I assume you're intentionally avoiding the std() function. However, your calculation of standard deviation is incorrect.
s=sqrt(s)/(n-1);
should be
s=sqrt(s/(n-1));
Secondly, your critical value is only correct for a standard normal distribution. Matlab's calculation of the critical value correctly multiplies by the standard error using
crit = tinv(1 - alpha, df) .* ser;
Lastly and most importantly, the confidence interval computed with this method for your data is meaningless or, even worse, misleading. This method of computing a CI assumes a normal distribution and your data are clearly nowhere close to being normally distributed.
I recommend using bootstrap confidence intervals which do not carry a distribution assumption.
This demo estimates the median (since the mean is heavily influenced by the tails in non-normal distributions). It computes the median 1000s times with bootstrapped samples and returns the middle 95% of the distribution of medians thanks to the central limit theorem.
% ValuesTable computed using rng(999) for reproducibility
ci = bootci(1000, {@median, ValuesTable}, 'type', 'per', 'alpha', .05);
figure()
histogram(ValuesTable,20)
h(1) = xline(ci(1), 'r-', 'LineWidth', 2, 'DisplayName','LowerCI');
h(2) = xline(ci(2), 'm-', 'LineWidth', 2, 'DisplayName', 'UpperCI');
h(3) = xline(median(ValuesTable), 'k--', 'DisplayName','median');
legend(h)
Lastly, let's compare the 95% CIs beformed by ttest and by bootstrapping on your data. The black dashed line is the mean of the population. The red lines are the 95% CI computed by bootstrapping. The dashed black lines are 95% CI computed by ttest. The ttest CIs are similar to the bootstrap CIs but shifted leftward. Since the ttest CIs are computed using std and since std are affected by outliers which appear as a rightward tail, the ttest CIs are not as reliable as the bootstrap results. In fact, the bootstrap results using the percentile method will always be either as reliable (in the case of normally distributed data) or more reliable (in all other cases) than using methods that require normal distributions.
  4 Comments
Adam Danz
Adam Danz on 19 Nov 2021
Well done figuring that out.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!