How pwelch computes confidence intervals

26 views (last 30 days)
Stephen
Stephen on 30 Apr 2020
Commented: William Rose on 26 Nov 2023
Can Mathworks please point to exactly which formulae are being used to compute confidence intervals in pwelch? Perhaps state the formulae being used in the code? This would include how exacty the number of degrees of freedom are being computed.
  3 Comments
William Rose
William Rose on 15 Feb 2021
I read one of the two books which is cited in document pwelch, which was recommended by Kiran. The book I read is Statistical digital signal processing and modeling by M.H. Hayes, 1996. The relevant portion is seciton 8.2.5, pp.415-419 (which are pages 435-439 in the electronic version). It does not give formulas for the confidence interval for the Welch periodogram. Therefore it does not answer the original question: how is the confidence interval determined.
Stephen
Stephen on 15 Feb 2021
There are several good books on this - the original Jenkins and Watt book is excellent (although out of print and not evailble in pdf form) as is a book aimed at oceanographers by Thomson and Emery (Data Analysis Methods in Physical Oceanography). The reason I originally asked this question was because I suspect that the confidence intervals given by pwelch are not quite calculated correctly. One can check this by doing spectra for a random process that has a known spectrum - Jenkins and Watt give an excellent example of such a process in their book. I wrote a script - below - that generates a time series using the auto-regressive process J&W use for their examples. With apologies for the poor coding technique (I first learned programming in Fortran 66), you will see that if one inverts the CIs that Matlab computes and swap the results between the upper and lower CIs, one gets something that looks pretty good whereas if one sticks with the CIs pwelch gives, the lower and upper bounds are both high, meaning that the lower bound is not low enough and the upper bound is too high. So, this is what makes me think there is something wrong in pwelch.
%
% Examine spectral confidence intervals using a random process with a known spectrum
% The time series used is generated via an AR1 process detailed in Jenkins and Watts on p 269
% S. Monismith
% Stanford University
% 2/15/21
%
clear
M=16;
seriesize=2^M; % length of time series
t=0:seriesize-1; % times
X=ones(seriesize,1); %
Z=randn(seriesize,1);
factor=0.5;
for j=3:seriesize
X(j)=X(j-1)-factor*X(j-2)+Z(j);
end
X(1)=X(seriesize)-factor*X(seriesize-1)+Z(1);
X(2)=X(1)-factor*X(seriesize)+Z(2);
idothese=seriesize;
%
% Key adjustment - how many segments
%
nsegs=2;
%
% Now set up and do pwelch estimation
%
windowlength=idothese/nsegs;
nfft=windowlength;
[PXX,f,PXXc] = pwelch(X,windowlength,0.5*windowlength,nfft,1,'ConfidenceLevel',0.95);
%
% Now compute bounds - note that they are multiplicative factors on known spectrum
% First, we will use what pwelch gives
%
factor1=median(PXXc(:,1)./PXX);
factor2=median(PXXc(:,2)./PXX);
subplot(1,2,1)
semilogy(f,PXX,'ko','markeredgecolor',0.7*[1,1,1],...
'markerfacecolor',0.7*[1,1,1],'markersize',4)
hold on
Pxxtheory=var(X)*(0.834)./(2.25-3*cos(2*pi*f)+cos(4*pi*f));
hold on
semilogy(f,Pxxtheory,'r','linewidth',2)
semilogy(f,Pxxtheory*factor1,'b--','linewidth',2)
semilogy(f,Pxxtheory*factor2,'r--','linewidth',2)
hold off
ylim([1e-1 1e2])
grid
set(gca,'fontsize',20)
ylabel('\Phi_{XX}','fontsize',20)
xlabel('f','fontsize',20)
title('CIs as given by pwelch','fontsize',20)
%
% Now re-do the confidence intervals with a suggested correction
%
factor1=median(PXX./PXXc(:,1));
factor2=median(PXX./PXXc(:,2));
%
subplot(1,2,2)
semilogy(f,PXX,'ko','markeredgecolor',0.7*[1,1,1],...
'markerfacecolor',0.7*[1,1,1],'markersize',4)
hold on
Pxxtheory=var(X)*(0.834)./(2.25-3*cos(2*pi*f)+cos(4*pi*f));
hold on
semilogy(f,Pxxtheory,'r','linewidth',2)
semilogy(f,Pxxtheory*factor1,'b--','linewidth',2)
semilogy(f,Pxxtheory*factor2,'r--','linewidth',2)
hold off
ylim([1e-1 1e2])
grid
set(gca,'fontsize',20)
ylabel('\Phi_{XX}','fontsize',20)
xlabel('f','fontsize',20)
title('CIs inverted and swapped','fontsize',20)

Sign in to comment.

Answers (2)

Kiran Felix Robert
Kiran Felix Robert on 5 Feb 2021
Hi Stephen,
Refer the pwelchfunction documentation for the details.
  2 Comments
William Rose
William Rose on 14 Feb 2021
Kiran, thak you for this answer but plese see my request above for more info. Thanks.
William Rose
William Rose on 15 Feb 2021
Kiran,
I read the source code for pwelch.m (C:\Program Files\MATLAB\R2018b\toolbox\signal\signal\pwelch.m). It does not reveal how the confidence interval is computed. pwelch.m calls welch.m (C:\Program Files\MATLAB\R2018b\toolbox\signal\signal\+spectrum\welch.m), which I inspected. It calls another version of pwelch() which I oculd not find, probably because it is in a compiled library. Therefore I would appreciate an actual answer to Stephen's question. Thank you.

Sign in to comment.


William Rose
William Rose on 26 Feb 2021
The 2-sided confidence interval (C.I.) with probability p on the power spectral denisty (p.s.d.), which is returned by pwelch(), is given by
Pxxhat(f)*k/chi2((1+p)/2,k) < Pxx(f) < Pxxhat(f)*k/chi2((1-p)/2,k)
where Pxxhat(f) is the experimental estimate of the p.s.d. at frequency f, and Pxx(f) is the true, but unknown, value of the p.s.d., and k is the degrees of freedom. This is analagous to a confidence interval for a variance. The degrees of freedom is given by
k = 2*K
where upper case K is the number of segments or windows used when esitmating the p.s.d.
If the window length and offset are chosen so that the windows fit the signal without any leftover bits, then
K=(N-L)/D+1
where N=sinal length, L=window length, D=offset. (Don't confuse overlap and offset. Overlap=L-D. pwelch() takes overlap as an argument, but Welch (1967) uses offset=D in his formulas.
Example: Signal length=1024, window=Hamming, with length 256, half-overlapped. Then N=1024, L=256, and D=offset=128. Therefore K=7, by the formula above, and we can verify that 7 half overlapped windows cover the signal exactly. Then k=2*K = 14.
Prob{Pxxhat*14/chi2(.975,14) < Pxx < Pxxhat*14/chi2(.025,14)} = 0.95
=> Prob{Pxxhat*0.536 < Pxx < Pxxhat*2.487} = 0.95
I have checked the above formulas for various combinations of N, L, D, and probability p. The formuals correctly reproduce the confidence intervals reported by pwelch().
The formulas above are not correct from a statistical point of view, when the overlap is more than zero, but they are the formulas that pwelch() uses. The error is that k, the degrees of freedom, should be somewhat less than 2*K when the windows overlap. The exact formula is given by Welch, IEEE Trans Audio Electroacoustics AU-15:70-73, 1967, https://ieeexplore.ieee.org/document/1161901. It is complicated so I will not reproduce it here. It has a varible number of terms, depending on window shape and amount of overlap. The C.I. from pwelch() does not take window shape or overlap into account. pwelch()'s confidence intervals are correct when the windows do not overlap. The error in the confidence interval reported by pwelch() is relatively small when using Hamming or Hann or similar windows at half-overlap. In general, the C.I. reported by pwelch() is narrower than it should be. See table below:
The errors in the confidence intervals reported by pwelch() become more severe if the overlap is greater. By more severe I mean that the C.I. from pwelch() is significantly more narrow that than the true C.I.
  5 Comments
William Rose
William Rose on 25 Nov 2023
No problem.
The estimate of the power spectrum at a specific frequency, when scaled appropriately, has an approximately chi-squared distribution.* Therefore the formula that says the 95% C.I. is approximately twice the standard deviation does not apply. That formula is for estimates that have a normal distribution - such as the mean of a set of measurements, which is approximately normal, due to the central limit theorem.
* I say "approximately" because, if overlapped segments are used to compute the spectrum, as is usually done, then the different components of the estimate are not all independent - due to the partial overlap. A true chi-squared random variable is the sum of the squares of independent standard normal random variables.
William Rose
William Rose on 26 Nov 2023
You said you are trying to understand the CIs given by pwelch(). I too am interested in this topic.
You may wonder if there is a simple formula, comparable to
which relates μ and σ for a group of power spectrum estimates to the upper 95% C.I. of the power spectrum. I do not think such a formula exists. The C.I. returned by pwelch() uses the formula I gave in an earlier comment. I have done millions of Monte Carlo simulations of power spectrum estimates. I conclude that the formula which pwelch() uses for confidence intervals is accurate when there is no overlap of segments in the spectrum estimate. The CI returned by pwelch() is somewhat narrower than it should be, when overlapping segments are used to estimate the spectrum. In other words, the actual 95% C.I. is slightly wider than the C.I. returned by pwelch().
If you wish to discuss this further, please email me separately by clicking on the envelope at the upper right corner of the box that pops up when you click on my name.

Sign in to comment.

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!