Request for help computing convolution of random variables via fft

Inspired by the answers of @David Goodmanson to this earlier question, I have been trying to understand exactly how in general to compute the pdf of a convolution of random variables (RVs) using ffts. The attached script shows two of the examples that I have attempted. In one example the fft produces the convolution pdf perfectly, but in the other it doesn't, and I'm hoping someone can identify the problem with my method.
The first example looks at the convolution of several identical exponential RVs. Here, the convolution pdf computed via fft agrees perfectly with the corresponding known gamma pdf for the convolution.
The second example shows the convolution of two different normal RVs, but the convolution pdf computed via fft does not agree with the corresponding known normal convolution. Actually, the fft convolution does have exactly the right shape, but it is offset relative to the true convolution distribution (i.e., it has a different mean).
I've also tried the fft approach with a number of other example distributions, and the same sort of shift problem that I'm having with the normals is quite common. In any particular example I can easily get rid of the shift in an ad hoc manner, but I haven't yet hit on a general method that works for any arbitrary set of RVs (identical or not).
As I have not worked with fft's before, I am probably missing something quite fundamental. It would be great if one of the experts in this group could get me straightened out.
Thanks very much for any hints.

 Accepted Answer

Hi Jeff,
It's nice to see a contribution of mine from almost four years ago come up, as just one of many people who have made contributions to this site and get referenced as well. Besides providing answers to many queries, the site appears to have become a searchable source of information.
The reason that the first conv example works and the second doesn't is due to the different x arrays used to create the pdfs. Suppose the fft transforms y = y(x) to z = z(ilam) (ilam being inverse wavelength, which seems to have no common symbol for it). The fft puts z( ilam=0 ), the DC value, as the first element in the resulting z array. Similarly the fft assumes that y( x=0 ) is the first element in the y array. If x=0 is somewhere else, then z picks up a phase shift in the ilam domain that can lead to incorrect results.
Your first example uses the correct x array and the method works. The second example starts out at a negative value of x to accommodate the gaussian, leading to problems as you can see.
The most straightforward approach to allow negative values of x is to start with an x array with x=0 at the midpoint (element n/2+1 for even n). Define y(x) and then use ifftshift to put the DC y value at y(1). I tend to think of the first array as a plotting array and the shifted array as the real array. After some ffts and then an ifft to get back in the x domain, fftshift up to put y( x=0 ) back in the center. For example,
N = 10000;
x0 = 5;
delx = 2*x0/N;
delilam = 1/(N*delx);
x = (-N/2:N/2-1)*delx;
mu = .1;
sig = .2;
m = 6; % number of convolutions
mp1 = m+1;
y = (1/(sqrt(pi)*sig))*exp(-(x-mu).^2/sig^2);
% exact result
yconv = (1/(sqrt(pi*mp1)*sig))*exp(-(x-mp1*mu).^2/(mp1*sig^2));
% by fft
z = fft(ifftshift(y)).^mp1*delx^mp1;
yconvfft = fftshift(ifft(z))*N*delilam;
% in the following plot the original gaussian is scaled down
figure(1); grid on
plot(x,y/2,x,yconvfft,x(1:60:end),yconv(1:60:end),'o')
I1 = trapz(x,y)
I2 = trapz(x,yconv)
I3 = trapz(x,yconvfft)
For normalization, since you are using the fft (which only does sums) to approximate an integral, there is a factor of the array spacing del_x for each transform. Coming back there is one ifft so there is a single factor of del_ilam. Except the ifft automatically divides by the number of points N, so you need to multiply by N to make up for that.
I do have one question. In the second example you have 2^13 = 8192 points instead of, say, 10000. Could you let me know the rationale behind using 2^n points?

12 Comments

Hi David,
Thanks very much for your reply. It is very helpful to know that the crucial aspect is whether the x array crosses 0. (Convolving various uniforms, it appears that no special handling is needed if the x's are either all positive or all negative--just if there are some of each.)
I will try to implement your suggestion of putting 0 in the center of the x's where there are both positives and negatives.
As to your question about 2^13...well, I must admit that I am floundering in the tall, tall grass here. I've seen several descriptions of fft suggesting that it works better when the number of points is a power of 2, so I just used 2^13 points on that basis.
I also want to second your comment about this site. I often search it and find helpful tips and explanations. Hats off to MATLAB for supporting this resource.
Hi Jeff,
I don't know if you meant this the way that it sounds, but the important thing is not just that x have all positive or all negative values. For example, it would not work to define the array x = 5:.001:8, calculate y = pdf(x) and then do an fft directly on that. In that case y(1) corresponds to x=5, not x=0. In the ifftshift procedure, what ends up as y(1) does correspond to x=0.
Hi David,
Hmmm...well, yes, I did mean it the way that it sounds, so I am still not understanding the requirements of fft. Are you saying that the x array must always start at 0? That seems like it would be computationally inconvenient if all of the probability was associated with x values far from 0 (e.g., x within the range 1000 to 1001). Does one then need the x vector to start with a ton of 0's before hitting some probability out at 1000?
Thanks for your patience.
MATLAB supports non-uniform FFT, nufft() these days, since R2020a.
The Algorithms section implies that they have used an implementation that pays attention to efficiency.
Hi Jeff,
Yes, the x array has to start at 0. Of course you are always free to set up a new shifted coordinate system whose origin is at 1000 in the old coordinate system. That avoids the problem you are talking about. But it depends on what you want to do. Let's say the mean of a pdf is at 1000.5 in the old system. Then it's at .5 in the new system. If you convolve two of those pdfs, the mean of their convolution is 2001 in the old system and 1 in the new system.
If you want to demonstrate 2001 in the old system, you will have to start at x = 0 and go out to something like 2100, which is indeed inconvenient. If you just want to look at the shape of the convolution and not worry about the overall mean, then the new system is a lot easier. At some point it just makes sense to take as a given that
mean(conv(pdf1,pdf2, ... pdfn)) = mean(pdf1) + mean(pdf2) + ... mean(pdfn)
and work with pdfs whos means are small.
Thanks for all your help, David. It does seem to be working now, with additive shifts on all RVs to a new coordinate system where they all start at zero. I'd like to do a little more testing before accepting your answer, but I've looked at quite a few cases so far and all look perfect.
FWIW, I do want to be able to look at the exact distribution, not just the general shape of the convolution. For example, I want to compute maximum likelihood parameter estimates so I need to compute pdfs of individual x's.
Jeff,
Can you post an example using this method with two different distributions that both require the additive transformations as part of the process?
Paul, Sure, I will clean up the code a bit and do that.
added later:
* I have now attached a script with 2 examples, to this comment.
* Cupid will soon have a general convolution distribution based on this technique.
* In various timing tests, this FFT-based convolution is about 50 times faster than computing convolutions of two RVs by integration, and the speed gains over integration are magnified enormously as the number of convoluted RVs increases.
This Answer thread started with suggestions to use fftshift and ifftshift. Frankly, I couldn't understand how those functions were relevant, which was part of my motivation to answer below. Now I see that your final code doesn't use those functions. Were they really relevant to the problem?
Will the RVs be allowed to come from different families of distributions, e.g., normal + exponential?
Suppose you have two normal RVs, both with sigma = 1, but mu1 = -1e5 and mu2 = 1e5. If I undertand your code correctly, you'd end up with an X vector of very many points to cover all of the nearly zero values in between the means. Is that a concern?
I'm probably not the best person to answer your question about fftshift and ifftshift, since I'm barely functional with fft in the first place. As I understand it, the key was to transform all RVs so that they would have minima of 0.
Yes, the RVs will be allowed to come from different families (and there are lots of same-family cases where the closed form of the sum isn't known, e.g. exponentials with different rates [I think]). The two examples in the answer file don't reflect that, but I chose them specifically because the sum distribution was known and could be used to show the accuracy of the method. My actual immediate interest is in the convolution of an exponential with various non-exponential distributions (gamma, lognormal, logistic, etc). But I thought it would be nice to have the method worked out as generally as possible, in case new problems arose.
Regarding the two RVs with wildly different ranges, actually no, the X vector would not have a lot of zero values between the means. The X vector actually just represents steps from each RV's minimum to its maximum. The more problematic case would be summed RVs with vastly different variances, because in that case the small-variance RV's density would be concentrated at the beginning of the X vector, with a lot of zeros at the end (i.e., the X vector would need far more steps to "cover" the range of the large-variance RV). In such a case the large-variance RV would dominate the variance of the sum, though, so one might reasonably approximate the sum as a shifted version of the large-variance RV (shifted by the mean of the small-variance one).
Anyway, thanks very much for your interest in the problem, as well as your many illuminating answers throughout this forum.
I guess I didn't understand the code as well as I thought!
But it does appear that each pdf is evaluated over the same X vector (M(i,:)), which might be a lot longer than it needs to be to evalaute any one particular pdf. Probalby only a very small effect on efficiency.
At the end of the day, it looks like the result will be an approximation to the true answer that is piecewise linear with a finite interval of support. Any concerns about missing the tails? Or put another way, does every density under consideration have a well defined xmin/xmax outside of which you're comfortable assuming the density is zero?
Wil lthe code have to adaptively pick NxPoints to ensure that delx = X(2) - X(1) is small enough to adequately capture the features of all of the densities under consideration?
I guess the answers depend on what is actually going to be done with the result. Make a plot? Compute other figures-of-merit, like the third moment, etc.
Interesting problem.
Yes, every pdf is evaluated over the same X vector of increments to its minimum value. As I understand it, that's required to get the input needed by fft.
And yes, the end result will be a piecewise linear approximation over a finite support interval. To address concerns about missing the tails, the user can extend the bounds out further and pay the associated computation time penalty. With some trial and error it would presumably be clear that the results didn't change much once the bounds were extended out "far enough", so the analyst could be satisfied that the results were a useful approximation.
I imagine that the user would always want to pick NxPoints depending on the distributions and computational goal, probably after trying various values to how many are enough to achieve the required resolution. My rough impression is that computation time seems to increase only about linearly with NxPoints, so it isn't necessarily going to be that computationally expensive to get a very good approximation within the finite interval examined.
FWIW, what I'm actually trying to do with this machinery is to get maximum likelihood parameter estimates with some convolutions for which I can't find closed-form pdf expressions. I was previously doing that by numerical integration, but this is much, much faster with hardly any loss of accuracy.

Sign in to comment.

More Answers (1)

Here is one option, illustrated with two widely separated normal pdfs:
mu1 = 10; sigma1 = 2;
mu2 = -5; sigma2 = 1;
Zextreme = 4;
delx = 0.01;
% Determine range of x values
min1 = mu1 - Zextreme*sigma1;
min2 = mu2 - Zextreme*sigma2;
max1 = mu1 + Zextreme*sigma1;
max2 = mu2 + Zextreme*sigma2;
x1 = min1:delx:max1;
x2 = min2:delx:max2;
x = mu1 + mu2 + (-Zextreme*(sigma1 + sigma2):delx:Zextreme*(sigma1 + sigma2));
% Get the pdfs of the individual distributions and of the sum, which is the
% convolution of the individuals
M1 = normpdf(x1,mu1,sigma1);
M2 = normpdf(x2,mu2,sigma2);
truepdf = normpdf(x,mu1+mu2,sqrt(sigma1^2+sigma2^2));
% Check the computations so far by examining the true distributions
% of the individual RVs and their convolution
plot(x1,M1);
hold on
plot(x2,M2);
plot(x,truepdf);grid
legend('Normal 1', 'Normal 2', 'Convolution');
% approximate the continuous convolution with a convolution sum in the time domain
convpdf = conv(M1,M2)*delx;
shift1 = round(x1(1)/delx);
shift2 = round(x2(1)/delx);
% approximate the continuous convolution with a convolution sum computed via the frequency domain
N = numel(convpdf);
fftM1 = fft(M1,N);
fftM2 = fft(M2,N);
fftpdf = fftM2.*fftM1;
fftpdf = real(ifft(fftpdf));
% compare
subplot(311); plot(x,truepdf);grid
subplot(312); plot( ((0:N-1) + shift1 + shift2)*delx,convpdf),grid
subplot(313); plot( ((0:N-1) + shift1 + shift2)*delx,real(fftpdf)*delx),grid
set(get(gcf,'Children'),'XLim',[x(1) x(end)]);

3 Comments

Thanks for the suggestion, Paul--that's a nice clean example. But it's important for me to have a method that extends to k>2 RVs, and I'm not exactly how to do that with your approach. I guess this step would be no problem:
fftpdf = fftMk.*fftM2.*fftM1;
but I'm not sure how to get N for k>2. Would that require a loop of successive conv's?
Speed is also an issue, and I'm wondering if it would be slower to do a separate fft for each RV (as in your example) compared with the single fft required with David's method. Obviously I have a lot of work to do...
I think that one important factor is if you are taking the sum of RVs that are i.i.d. as in your Example 1 and in @David Goodmanson's example or if the RVs are not i.i.d. as in your Example 2 and in the example I posted.
I definitely want to handle cases where the RVs are independent but have different distributions. David's method seems to be working with the appropriate transformations (see my reply to him).

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!