Generating a population randomly starting from some parameters

5 views (last 30 days)
Hello everybody, I am quite a beginner and I am sorry if my question seems trivial, but I hope somebody will help me.
Let's assume I have a quantity Q which is function of n inputs Xi:
Q = f(X1, X2, ... Xn)
Now, let's assume that some of these inputs are distributed according to a Gaussian. Thus, for example, X1, X2 and X3 are randomly distributed with a well define mean value and standard deviation, while X4 ... Xn are instead assumed to be constant.
I know how to generate randomly the populations X1, X2 and X3 on Matlab, with a command that should implement implicitly a Monte Carlo approach:
pop_X1 = X1_nom + randn(N,1) * X1_dev;
pop_X2 = X2_nom + randn(N,1) * X2_dev;
pop_X3 = X3_nom + randn(N,1) * X3_dev;
However, how do I generate Q taking into account all these input populations variations? Can I simply apply the function f aligning the vectors of X1, X2 and X3 previously generated?
Thanks!!
Paolo
  1 Comment
Image Analyst
Image Analyst on 20 Feb 2015
Why do you need to "align" the X? Who says f needs them aligned? You can "align" them if that's what you define f as, or you might not , again depending on what you define the function f as.

Sign in to comment.

Accepted Answer

John D'Errico
John D'Errico on 20 Feb 2015
Edited: John D'Errico on 20 Feb 2015
This is sometimes known as statistical tolerancing, to identify the distribution of a function of some random variable, or a set of random variables.
Monte Carlo simulation is indeed one way you can solve the problem, at least approximately so. Thus, just evaluate the function Q at all of those sets of values, then plot a histogram, etc.
There are other methods you can use to compute or approximate the distribution of Q. One can use Taylor series approximations, or even Taguchi methods. (I wrote several papers on the topic of modified Taguchi methods for this exact purpose. Find one of them here in Technometrics .)
A problem is that it is often true that if your function Q is nonlinear, that no common distribution exists to describe the distribution of Q. In those cases, you might use a member of the Pearson or Johnson families of distributions to yield an approximate distribution.
Edit: I pointed out some ideas in my comment below. So lets look at what happens for a simple nonlinear function. How about Q(x) = x^2?
Q = @(x) x.^2;
Really, how simple can you get, and still be nonlinear?
Now, suppose that x is normally distributed, with mean 0 and variance 1. I think that anyone will see, even without any computation that Q(x) cannot look normally distributed, since Q(x) can NEVER take on negative values. It turns out that in this case, Q(X) will have a simple distribution, a Chi-squared random variable, with one degree of freedom.
ezplot(@(x) chi2pdf(x,1),[0,10])
grid on
Sometimes you get lucky. As you can see though, this distribution looks quite non-normmally distributed. The reason is Q(x) has the property that around zero, we cannot simply truncate the Taylor series of x^2 to only the linear term and below. That quadratic terms is hugely significant.
We can try to convince ourselves that I got the distribution right by a simple Monte Carlo simulation.
hist(Q(randn(10000,1)),100)
Yep. Looks close enough for government work to me.
But suppose we changed x slightly? Suppose that x was normally distributed, with mean 100, and variance 1? In fact, the true distribution here should be a non-central chi-square, but I'm feeling too lazy to generate that one. A Monte Carlo simulation will suffice.
hist(Q(100 + randn(100000,1)),100)
Locally, it turns out that the Taylor series of Q(x), expanded around the mean of x (i.e., x==100), is now pretty well approximated by a linear polynomial, so dropping the quadratic term does not hurt us too much. That is, we can write
Q(u + 100) = 10000 + 200*u + u^2
Thus, for moderately small values of u, we can truncate the polynomial to ignore the u^2 term. Q is indeed sufficiently locally linear out there when u is a Normal random variable with mean zero and variance 1. (Then u+100 will clearly have mean 100 and variance 1.)
There is much more we can do with the techniques of statistical tolerancing. I've just brushed the surface here. The multidimensional case, where Q is a function of several random variables is really no more complicated.

More Answers (2)

Greig
Greig on 20 Feb 2015
If your inputs X are distributed then Q will also be distributed. So when you say you want to determine Q, do you mean the distribution or some statistic of that distribution (i.e., central tendency, or dispersion)?
Either way, depending on what your function f() is, there may or may not be an analytical solution. If not the best solution is to use a Monte Carlo approach as you mentioned.
If we let "n" be the number of inputs for column vectors Xi and "iter" represents the number of random elements in each Xi. A matrix, X, can be generated with a single call to randn using
X = randn(iter, n) .* repmat(Sigma, iter, 1) + repmat(Mu, iter, 1);
where Sigma and Mu are (n x 1) row vectors of the standard deviations and means, respectively. Any constant columns can be added as:
X(:, ii) = ones(iter, 1) .* Constant;
where ii, is the column index to be set a value of "Constant". Then you simply have to propagated these inputs through f() to determine the distribution of Q.
As I mentioned, depending on what f() does you may be able to approximate the distribution of Q with a parametric distribution (e.g., Gaussian, lognormal etc.) and determine parametric statistics (e.g., mean, variance etc.). If it cannot be well approximated by a parametric distribution then non-parametric statistics, such as the median or inter-quartile range, can be used to quantify the distribution of Q.

PaVi90
PaVi90 on 20 Feb 2015
Edited: PaVi90 on 20 Feb 2015
Thank you both the useful replies! I am planning to stick with the basic approach: generate the vectors of Xi inputs (using the Matlab instructions that I posted in my first post, knowing their respective mean value and standard deviation) and then create the resulting vector Q simply applying the function f element-wise considering all these input vectors simultaneously.
My only concern was from the mathematical standpoint: will Matlab automatically get that the resulting function is basically a result of some randomly Gaussian distributed variables? That is, will it be able to remember this aspect while computing the distribution of f and its variance? I tried this approach and the resulting Q is normally distributed, which is something that I would like to get.
  2 Comments
John D'Errico
John D'Errico on 20 Feb 2015
Edited: John D'Errico on 20 Feb 2015
No. MATLAB will NOT automatically know anything about what you are doing. Computers are stupid. They do as we tell them. In this case, you will be sending in a stream of input values to a function. MATLAB computes a result (hopefully in vectorized form) and it returns a stream of function values. Numbers are just that: numbers.
MATLAB does not understand where the numbers came from, nor what they represent. You need to put that interpretation on a set of numbers.
So if you will do a Monte Carlo simulation to estimate the distribution of Q, then you will just compute a histogram, or compute a mean and variance, possibly also skewness and Kurtosis, use one of the distribution fitting tools, etc.
You comment that the resulting Q seems to be pretty Normally (Gaussian) distributed. In fact, this is often the case. For example, if your inputs are Normally distributed, then any linear combination of a set of normal random variables is also Normally distributed!
Ok, but suppose your function is not a linear one. What happens then? in most cases, an exact known distribution will not be available, as there are only a rather limited set of distributions, and you might supply any nonlinear transformation. HOWEVER...
Assume that the function Q is nonlinear, but at least moderately smooth. So locally, Q can be represented sufficiently well by a low order Taylor series approximation. Now, for sufficiently small variances of the inputs, that function may be adequately well approximated by truncating the Taylor series to include only the linear terms. In that case, your function can be thought of as sufficiently locally linear, that the transformation is effectively linear. And what did I say before? That for any linear combination of Gaussian inputs, the output is also Gaussian!
Therefore it is often true that the distribution of even a nonlinear transformation of a set of variables appears to be sufficiently well approximated by a Normal distribution. That only relies on the variances of your inputs being small enough compared to the higher order coefficients of the local series approximation, that those higher order terms can be effectively ignored.
Of course, for sufficiently large parameter variances, all bets are off. Then it indeed will be necessary to use more sophisticated statistical tolerancing techniques. (I'll edit my answer to show some ideas for a simple function.)

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!