A distribution binning problem
Show older comments
Hi,
I have a problem in which I need to count the number of (probabilistic) occurrences falling in each non-uniform interval of a distribution.
While the final goal itself can apparently be achieved with histcounts, my problem is upstream, that is with the data about the population to be binned, which is not given by a sample, but by some known parameters about the (entire) population. Here is an exemplification, using packages and their weight to illustrate the problem, which is more general. N.B. I do have the Statistics and Machine Learning toolbox, but I'm not an expert statistician myself.
I have a set of N = 100 packages, and their total weight, W = 1000 kg. Let's say that we know how the weight of packages is distributed (about the mean), and that the variance is also a known, exogenous, parameter. The minimum and maximum weight of the packages in the lot is also known. To recap:
Number of packages N = 100;
Total weight W = 1000 kg;
minimum weight of package wmin = 2 kg;
maximum weight of package wmax = 20 kg;
mean weight = mu = W/N = 1000 kg/100 = 10 kg
variance = sigmacap = 4 (exogenously determined)
distribution of weights about the mean = N(mu,sigmacap) in case of normal distribution
With the above input, how should I proceed in having a (probabilistic) count of how many packages will fall in unqually spaced weight intervals of the type 2-5, 5-10, 10-12, 12-16 and 16-20 kilograms?
Thank you very much for any help or lead you can offer.
Daniele
5 Comments
John D'Errico
on 7 Nov 2019
This is not a question about MATLAB, but a question about statistics. You don't need to be an expert, but you do need to understand what the CDF of a normal distribution tells you, and how to use it. (I said a normal distribution, because you explicitly stated normality. If you did not know the distribution, then nothing can be done along these lines anyway.)
So, what does the Normal CDF tell you? And how does it help you? It is your homework, not mine. I'll even add a bit. If you take the difference betwee the normal CDF at two points, what would that tell you?
Daniele Rocchetta
on 7 Nov 2019
John D'Errico
on 7 Nov 2019
Edited: John D'Errico
on 7 Nov 2019
Let me be more clear, without totally doing your homework. What would this
normcdf(2,mu,sigmacap)
tell you? Then, ask what does this mean?
normcdf(5,mu,sigmacap)
What are those calls doing? (Hint: each of them can be interpreted as probabilites, although more explicitly, they are the area under a Normal PDF. But what would they mean in your problem?)
Now, what would the difference between those results imply? (Hint: it could also be interpreted as a probability.)
Now, you have N such packages. If you multiplied the above difference by N, what would that mean?
You are asking what fraction of events in different categories happen out of a total of N events. It you know the probability of that event arising, and you know the sample size, then what is the expected number of such events?
I'm sorry if I am not giving you explicit code to compute what you are asking (really, I almost did that if you look at what I wrote in this comment), but these are very basic questions about probability. If you are unable to answer basic questions about probability, then you really do need to crack those notes on probability, or maybe a simple book. You won't be able to handle the harder stuff when it comes up, if the most basic stuff is tossing you a curve.
Daniele Rocchetta
on 7 Nov 2019
Daniele Rocchetta
on 7 Nov 2019
Accepted Answer
More Answers (0)
Categories
Find more on Creating and Concatenating Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!