A distribution binning problem

Question

0 votes

Hi,

I have a problem in which I need to count the number of (probabilistic) occurrences falling in each non-uniform interval of a distribution.

While the final goal itself can apparently be achieved with histcounts, my problem is upstream, that is with the data about the population to be binned, which is not given by a sample, but by some known parameters about the (entire) population. Here is an exemplification, using packages and their weight to illustrate the problem, which is more general. N.B. I do have the Statistics and Machine Learning toolbox, but I'm not an expert statistician myself.

I have a set of N = 100 packages, and their total weight, W = 1000 kg. Let's say that we know how the weight of packages is distributed (about the mean), and that the variance is also a known, exogenous, parameter. The minimum and maximum weight of the packages in the lot is also known. To recap:

Number of packages N = 100;

Total weight W = 1000 kg;

minimum weight of package wmin = 2 kg;

maximum weight of package wmax = 20 kg;

mean weight = mu = W/N = 1000 kg/100 = 10 kg

variance = sigmacap = 4 (exogenously determined)

distribution of weights about the mean = N(mu,sigmacap) in case of normal distribution

With the above input, how should I proceed in having a (probabilistic) count of how many packages will fall in unqually spaced weight intervals of the type 2-5, 5-10, 10-12, 12-16 and 16-20 kilograms?

Thank you very much for any help or lead you can offer.

Daniele

5 Comments
Show 3 older comments Hide 3 older comments

John D'Errico on 7 Nov 2019

Edited: John D'Errico on 7 Nov 2019

Open in MATLAB Online

Let me be more clear, without totally doing your homework. What would this

normcdf(2,mu,sigmacap)

tell you? Then, ask what does this mean?

normcdf(5,mu,sigmacap)

What are those calls doing? (Hint: each of them can be interpreted as probabilites, although more explicitly, they are the area under a Normal PDF. But what would they mean in your problem?)

Now, what would the difference between those results imply? (Hint: it could also be interpreted as a probability.)

Now, you have N such packages. If you multiplied the above difference by N, what would that mean?

You are asking what fraction of events in different categories happen out of a total of N events. It you know the probability of that event arising, and you know the sample size, then what is the expected number of such events?

I'm sorry if I am not giving you explicit code to compute what you are asking (really, I almost did that if you look at what I wrote in this comment), but these are very basic questions about probability. If you are unable to answer basic questions about probability, then you really do need to crack those notes on probability, or maybe a simple book. You won't be able to handle the harder stuff when it comes up, if the most basic stuff is tossing you a curve.

Daniele Rocchetta on 7 Nov 2019

Now, this is a lot more cut down to a size I think I can handle, rather than going through half the probability theory just to have one isolated problem solved. I'll take it up as a challenge, and I'll put my head into it starting tonight, after work. Whatever knowledge of stats I had, it has been left to rust for far too many years.

Thanks for your time and good leads; will revert once through.

Kind regards

Daniele

Daniele Rocchetta on 7 Nov 2019

John,

thanks to your clues I managed to put down the code I needed to answer my question. It was, after all, a good idea to ask. I'm putting it in a separate answer below for anyone interested.

If you have any further observation, it is of course welcome.

Thanks again.

All the best

Daniele

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Daniele Rocchetta on 7 Nov 2019

Open in MATLAB Online

0 votes

Inspired and encouraged by John D'Errico advice above, I post below the code that solves the submitted distribution problem.

N = 100; % <- number of packages
W = 1000; % <- total weight in kg
mu = W/N; % <- average weight
sigmacap = 4; % <- variance (exogenously determined)
wmin = 2; % <- minimum weight of package in kg
wmax = 20; % <- maximum weight of package in kg
interValues = [wmin 5 10 12 16 wmax]; % edges of the weight bins
pd = makedist('Normal',mu,sqrt(sigmacap)); % <- create a normal distribution with the given parameters;
pdt = truncate(pd,wmin,wmax); % <- truncate the distribution to exclude packages < 2 kg or > 20 kg;
% each element in packCount represents the expected number of packages in the weight range (bin) [interValues(i) interValues(i+1];
packCount = NaN(1,numel(interValues)-1);
for i = 1:numel(packCount)
    packCount(i) = round(diff([cdf(pdt,interValues(i)), cdf(pdt,interValues(i+1))])*N);
end

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

A distribution binning problem

5 Comments
Show 3 older comments Hide 3 older comments

Accepted Answer

0 Comments
Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Products

Tags

Community Treasure Hunt

A distribution binning problem

5 Comments Show 3 older comments Hide 3 older comments

Accepted Answer

0 Comments Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Products

Tags

See Also

Community Treasure Hunt

5 Comments
Show 3 older comments Hide 3 older comments

0 Comments
Show -2 older comments Hide -2 older comments