Somewhat lengthy question on distribution fitting

Question

0 votes

My apologies in advance for what I expect to be a lengthy background section leading up to my question.

I'm working on a set of decision analysis methodologies for making choices about alternatives that arrive asynchronously over a time period. One of the methods requires a cumulative distribution function (CDF) for one aspect of the alternatives. In exploring the performance of the methods, I try several different methods of generating the CDF for some sample data:

Using an empirically-derived CDF generated by MatLab that precisely fits the observed data
Using an approximation via a triangular distribution (since those are easy when data is scarce)
Using an approximation that uses a "standard" distribution to fit the observed data

My question regards that third method. I have some sample data, and in my first shot at this I used the Arena Input Analyzer tool against three data sets. For two of the three it suggested distributions that performed almost as well as the empirically-derived exact fit. These were

25 + exponential(261)
127 + exponential(1030)

For the third data set though, it suggested LogNormal(1.96, 3.23) which worked like a dog....using that CDF literally performed worse than just flipping a coin at each decision point.

So I figured I'd use MatLab to fit distributions and see if I got better results. And for the one that Arena missed badly on, given the exact same text file of input data, MatLab suggested LogNormal(0.0185, 1.1458)...note the significantly different parameters. This worked like a champ-again as good as the empirical one. So I figured I'd go on with MatLab to fit the other two data sets. What MatLab suggested was

LogNormal(5.2912, 0.8789)
LogNormal(6.7327, 0.8078)

And these two were dogs! My suspicion is that it has something to do with that "offset" that you see in the Arena suggested distribution. MatLab seems to be trying to only fit to a "straight" distribution with no offset term like that.

So here's my question: is there a way to get MatLab to identify an offset term in examining a data set for distribution fitting?

Ideally my final methodology will just involve running a fit (if you have historical data to fit to) which I think is a fairly low bar. If you first have to examine the data and determine an appropriate offset manually and then adjust all the data to account for it, I think it's of less use.

I hope this makes sense, and that I haven't bored you sleep yet. Any help would be greatly appreciated.

4 Comments
Show 2 older comments Hide 2 older comments

Image Analyst on 21 Apr 2015

Nonetheless, I agree with Star - screenshots would help us visualize, even if it's just for one example set of data.

Jeremy Hendrix on 21 Apr 2015

Okay....here is a screenshot of the output I get from the Arena Input Analyzer

And using the MatLab Distribution Fitting Tool on the same data file

They choose significantly different binning. Arena estimates the distribution as 25 + expo(261) while MatLab return LogNormal(5.2912, 0.8789)

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Image Analyst on 21 Apr 2015

0 votes

I haven't used those functions. I've never heard of the Arena Input Analyzer. What toolbox are they in? Please list it below your question. Is it the stats toolbox or curve fitting toolbox or something else?

What does the histogram of your actual data look like? Is it more like the bars in the top plot (like an exponential decay) or in the bottom plot (like a log-normal or Poisson)?

You say: " is there a way to get MatLab to identify an offset term in examining a data set for distribution fitting?" Can you subtract the mean and then see this: http://www.mathworks.com/matlabcentral/answers/94272-how-do-i-constrain-a-fitted-curve-through-specific-points-like-the-origin-in-matlab

By the way, for what it's worth, here's an interesting File Exchange submission that has dozens of distributions: http://www.mathworks.com/matlabcentral/fileexchange/7309-randraw

3 Comments
Show 1 older comment Hide 1 older comment

Image Analyst on 21 Apr 2015

Cool - what submission is that?

Jeremy Hendrix on 21 Apr 2015

http://www.mathworks.com/matlabcentral/fileexchange/34943-fit-all-valid-parametric-probability-distributions-to-data/content/allfitdist.m

Sign in to comment.

Answer 2

Hannes Driessen on 20 May 2018

0 votes

Your confusion arrises from the fact that the parameters used for a lognormal distribution in Matlab represent the parameters from the underlying normal distribution. If you want to use those in Rockwell Arena, you'll first need to transform them into the mu and sigma from the lognormal distribution (https://en.wikipedia.org/wiki/Log-normal_distribution#Arithmetic_moments). Then you'll see that the parameters found with the Input Analyzer tool in Arena closely resemble the parameter estimates you get from Matlab.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 3

Jeff Miller on 21 May 2018

Edited: Jeff Miller on 21 May 2018

Open in MATLAB Online

0 votes

You might be able to do a lot of what you want with the routines here: Cupid

To create a standard distribution with an offset, you would write something like this:

% Create exponential distribution with an offset of 100.
mydist=AddTrans(Exponential(.01),100);

Assuming your to-be-fitted data is in an array x, you could then get maximum likelihood estimates of the exponential rate and additive offset with:

mydist.EstML(x)

Actually, in the case of an exponential plus a constant, the MLE of the constant will always be the minimum value in the data set (perhaps minus a few eps to avoid numerical problems).

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Somewhat lengthy question on distribution fitting

4 Comments
Show 2 older comments Hide 2 older comments

Answers (3)

3 Comments
Show 1 older comment Hide 1 older comment

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Tags

Community Treasure Hunt

Somewhat lengthy question on distribution fitting

4 Comments Show 2 older comments Hide 2 older comments

Answers (3)

3 Comments Show 1 older comment Hide 1 older comment

0 Comments Show -2 older comments Hide -2 older comments

0 Comments Show -2 older comments Hide -2 older comments

Categories

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments

3 Comments
Show 1 older comment Hide 1 older comment

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments