Non-reproducible solution with non-linear fitting

1 view (last 30 days)
I have been trying to fit a power-law distribution: ns ~ s^-a (where ns is the probability density function; s is the empirical observation; and a is the scaling exponent) to my empirical data (see attached excel sheet) using MATLAB's fit function. However, I am having problems getting a reproducible solution. When I do not specify a startpoint, I get different results for each run -- in spite of the fact that I'm using the same dataset. Then, when I do specify a startpoint, the function gives me the same result as my specified startpoint. Also, the issue remains regardless of whether I specify an algorithm such as the Levenberg-marquardt, or if I used the default algorithm. My codes are as follows:
clc; clear; close all;
%%%%%%%%%%%%%%%%%%% Load data
[data, ~, ~] = xlsread('myDataset.xlsx',1,'A:A');
%%%%%%%%%%%%%%%%%% Histogram plot
data = sort(data);
[y, edges] = histcounts(data, 100000, 'Normalization','pdf');
edges = edges(2:end) - (edges(2)-edges(1))/2;
figure;
scatter(edges, y, 4, 'ro', 'Markerfacecolor', 'r');
hold on;
box on;
edges = edges';
y = y';
%%%%%%%%%%%%%%%%%% Specify input parameters
%(1): cut-off value
xmin = 1*10^5;
%(2): x-values after cut-off
x = edges(edges>=xmin);
ind = find(edges>=xmin,1,'first');
%(3): y-values after cut-off
y([1:ind-1]) = [];
%%%%%%%%%%%%%%%%%% fit power-law model
f = fittype('b*x.^-a');
fModel = fit(x, y, f, 'Algorithm','levenberg-marquardt');
coeffs = coeffvalues(fModel);
plot(fModel,'b--')
set(gca, 'xscale','log', 'yscale','log')
xlabel('s', 'fontsize',8)
ylabel('ns', 'fontsize',8)
The value of a is said to be around 2.189. Although I may not get this exact value - as it will depend on the dataset, I however expect to at least get values in the range of about 1.5 - 2.
I do appreciate any help, suggestions, or references that will be useful in solving the problem. Many thanks.

Answers (1)

John D'Errico
John D'Errico on 16 Oct 2019
Edited: John D'Errico on 16 Oct 2019
Do you understand that when you do not specify astart point, that fit uses a RANDOM start point? So why would you expect the same results from a random start?
When you start from the same point, you get the same result. That is not a surprise at all. Why not? Because your data is composed of unholy crap.
You are trying to fit a power model to data y = f(x) = b*x^(-a). Ok, a simple model. Trivial, right? LOOK AT YOUR DATA!!!!!!!!!
plot(x,y,'.')
untitled.jpg
That looks a little strange. Looking more closely at your data, we see:
numel(y)
ans =
93279
sum(y == 0)
ans =
93232
So, of 93279 data points, all but 47 of them have y IDENTICALLY equal to zero. 99.95% of your data is completely useless in fitting that model. The rest of it is not of much more value.
Are you serious? Do you rationally expect this to work? Sigh.
Get better data. And use intelligently chosen start points, if you seriously want a decent estimate in any nonlinear regression. That will also alleviate the randomness of your result.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!