how to know the distribution of my data

5 views (last 30 days)
Dear Matlab Community, I have attached an excel file for some data I have. this data represents the percent of loads in each load bin with their histogram , my question is how can I know using MATLAB what distribution my data follows? is it normal? exponential or something else? and after that how to know the parameters of the distribution. Thanks alot
  8 Comments
dpb
dpb on 25 Sep 2018
Your institution may have access to more; ask your advisor for what is available for your use on university machines.

Sign in to comment.

Accepted Answer

Image Analyst
Image Analyst on 26 Sep 2018
When I fit the data to the sum of 3 Gaussians, the fit looks pretty reasonable. What do you think? And why do you need analytical equation(s) for the distribution rather than just using the ACTUAL distribution obtained from the histogram.
% Uses fitnlm() to fit a non-linear model (sum of three Gaussians with an offset) through noisy data.
% Requires the Statistics and Machine Learning Toolbox, which is where fitnlm() is contained.
% Initialization steps.
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 20;
% % Create the X coordinates from 0 to 20 every 0.5 units.
% X = linspace(0, 40000, 4000);
% mu1 = 6000; % Mean, center of Gaussian.
% sigma1 = 2000; % Standard deviation.
% mu2 = 13000; % Mean, center of Gaussian.
% sigma2 = 2500; % Standard deviation.
%
% % Define function that the X values obey.
% a = 0 % Arbitrary sample values I picked.
% b = 3
% c = 18
% Y = a + b * exp(-((X - mu1)/sigma1) .^ 2) + ...
% c * exp(-((X - mu2)/sigma2) .^ 2); % Get a vector. No noise in this Y yet.
% X=X';
% Y=Y';
data = xlsread('matlab.xlsx');
X = data(:, 1);
Y = data(:, 2);
% Now we have noisy training data that we can send to fitnlm().
% Plot the noisy initial data.
plot(X, Y, 'b.', 'LineWidth', 2, 'MarkerSize', 15);
grid on;
drawnow;
% Convert X and Y into a table, which is the form fitnlm() likes the input data to be in.
tbl = table(X, Y);
% Define the model as Y = a + exp(-b*x)
% Note how this "x" of modelfun is related to big X and big Y.
% x((:, 1) is actually X and x(:, 2) is actually Y - the first and second columns of the table.
modelfun = @(b,x) b(1) + b(2) * exp(-((x(:, 1) - b(3))/b(4)).^2) + ...
b(5) * exp(-((x(:, 1) - b(6))/b(7)).^2) + ...
b(8) * exp(-((x(:, 1) - b(9))/b(10)).^2);
beta0 = [0, 2, 6000, 2000, 18, 13000, 2000, 2, 14000, 9000]; % Guess values to start with. Just make your best guess.
% Now the next line is where the actual model computation is done.
mdl = fitnlm(tbl, modelfun, beta0);
% Now the model creation is done and the coefficients have been determined.
% YAY!!!!
% Extract the coefficient values from the the model object.
% The actual coefficients are in the "Estimate" column of the "Coefficients" table that's part of the mode.
coefficients = mdl.Coefficients{:, 'Estimate'}
% Let's do a fit, but let's get more points on the fit, beyond just the widely spaced training points,
% so that we'll get a much smoother curve.
X = linspace(min(X), max(X), 1920); % Let's use 1920 points, which will fit across an HDTV screen about one sample per pixel.
% Create smoothed/regressed data using the model:
yFitted = coefficients(1) + coefficients(2) * exp(-((X - coefficients(3))/ coefficients(4)) .^2) + ...
coefficients(5) * exp(-((X - coefficients(6))/ coefficients(7)) .^2) + ...
coefficients(8) * exp(-((X - coefficients(9))/ coefficients(10)) .^2);
% Now we're done and we can plot the smooth model as a red line going through the noisy blue markers.
hold on;
plot(X, yFitted, 'r-', 'LineWidth', 2);
grid on;
title('Exponential Regression with fitnlm()', 'FontSize', fontSize);
xlabel('X', 'FontSize', fontSize);
ylabel('Y', 'FontSize', fontSize);
legendHandle = legend('Noisy Y', 'Fitted Y', 'Location', 'northeast');
legendHandle.FontSize = 25;
% Set up figure properties:
% Enlarge figure to full screen.
set(gcf, 'Units', 'Normalized', 'OuterPosition', [0 0 1 1]);
% Get rid of tool bar and pulldown menus that are along top of figure.
% set(gcf, 'Toolbar', 'none', 'Menu', 'none');
% Give a name to the title bar.
set(gcf, 'Name', 'Demo by ImageAnalyst', 'NumberTitle', 'Off')
  1 Comment
MAHMOUD ALZIOUD
MAHMOUD ALZIOUD on 26 Sep 2018
this is genius and beautiful, I thank you very very much for your amazing help

Sign in to comment.

More Answers (2)

dpb
dpb on 25 Sep 2018
Plotting the data it definitely is not normal; has long RH tail and isn't symmetric.
For hypothesis testing it would be better to go back to the underlying data from which the histogram was made if you have it.
  4 Comments
MAHMOUD ALZIOUD
MAHMOUD ALZIOUD on 25 Sep 2018
actually when i went back to the original data (55000 rows) i found out that it is normal
dpb
dpb on 25 Sep 2018
By what measure? As IA says, it looks bimodal (if not tri, that's kinda suspicious hump at the LH side of the central lobe) and the RH tail is definitely not consistent with Gaussian.
If the raw data look markedly different that would be surprising.

Sign in to comment.


Image Analyst
Image Analyst on 25 Sep 2018
Since your data didn't look like one Gaussian to me, I fit it to the sum of two Gaussians with the attached m-file. I got this:

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!