Generating a histogram from lognormally distributed data

Good Afternoon,
I am trying to modify the following script
mu = 0.015; % mean particle size
sg = 1.6; % standard deviation
Ntot = 1000; % total number concentration of particles
nbins = 50; % number of bins
data = lognrnd(mu,sg,[Ntot,1]); % generate data
h = histogram(data,nbins); % display data as histogram
Nconcs = h.Values; % determine number concentration
rMin = min(data); % find the minimum particle radius in the distribution
rMax = max(data); % find the maximum particle radius in the distribution
rs = linspace(rMin,rMax,nbins); % create vector containing particle radii
scatter(rs,Nconcs); % plot data
I would like to be able to provide the values for minumum and maximum particle radius but receive a similar output. The trouble I am having is I cannot figure out a way of generating the random data between these values. Because the data does not have to be random I have also tried using lognpdf and providing the values...
mu = 0;
sg = 1;
rs = linspace(0.05:0.01:1.25); % range of particle radii
data = lognpdf(rs,mu,sg); % lognormal pdf
...after which I am finding myself getting stuck. Below I shall state what Inputs I have and the Outputs I am trying to acheive.
Inputs:
  • mu and sg: the mean particle size and the standard deviation
  • rMin and rMax: the minimum and maximum particle size
  • Ntot: the total number concentration of particles
Outputs:
  • rs: a vector containing particle radii (ideally lognormally spaced)
  • Nconcs: a vector containing the approximate number concentration of each particle size corresponding to each element in rs
  • figure(1) : a plot with rs on the x axis and Nconcs on the y axis (ideally with a lognormally spaced x-axis)
  • figure(2): a histogram seperating radii into 50 bins of equal width showing the number concentration within each bin
I hope someone can help and i'm not missing anything too simple!!!

Answers (1)

What about something like this:
mu = 0.015; % mean particle size
sg = 1.6; % standard deviation
Ntot = 1000; % total number concentration of particles
nbins = 50; % number of bins
data = lognrnd(mu,sg,[Ntot,1]); % generate data
% only keep data from the range you want:
inRange = (data >= rMin) & (data <= rMax);
data = data(inRange);
h = histogram(data,nbins); % display data as histogram
Nconcs = h.Values; % determine number concentration
rMin = min(data); % find the minimum particle radius in the distribution
rMax = max(data); % find the maximum particle radius in the distribution
rs = linspace(rMin,rMax,nbins); % create vector containing particle radii
scatter(rs,Nconcs); % plot data

6 Comments

Hi Jeff,
Many thanks for your answer. I did find this useful however I'm not sure it had the desired effect. I have made some further modifications to my code but I am still having some trouble.
rMin = 0.05; % minumum particle radius
rMax = 1.25; % maximum particle radius
nr = 50; % number of radii being tracked
rs = logspace(log10(rMin),log10(rMax),nr); % vector containing radii
M = mean(rs); % mean radius
V = var(rs); % variance
mu = log(M^2/sqrt(V+M^2));
sg = sqrt(log(V/M^2 + 1));
Y = lognpdf(rs,mu,sg);
plot(rs,Y,'Linewidth',2);
xlabel('Particle Radius \mu m');
ylabel('PDF');
Is there a way I can use the PDF to create a vector showing the probability of a particle being a certain size given by rs? I am assuming the elements of such a vector would add up to one?
I am concerned that because the vector rs is lognormally spaced I will not be able to use a histogram to determine the bin counts. Unless there is a way to create 50 lognormally spaced histogram bins where the mean in each bin is equal to each element in rs?
Sorry, I can't really understand what you are trying to do. But it sounds like 'makedist' and 'truncate' would help.
lndist = makedist('lognormal',mu,sg); % create a lognormal distribution object
trunclndist = truncate(lndist,lnrMin,lnrMax); % make a truncated version of it
rspdf = pdf(trunclndist,rs); % compute the probability density at each point in rs, conditional on being in the required interval
(I am a bit confused about which numbers are values come from the underlying normal distribution and which numbers are logs of those values.)
About the vector elements adding up to one...no, that isn't correct. The integral of the pdf over its entire range is 1, but the individual pdf values can be anything consistent with that integral. For example, if the range is only 0.1 units wide, then the average pdf value has to be 10--already much more than 1 at each point--to make the integral 1.
Hope something in here is helpful...
Thanks Jeff,
I have found the 'truncate' function very helpful indeed!!! This has certainly made things a bit easier. My script now looks like this and hopefully it gives a clearer idea of what I am trying to acheive.
mu = 0.034; % distribution parameters
sigma = 2.1;
pd = makedist('Lognormal',mi,sigma); % create distribution object
rMin = 0.05; % minimum particle size
rMax = 1.25; % maximum particle size
Ntot = 1000; % total number concentration of particles
nbins = 50;
pdT = truncate(pd,rMin,rMax); % keep values of pd within specified range
Y = random(pdT,Ntot,1); % generate samples
figure(1)
h = histogram(Y,nbins); % create histogram
xlim = ([0,2.5]);
xlabel('Particle Radius, \mum');
ylabel('PDF');
Nconcs = h.Values; % number concentration in each bin
rs = ((h.BinEdges(2:end))+(h.BinEdges(1,(end-1))))/2; % mean radius in each bin
figure(2)
plot(rs,Nconcs,'Linewidth',2);
xlabel('Particle Radius, \mum');
ylabel('Number Concentration, cm^3');
Whilst this works the data is of course random. Is there any way of generating a continuous pdf with the same parameters (mu, sigma, rMin, rMax, Ntot) and then discretising this into bins and evaluating the approximate number concentration and midpoint in each bin? Ideally producing similar figures?
Thanks again for your help!!!
pdf(pdT,x)
is the continuous pdf, and you can set it with whatever parameters you want. Note that Ntot is not really a parameter of the pdf. The pdf is a representation of what happens with a single observation, whereas Ntot is the number of observations.
You can discretize into bins however you like. For any bin i extending from x_low(i) to x_hi(i), the probability of an observation being in that bin is
bin_probability(i) = cdf(pdT,x_hi(i))-cdf(pdT,x_low(i))
Multiply the bin probability by Ntot and you will have the theoretically expected number in each bin. Of course the midpoint of each bin is just the average of its low and hi values.
Still not exactly sure what you are trying to do, but note that you could (if you wanted) also look at each bin separately with something like
pdT(i) = truncate(pd,x_low(i),x_hi(i));
This would be useful if you wanted, say, the mean in each bin rather than the midpoint (these might be a bit difference if the pdf is not flat or symmetric within the bin).
I thought it best to provide some further background on what I am trying to acheive as I feel my explanations aren't very clear. I am trying to represent an aerosol population as described in the document below:
Available to me are the parameters Ntot, mu and sigma. I also have the lower and upper limits of r, which are rMin and rMax.
% code attempt for the first formula
pdf = Ntot/((sqrt(2*pi)*log(sigma))*exp(-(log(rs./mu).^2)./(2*log(sigma^2)));
I am not too sure how to go about the discrete approximation. The goal is to get two vectors containing the number concentraion and size at the mean value in each bin so I can create a plot similar to this one:
Hopefully this helps explain things a bit better.
Thanks again!!!
Sorry I am not being helpful. I 'm not sure I understand this background and have nothing to add to my previous suggestion.
BTW, your code attempt for pdf does not look like a legal pdf to me, because I think it will integrate to Ntot, whereas pdfs should integrate to 1.

Sign in to comment.

Products

Release

R2019b

Asked:

on 23 Feb 2021

Commented:

on 25 Feb 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!