Generate random numbers given distribution/histogram
    31 views (last 30 days)
  
       Show older comments
    
    David C
      
 on 25 Oct 2012
  
    
    
    
    
    Commented: Himanshu Tanwar
 on 25 May 2022
            MATLAB provides built-in functions to generate random numbers with an uniform or Gaussian (normal) distribution. My question is: if I have a discrete distribution or histogram, how can I can generate random numbers that have such a distribution (if the population (numbers I generate) is large enough)?
Please post here if anyone knows of a good method of doing this.
Thanks, David
0 Comments
Accepted Answer
  Jonathan Epperl
      
 on 27 Oct 2012
        Since nobody has any suggestions, here's one. If you have a discrete distribution, say it is a Nx2 matrix PD, first column the discrete values, second the probabilities of the corresponding value -- so sum(PD(:,2))==1.
Then map the probablities to the unit interval and use rand. What mean by that:
% Those are your values and the corr. probabilities:
PD =[
  1.0000    0.1000
  2.0000    0.3000
  3.0000    0.4000
  4.0000    0.2000];
% Then make it into a cumulative distribution
D = cumsum(PD(:,2));
% D = [0.1000    0.4000    0.8000    1.0000]'
Now for every r generated by rand, if it is between D(i) and D(i+1), then it corresponds to an outcome PD(1,i+1), with the obvious extension at i==0. Here's a way you could do that, even though I'm sure there are better ones:
R = rand(100,1); % Your trials
p = @(r) find(r<pd,1,'first'); % find the 1st index s.t. r<D(i);
% Now this are your results of the random trials
rR = arrayfun(p,R);
% Check whether the distribution looks right:  
hist(rR,1:4)
% It does, roughly 10% are 1, 30% are 2 and so on
If you want more help you should post a minimal example of the form in which you have the discrete distribution.
5 Comments
  Aasheesh Dixit
 on 8 Jun 2020
				one change is required: 
p = @(r) find(r<d,1,'first'); % find the 1st index s.t. r<D(i);
  Himanshu Tanwar
 on 25 May 2022
				Overall code may be written (with some slight modifications) as:
dx = 0.001;
x = -100 : dx : 100; % limits depending on the random variable definition.
% For example, (dx : dx : 100) for Rayleigh (0 not recommended).
f = PDF(x);
F = cumsum(f) * dx; % CDF
F = F / F(end); % recommended when max(F) is close to 1. Otherwise increase x - points to achieve F(end) - max value - close to 1.
N = 10000;
U = rand(1, N);
pdf2rand = @(u) x(find(u <= F, 1, 'first'));
X = arrayfun(pdf2rand, U);
X % desired random variable points
function f = PDF(x)
% Probability Distribution Function
...
end
More Answers (2)
  Image Analyst
      
      
 on 28 Oct 2012
        I didn't notice your question or I would have answered, especially since it's asked so often. Try RANDRAW ( http://www.mathworks.com/matlabcentral/fileexchange/7309-randraw) for a list of common distributions. Or to refresh yourself on the theory of why using the CDF works, see Wikipedia: http://en.wikipedia.org/wiki/Inverse_transform_sampling
  Theron FARRELL
      
 on 30 Apr 2019
        
      Edited: Theron FARRELL
      
 on 30 Apr 2019
  
      Hi there,
I use this naive function to generate artificial outliers applied in machine learning. Hope that it will be a bit help in your case.
function [Out_Data, Out_PDF, CHist] = Complement_PDF(Hist, Data_Num, p)
% Generate a 1D vector of data with a PDF specified as the complementary PDF of input historgram. Note that the larger
% Data_Num is, the more Out_PDF will resemble to CHist
% Input
% Hist: PDF/Histogram of data
% Data_Num: Desired number of data to be generated
% p: Precision given by number of digits after 0
% Output
% Out_Data: Generated data as per the complementary PDF
% Out_PDF: The complementary PDF as per Out_Data
% CHist: The complementary PDF as per Hist
% Example
% Hist = [1, 6, 7, 100, 0, 0, 0, 2, 3, 5];
% Data_Number = 100000;
% p = 3
Hist = Hist/sum(Hist);
CHist = 1- Hist;
CHist = CHist/sum(CHist);
CDF_CHist = cumsum(CHist);
CDF_CHist = double(int32(CDF_CHist*10^p))/10^p;
Out_Data = zeros(1, Data_Num);
Out_PDF = zeros(1, length(CDF_CHist));
for i = 1:Data_Num
    % Generate a uniformly distributed variable
    x = double(int32(rand*10^p))/10^p;
    % Inversely index CDF
    Out_Data(i) = Inverse_CDF(x, CDF_CHist);
    temp = floor(Out_Data(i) * length(CDF_CHist));
    Out_PDF(temp) = Out_PDF(temp) + 1;
end
figure;
subplot 221, bar(Hist); 
subplot 222, bar(CHist); 
subplot 223, plot(CDF_CHist);
subplot 224, bar(Out_PDF);
end
function [y] = Inverse_CDF(x, CDF_CHist)
CDF_CHist_Ext = [0, CDF_CHist];
y = 1;
for ind = 1:length(CDF_CHist)
    if (x >= CDF_CHist_Ext(ind)) && (x < CDF_CHist_Ext(ind+1))
        y = ind/length(CDF_CHist);
        break;
    end
end
end

0 Comments
See Also
Categories
				Find more on Random Number Generation in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!






