Zipf Based Number generation
Show older comments
I am trying to generate numbers based on Zipf distribution.
There is a function named RANDDRAW that generates number for the zipf exponent greater than 1. However, i need to generate Zipf numbers which can be less than 1.
Can anybody help me to generate random numbers based on zipf distribution with any value of zipf exponent?
Answers (4)
John D'Errico
on 31 Mar 2015
Edited: John D'Errico
on 31 Mar 2015
Whenever someone says something like "for ANY exponent", I can bet that they won't be going to throw an easy problem at any offered code.
So, looking at the zipf distribution, it turns out to be an easy one to deal with, sort of. The zipf pdf is simply
p_zipf(x,p) = -x^(p+1)/zeta(p+1)
This is a discrete distribution in x, over the positive integers.
If we look at the pdf, we recognize that zeta(p+1) is the limit of the sum of powers of x, x^-(p+1), where x ranges from 1 to inf. So zeta(p+1) merely normalizes that pdf to have unit area.
Here the zipf exponent that was referred to would be p, I assume, but there are unresolved questions hidden in the question. I'll quote the line you wrote:
"However, i need to generate Zipf numbers which can be less than 1."
So, IF your question is to generate zipf samples that are non-integer, this makes no sense, since the zipf distribution is a discrete one, that samples over the positive integers.
If you are calling the zipf exponent (p+1), and you wish to sample such that p+1 is less than 1, so p is less than zero, then you will have a great deal of trouble.
If your question is to sample for non-integer values of p that are less than one, but greater than zero, then this is possible, but a serious bit of an effort for many values of p less than 1.
By the way, since the zipf pdf involves zeta(p+1), this would employ the symbolic toolbox, because for small p, the zeta function converges pretty slowly.
Sadly, I cannot proceed further without some resolution of the above questions.
Edit: Though I've not yet gotten a response to my questions, I took a quick look at the randraw code. Randraw defines the zipf pdf such that
p_zipf(x,a) = -x^(-a)/zeta(a)
and then requires only that a>1.
It would appear that you want to generate zipf numbers for a<=1, as randraw defines that distribution. If this is so, then you cannot do so. The sum...
sum(x^-a)
where x is the set of positive integers, is inf for a<=1.
This is why randraw requires that a>1, as otherwise, the zeta function has a problem. Even for a near 1, I would predict that randraw becomes slow.
Roger Stafford
on 31 Mar 2015
If you want exponent values less than or equal to one, it is necessary to confine yourself to a specific finite number of terms, since the infinite sum of terms in the zipf distribution would be infinity. See
http://en.wikipedia.org/wiki/Zipf's_law
In the following, N is the number of possible integer values of the distribution, s is the exponent, and the desired size of the generated random matrix is m-by-n.
V = cumsum([0,1./(1:N).^s]');
[~,r] = histc(rand(m*n,1),V/V(end));
r = reshape(r,m,n);
The variable 'r' assumes random integer values from 1 to N in accordance with the zipf distribution.
10 Comments
Tamoor Hassan
on 1 Apr 2015
Roger Stafford
on 1 Apr 2015
"In your given code, i am not understanding the rank of each integer." In the first line the values in 1:N represent all the integer ranks that are to be considered possible in your finite set. In the second and third lines the quantity 'r' gives the particular random rank values from among these that are being generated with the use of 'rand'. Any rank from 1 to N can be generated. The choice of N is up to you. As I mentioned, if s <= 1, the sum of all terms in V would be infinite if N were allowed to be infinity, so you must select a finite value for N as an upper bound to the rank values you are generating. Again I refer you to the article at:
http://en.wikipedia.org/wiki/Zipf's_law
You state that "in each time instant, i generate a single value based on Zipf distribution" . For the above code it would be more efficient to generate ahead of time a 'V' vector for each of your two values of s, which could be used repeatedly, rather than recreating 'V' for each single random generation.
Tamoor Hassan
on 3 Apr 2015
Roger Stafford
on 3 Apr 2015
That's a very important restriction. You should have stated that in the very beginning, Tamoor, so that we would not waste our time!
Tamoor Hassan
on 3 Apr 2015
Roger Stafford
on 3 Apr 2015
To avoid repetitions, the right way to proceed, in my opinion, is to use a "rejection" method. Generate a set of n positive integers which are mutually independent random variables with Zipf distribution but then reject the entire set if there are any repetitions and repeat the operation. For this to be practical it is important for n to be much smaller than N, the number of possible rank levels in the distribution.
V = cumsum([0,1./(1:N).^s]'); V = V/V(end);
t = true;
while t
[c,r] = histc(rand(n,1),V);
t = any(c>1); % Repeat until there are no repetitions
end
At exit from the while-loop the column vector r will contain n positive integers each of which lies in the range 1:N and with no two alike.
Tamoor Hassan
on 4 Apr 2015
Roger Stafford
on 4 Apr 2015
Are you saying that your resulting matrix should have no repetitions along any row nor along any column? If so, that makes it more difficult to achieve. You can try rejecting entire matrices of results which fail this condition:
V = cumsum([0,1./(1:N).^s]'); V = V/V(end);
t = true;
while t
[c,r] = histc(rand(m,n),V);
r2 = diff(sort(r,2),1,2);
t = any(c(:)>1) | any(r2(:)==0);
end
Tamoor Hassan
on 4 Apr 2015
Roger Stafford
on 4 Apr 2015
Your statement "all the columns contain unique values" made me think you wanted no repetitions to occur along any column. If this is not the case, then things are a little easier.
V = cumsum([0,1./(1:N).^s]'); V = V/V(end);
t = true;
while t
[c,r] = histc(rand(n,m),V);
t = any(c(:)>1);
end
r = r';
Tuyen Tran
on 17 Oct 2015
0 votes
Check this out zipf_rand()
Wasseem Suhabuth
on 31 Aug 2016
0 votes
Can ihave the coding on matlab to generate zipf distribution please
Categories
Find more on Noncentral Chi-Square Distribution in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!