variational Bayes one dimension gaussian mixture model
2 views (last 30 days)
Show older comments
Hi, I am trying to develop a variational bays one dimension gaussian mixture model following the Penny Roberts paper (there are ND implementation on matlab file exchange but the results do not always converge so I decided to write my own code). The function I have written so far converges sometimes in 4-5 iterations and sometimes just gives NaN values. I debugged and cross checked my code with the equations but haven't figured put what is causing it to behave so weird. Is it possible to have someone in this forum have a look and give me some feedback? I would really appreciate the help!
Thank you
X = Gaussian mixture of 1D data
function [label, model, L] = mixGaussVb_SB(X)
% Variational Bayesian inference for Gaussian mixture.
% Input:
% X: 1 x n data matrix
% Output:
% label: 1 x n cluster label
% model: trained model structure
fprintf('Variational Bayesian Gaussian mixture: running ... \n');
N=length(X);
%%assign prior
prior.m0=mean(X); prior.v0=((max(X)-min(X))/3).^2; % prior for mean
prior.b0=1e3; prior.c0=1e-3; % prior for variance
prior.lambda0=5; % prior for mixing weights
%%VBGM
tol = 1e-8;
maxiter = 2000;
L = -inf(1,maxiter);
model = init(X,prior,N);
for iter = 2:maxiter
iter
prevmu=model.m;
model = expect(X,model);
model = maximize(X,model,N,prior);
if sum((abs(model.m)-abs(prevmu)).^2)<tol
break;
end
% L(iter) = bound(X,model,prior,N)/n;
%if abs(L(iter)-L(iter-1)) < tol*abs(L(iter)); break; end
end
L = L(2:iter);
label = zeros(1,N);
[~,label(:)] = max(model.R,[],2);
[~,~,label(:)] = unique(label);
function model = init(X,prior,N)
idx=kmeans(X,2);
for s=1:length(unique(idx))
temp1=X(idx==s);
pi(s)=length(X(idx==s))/length(X);
mu(s)=mean(X(idx==s));
beta(s)=1./var(X(idx==s));
end
m=mu; %prior for mean
v=(1./beta)./(N.*pi); % prior for mean
b=var(beta)./beta; %prior for variance
c=beta./b; % prior for variance
lambda=100.*pi; %prior for weights
model.R = full(sparse(1:N,idx,1,N,max(idx),N));
model.m=m;
model.v=v;
model.lambda=lambda;
model = maximize(X,model,N,prior);
function model = maximize(X, model,N,prior)
R=model.R;
m=model.m;
v=model.v;
for s=1:length(m)
pi_bar(s)=(1/N)*sum(R(:,s));
N_bar(s)=N*pi_bar(s);
y_est(:,s)=(1/N)* (R(:,s)'*X);
y_est2(:,s)=(1/N)* (R(:,s)'*X.^2);
% update hyper param
% mixing weights
lambda(s)=N_bar(s)+prior.lambda0;
% variance
sigma_bar2(s)=y_est2(:,s)+pi_bar(s)*(m(s).^2+v(s))-2*(m(s)*y_est(s));
b(s)=1/((N/2)*sigma_bar2(s)+(1/prior.b0));
c(s)=(0.5*N_bar(s))+prior.c0;
% mean
tau0=1/prior.v0;
tau(s)=1/v(s);
beta_bar(s)=b(s)*c(s);
tau(s)=tau0+(N_bar(s)*beta_bar(s));
m(s)=(tau0/tau(s))*prior.m0+((N_bar(s)*beta_bar(s))/tau(s))*(y_est(s)/pi_bar(s));
end
tmp1=gamrnd(c,b);
tmp2=gamrnd(lambda,1);
v=1./tau;
model.mean = normrnd(m,v);
model.sigma2 = 1./tmp1;
model.weight = tmp2./sum(tmp2);
model.lambda=lambda;
model.b=b;
model.c=c;
model.v=v;
model.m=m;
model.beta_bar=beta_bar;
model.N_bar=N_bar;
% Done
function model = expect(X, model)
lambda=model.lambda;
c=model.c;
b=model.b;
m=model.m;
v=model.v;
beta_bar=model.beta_bar;
for s=1:length(model.mean)
tmp_lambda=lambda(~ismember(lambda,lambda(s)));
pi_est(s)=exp(psi(lambda(s))-psi(sum(tmp_lambda)));
beta_est(s)=exp(psi(c(s))+log(b(s)));
pdf_w(:,s)=pi_est(s)*sqrt(beta_est(s))*exp(-0.5*beta_bar(s)*(X.^2+m(s)^2+v(s)-2*m(s).*X));
end
R = bsxfun(@rdivide, pdf_w, sum(pdf_w, 2));
model.logR = log(R);
model.R = R;
model.pi_est=pi_est;
model.beta_est=beta_est;
2 Comments
Walter Roberson
on 26 Sep 2016
We need to know two example inputs, one which it converges for, and one that it gives nan on.
Answers (1)
Walter Roberson
on 26 Sep 2016
I notice that the code you attached is not exactly the same as the code you posted. The attached code appears to be more commented.
I loaded your X from nan_input.mat and ran
GMMVb_SB(X.')
By iteration #12, R(:,1) have all become 0 in maximize. That leads to N_bar(1) becoming 0, which leads to c(1) becoming assigned prior.c . When you get to beta_est(s)=exp(psi(c(s))+log(b(s))); in expect then because that c(s) is small, psi(c(1)) becomes -789.025866305206 and log(b(1)) only adds about 6 1/2 to that, exp() of that is 0. Your pdf_w(:,1) becomes entirely 0. sum() of the pdf_w there is still 0. rdivide of 0 / 0 is NaN . And then the entire rest of your calculation is polluted.
Your R(:,1) do not appear to all start out as 0, though some of them do.
I do not know anything about the calculation algorithm you are using (I am not familiar with the math of it). At the moment I do not see any inherent reason why it should not be possible for all of the values to "migrate" to one side or the other. When you are working with exp() and probabilities, underflow to 0 or overflow to inf is a perpetual risk.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!