Error in Hosmer Lemeshow GOF test calculation

3 views (last 30 days)
Dear all,
I am trying to calculate the Hosmer and Lemeshow Good-of-Fitness test, following the steps presented in the book "Logistic Regression. A Self- Learning Text" by David Kleinbaum and M. Klein.
The following functions calculates all the steps, gives as outputs the chi-squared, degrees of freedom and the p-value, as well as some interesting plots described in the manual, which look fine to me.
However, I get a completely different result when I run this test in R (holsem.test, from ResourceSelection package).
I was surprised I could not find any code online, so maybe this could be useful in the future. Neither could I see what I am doing wrong. Maybe R uses a very different method?
If you are also keen to find out and see this work, please feel free to join.
Thank you a lot,
% This function performs the Hoslem-Lemeshow Goodness-of-Fit test
% 06-January-2017 davidnielsen@id.uff.br
%
% Inputs
% Q: The number of groups or quantiles (normally Q=10).
% y: 1D array of observations. Must be a series of {0,1}.
% fit: 1D array of the model's fitted values.
% Outputs
% chi2: The Hosmer-Lemeshow chi-square statistics.
% df: Degrees-of-freedom.
% p: p-value
function [chi2,df,p]=hoslem(fit,y,Q)
[ordems,I]=sort(fit,'descend'); % Sorts the fitted values
% Calculates the length of each decile.
% Note that the division of length/10 may not be exact. Therefore, the
% considered series may differ from original inputs in up to 10
% observations, at most. This error shold be to non-significant for very
% long series.
qlen=round(length(y)/Q)-1;
qlens=zeros(Q+1,1);
for i=1:Q+1
qlens(i,1)=1+(i-1)*qlen; % Determines the thresholds of deciles
end
qlims=zeros(Q+1,1); % Positions of thresholds
for i=1:Q+1
qlims(i,1)=ordems(qlens(i,1));
end
plot(1:Q+1,qlims,'o')
% Couts Observed Cases por Decile (Ocq)
ocq=zeros(Q,1); t=1; count=0;
for i=1:max(qlens)-1
if count==qlen
t=t+1; count=0;
end
ocq(t,1)=ocq(t,1)+(y(I(i)));
count=count+1;
end
% Couts Non-Observed Cases por Decile (Oncq)
oncq=zeros(Q,1); t=1; count=0;
for i=1:max(qlens)-1
if count==qlen
t=t+1; count=0;
end
if (y(I(i)))==0
oncq(t,1)=oncq(t,1)+1;
end
count=count+1;
end
figure(1)
plot(ocq); hold on; plot(oncq)
legend('Observed cases','Non-observed cases'); hold off
% Expected cases (Ecq) e noncases (Encq) per Decile
ecq=zeros(Q,1);
t=1; count=0;
for i=1:max(qlens)-1
if count==qlen
t=t+1; count=0;
end
ecq(t,1)=ecq(t,1)+ordems(i);
count=count+1;
end
figure(2)
plot(ecq,ocq,'-o'); hold on;
xlabel('Expected cases'); ylabel('Oobserved cases');
encq=qlen-ecq;
% Calculates HL statistics
chi2= sum(sum(((ocq-ecq).^2)/ecq)) + sum(sum(((oncq-encq).^2)/encq));
df=Q-2;
p=1-chi2cdf(chi2,Q-2);
end

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!