Fastest possible code for AUC between a continuous predictor and a binary target
Show older comments
Hi Folks,
I am on the hunt for the fastest possible Matlab code that computes empirical exact Area Under the Receiver/Operator Curve (AUC). The built-in Matlab's perfcurve returns auc as the 4th output but it is terribly slow. Here is my current fastest code, would apprreciate suggestions if/how to make it faster. cumsum(flipud(x)) appear optimizable, but I could not find any faster solution so far. Another lead could be that all variables involved are integer until the very last scaling normalization /(2*s0*s1).
%AUC - Area Under the Receiver/Operator Curve for binary target vector y
% and a continuous or binary predictor vector x
function a=auc(y,x)
n=numel(x); s1=sum(y); s0=n-s1;
if islogical(x) %Binary x case a = (1+tp-fp)/2
tp=sum(x & y); fp=(sum(x)-tp)/s0; tp=tp/s1; a=(1+tp-fp)/2;
else %Continuous numerical x case
[x,j]=sort(x);
i=[find(diff(x)~=0);n]; c=diff([0;i]);
y=cumsum((y(j,:))); y=diff([0;y(i,:)]);
tp=[0;cumsum(flipud(y))]; fp=[0;cumsum(flipud(c))]-tp;
a=sum(diff(fp).*(tp(1:end-1,:)+tp(2:end,:)))/(2*s0*s1);
end
a=max(a,1-a);
A quick speed test to beat (on a small portable laptop):
x=rand(1e7,1); y=rand(1e7,1)>.5; tic;a=auc(y,x);t=toc; [a t]
ans =
0.5001 1.7108
Answers (1)
nnz() on a logical matrix is faster than sum() on the matrix
foo = rand(1,1e7) < 0.1;
N = 50;
t1 = zeros(N,1);
t2 = zeros(N,1);
for K = 1 : N; t1(K) = timeit(@() sum(foo), 0); end
for K = 1 : N; t2(K) = timeit(@() nnz(foo), 0); end
plot([t1, t2])
legend({'sum', 'nnz'})
3 Comments
dymitr ruta
on 10 Oct 2021
Walter Roberson
on 10 Oct 2021
The second section has no useful comments. I do not know what it is intended to do or why it is doing things that way.
dymitr ruta
on 10 Oct 2021
Edited: dymitr ruta
on 10 Oct 2021
Categories
Find more on Subplots in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
