Attempt on k-nearest neighbor pdf estimate in 1D

Question

Jonne Klockars on 31 Oct 2022

0
Link

Direct link to this question

https://ch.mathworks.com/matlabcentral/answers/1839958-attempt-on-k-nearest-neighbor-pdf-estimate-in-1d

Answered: Akshat on 1 Sep 2023

I'm trying to write a function for estimating k-nearest neighbours pdf in one dimension. I've been going through this several times already and can't figure out what is wrong. The visualisation shows that my 'pdf' is clearly not how it should be: there's a peak on top of one sample and a sample-wise more dense area is flat. Any advice and corrections appreciated! Here is my code, the test data 't122' is a 1x10 vector i.e. ten 1D samples:

x = [0.553766713954610,0.683388501459509,0.274115313899635,0.586217332036812,0.531876523985898,0.369231170369473,0.456640797769432,0.534262446653865,0.857839693972576,0.776943702988488];

d = size(x,1);

d2 = size(x,2);

% k samples inside the Parzen window

k = 3; % sqrt(N) is a good guess for optimal k

% plotting the samples and the estimated pdf

xAxis = linspace(0,1,100);

plot(xAxis,nnPdf(xAxis,x,k));

title('t122 on the real line with nn-estimated pdf');

hold on;

plot(x,0,'o','MarkerSize',25);

legend(sprintf('%d nearest neighbours pdf',k),'t122');

And here is the function:

% k nearest neighbours 1D pdf-estimator function nnPdf()

% inputs:

% x0 = interval for the pdf

% x = data for which the pdf is estimated

% k = number of samples in every Parzen window

% output:

% V = 1D-pdf estimated with k nearest neighbours

function V = nnPdf(x0,x,k)

v = zeros(length(x0),size(x,2)); % for distances to all samples

V = zeros(length(x0),1); % for distance needed to include k samples

if k > size(x,2)

disp('*Invalid value for k: not so many samples in the data.');

return

end

standardize(x);

for i = 1:length(x0)

for j = 1:size(x,2)

% distance from interval point to all samples

v(i,j) = abs(x0(i)-x(j));

end

% sorted distances so v_ik is the distance for reaching to the

% kth sample from the point x0_i

sort(v,2);

% window size V at point x0_i based on the distance (volume in 1D)

V(i) = (k/size(x,2)) * 1/v(i,k);

end

And the outcome:

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Akshat on 1 Sep 2023

0
Link

Direct link to this answer

https://ch.mathworks.com/matlabcentral/answers/1839958-attempt-on-k-nearest-neighbor-pdf-estimate-in-1d#answer_1299486

Open in MATLAB Online

Hi Jonne,

I have reproduced your code at my end, and I am currently using R2023A version of MATLAB. Kindly note the following differences, and I will paste the code below as well. I have attached the required PDF graph here.

In the "nnPdf" function, you have used “standardize” but it isn’t defined in MATLAB. The function “zscore” can perform the task of standardizing.
While sorting, you haven’t assigned the values back to v, and hence it isn’t working. Code:

[v(i,:), ~] = sort(v(i,:));

The line where you calculate the window size "V(i)" is incorrect. Instead of dividing by "v(i,k)", you should divide by the distance to the k-th nearest neighbor, which is "v(i,k+1)" (since MATLAB indexing starts from 1).

Finally the code which gave me the attached result is:

x = [0.553766713954610,0.683388501459509,0.274115313899635,0.586217332036812,0.531876523985898,0.369231170369473,0.456640797769432,0.534262446653865,0.857839693972576,0.776943702988488];
d = size(x,1);
d2 = size(x,2);
% k samples inside the Parzen window 
k = 3; % sqrt(N) is a good guess for optimal k
% plotting the samples and the estimated pdf
xAxis = linspace(0,1,100);
plot(xAxis,nnPdf(xAxis,x,k));
title('t122 on the real line with nn-estimated pdf');
hold on;
plot(x,0,'o','MarkerSize',25);
legend(sprintf('%d nearest neighbours pdf',k),'t122');
% k nearest neighbours 1D pdf-estimator function nnPdf()
% inputs:
% x0 = interval for the pdf
% x = data for which the pdf is estimated
% k = number of samples in every Parzen window
% output: 
% V = 1D-pdf estimated with k nearest neighbours
function V = nnPdf(x0,x,k)
    v = zeros(length(x0),size(x,2)); % for distances to all samples
    V = zeros(length(x0),1); % for distance needed to include k samples
    if k > size(x,2)
        disp('*Invalid value for k: not so many samples in the data.');
        return
    end
    zscore(x);
    for i = 1:length(x0)
        for j = 1:size(x, 2)
            % distance from interval point to all samples
            v(i, j) = abs(x0(i) - x(j));
        end
        % sorted distances so v_ik is the distance for reaching to the 
        % kth sample from the point x0_i
        [v(i, :), ~] = sort(v(i, :)); 
        % window size V at point x0_i based on the distance (volume in 1D)
        V(i) = (k / size(x, 2)) * (1 / v(i, k+1)); 
    end 
end

Hope it helps!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Attempt on k-nearest neighbor pdf estimate in 1D

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Attempt on k-nearest neighbor pdf estimate in 1D

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments