Euclidean distance-based clustering with predetermined number of members

33 views (last 30 days)
Hello to you all
I have a data point that contains points in the 2D coordinate, and I want to cluster these points based on the minimum distance between them to the K group. Each cluster will have a predetermined number of members, for example, five members, like the following picture. Note that remained data points, will be unclustered.
Is there any function at Matlab that help me?
  2 Comments
mohammad zahiri
mohammad zahiri on 4 Jun 2020
Edited: mohammad zahiri on 4 Jun 2020
I want only a certain number of points that are adjacent to each other to be clustered and the rest of the points to remain intact.
Can these functions do that?

Sign in to comment.

Accepted Answer

Image Analyst
Image Analyst on 4 Jun 2020
You could easily 😉 write your own loop to do it. Just start with the closest pair of points and keep assigning nearby neighbors to that cluster until you reach the required number of neighbors in the cluster. Then increment the cluster number of repeat to find the next cluster. Keep going until all points are gone (used up).
clc; % Clear the command window.
fprintf('Beginning to run %s.m.\n', mfilename);
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 22;
numPoints = 200;
pointsPerCluster = 5;
numClusters = ceil(numPoints / pointsPerCluster)
coordinates = zeros(pointsPerCluster, 2, numClusters);
t = table(zeros(numPoints, 1), zeros(numPoints, 1), zeros(numPoints, 1), zeros(numPoints, 1), 'VariableNames', {'ClusterNumber', 'PointNumber', 'x', 'y'});
xy = rand(numPoints, 2);
x = xy(:, 1);
y = xy(:, 2);
subplot(1, 2, 1);
plot(x, y, 'b.', 'MarkerSize', 14);
grid on;
% Get distances of every point to every other point
distances = pdist2(xy, xy);
minDistance = min(distances(distances~=0))
[row, col] = find(distances == minDistance)
currentRow = row(1) % Get first point.
pointer = 1;
for k = 1 : numClusters
% Get distances of every point to every other point
distances = pdist2(xy, xy(currentRow, :));
% Find the closest points.
minDistances = mink(distances, pointsPerCluster);
[ia, ib] = ismember(distances, minDistances);
rows = find(ib);
% Store these coordinates as cluster #k
for n = 1 : length(rows)
t.ClusterNumber(pointer) = k;
t.PointNumber(pointer) = n;
t.x(pointer) = xy(rows(n), 1); % Store x value.
t.y(pointer) = xy(rows(n), 2); % Store y value.
pointer = pointer + 1;
end
if pointer >= numPoints
break; % Quit when all points are used up
end
% Set the current row coordinates to infinity so we know not to consider (use) them again.
xy(rows, :) = inf;
% Get new cluster -- new starting point.
% Get distances of every point to every other point
distances = pdist2(xy, xy);
minDistance = min(distances(distances~=0));
[row, col] = find(distances == minDistance);
currentRow = row(1); % Get first point.
end
% Show clusters in unique colors
subplot(1, 2, 2);
gscatter(t.x, t.y, t.ClusterNumber);
Be aware that as points in closely located clusters get used up, the points available for remaining clusters will be more spread out. I think that's obvious though, right? For example if there are only 2 clusters in 1-D, if your values are [1,3,5, 61,62,63,64,65, 99,100] then cluster #1 will be [61,62,63,64,65] (tightly grouped) and cluster %2 will be [1,3,5, 99,100] (widely spaced).
  5 Comments
stefano chiappini
stefano chiappini on 31 Aug 2021
Hi, How make i load my point cloud dataset in the followinf script?
Which row i must fill with the name of my dataset file?
Thank you so much.
stefano chiappini
stefano chiappini on 31 Aug 2021
My dubt is following: 1) how load my dataset test?
2) i run your script but on line44 shows me an error: Docked figures do not support 'WindowState'.
Error in dbscan_demo (line 44)
hFig.WindowState = 'maximized'; % May not work in earlier versions of MATLAB.
How can i solve these issues?
I am using Matlab R2021a.
Thank you so much.

Sign in to comment.

More Answers (2)

Image Analyst
Image Analyst on 1 Sep 2021
Stefano:
Regarding "How make i plot all cluster inside a unique figure? Because in your script, it show only one cluster at time and i don't able to plot the remain clusters."
Put hold on after a call to subplot or plot, and take away the call to hold off. See code below.
And regarding "Furthermore, How do i set different colour for each cluster on unique plot?" each cluster is already plotted in a different color. For the demo as-is, using default values, cluster 2 has so many points that you don't see the other clusters, which sometimes have only 2 points in them.
If this helps, can you click the "Accept this answer" link? Thanks in advance.
% dbscan attempt with Stefano's data by Image Analyst
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 20;
%--------------------------------------------------------------------------------------------------------
% READ IN DATA.
data = readmatrix('test.txt');
x = data(:, 1);
y = data(:, 2);
z = data(:, 3);
% Subtract means
x = x - mean(x);
y = y - mean(y);
z = z - mean(z);
%--------------------------------------------------------------------------------------------------------
% Display the data.
subplot(1, 2, 1);
plot3(x, y, z, '.', 'MarkerSize', 3);
grid on;
xlabel('Column 1');
ylabel('Column 2');
zlabel('Column 3');
title('Original data not classified yet.');
hFig = gcf;
hFig.WindowState = 'maximized'; % May not work in earlier versions of MATLAB.
drawnow;
%--------------------------------------------------------------------------------------------------------------------
% Ask user for search radius.
defaultValue = {'0.3'};
titleBar = 'Enter search radius';
userPrompt = {'Enter search radius: '};
caUserInput = inputdlg(userPrompt, titleBar, 1, defaultValue);
if isempty(caUserInput),return,end % Bail out if they clicked Cancel.
% Convert to floating point from string.
usersValue1 = str2double(caUserInput{1})
% Check usersValue1 for validity.
if isnan(usersValue1)
% They didn't enter a number.
% They clicked Cancel, or entered a character, symbols, or something else not allowed.
% Convert the default from a string and stick that into usersValue1.
usersValue1 = str2double(defaultValue{1});
message = sprintf('I said it had to be a number.\nTry replacing the user.\nI will use %.2f and continue.', usersValue1);
uiwait(warndlg(message));
end
searchRadius = usersValue1;
%--------------------------------------------------------------------------------------------------------------------
% Do clustering with the "dbscan" algorithm.
% [classNumbers, corepts] = dbscan(distances, searchRadius, minPointsPerCluster, 'Distance','precomputed')
% searchRadius = 0.01; % It's in the same cluster if the point is within this of other points.
minPointsPerCluster = 2; % We need to have at least this many point to be considered a valid cluster.
startTime = tic;
fprintf('Starting dbscan() function at %s.\n', datestr(now));
[classNumbers, isACorePoint] = dbscan(data, searchRadius, minPointsPerCluster);
elapsedSeconds = toc(startTime);
fprintf('Done with dbscan() function after %.1f seconds.\nFound %d clusters.\n', elapsedSeconds, max(classNumbers));
%--------------------------------------------------------------------------------------------------------------------
% classNumbers = classNumbers - min(classNumbers) + 1 % Convert negative class numbers to all positive class numbers.
ucn = unique(classNumbers);
% Get a list of colors so that each data point / cluster will have its own unique color.
if length(ucn) <= 7
colorList = lines(length(ucn));
else
colorList = spring(length(ucn));
end
% Put the class label next to each point.
subplot(1, 2, 2);
% Plot each class in a different color.
hold on;
for k = 1 : length(ucn)
thisClassNumber = ucn(k);
thisClassesIndexes = classNumbers == thisClassNumber;
xSubset = x(thisClassesIndexes);
ySubset = y(thisClassesIndexes);
zSubset = z(thisClassesIndexes);
% Plot all the points in this class.
thisColor = colorList(k,:)
plot3(xSubset, ySubset, zSubset, '.', 'Color', thisColor, 'MarkerSize', 1);
% Find the mean (centroid).
meanx = mean(xSubset);
meany = mean(ySubset);
meanz = mean(zSubset);
numPointsInThisCluster = length(xSubset);
% Plot the centroid in yellow.
hold on;
plot3(meanx, meany, meanz, 'y.', 'MarkerSize', 30);
grid on;
if thisClassNumber == -1
fprintf(' Class %d (not in any cluster) has %d points in it and the centroid is at (%.1f, %.1f, %.1f).\n', thisClassNumber, numPointsInThisCluster, meanx, meany, meanz);
promptMessage = sprintf('This is class #%d of %d.\nThese are not in any cluster.\nIt has %d points in it.\nDo you want to Continue processing,\nor Quit processing?', ...
thisClassNumber, max(classNumbers), numPointsInThisCluster);
caption = sprintf('%d isolated points not in any cluster', numPointsInThisCluster);
title(caption, 'FontSize', fontSize);
else
fprintf(' Class %d has %d points in it and the centroid is at (%.1f, %.1f, %.1f).\n', thisClassNumber, numPointsInThisCluster, meanx, meany, meanz);
promptMessage = sprintf('This is class #%d of %d.\nIt has %d points in it.\nDo you want to Continue processing,\nor Quit processing?', ...
thisClassNumber, max(classNumbers), numPointsInThisCluster);
caption = sprintf('%d points in cluster %d.', numPointsInThisCluster, thisClassNumber);
title(caption, 'FontSize', fontSize);
end
titleBarCaption = 'Continue?';
buttonText = questdlg(promptMessage, titleBarCaption, 'Continue', 'Quit', 'Continue');
if contains(buttonText, 'Quit', 'IgnoreCase', true)
break; % or return or continue.
end
end
hold off;
% Update title with number of clusters.
numClusters = sum(ucn > 0);
numClusteredPoints = sum(isACorePoint);
numIsolatedPoints = sum(~isACorePoint);
caption = sprintf('Found %d clusters. Yellow spot at the cluster centroids.\n%d points are in numbered clusters.\n%d red points are not in any cluster.', ...
numClusters, numClusteredPoints, numIsolatedPoints);
title(caption, 'FontSize', fontSize, 'Interpreter', 'None');
g = gcf;
g.WindowState = 'maximized'; % May not work in earlier versions of MATLAB.
  5 Comments
stefano chiappini
stefano chiappini on 27 Sep 2021
Hi, i have a dubt: How i can to export a .txt file formed by x,y,z and clusters?
I would like to upload the output in CloudCompare software.
Thank you so much

Sign in to comment.


stefano chiappini
stefano chiappini on 30 Aug 2021
Hi, i have the same problem with a 3x3 matrix about olive tree point cloud. I want clustering only individual tree applying "Euclidean Clustering". How can I apply the script described above?
I attach the .txt file of input data.
Thank you so much.
  6 Comments
Image Analyst
Image Analyst on 31 Aug 2021
@stefano chiappini I played around with your data for you for about 2 hours. Like I suspected, dbscan is just not a good method. This is how far I got. Feel free to try different search radii and see what classes you can get. Most of the time the vast majority of points are either in no class or the same class, as I suspected in advance.
% dbscan attempt with Stefano's data by Image Analyst
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 20;
%--------------------------------------------------------------------------------------------------------
% READ IN DATA.
data = readmatrix('test.txt');
x = data(:, 1);
y = data(:, 2);
z = data(:, 3);
% Subtract means
x = x - mean(x);
y = y - mean(y);
z = z - mean(z);
%--------------------------------------------------------------------------------------------------------
% Display the data.
subplot(1, 2, 1);
plot3(x, y, z, '.', 'MarkerSize', 3);
grid on;
xlabel('Column 1');
ylabel('Column 2');
zlabel('Column 3');
title('Original data not classified yet.');
hFig = gcf;
hFig.WindowState = 'maximized'; % May not work in earlier versions of MATLAB.
drawnow;
%--------------------------------------------------------------------------------------------------------------------
% Ask user for search radius.
defaultValue = {'0.3'};
titleBar = 'Enter search radius';
userPrompt = {'Enter search radius: '};
caUserInput = inputdlg(userPrompt, titleBar, 1, defaultValue);
if isempty(caUserInput),return,end % Bail out if they clicked Cancel.
% Convert to floating point from string.
usersValue1 = str2double(caUserInput{1})
% Check usersValue1 for validity.
if isnan(usersValue1)
% They didn't enter a number.
% They clicked Cancel, or entered a character, symbols, or something else not allowed.
% Convert the default from a string and stick that into usersValue1.
usersValue1 = str2double(defaultValue{1});
message = sprintf('I said it had to be a number.\nTry replacing the user.\nI will use %.2f and continue.', usersValue1);
uiwait(warndlg(message));
end
searchRadius = usersValue1;
%--------------------------------------------------------------------------------------------------------------------
% Do clustering with the "dbscan" algorithm.
% [classNumbers, corepts] = dbscan(distances, searchRadius, minPointsPerCluster, 'Distance','precomputed')
% searchRadius = 0.01; % It's in the same cluster if the point is within this of other points.
minPointsPerCluster = 2; % We need to have at least this many point to be considered a valid cluster.
startTime = tic;
fprintf('Starting dbscan() function at %s.\n', datestr(now));
[classNumbers, isACorePoint] = dbscan(data, searchRadius, minPointsPerCluster);
elapsedSeconds = toc(startTime);
fprintf('Done with dbscan() function after %.1f seconds.\nFound %d clusters.\n', elapsedSeconds, max(classNumbers));
%--------------------------------------------------------------------------------------------------------------------
% classNumbers = classNumbers - min(classNumbers) + 1 % Convert negative class numbers to all positive class numbers.
ucn = unique(classNumbers);
% Get a list of colors so that each data point / cluster will have its own unique color.
if length(ucn) <= 7
colorList = lines(length(ucn));
else
colorList = spring(length(ucn));
end
% Put the class label next to each point.
subplot(1, 2, 2);
% Plot each class in a different color.
% hold on;
for k = 1 : length(ucn)
thisClassNumber = ucn(k);
thisClassesIndexes = classNumbers == thisClassNumber;
xSubset = x(thisClassesIndexes);
ySubset = y(thisClassesIndexes);
zSubset = z(thisClassesIndexes);
% Plot all the points in this class.
plot3(xSubset, ySubset, zSubset, '.', 'Color', colorList(k,:), 'MarkerSize', 1);
% Find the mean (centroid).
meanx = mean(xSubset);
meany = mean(ySubset);
meanz = mean(zSubset);
numPointsInThisCluster = length(xSubset);
% Plot the centroid in yellow.
hold on;
plot3(meanx, meany, meanz, 'y.', 'MarkerSize', 30);
grid on;
hold off;
if thisClassNumber == -1
fprintf(' Class %d (not in any cluster) has %d points in it and the centroid is at (%.1f, %.1f, %.1f).\n', thisClassNumber, numPointsInThisCluster, meanx, meany, meanz);
promptMessage = sprintf('This is class #%d of %d.\nThese are not in any cluster.\nIt has %d points in it.\nDo you want to Continue processing,\nor Quit processing?', ...
thisClassNumber, max(classNumbers), numPointsInThisCluster);
caption = sprintf('%d isolated points not in any cluster', numPointsInThisCluster);
title(caption, 'FontSize', fontSize);
else
fprintf(' Class %d has %d points in it and the centroid is at (%.1f, %.1f, %.1f).\n', thisClassNumber, numPointsInThisCluster, meanx, meany, meanz);
promptMessage = sprintf('This is class #%d of %d.\nIt has %d points in it.\nDo you want to Continue processing,\nor Quit processing?', ...
thisClassNumber, max(classNumbers), numPointsInThisCluster);
caption = sprintf('%d points in cluster %d.', numPointsInThisCluster, thisClassNumber);
title(caption, 'FontSize', fontSize);
end
titleBarCaption = 'Continue?';
buttonText = questdlg(promptMessage, titleBarCaption, 'Continue', 'Quit', 'Continue');
if contains(buttonText, 'Quit', 'IgnoreCase', true)
break; % or return or continue.
end
end
% Update title with number of clusters.
numClusters = sum(ucn > 0);
numClusteredPoints = sum(isACorePoint);
numIsolatedPoints = sum(~isACorePoint);
caption = sprintf('Found %d clusters. Yellow spot at the cluster centroids.\n%d points are in numbered clusters.\n%d red points are not in any cluster.', ...
numClusters, numClusteredPoints, numIsolatedPoints);
title(caption, 'FontSize', fontSize, 'Interpreter', 'None');
g = gcf;
g.WindowState = 'maximized'; % May not work in earlier versions of MATLAB.
stefano chiappini
stefano chiappini on 1 Sep 2021
Edited: stefano chiappini on 1 Sep 2021
HI, thank you so much for your wonderful help in my issue. Since now i have used only DBSCAN algorithm provided by Mathworks(https://it.mathworks.com/help/stats/dbscan.html). The output are similar these ones. I have last question: How make i plot all cluster inside a unique figure? Because in your script, it show only one cluster at time and i don't able to plot the remain clusters. Furthermore, How do i set different colour for each cluster on unique plot?
Thank you so much.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!