How to cluster discrete data
Show older comments
Hi!
I have a database containing discrete features. For example, number of hairpinloops, number of elements, length of a sequence, the % of A nucleotides. Now I would like to apply some clustering algorithms. Does anyone know which algorithms in matlab are suited for discrete data?
Thanks a lot, Iene
Answers (1)
Purvaja
on 5 Feb 2025
There are various ways to obtain clusters. You can refer the following methods:
- K-Means clustering: The function “k-means" partitions data into k mutually exclusive clusters and returns the index of the cluster to which it assigns each observation. Requires number of clusters. (https://www.mathworks.com/help/stats/k-means-clustering.html )
[idx, C] = kmeans(data, k); % k is the number of clusters
- K-medoids Clustering: “K-medoids” is like “K-means” but is more robust to noise and outliers. Requires number of clusters too. (https://www.mathworks.com/help/stats/kmedoids.html)
[idx, C] = kmedoids(data, k); % k is the number of clusters
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Unlike “k-means” clustering, the ”DBSCAN” algorithm does not require prior knowledge of the number of clusters. It works with distance metrics and can be applied to discrete data.(https://www.mathworks.com/help/stats/dbscan-clustering.html)
epsilon = 0.5; % Distance threshold
minPts = 5; % Minimum number of points to form a cluster
idx = dbscan(data, epsilon, minPts);
- Gaussian Mixture Models (GMM): “GMM” clustering can accommodate clusters that have different sizes and correlation structures within them.(https://www.mathworks.com/help/stats/clustering-using-gaussian-mixture-models.html)
gm = fitgmdist(data, k); % k is the number of clusters
idx = cluster(gm, data);
To check out more methods, you can refer to the following resource:
You can also access release-specific documentation using these commands in your MATLAB command window:
web(fullfile(docroot, 'stats/k-means-clustering.html'))
web(fullfile(docroot, 'stats/kmedoids.html'))
web(fullfile(docroot, 'stats/dbscan-clustering.html'))
web(fullfile(docroot, 'stats/clustering-using-gaussian-mixture-models.html'))
Hope this helps you!
Categories
Find more on Statistics and Machine Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!