Error in calculated dendrogram (compared with R)

8 views (last 30 days)
I am trying to make a dendrogram in Matlab based on a dissimilarity matrix X. I converted it into a format that linkage would accept by using Y=squareform(X). I get the following vector:
Y = [0 0 0 0 1.4476 0 0 0 0 0 0 0 0 0 1.7525 0 0 0 0 0 1.9634 11.3676 0 0 0 2.1579 14.4957 0 0 0 0 0 0 0 0 0];
I then tried to plot a dendrogram using complete linkage:
dendrogram(linkage(Y,'complete'))
and get the following result:
wrong_dendrogram.jpg
However, when I use the dissimilarity matrix X in R (also with the complete linkage method) I get a completely different dendrogram:
real_dendrogram.jpg
The dendrogram made with R makes a lot more sense with the input data than the matlab one.
Any idea why they are so different? And is there a way to reproduce it in matlab? I am trying to make an automized script that should create the dendrogram in the end, so having to switch programs every time would not be ideal.

Answers (1)

Darshak
Darshak on 24 Jun 2025
This is actually a very common issue when switching between R and MATLAB for hierarchical clustering. Even though both use similar terminology and functions, there are some subtle but impactful differences in how they treat the dissimilarity inputs — especially zero values and default leaf ordering in the dendrogram. Took me a while to figure this out too.
Some points to investigate that might help replicate the R result more accurately in MATLAB:
  • Zero off-diagonal values in the dissimilarity matrix can cause unintended behavior in MATLAB. The “linkage” function treats zeros as perfect matches (i.e., identical observations), which leads to clustering them immediately. This might not be how R handles them unless explicitly instructed.
  • It is important to make sure that the dissimilarity matrix you are passing to “squareform” does not have zero values anywhere except the diagonal (unless they are truly identical samples). If those zeros are just placeholders or artifacts, replacing them with a small value like “eps” can prevent misleading clustering.
  • Before passing the matrix to “linkage”, you can use something like the following loop to replace such values:
for i = 1:size(X,1)
for j = 1:size(X,2)
if i ~= j && X(i,j) == 0
X(i,j) = eps; % very small positive number
end
end
end
  • After converting to vector form with “squareform”, go ahead with the “linkage”:
Y = squareform(X);
Z = linkage(Y, 'complete');
  • MATLAB and R also use different strategies for leaf ordering by default. In MATLAB, you can get a cleaner leaf order using
dendrogram(Z, 0, 'Reorder', optimalleaforder(Z, Y));
The function “optimalleaforder” can help minimize crossings and give a more interpretable dendrogram that aligns more closely with R's visual structure.
A few links from the MATLAB documentation worth keeping bookmarked:
If there are still discrepancies, it could be worth printing the Z matrix in both environments and comparing step-by-step how the clusters are formed.

Categories

Find more on Creating and Concatenating Matrices in Help Center and File Exchange

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!