Davies-Bouldin, Dunn, and silhouette indices of cluster quality

June 17, 2026

Determining if one cluster solution is better than another can be challenging. A common approach is to calculate some kind of ratio between intra-cluster spread to inter-cluster spread. Ideally, things in the same cluster are all very close to each other (low intra-cluster spread) and things in different clusters are all very far from each other (high inter-cluster spread).

The bad thing (IMO) about this type of ratio-based approach is that it isn’t directly tied to the usefulness of a cluster solution. It’s possible for a solution splitting athletes by height to have a better ratio than one splitting them by play style, even though the latter might be much more useful for Moneyball-style decision making. The good thing about this type of approach is that it’s easy to calculate. Here are a few popular examples.

The Davies-Bouldin index

The Davies-Bouldin index is one example of this approach. Let’s say we’re given the following properties about a cluster solution:

The spread of each cluster: ( $𝑆$ )
The distances between every pair of clusters: ( $𝑀$ )

For any cluster $𝐶_{𝑖}$ , we know how spaced out it is based on the value $𝑆_{𝑖}$ . For any two clusters $𝐶_{𝑖}$ and $𝐶_{𝑗}$ , we know how far apart they are based on the value $𝑀_{𝑖 𝑗}$ .

We want to end up with a single ratio. How’s this:

𝑅_{𝑖 𝑗} = \frac{𝑆_{𝑖} + 𝑆_{𝑗}}{𝑀_{𝑖 𝑗}}

$𝑅$ is the sum of the two clusters’ spreads divided by their distance. Reducing $𝑅$ can be achieved by placing the two clusters farther apart or making either of the clusters denser.

It looks like a pretty decent measure so far. So… can we just average all the $𝑅$ s from all the pairs of clusters and call it a day? Yes! We can do anything! There aren’t any rules out here. But, we can do better.

The problem with this metric as is is that it accounts for a lot of unnecessary junk. Consider the green cluster below.

Davies and Bouldin figured that there’s no use in celebrating the separation between the green and red clusters when the green and purple clusters are so much closer together. It is the green cluster’s $𝑅$ value with the purple cluster (the closest one) that should define how neat of a cluster it is.

So they created a bonus intermediate value, $𝑅_{𝑖}$ , defined as the worst (biggest) $𝑅_{𝑖 𝑗}$ value for each cluster. Taking the average of all $𝑅_{𝑖}$ , we get $\bar{𝑅}$ , the Davies-Bouldin index:

\bar{𝑅} \equiv \frac{1}{𝑁} \sum_{𝑖}^{𝑁} 𝑅_{𝑖}

Conceptually, that’s all there is to it. To practically implement this thing, we still need formulae for $𝑆$ and $𝑀$ . We’ll start with the $𝑆$ presented in the original Dave & Buster’s paper [1]. Let’s let:

$| 𝐶_{𝑖} |$ be the number of observations in cluster $𝐶_{𝑖}$
$𝑋_{𝑘}$ be the $𝑘$ ^th observation in cluster $𝐶_{𝑖}$
$𝐴_{𝑖}$ be the middle (“centroid”) of cluster $𝐶_{𝑖}$

Then we could calculate $𝑆_{𝑖}$ (the spread) of cluster $𝐶_{𝑖}$ as:

𝑆_{𝑖} = \frac{1}{| 𝐶_{𝑖} |} \sum_{𝑘 = 1}^{| 𝐶_{𝑖} |} dist (𝑋_{𝑘}, 𝐴_{𝑖})

I.e., the average Euclidean distance between every point in the cluster to its centroid.

I often have to calculate quality measures using nothing more than a pre-computed distance matrix of observations, so I have to do things a little differently. One option is to take the average of all inter-observation distances within a cluster:

𝑆_{𝑖} = \frac{2}{| 𝐶_{𝑖} | \times (| 𝐶_{𝑖} | - 1)} \sum_{𝑗 \neq 𝑘} dist (𝑋_{𝑗}, 𝑋_{𝑘}) where 𝑋_{𝑗}, 𝑋_{𝑘} \in 𝐶_{𝑖}

Don’t worry for now if you’re not sure where the 2 and the other stuff scaling the summation came from.

Another option is to take the maximum of all inter-observation distances within a cluster:

𝑆_{𝑖} = max_{𝑗, 𝑘} (dist (𝑋_{𝑗}, 𝑋_{𝑘}))

We next need to pick an $𝑀$ formula. The recommendation in the paper is the distance between the centroids of the two clusters. If you, like me, have nothing but a distance matrix to work from, you might want to consider some of the options below.

Let $𝑋_{𝑘}$ be the $𝑘$ ^th observation in cluster $𝐶_{𝑖}$ and $𝑌_{𝑙}$ be the $𝑙$ ^th observation in cluster $𝐶_{𝑗}$ .

Single-linkage: The smallest distance between $𝐶_{𝑖}$ and $𝐶_{𝑗}$ :

𝑀_{𝑖 𝑗} = min_{𝑘, 𝑙} (dist (𝑋_{𝑘}, 𝑌_{𝑙}))

Complete-linkage: The largest distance between $𝐶_{𝑖}$ and $𝐶_{𝑗}$ :

𝑀_{𝑖 𝑗} = max_{𝑘, 𝑙} (dist (𝑋_{𝑘}, 𝑌_{𝑙}))

Average: The average of all inter-distances between $𝐶_{𝑖}$ and $𝐶_{𝑗}$ :

𝑀_{𝑖 𝑗} = \frac{1}{| 𝐶_{𝑖} ‖ 𝐶_{𝑗} |} \sum_{𝑘, 𝑙} (dist (𝑋_{𝑘}, 𝑌_{𝑙}))

Where, $| 𝐶_{𝑖} |$ and $| 𝐶_{𝑗} |$ are the sizes of clusters $𝐶_{𝑖}$ and $𝐶_{𝑗}$ respectively.

Hausdorff: This one’s a mouthful. For every observation in cluster $𝐶_{𝑖}$ , find its shortest distance to cluster $𝐶_{𝑗}$ . Take the max of that and call it $𝐷_{𝑖 𝑗}$ . Do the same for every observation in cluster $𝐶_{𝑗}$ to cluster $𝐶_{𝑖}$ and call that $𝐷_{𝑗 𝑖}$ . Finally, take the max of $𝐷_{𝑖 𝑗}$ and $𝐷_{𝑗 𝑖}$ . I strongly encourage you to avert your gaze from the equation below; it does nothing but bestow on this calculation a flying facade of danger:

max (max_{𝑘} (min_{𝑙} (dist (𝑋_{𝑘}, 𝑌_{𝑙}))), max_{𝑙} (min_{𝑘} (dist (𝑌_{𝑙}, 𝑋_{𝑘})))

Horrible.

The good news is, we’re done! You now have everything you need to work out the Davies-Bouldin index calculation for yourself.

One implementation note to watch out for: don’t forget to take your averages properly. The clv package’s calculation of the Davies-Bouldin index (whose tragic removal from CRAN has catalyzed this adventure) made a small slip up on this. When calculating mean intra-cluster distance from a data matrix, the package uses the C code:

average_intracluster[cluster_i] +=
    2*(dist/(cluster_size[cluster_i]*(cluster_size[cluster_i]-1)));

cluster_scatter.c: Line 193

But when calculating the same thing from a distance matrix, the package uses:

average_intracluster[cluster_i] +=
    (dist/(cluster_size[cluster_i]*(cluster_size[cluster_i]-1)));

cluster_scatter.c: Line 433

$𝑁 \times 𝑁$ self-distance matrices are tricky in that their diagonals are useless and half of their off-diagonals are duplicates of the other half.

If you subtract off the diagonals and do the appropriate halving, the number of terms to divide over for the average is:

\frac{𝑁^{2} - 𝑁}{2}

For both of the snippets presented above, dist is the sum of the upper triangle of a cluster_size[cluster_i] $\times$ cluster_size[cluster_i] distance matrix. If you’re bored, you can try to figure out which of the two snippets above is correct.

The Dunn index

Once you understand the Davies-Bouldin index, you get to understand the Dunn index for free. It’s the smallest inter-cluster distance divided by the largest intra-cluster distance [2]:

\frac{{min}_{𝑖, 𝑗, 𝑖 \neq 𝑗} 𝑀_{𝑖 𝑗}}{{max}_{𝑖} 𝑆_{𝑖}}

I dare say I find this index a little less impressive than the former. While the Davies-Bouldin index captures information about every cluster involved, the Dunn index exclusively focuses on the worst clusters. If you had a 10 cluster solution where 8 clusters were spectacularly compact and well-separated but two were fuzzy clouds right beside each other, it would appear awful by the Dunn index but a rightfully quite good by the Davies-Bouldin index.

The silhouette

The most powerful and complicated of the tailed beasts is the lowercased silhouette value. The silhouette value (or “score”) was created by Peter Rousseeuw, a statistics professor currently at Katholieke Universiteit Leuven in Belgium. He seems to have constructed all sorts of zany methods apart from this one. But who could be surprised after knowing that his PhD advisor’s advisor’s advisor’s advisor’s advisor’s advisor’s advisor’s advisor’s two advisors were Laplace and Lagrange?

Rather than calculating a spread value for each cluster, the silhouette score starts from a per-observation average distance to all other observations in its own cluster. If a cluster has $𝑁$ observations, you’ll end up with $\frac{𝑁^{2} - 𝑁}{2}$ of these averages.

This time, let’s call $𝑋_{𝑘}^{𝑖}$ the $𝑘$ ^th observation in cluster $𝐶_{𝑖}$ and $𝑋_{𝑙}^{𝑗}$ the $𝑙$ ^th observation in cluster $𝐶_{𝑗}$ . Sorry for the change-up on the indexing; it’ll make things easier.

The average distance of $𝑋_{𝑘}^{𝑖}$ to all the other observations in its own cluster $𝐶_{𝑖}$ is denoted by $𝑎 (𝑘)$ :

𝑎 (𝑘) = \frac{1}{| 𝐶_{𝑖} | - 1} \sum_{𝑙, 𝑙 \neq 𝑘} dist (𝑋_{𝑘}^{𝑖}, 𝑋_{𝑙}^{𝑖})

Repeat for all $𝑋_{𝑘}$ to get all your $𝑎$ values.

The average distance of $𝑋_{𝑘}^{𝑖}$ to all the observations in a separate cluster $𝐶_{𝑗}$ is:

\frac{1}{| 𝐶_{𝑗} |} \sum_{𝑙} dist (𝑋_{𝑘}^{𝑖}, 𝑋_{𝑙}^{𝑗})

This handsome, nameless set of values tells us how far apart $𝐶_{𝑖}$ is from $𝐶_{𝑗}$ , much like the $𝑀$ values from before. If we minimize it across all clusters $𝐶_{𝑗}$ where $𝑗 \neq 𝑖$ , we’ll have found which cluster is specifically closest to our observation $𝑋_{𝑘}^{𝑖}$ of cluster $𝐶_{𝑖}$ . This is the $𝑏$ value:

𝑏 (𝑘) = min_{𝑗} \frac{1}{| 𝐶_{𝑗} |} \sum_{𝑙} dist (𝑋_{𝑘}^{𝑖}, 𝑋_{𝑙}^{𝑗})

Repeat for all $𝑋_{𝑘}$ to get all your $𝑏$ values.

To recap, for each of our data points, we have calculated a single $𝑎$ value measuring its average distance to every other point in its cluster and a single $𝑏$ value measuring its shortest distance to its closest cluster.

The hard part is done now. Each data point gets a silhouette score calculated as:

𝑠 (𝑘) = \frac{𝑏 (𝑘) - 𝑎 (𝑘)}{max (𝑎 (𝑖), 𝑏 (𝑖))}

Unless an entire cluster has only a single data point, in which case that data point gets a silhouette score of 0.

This is a pretty neat measure and the most information-rich of the three discussed here.

Summary

Today we learned:

The Dunn index and Davies-Bouldin index are nearly identical to calculate
Cluster quality measures can be calculated many different ways because you have flexibility in what you consider to be distance
The silhouette score reveals information about every single data point, the Davies-Bouldin index reveals information about every single cluster, and the Dunn index reveals information about the worst clusters

Happy implementing!

Bibliography

[1] D. L. Davies and D. W. Bouldin, “A Cluster Separation Measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-1, no. 2, pp. 224–227, Apr. 1979, doi: 10.1109/TPAMI.1979.4766909.
[2] J. C. Dunn, “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters,” Journal of Cybernetics, vol. 3, no. 3, pp. 32–57, Jan. 1973, doi: 10.1080/01969727308546046.