Davies-Bouldin, Dunn, and silhouette indices of cluster quality

June 17, 2026

Determining if one cluster solution is better than another can be challenging. A common approach is to calculate some kind of ratio between intra-cluster spread to inter-cluster spread. Ideally, things in the same cluster are all very close to each other (low intra-cluster spread) and things in different clusters are all very far from each other (high inter-cluster spread).

The bad thing (IMO) about this type of ratio-based approach is that it isn’t directly tied to the usefulness of a cluster solution. It’s possible for a solution splitting athletes by height to have a better ratio than one splitting them by play style, even though the latter might be much more useful for Moneyball-style decision making. The good thing about this type of approach is that it’s easy to calculate. Here are a few popular examples.

The Davies-Bouldin index

The Davies-Bouldin index is one example of this approach. Let’s say we’re given the following properties about a cluster solution:

  1. The spread of each cluster: (𝑆)
  2. The distances between every pair of clusters: (𝑀)

For any cluster 𝐶𝑖, we know how spaced out it is based on the value 𝑆𝑖. For any two clusters 𝐶𝑖 and 𝐶𝑗, we know how far apart they are based on the value 𝑀𝑖𝑗.

We want to end up with a single ratio. How’s this:

𝑅𝑖𝑗=𝑆𝑖+𝑆𝑗𝑀𝑖𝑗

𝑅 is the sum of the two clusters’ spreads divided by their distance. Reducing 𝑅 can be achieved by placing the two clusters farther apart or making either of the clusters denser.

It looks like a pretty decent measure so far. So… can we just average all the 𝑅s from all the pairs of clusters and call it a day? Yes! We can do anything! There aren’t any rules out here. But, we can do better.

The problem with this metric as is is that it accounts for a lot of unnecessary junk. Consider the green cluster below.

Davies and Bouldin figured that there’s no use in celebrating the separation between the green and red clusters when the green and purple clusters are so much closer together. It is the green cluster’s 𝑅 value with the purple cluster (the closest one) that should define how neat of a cluster it is.

So they created a bonus intermediate value, 𝑅𝑖, defined as the worst (biggest) 𝑅𝑖𝑗 value for each cluster. Taking the average of all 𝑅𝑖, we get 𝑅̄, the Davies-Bouldin index:

𝑅̄1𝑁𝑖𝑁𝑅𝑖

Conceptually, that’s all there is to it. To practically implement this thing, we still need formulae for 𝑆 and 𝑀. We’ll start with the 𝑆 presented in the original Dave & Buster’s paper [1]. Let’s let:

  1. |𝐶𝑖| be the number of observations in cluster 𝐶𝑖
  2. 𝑋𝑘 be the 𝑘th observation in cluster 𝐶𝑖
  3. 𝐴𝑖 be the middle (“centroid”) of cluster 𝐶𝑖

Then we could calculate 𝑆𝑖 (the spread) of cluster 𝐶𝑖 as:

𝑆𝑖=1|𝐶𝑖|𝑘=1|𝐶𝑖|dist(𝑋𝑘,𝐴𝑖)

I.e., the average Euclidean distance between every point in the cluster to its centroid.

I often have to calculate quality measures using nothing more than a pre-computed distance matrix of observations, so I have to do things a little differently. One option is to take the average of all inter-observation distances within a cluster:

𝑆𝑖=2|𝐶𝑖|×(|𝐶𝑖|1)𝑗𝑘dist(𝑋𝑗,𝑋𝑘)where𝑋𝑗,𝑋𝑘𝐶𝑖

Don’t worry for now if you’re not sure where the 2 and the other stuff scaling the summation came from.

Another option is to take the maximum of all inter-observation distances within a cluster:

𝑆𝑖=max𝑗,𝑘(dist(𝑋𝑗,𝑋𝑘))

We next need to pick an 𝑀 formula. The recommendation in the paper is the distance between the centroids of the two clusters. If you, like me, have nothing but a distance matrix to work from, you might want to consider some of the options below.

Let 𝑋𝑘 be the 𝑘th observation in cluster 𝐶𝑖 and 𝑌𝑙 be the 𝑙th observation in cluster 𝐶𝑗.

  1. Single-linkage: The smallest distance between 𝐶𝑖 and 𝐶𝑗:
𝑀𝑖𝑗=min𝑘,𝑙(dist(𝑋𝑘,𝑌𝑙))
  1. Complete-linkage: The largest distance between 𝐶𝑖 and 𝐶𝑗:
𝑀𝑖𝑗=max𝑘,𝑙(dist(𝑋𝑘,𝑌𝑙))
  1. Average: The average of all inter-distances between 𝐶𝑖 and 𝐶𝑗:
𝑀𝑖𝑗=1|𝐶𝑖𝐶𝑗|𝑘,𝑙(dist(𝑋𝑘,𝑌𝑙))

Where, |𝐶𝑖| and |𝐶𝑗| are the sizes of clusters 𝐶𝑖 and 𝐶𝑗 respectively.

  1. Hausdorff: This one’s a mouthful. For every observation in cluster 𝐶𝑖, find its shortest distance to cluster 𝐶𝑗. Take the max of that and call it 𝐷𝑖𝑗. Do the same for every observation in cluster 𝐶𝑗 to cluster 𝐶𝑖 and call that 𝐷𝑗𝑖. Finally, take the max of 𝐷𝑖𝑗 and 𝐷𝑗𝑖. I strongly encourage you to avert your gaze from the equation below; it does nothing but bestow on this calculation a flying facade of danger:
max(max𝑘(min𝑙(dist(𝑋𝑘,𝑌𝑙))), max𝑙(min𝑘(dist(𝑌𝑙,𝑋𝑘)))

Horrible.

The good news is, we’re done! You now have everything you need to work out the Davies-Bouldin index calculation for yourself.

One implementation note to watch out for: don’t forget to take your averages properly. The clv package’s calculation of the Davies-Bouldin index (whose tragic removal from CRAN has catalyzed this adventure) made a small slip up on this. When calculating mean intra-cluster distance from a data matrix, the package uses the C code:

average_intracluster[cluster_i] +=
    2*(dist/(cluster_size[cluster_i]*(cluster_size[cluster_i]-1)));

cluster_scatter.c: Line 193

But when calculating the same thing from a distance matrix, the package uses:

average_intracluster[cluster_i] +=
    (dist/(cluster_size[cluster_i]*(cluster_size[cluster_i]-1)));

cluster_scatter.c: Line 433

𝑁×𝑁 self-distance matrices are tricky in that their diagonals are useless and half of their off-diagonals are duplicates of the other half.

If you subtract off the diagonals and do the appropriate halving, the number of terms to divide over for the average is:

𝑁2𝑁2

For both of the snippets presented above, dist is the sum of the upper triangle of a cluster_size[cluster_i] × cluster_size[cluster_i] distance matrix. If you’re bored, you can try to figure out which of the two snippets above is correct.

The Dunn index

Once you understand the Davies-Bouldin index, you get to understand the Dunn index for free. It’s the smallest inter-cluster distance divided by the largest intra-cluster distance [2]:

min𝑖,𝑗,𝑖𝑗𝑀𝑖𝑗max𝑖𝑆𝑖

I dare say I find this index a little less impressive than the former. While the Davies-Bouldin index captures information about every cluster involved, the Dunn index exclusively focuses on the worst clusters. If you had a 10 cluster solution where 8 clusters were spectacularly compact and well-separated but two were fuzzy clouds right beside each other, it would appear awful by the Dunn index but a rightfully quite good by the Davies-Bouldin index.

The silhouette

The most powerful and complicated of the tailed beasts is the lowercased silhouette value. The silhouette value (or “score”) was created by Peter Rousseeuw, a statistics professor currently at Katholieke Universiteit Leuven in Belgium. He seems to have constructed all sorts of zany methods apart from this one. But who could be surprised after knowing that his PhD advisor’s advisor’s advisor’s advisor’s advisor’s advisor’s advisor’s advisor’s two advisors were Laplace and Lagrange?

Rather than calculating a spread value for each cluster, the silhouette score starts from a per-observation average distance to all other observations in its own cluster. If a cluster has 𝑁 observations, you’ll end up with 𝑁2𝑁2 of these averages.

This time, let’s call 𝑋𝑘𝑖 the 𝑘th observation in cluster 𝐶𝑖 and 𝑋𝑙𝑗 the 𝑙th observation in cluster 𝐶𝑗. Sorry for the change-up on the indexing; it’ll make things easier.

The average distance of 𝑋𝑘𝑖 to all the other observations in its own cluster 𝐶𝑖 is denoted by 𝑎(𝑘):

𝑎(𝑘)=1|𝐶𝑖|1𝑙,𝑙𝑘dist(𝑋𝑘𝑖,𝑋𝑙𝑖)

Repeat for all 𝑋𝑘 to get all your 𝑎 values.

The average distance of 𝑋𝑘𝑖 to all the observations in a separate cluster 𝐶𝑗 is:

1|𝐶𝑗|𝑙dist(𝑋𝑘𝑖,𝑋𝑙𝑗)

This handsome, nameless set of values tells us how far apart 𝐶𝑖 is from 𝐶𝑗, much like the 𝑀 values from before. If we minimize it across all clusters 𝐶𝑗 where 𝑗𝑖, we’ll have found which cluster is specifically closest to our observation 𝑋𝑘𝑖 of cluster 𝐶𝑖. This is the 𝑏 value:

𝑏(𝑘)=min𝑗1|𝐶𝑗|𝑙dist(𝑋𝑘𝑖,𝑋𝑙𝑗)

Repeat for all 𝑋𝑘 to get all your 𝑏 values.

To recap, for each of our data points, we have calculated a single 𝑎 value measuring its average distance to every other point in its cluster and a single 𝑏 value measuring its shortest distance to its closest cluster.

The hard part is done now. Each data point gets a silhouette score calculated as:

𝑠(𝑘)=𝑏(𝑘)𝑎(𝑘)max(𝑎(𝑖),𝑏(𝑖))

Unless an entire cluster has only a single data point, in which case that data point gets a silhouette score of 0.

This is a pretty neat measure and the most information-rich of the three discussed here.

Summary

Today we learned:

Happy implementing!

Bibliography