Extracting clusters from seaborn clustermap

Question

While using result.linkage.dendrogram_col or result.linkage.dendrogram_row will currently work, it seems to be an implementation detail. The safest route is to first compute the linkages explicitly and pass them to the clustermap function, which has row_linkage and col_linkage parameters just for that.

Replacing the last line in your example (result = …) with the following code gives the same result as before, but you will also have row_linkage and col_linkage variables that you can use with fcluster etc.

from scipy.spatial import distance
from scipy.cluster import hierarchy

correlations = df.corr()
correlations_array = np.asarray(df.corr())

row_linkage = hierarchy.linkage(
    distance.pdist(correlations_array), method='average')

col_linkage = hierarchy.linkage(
    distance.pdist(correlations_array.T), method='average')

sns.clustermap(correlations, row_linkage=row_linkage, col_linkage=col_linkage, row_colors=network_colors, method="average",
               col_colors=network_colors, figsize=(13, 13), cmap=cmap)

In this particular example, the code could be simplified more since the correlations array is symmetric and therefore row_linkage and col_linkage will be identical.

Note: A previous answer included a call to distance.squareshape according to what the code in seaborn does, but that is a bug.

Leave a Comment Cancel reply