These functions cut hierarchical clusterings into flat clusterings
or find the roots of the forest formed by a cut by providing the flat
cluster ids of each observation.
fcluster (Z, t[, criterion, depth, R, monocrit]) |
Form flat clusters from the hierarchical clustering defined by the given linkage matrix. |
fclusterdata (X, t[, criterion, metric, …]) |
Cluster observation data using a given metric. |
leaders (Z, T) |
Return the root nodes in a hierarchical clustering. |
These are routines for agglomerative clustering.
linkage (y[, method, metric, optimal_ordering]) |
Perform hierarchical/agglomerative clustering. |
single (y) |
Perform single/min/nearest linkage on the condensed distance matrix y . |
complete (y) |
Perform complete/max/farthest point linkage on a condensed distance matrix. |
average (y) |
Perform average/UPGMA linkage on a condensed distance matrix. |
weighted (y) |
Perform weighted/WPGMA linkage on the condensed distance matrix. |
centroid (y) |
Perform centroid/UPGMC linkage. |
median (y) |
Perform median/WPGMC linkage. |
ward (y) |
Perform Ward’s linkage on a condensed distance matrix. |
These routines compute statistics on hierarchies.
cophenet (Z[, Y]) |
Calculate the cophenetic distances between each observation in the hierarchical clustering defined by the linkage Z . |
from_mlab_linkage (Z) |
Convert a linkage matrix generated by MATLAB(TM) to a new linkage matrix compatible with this module. |
inconsistent (Z[, d]) |
Calculate inconsistency statistics on a linkage matrix. |
maxinconsts (Z, R) |
Return the maximum inconsistency coefficient for each non-singleton cluster and its descendents. |
maxdists (Z) |
Return the maximum distance between any non-singleton cluster. |
maxRstat (Z, R, i) |
Return the maximum statistic for each non-singleton cluster and its descendents. |
to_mlab_linkage (Z) |
Convert a linkage matrix to a MATLAB(TM) compatible one. |
Routines for visualizing flat clusters.
dendrogram (Z[, p, truncate_mode, …]) |
Plot the hierarchical clustering as a dendrogram. |
These are data structures and routines for representing hierarchies as
tree objects.
ClusterNode (id[, left, right, dist, count]) |
A tree node class for representing a cluster. |
leaves_list (Z) |
Return a list of leaf node ids. |
to_tree (Z[, rd]) |
Convert a linkage matrix into an easy-to-use tree object. |
cut_tree (Z[, n_clusters, height]) |
Given a linkage matrix Z, return the cut tree. |
optimal_leaf_ordering (Z, y[, metric]) |
Given a linkage matrix Z and distance, reorder the cut tree. |
These are predicates for checking the validity of linkage and
inconsistency matrices as well as for checking isomorphism of two
flat cluster assignments.
is_valid_im (R[, warning, throw, name]) |
Return True if the inconsistency matrix passed is valid. |
is_valid_linkage (Z[, warning, throw, name]) |
Check the validity of a linkage matrix. |
is_isomorphic (T1, T2) |
Determine if two different cluster assignments are equivalent. |
is_monotonic (Z) |
Return True if the linkage passed is monotonic. |
correspond (Z, Y) |
Check for correspondence between linkage and condensed distance matrices. |
num_obs_linkage (Z) |
Return the number of original observations of the linkage matrix passed. |
Utility routines for plotting:
References
[R3] | Gower, JC and Ross, GJS. “Minimum Spanning Trees and Single Linkage
Cluster Analysis.” Applied Statistics. 18(1): pp. 54–64. 1969. |
[R4] | Ward Jr, JH. “Hierarchical grouping to optimize an objective
function.” Journal of the American Statistical Association. 58(301):
pp. 236–44. 1963. |
[R5] | Johnson, SC. “Hierarchical clustering schemes.” Psychometrika.
32(2): pp. 241–54. 1966. |
[R6] | Sneath, PH and Sokal, RR. “Numerical taxonomy.” Nature. 193: pp.
855–60. 1962. |
[R7] | Batagelj, V. “Comparing resemblance measures.” Journal of
Classification. 12: pp. 73–90. 1995. |
[R8] | Sokal, RR and Michener, CD. “A statistical method for evaluating
systematic relationships.” Scientific Bulletins. 38(22):
pp. 1409–38. 1958. |
[R9] | Edelbrock, C. “Mixture model tests of hierarchical clustering
algorithms: the problem of classifying everybody.” Multivariate
Behavioral Research. 14: pp. 367–84. 1979. |
- MATLAB and MathWorks are registered trademarks of The MathWorks, Inc.
- Mathematica is a registered trademark of The Wolfram Research, Inc.