.BG
.FN hierclust
.TL
Hierarchical Clustering
.DN
Agglomerates the closest pair of observations, and replaces them with
a cluster.  Agglomerates the next closest pair of observations or clusters.
Continues through `n-1' agglomerations, assuming there are `n' observations.
The output is an object of class `hierarchy'.
.CS
hierclust(a, method=1, bign = F, diagnostics=F, sim=" ",
             movie=F, option="thru",alpha=1.0,repres="chull", 
             show="change")
.sp
or
.sp
hierclust(dist, method="compact", sim=)
.RA
.sp
First variant:
.sp
.AG a
data matrix to be analyzed, the rows representing observations and the
columns variables; or, alternatively, lower-diagonal dissimilarity matrix in
vector or list format.  In latter case, the number of values is `n(n-1)/2' 
where `n' is the number of observations.
.OA
.sp
First variant:
.sp
.AG method
First possibility: integer taking the value between 1 and 7.  These methods 
correspond to:
Ward's minimum variance or error sum of squares method (1); single linkage
or nearest neighbor method (2); complete linkage or diameter (3); average
linkage, group average, or UPGMA method (4); McQuitty's or WPGMA method
(5); median, Gower's or WPGMC method (6); and centroid or UPGMC method (7).
If the `movie' option is used, only `method's 1 to 4 are supported.
Second possibility: string `compact' (default; complete linkage), 
`average' (average linkage), or `connected' (single linkage).
.AG sim
structure giving similarities rather than dissimilarities.  This can either
be a symmetric matrix or a vector with a `Size' attribute.  Only one of 
dissimilarities or similarities may be specified (or, of course, as an
alternative, a data matrix.
.AG bign
is `n' big?  If storage is problemsome, a different implementation of the 
Ward criterion may be tried.  This determines dissimilarities on the fly, and
hence requires `O(n)' storage.
.AG diagnostics
show, or suppress, some information directed to the command window. 
Suppression is useful if functional programming capabilities are availed of,
leading to lazy evaluation; this often results in multiple evaluations, 
with consequent repetition of diagnostic information.
.AG movie
whether interactive display of construction of hierarchy is wanted.  This
necessitates an open plot window.  It does not produce an output object.  
All remaining arguments apply only in the case of the `movie=T'.
.AG option
(Only for `movie'.) "thru" (default) or "prompt": Go right thru all n-1 
agglomerations, or ask
the user whether to continue at each agglomeration?  Here is what we
recommend: On the first occasion, go right through.  Check out the cluster
criterion values reported on in the command window.  Then using the 'prompt'
option, go as far as the desired partition, and print it out.
.AG alpha
(Only for `movie'.) 1.0 (default) or 0.0 < alpha <= 1.0.
Only used by the minimum variance agglomerative criterion. Alpha downweights
the variance component on the principal axis of the cluster, and thereby
redefines the cluster criterion such that linear clusters are preferred.  A
value of alpha = 1.0 gives exactly the traditional Ward's minimum variance
method.  See Murtagh and Raftery (1984) for more on this altered criterion.
.AG repres
(Only for `movie'.) repres = "chull" (default), "lines", "gauss":
Representation of clusters: For input data which has dimensionality greater
than 2, only "chull" is allowed (since the representations were found to be
too problematic, given that the clustering takes place in m-dimensions, and
the the representation is necessarily a 2-dimensional projection of this).
Representation "chull" = convex hull, or straight line in case
of collinear points.  (We test for collinearity using principal
components analysis.)
Representation "lines" = lines of best fit (least squares fit is used,
and the extremities are defined by the intersection of the min. and max. x
values on this LS fit line). "gauss" = 1-sigma circles.  Side-effect of
latter: warning messages indicate whenever circles overlap plot boundaries.
.AG show
(Only for `movie'.) "change" or "all":
In this method's present implementation, the clusters at each stage are
`highlighted' by means of a different symbol.  Option show = "change" has
this highlighting changed at each agglomeration, so that only the most recent
agglomerands are shown.  Option show = "all" does not turn off the
highlighting in this way.
.RA
.sp
Second variant (which re-wraps S function `hclust', adding appropriate 
attributes, including class `hierarchy': cf. help file on `hclust'):
.sp

Exactly one of dist or sim must be specified.
.sp
.AG dist
a distance structure or distance matrix.  Normally this
will be the result of the function `distance', but it can be any
data of the form returned by `distance', or  a  full,  symmetric
matrix.  Missing values are not allowed.
.OA
.AG method
character string giving  the  clustering  method.   The
three  methods  currently implemented are "average", "con-
nected" (single linkage) and "compact" (complete linkage).
(The first three characters of the method are sufficient.)
.AG sim=
structure giving similarities  rather  than  distances.
This  can  either be a symmetric matrix or a vector with a
"Size" attribute.  Missing values are not allowed.
.RT
When `movie' is not specified in the first variant, this is a 
list describing the hierarchical clustering of the observations.
The `movie' option is considerably slower than alternative options, due
to the processing involved.  Hence its use is recommended only on small
data sets (less than 100 observations).
.RC merge
an `n-1' by 2 matrix. Row `i' of `merge' describes the merging of clusters
at step `i' of the clustering.  If an element `j' in the row is negative,
then observation `-j' was merged at this stage.  If `j' is positive 
then the merge
was with the cluster formed at the (earlier) stage `j' of the algorithm. Thus
negative entries in `merge' indicate agglomerations of singletons, and 
positive entries indicate agglomerations of non-singletons.
.RC height 
a set of `n-1' non-decreasing real values.  The
clustering "height": that is, the value of the criterion associated with
the clustering `method' for the particular agglomeration.
.RC order
a vector giving the permutation of the original observations suitable for 
plotting,
in the sense that a cluster plot using this ordering and matrix `merge' will
not have crossings of the branches.
.PP
In hierarchical cluster displays, a decision is needed at each merge to
specify which subtree should go on the left and which on the right.  Since, 
for 
`n' observations there are `n-1' merges, there are `2^(n-1)' possible
orderings for the leaves in a cluster tree, or dendrogram.  The default
algorithm in `hc' is to order the subtree so that the tighter cluster is on
the left (the last, i.e. most recent, merge of the left subtree is at a lower 
value than the last
merge of the right subtree).  Observations are the tightest clusters possible,
and merges involving two observations place them in order by their 
observation sequence number.
.PP
Attributes associated with the returned hierarchy include: the class,
`"hierarchy"'; the original data name; and the metric used (the latter
being implemented, so far, mainly for Euclidean distance); the call which
created the object; and the agglomerative clustering criterion used.
.SH NOTE
The squared Euclidean distance is used in the `movie' case, and in the case
when a data matrix is input. Note that the `movie' option returns 
invisibly.   
.SH METHOD
In the case of all methods when `bign = F' (default), 
the Lance-Williams dissimilarity update formula is used to allow the 7 methods
to be implemented.   
.PP
In the direct data matrix case, 
a list of nearest neighbors is maintained, which 
allows the closest pair of clusters or observations to be determined by
scanning this list to find the pair of agglomerands on each iteration.
Hence `O(n)' operations are required for each of `n-1' iterations, i.e.
`O(n^2)' computation time in total.  The 
initial setting up of the `n' nearest neighbors also requires `O(n^2)' time.
The processing to be carried out following each agglomeration is `O(n)'.
In total, all methods implemented in `hc' are of time complexity `O(n^2)'.
Unless `bign = T' (see below), storage 
complexity is `O(n^2)'.  Typical times on a Sun SPARCstation
1 for 400x4 and 800x4 arrays: 12 and 48 secs.
.PP
Ward's minimum variance method aims at finding compact, spherical clusters.
The complete link method finds similar clusters. The single link method,
which is closely related to the minimal spanning tree, adopts a 
`friends of friends' clustering strategy.  
The other methods can be regarded as aiming
for clusters with characteristics somewhere between the single and complete
link methods. 
.PP
In the case of the Ward method, and `bign = T', the nearest-neighbor 
chain algorithm is used, which gives an `O(n^2)'
computation time algorithm. The input matrix is stored, but not the
dissimilarities produced.  Computation time is marginally inferior to the
implementation of Ward's method when `bign = F' (default).  Sample 
times for a 400x4 and an 800x4
array on a Sun SPARCstation 1: 12.5 and 50 secs.
.SH REFERENCES
A small sample: 
.sp
B. Everitt, 
.ul
Cluster Analysis
Heinemann Educ. Books, London 1974; 
.sp
P.H.A. Sneath and R.R. Sokal, 
.ul
Numerical Taxonomy
Freeman, San Francisco, 1973; 
.sp
M.R. Anderberg, 
.ul
Cluster Analysis for Applications
Academic Press, New York, 1973;
.sp
A.D. Gordon, 
.ul
Classification
Chapman and Hall, London, 1981; and
.sp
F. Murtagh, 
.ul
Multidimensional Clustering Algorithms
COMPSTAT Lectures 4,
Physica-Verlag, Wuerzburg, 1985 (for algorithmic details of algorithms used).
.SA
`plot', `summary', `print' to look at hierarchical clustering results.
'distances' to produce (dis)similarity input.  `members' may be used to 
obtain cluster assignments of all observations.  Function `hclust' is
included in `hierclust', with additing of appropriate attributes.  In a
similar way, function `distance' supersedes `dist'.
.EX
# Default Ward criterion
h <- hierclust(a)
plot(h)
# Real-time display of clustering
hierclust(a, movie=T)
# Single link criterion
h <- hierclust(distances(a), method=connected)
# Get cluster assignments corresponding to 5-cluster partition:
k <- members(h)
k$class[,4]
# Large `n'
h <- hierclust(a, bign=T)
# An example of the incorporated `hclust' function, which allows processing
# of dissimilarity data:
hierclust(distance(a))
.KW cluster
.KW multivariate
.WR