.BG
.FN distance
.TL
Distance Matrix Calculation
.DN
Returns a distance structure that represents all of the pairwise distances
between objects in the data.
The choices for the metric are currently restricted to 
`"euclidean"', `"maximum"', `"manhattan"', and `"binary"'.
.CS
distance(x, metric="euclidean")
.RA
.AG x
matrix (typically a data matrix).  The distances computed
will be among the rows of `x'.  Missing values (`NA's) are
allowed.
.OA
.AG metric
character string specifying the distance metric to be used.
The currently available options are `"euclidean"',
`"maximum"', `"manhattan"', and `"binary"'.
Euclidean distances are root sum-of-squares of differences,
`"maximum"' is the maximum difference, `"manhattan"' is the sum
of absolute differences, and `"binary"' is the proportion
of non-zeros that two vectors do not have in common (the number 
of occurrences of a zero and a one, or a one and a zero
divided by the number of times at least one vector has a one).
.RT
the distances among
the rows of `x'.  Since there are many distances
and since the result of `dist' is typically an argument to
`hclust' or `cmdscale', a vector is returned, rather than a symmetric matrix.
For `i' less than `j', the distance between row `i' and row `j' is 
element `nrow(x)*(i-1) - i*(i-1)/2 + j-i' of the result.
.PP
The returned object has a number of attributes:
.Co Size ,
giving
the number of objects, that is, `nrow(x)'.
The length of the vector that is returned is `nrow(x)*(nrow(x)-1)/2',
that is, it is of order `nrow(x)' squared.
.Co class ,
indicating the type of data returned.  Currently only `"disddat"' is
supported, indicating dissimilarity data.  Note that this latter is taken
here as a class, encompassing similarities and distances as well.
.Co origdata ,
giving the name of the original input data.
.Co metric ,
giving the option chosen for the `metric' parameter.
.DT
Missing values in a row of `x' are not included in any distances
involving that row.
If the metric is `"euclidean"' and `ng' is the number of columns in which
no missing values occur for the given rows, 
then the distance returned is `sqrt(ncol(x)/ng)' times the Euclidean
distance between the two vectors of length `ng' 
shortened to exclude `NA's.
The rule is similar for the `"manhattan"' metric, except that the coefficient
is `ncol(x)/ng'.
The `"binary"' metric excludes columns in which either row has an `NA'.
If all values for a particular distance are
excluded, the distance is `NA'.
.SH NOTE
This function incorporates the S function `dist'.  The only difference in the
present implementation is additional set of attributes associated with the
function's output.
.PP
If the columns of a matrix are in different units, it is usually advisable
to scale the matrix before using `distance'.  A column that is much more 
variable than the others will dominate the distance measure.
.SH BACKGROUND
Distance measures are used in cluster analysis and in multidimensional scaling.
The choice of metric may have a large impact.
.SH REFERENCES
Everitt, B. (1980).
.ul
Cluster Analysis (second edition). 
Halsted, New York.
.sp
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979).
.ul
Multivariate Analysis.
Academic Press, London.
.SA
`cmdscale', `hierclust', `hclust', `scale'.
.EX
dist(x,"max") # distances among rows by maximum
dist(t(x)) # distances among cols in Euclidean metric

# below is a function that converts a distance structure to a matrix
dist2full <- function(dis)
{
        n <- attr(dis, "Size")
        full <- matrix(0, n, n)
        for(rowi in seq(n - 1))
                for(colj in seq(from = rowi + 1, to = n)) {
                        full[rowi, colj] <- full[colj, rowi] <- dis[n * (rowi -
                                1) - (rowi * (rowi - 1))/2 + colj - rowi]
                }
        full
}
.KW multivariate
.KW cluster
.WR