Sample M-Dimensional Datasets for Cluster Analysis

SAMMON_DATA is a MATLAB program which generates 6 files of test data for multivariate data clustering.




The computer code and data files described and made available on this web page are distributed under the GNU LGPL license.

Related Data and Programs:

ASA113, a MATLAB library which implements the Banfield and Bassill clustering algorithm using transfers and swaps.

ASA136, a MATLAB library which implements the Hartigan and Wong clustering algorithm.

CLUSTER_ENERGY, a FORTRAN90 program which groups data into a given number of clusters to minimize the energy.

KMEANS, a MATLAB library which contains several different algorithms for the K-Means problem.

MARTINEZ, a dataset directory which contains datasets for computational statistics;

MDS, a dataset directory which contains datasets for M-dimensional scaling;

PCL, a dataset directory which contains datasets from a gene expression experiment on Arabidopsis, which are candidates for data cluster analysis;

SPAETH, a dataset directory which contains datasets for cluster analysis;

SPAETH, a FORTRAN90 library which can cluster data according to various principles.

SPAETH2, a dataset directory which contains datasets for cluster analysis;

SPAETH2, a FORTRAN90 library which can cluster data according to various principles.


  1. Ronald Fisher,
    The use of multiple measurements in taxonomic problems,
    Annual Eugenics,
    Volume 7, part II, 1936, pages 179-188.
  2. John Sammon,
    A nonlinear mapping for data structure analysis,
    IEEE Transactions on Computers,
    Volume C-18, Number 5, May 1969, pages 401-409.

Source Code:

Examples and Tests:

List of Routines:

You can go up one level to the MATLAB source codes.

Last revised on 20 September 2010.