{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Multi-dimensional scaling\n\nAn illustration of the metric and non-metric MDS on generated noisy data.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset preparation\n\nWe start by uniformly generating 20 points in a 2D space.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import numpy as np\nfrom matplotlib import pyplot as plt\nfrom matplotlib.collections import LineCollection\n\nfrom sklearn import manifold\nfrom sklearn.decomposition import PCA\nfrom sklearn.metrics import euclidean_distances\n\n# Generate the data\nEPSILON = np.finfo(np.float32).eps\nn_samples = 20\nrng = np.random.RandomState(seed=3)\nX_true = rng.randint(0, 20, 2 * n_samples).astype(float)\nX_true = X_true.reshape((n_samples, 2))\n\n# Center the data\nX_true -= X_true.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we compute pairwise distances between all points and add\na small amount of noise to the distance matrix. We make sure\nto keep the noisy distance matrix symmetric.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Compute pairwise Euclidean distances\ndistances = euclidean_distances(X_true)\n\n# Add noise to the distances\nnoise = rng.rand(n_samples, n_samples)\nnoise = noise + noise.T\nnp.fill_diagonal(noise, 0)\ndistances += noise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we compute metric, non-metric, and classical MDS of the noisy distance matrix.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "mds = manifold.MDS(\n n_components=2,\n max_iter=3000,\n eps=1e-9,\n n_init=1,\n random_state=42,\n metric=\"precomputed\",\n n_jobs=1,\n init=\"classical_mds\",\n)\nX_mds = mds.fit(distances).embedding_\n\nnmds = manifold.MDS(\n n_components=2,\n metric_mds=False,\n max_iter=3000,\n eps=1e-12,\n metric=\"precomputed\",\n random_state=42,\n n_jobs=1,\n n_init=1,\n init=\"classical_mds\",\n)\nX_nmds = nmds.fit_transform(distances)\n\ncmds = manifold.ClassicalMDS(\n n_components=2,\n metric=\"precomputed\",\n)\nX_cmds = cmds.fit_transform(distances)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rescaling the non-metric MDS solution to match the spread of the original data.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "X_nmds *= np.sqrt((X_true**2).sum()) / np.sqrt((X_nmds**2).sum())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To make the visual comparisons easier, we rotate the original data and all MDS\nsolutions to their PCA axes. And flip horizontal and vertical MDS axes, if needed,\nto match the original data orientation.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Rotate the data (CMDS does not need to be rotated, it is inherently PCA-aligned)\npca = PCA(n_components=2)\nX_true = pca.fit_transform(X_true)\nX_mds = pca.fit_transform(X_mds)\nX_nmds = pca.fit_transform(X_nmds)\n\n# Align the sign of PCs\nfor i in [0, 1]:\n if np.corrcoef(X_mds[:, i], X_true[:, i])[0, 1] < 0:\n X_mds[:, i] *= -1\n if np.corrcoef(X_nmds[:, i], X_true[:, i])[0, 1] < 0:\n X_nmds[:, i] *= -1\n if np.corrcoef(X_cmds[:, i], X_true[:, i])[0, 1] < 0:\n X_cmds[:, i] *= -1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we plot the original data and all MDS reconstructions.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fig = plt.figure(1)\nax = plt.axes([0.0, 0.0, 1.0, 1.0])\n\ns = 100\nplt.scatter(X_true[:, 0], X_true[:, 1], color=\"navy\", s=s, lw=0, label=\"True Position\")\nplt.scatter(X_mds[:, 0], X_mds[:, 1], color=\"turquoise\", s=s, lw=0, label=\"MDS\")\nplt.scatter(\n X_nmds[:, 0], X_nmds[:, 1], color=\"darkorange\", s=s, lw=0, label=\"Non-metric MDS\"\n)\nplt.scatter(\n X_cmds[:, 0], X_cmds[:, 1], color=\"lightcoral\", s=s, lw=0, label=\"Classical MDS\"\n)\nplt.legend(scatterpoints=1, loc=\"best\", shadow=False)\n\n# Plot the edges\nstart_idx, end_idx = X_mds.nonzero()\n# a sequence of (*line0*, *line1*, *line2*), where::\n# linen = (x0, y0), (x1, y1), ... (xm, ym)\nsegments = [\n [X_true[i, :], X_true[j, :]] for i in range(len(X_true)) for j in range(len(X_true))\n]\nedges = distances.max() / (distances + EPSILON) * 100\nnp.fill_diagonal(edges, 0)\nedges = np.abs(edges)\nlc = LineCollection(\n segments, zorder=0, cmap=plt.cm.Blues, norm=plt.Normalize(0, edges.max())\n)\nlc.set_array(edges.flatten())\nlc.set_linewidths(np.full(len(segments), 0.5))\nax.add_collection(lc)\n\nplt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.14" } }, "nbformat": 4, "nbformat_minor": 0 }