{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Vector Quantization Example\n\nThis example shows how one can use :class:`~sklearn.preprocessing.KBinsDiscretizer`\nto perform vector quantization on a set of toy image, the raccoon face.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Authors: The scikit-learn developers\n# SPDX-License-Identifier: BSD-3-Clause" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Original image\n\nWe start by loading the raccoon face image from SciPy. We will additionally check\na couple of information regarding the image, such as the shape and data type used\nto store the image.\n\nNote that depending of the SciPy version, we have to adapt the import since the\nfunction returning the image is not located in the same module. Also, SciPy >= 1.10\nrequires the package `pooch` to be installed.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "try: # Scipy >= 1.10\n from scipy.datasets import face\nexcept ImportError:\n from scipy.misc import face\n\nraccoon_face = face(gray=True)\n\nprint(f\"The dimension of the image is {raccoon_face.shape}\")\nprint(f\"The data used to encode the image is of type {raccoon_face.dtype}\")\nprint(f\"The number of bytes taken in RAM is {raccoon_face.nbytes}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Thus the image is a 2D array of 768 pixels in height and 1024 pixels in width. Each\nvalue is a 8-bit unsigned integer, which means that the image is encoded using 8\nbits per pixel. The total memory usage of the image is 786 kilobytes (1 byte equals\n8 bits).\n\nUsing 8-bit unsigned integer means that the image is encoded using 256 different\nshades of gray, at most. We can check the distribution of these values.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n\nfig, ax = plt.subplots(ncols=2, figsize=(12, 4))\n\nax[0].imshow(raccoon_face, cmap=plt.cm.gray)\nax[0].axis(\"off\")\nax[0].set_title(\"Rendering of the image\")\nax[1].hist(raccoon_face.ravel(), bins=256)\nax[1].set_xlabel(\"Pixel value\")\nax[1].set_ylabel(\"Count of pixels\")\nax[1].set_title(\"Distribution of the pixel values\")\n_ = fig.suptitle(\"Original image of a raccoon face\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compression via vector quantization\n\nThe idea behind compression via vector quantization is to reduce the number of\ngray levels to represent an image. For instance, we can use 8 values instead\nof 256 values. Therefore, it means that we could efficiently use 3 bits instead\nof 8 bits to encode a single pixel and therefore reduce the memory usage by a\nfactor of approximately 2.5. We will later discuss about this memory usage.\n\n### Encoding strategy\n\nThe compression can be done using a\n:class:`~sklearn.preprocessing.KBinsDiscretizer`. We need to choose a strategy\nto define the 8 gray values to sub-sample. The simplest strategy is to define\nthem equally spaced, which correspond to setting `strategy=\"uniform\"`. From\nthe previous histogram, we know that this strategy is certainly not optimal.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from sklearn.preprocessing import KBinsDiscretizer\n\nn_bins = 8\nencoder = KBinsDiscretizer(\n n_bins=n_bins,\n encode=\"ordinal\",\n strategy=\"uniform\",\n random_state=0,\n)\ncompressed_raccoon_uniform = encoder.fit_transform(raccoon_face.reshape(-1, 1)).reshape(\n raccoon_face.shape\n)\n\nfig, ax = plt.subplots(ncols=2, figsize=(12, 4))\nax[0].imshow(compressed_raccoon_uniform, cmap=plt.cm.gray)\nax[0].axis(\"off\")\nax[0].set_title(\"Rendering of the image\")\nax[1].hist(compressed_raccoon_uniform.ravel(), bins=256)\nax[1].set_xlabel(\"Pixel value\")\nax[1].set_ylabel(\"Count of pixels\")\nax[1].set_title(\"Sub-sampled distribution of the pixel values\")\n_ = fig.suptitle(\"Raccoon face compressed using 3 bits and a uniform strategy\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Qualitatively, we can spot some small regions where we see the effect of the\ncompression (e.g. leaves on the bottom right corner). But after all, the resulting\nimage is still looking good.\n\nWe observe that the distribution of pixels values have been mapped to 8\ndifferent values. We can check the correspondence between such values and the\noriginal pixel values.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "bin_edges = encoder.bin_edges_[0]\nbin_center = bin_edges[:-1] + (bin_edges[1:] - bin_edges[:-1]) / 2\nbin_center" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "_, ax = plt.subplots()\nax.hist(raccoon_face.ravel(), bins=256)\ncolor = \"tab:orange\"\nfor center in bin_center:\n ax.axvline(center, color=color)\n ax.text(center - 10, ax.get_ybound()[1] + 100, f\"{center:.1f}\", color=color)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As previously stated, the uniform sampling strategy is not optimal. Notice for\ninstance that the pixels mapped to the value 7 will encode a rather small\namount of information, whereas the mapped value 3 will represent a large\namount of counts. We can instead use a clustering strategy such as k-means to\nfind a more optimal mapping.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "encoder = KBinsDiscretizer(\n n_bins=n_bins,\n encode=\"ordinal\",\n strategy=\"kmeans\",\n random_state=0,\n)\ncompressed_raccoon_kmeans = encoder.fit_transform(raccoon_face.reshape(-1, 1)).reshape(\n raccoon_face.shape\n)\n\nfig, ax = plt.subplots(ncols=2, figsize=(12, 4))\nax[0].imshow(compressed_raccoon_kmeans, cmap=plt.cm.gray)\nax[0].axis(\"off\")\nax[0].set_title(\"Rendering of the image\")\nax[1].hist(compressed_raccoon_kmeans.ravel(), bins=256)\nax[1].set_xlabel(\"Pixel value\")\nax[1].set_ylabel(\"Number of pixels\")\nax[1].set_title(\"Distribution of the pixel values\")\n_ = fig.suptitle(\"Raccoon face compressed using 3 bits and a K-means strategy\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "bin_edges = encoder.bin_edges_[0]\nbin_center = bin_edges[:-1] + (bin_edges[1:] - bin_edges[:-1]) / 2\nbin_center" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "_, ax = plt.subplots()\nax.hist(raccoon_face.ravel(), bins=256)\ncolor = \"tab:orange\"\nfor center in bin_center:\n ax.axvline(center, color=color)\n ax.text(center - 10, ax.get_ybound()[1] + 100, f\"{center:.1f}\", color=color)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The counts in the bins are now more balanced and their centers are no longer\nequally spaced. Note that we could enforce the same number of pixels per bin\nby using the `strategy=\"quantile\"` instead of `strategy=\"kmeans\"`.\n\n### Memory footprint\n\nWe previously stated that we should save 8 times less memory. Let's verify it.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(f\"The number of bytes taken in RAM is {compressed_raccoon_kmeans.nbytes}\")\nprint(f\"Compression ratio: {compressed_raccoon_kmeans.nbytes / raccoon_face.nbytes}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is quite surprising to see that our compressed image is taking x8 more\nmemory than the original image. This is indeed the opposite of what we\nexpected. The reason is mainly due to the type of data used to encode the\nimage.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print(f\"Type of the compressed image: {compressed_raccoon_kmeans.dtype}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Indeed, the output of the :class:`~sklearn.preprocessing.KBinsDiscretizer` is\nan array of 64-bit float. It means that it takes x8 more memory. However, we\nuse this 64-bit float representation to encode 8 values. Indeed, we will save\nmemory only if we cast the compressed image into an array of 3-bits integers. We\ncould use the method `numpy.ndarray.astype`. However, a 3-bits integer\nrepresentation does not exist and to encode the 8 values, we would need to use\nthe 8-bit unsigned integer representation as well.\n\nIn practice, observing a memory gain would require the original image to be in\na 64-bit float representation.\n\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 0 }