{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": false, "deletable": true, "editable": true }, "source": [ "Entropic Coding and Compression\n", "===============================\n", "\n", "*Important:* Please read the [installation page](http://gpeyre.github.io/numerical-tours/installation_python/) for details about how to install the toolboxes.\n", "$\\newcommand{\\dotp}[2]{\\langle #1, #2 \\rangle}$\n", "$\\newcommand{\\enscond}[2]{\\lbrace #1, #2 \\rbrace}$\n", "$\\newcommand{\\pd}[2]{ \\frac{ \\partial #1}{\\partial #2} }$\n", "$\\newcommand{\\umin}[1]{\\underset{#1}{\\min}\\;}$\n", "$\\newcommand{\\umax}[1]{\\underset{#1}{\\max}\\;}$\n", "$\\newcommand{\\umin}[1]{\\underset{#1}{\\min}\\;}$\n", "$\\newcommand{\\uargmin}[1]{\\underset{#1}{argmin}\\;}$\n", "$\\newcommand{\\norm}[1]{\\|#1\\|}$\n", "$\\newcommand{\\abs}[1]{\\left|#1\\right|}$\n", "$\\newcommand{\\choice}[1]{ \\left\\{ \\begin{array}{l} #1 \\end{array} \\right. }$\n", "$\\newcommand{\\pa}[1]{\\left(#1\\right)}$\n", "$\\newcommand{\\diag}[1]{{diag}\\left( #1 \\right)}$\n", "$\\newcommand{\\qandq}{\\quad\\text{and}\\quad}$\n", "$\\newcommand{\\qwhereq}{\\quad\\text{where}\\quad}$\n", "$\\newcommand{\\qifq}{ \\quad \\text{if} \\quad }$\n", "$\\newcommand{\\qarrq}{ \\quad \\Longrightarrow \\quad }$\n", "$\\newcommand{\\ZZ}{\\mathbb{Z}}$\n", "$\\newcommand{\\CC}{\\mathbb{C}}$\n", "$\\newcommand{\\RR}{\\mathbb{R}}$\n", "$\\newcommand{\\EE}{\\mathbb{E}}$\n", "$\\newcommand{\\Zz}{\\mathcal{Z}}$\n", "$\\newcommand{\\Ww}{\\mathcal{W}}$\n", "$\\newcommand{\\Vv}{\\mathcal{V}}$\n", "$\\newcommand{\\Nn}{\\mathcal{N}}$\n", "$\\newcommand{\\NN}{\\mathcal{N}}$\n", "$\\newcommand{\\Hh}{\\mathcal{H}}$\n", "$\\newcommand{\\Bb}{\\mathcal{B}}$\n", "$\\newcommand{\\Ee}{\\mathcal{E}}$\n", "$\\newcommand{\\Cc}{\\mathcal{C}}$\n", "$\\newcommand{\\Gg}{\\mathcal{G}}$\n", "$\\newcommand{\\Ss}{\\mathcal{S}}$\n", "$\\newcommand{\\Pp}{\\mathcal{P}}$\n", "$\\newcommand{\\Ff}{\\mathcal{F}}$\n", "$\\newcommand{\\Xx}{\\mathcal{X}}$\n", "$\\newcommand{\\Mm}{\\mathcal{M}}$\n", "$\\newcommand{\\Ii}{\\mathcal{I}}$\n", "$\\newcommand{\\Dd}{\\mathcal{D}}$\n", "$\\newcommand{\\Ll}{\\mathcal{L}}$\n", "$\\newcommand{\\Tt}{\\mathcal{T}}$\n", "$\\newcommand{\\si}{\\sigma}$\n", "$\\newcommand{\\al}{\\alpha}$\n", "$\\newcommand{\\la}{\\lambda}$\n", "$\\newcommand{\\ga}{\\gamma}$\n", "$\\newcommand{\\Ga}{\\Gamma}$\n", "$\\newcommand{\\La}{\\Lambda}$\n", "$\\newcommand{\\si}{\\sigma}$\n", "$\\newcommand{\\Si}{\\Sigma}$\n", "$\\newcommand{\\be}{\\beta}$\n", "$\\newcommand{\\de}{\\delta}$\n", "$\\newcommand{\\De}{\\Delta}$\n", "$\\newcommand{\\phi}{\\varphi}$\n", "$\\newcommand{\\th}{\\theta}$\n", "$\\newcommand{\\om}{\\omega}$\n", "$\\newcommand{\\Om}{\\Omega}$" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "This numerical tour studies source coding using entropic coders (Huffman and arithmetic).\n", "\n", "You need to install the package _ape_" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "options(repr.plot.width=3.5, repr.plot.height=3.5)\n", "options(warn=-1) # turns off warnings, to turn on: \"options(warn=0)\"\n", "\n", "library(Matrix)\n", "library(stringr)\n", "library(ape)\n", "\n", "# Importing the libraries\n", "for (f in list.files(path=\"nt_toolbox/toolbox_general/\", pattern=\"*.R\")) {\n", " source(paste(\"nt_toolbox/toolbox_general/\", f, sep=\"\"))\n", "}\n", "for (f in list.files(path=\"nt_toolbox/toolbox_signal/\", pattern=\"*.R\")) {\n", " source(paste(\"nt_toolbox/toolbox_signal/\", f, sep=\"\"))\n", "}" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Source Coding and Entropy\n", "-------------------------\n", "Entropic coding converts a vector $x$ of integers into a binary stream\n", "$y$. Entropic coding exploits the\n", "redundancies in the statistical distribution of the entries of $x$ to\n", "reduce as much as possible the size of $y$. The lower bound for the\n", "number of bits $p$ of $y$ is the Shannon bound :\n", "\n", "$$p=-\\sum_ih(i)\\log_2(h(i))$$\n", "\n", "where $h(i)$ is the probability of apparition of symbol $i$ in $x$.\n", "\n", "Fist we generate a simple binary signal $x$ so that $0$ has a probability $p$\n", "to appear in $x$.\n", "\n", "Probability of 0." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "p = 0.1" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Size.\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "n = 512" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Signal, should be with token 1,2." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "x = ceiling(runif(n, 0, 1) > p ) +1" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "One can check the probabilities by computing the empirical histogram." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Empirical p = 0.083984375\"\n" ] } ], "source": [ "h = c(sum(x==1), sum(x==2))\n", "h = h/sum(h)\n", "\n", "print(paste(\"Empirical p =\" , h[1]))" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We can compute the entropy of the distribution represented as a vector $h$ of proability that should sum to 1.\n", "We take a max to avoid problems with null probabilties." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Entropy = 0.416065091363473\"\n" ] } ], "source": [ "e = - sum(h*log2(c(max(h[1],1e-20), max(h[2],1e-20))))\n", "print(paste(\"Entropy = \", e))" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Huffman Coding\n", "--------------\n", "A Hufman code $C$ associates to each symbol $i$ in $\\{1,...,m\\}$ a binary code $C_i$\n", "whose length is as close as possible to the optimal bound\n", "$-\\log_2\\left(h(i)\\right)$, where $h(i)$ is the probability of apparition of the\n", "symbol $i$.\n", "\n", "We select a set of proabilities." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "h = c(.1, .15, .4, .15, .2)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The tree $T$ contains the codes and is generated by an iterative algorithm.\n", "The initial \"tree\" is a collection of empty trees, pointing to the symbols numbers." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "m = length(h)\n", "T = list(list())" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We build iteratively the Huffman tree\n", "by grouping together the two erees that have the smallest probabilities.\n", "The merged tree has a probability which is the sum of the two selected\n", "probabilities.\n", "\n", "Initial probability." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "#we use the symbols i = 1,2,3,4,5 (as strings) with the associated probabilities h(i)\n", "\n", "for (i in (1:m))\n", "{\n", " T[[i]] = list(toString(i), h[i])\n", " \n", " }\n" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Iterative merging of the leading probabilities." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "while (length(T)>=2)\n", "{\n", " T = T[order(sapply(T,'[[',2))]\n", " q = as.numeric(T[[1]][2])+as.numeric(T[[2]][2])\n", " t = T[1:2]\n", " T = T[-(1:2)]\n", " T[[length(T)+1]] = list(t,q)\n", "}" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We trim the computed tree by removing the probabilities." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "trim = function(T)\n", "{\n", " T0 = T[[1]][1]\n", " if (typeof(T0[[1]][1]) == 'character')\n", " {\n", " return (T0)\n", " }\n", " else\n", " {\n", " return (list(trim(T0[[1]][1]),trim(T0[[1]][2])))\n", " }\n", "}\n", "\n", "K = list()\n", "K[[1]] = list(trim(T)[[1]][[1]][1],trim(T)[[2]])\n", "T = K " ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We display T " ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA8AAAAEsCAIAAACDrU0cAAAABmJLR0QA/wD/AP+gvaeTAAAN\nVklEQVR4nO3dX2jV9R/H8c8xWxuUf6j1B/QmkSkLvJCyFWGF/cFFd2oWQWB2YUZIODCyhZmJ\nWAc0YuEQlAz8A5UYDoLQCg6IECwWmg2hKMi/DcvUTme/iwMi5XTvH9HnnM7jcSH63c0LQ3vu\nw9fzKQwPDycAAGB0xuQeAAAA9URAAwBAgIAGAIAAAQ0AAAECGgAAAgQ0AAAECGgAAAgQ0AAA\nECCgAQAgQEADAECAgAYAgAABDQAAAQIaAAACBDQAAAQIaAAACBDQAAAQIKABACBAQAMAQICA\nBgCAAAENAAABAhoAAAIENAAABAhoAAAIENAAABAgoAEAIEBAAwBAgIAGAIAAAQ0AAAECGgAA\nAgQ0AAAECGgAAAgQ0AAAECCgAQAgQEADAECAgAYAgAABDQAAAQIaAAACBDQAAAQIaAAACBDQ\nAAAQIKABACBAQAMAQICABgCAAAENAAABAhoAAAIENAAABAhoAAAIENAAABAgoAEAIEBAAwBA\ngIAGAIAAAQ0AAAECGgAAAgQ0AAAECGgAAAgQ0AAAECCgAQAgQEADAECAgAYAgAABDQAAAQIa\nAAACBDQAAAQIaAAACBDQAAAQIKABACBAQAMAQICABgCAAAENAAABAhoAAAIENAAABAhoAAAI\nENAAABAgoAEAIEBAAwBAgIAGAIAAAQ0AAAECGgAAAgQ0AAAECGgAAAgQ0AAAECCgAQAgQEAD\nAECAgAYAgAABDQAAAQIaAAACBDQAAAQIaAAACBDQAAAQIKABACBAQAMAQICABgCAAAENAAAB\nAhoAAAIENAAABAhoAAAIENAAABAgoAEAIEBAAwAwol27dhUKhc7OztxDaoiABgBgRN9++21K\n6bbbbluyZMmtt97a3Nx8xx13rF+/vlwu556WTWF4eDj3BgDg33PhwoW1a9e2tbWNGeMcjSsp\nl8t9fX2VSuX9998vFP4ajQ8//PCePXuuvfbaXPMyEtAA0FhWr169cuXK3CuoG7fccsvPP/88\nc+bM9evXz5o1a2hoaOfOnStWrPjtt9/eeOONl19+OffADAQ0ADSW7du3P/HEE8uWLevo6Mi9\nhZpWPYHu6elpaWn5y5d6e3sXL17c1tZ26NChLNvyEtAA0Fh27tw5f/78HTt2zJs3L/cW6tWJ\nEydaW1ubmprOnz+fe0sGXn4CACDmjz/+SCndcMMNuYfkIaABABhRa2troVD4y6sa27ZtSynd\nfffdmUZlJqABABjRo48+mlJasGDB/v37z50799NPP23YsKH671BfeOGF3OvyGJt7AAAAtWvN\nmjX79u3r7++///77L32+dOnSRx55JNOozJxAAwAwosmTJ/f3969cubK9vb2lpWX8+PH33Xff\n1q1bN27cmHtaNk6goQ649QD4B5VKpZRSpVLJPYS6MXHixFWrVq1atSr3kFohoKEOrFu3rru7\nO/cK4D9lYGAg9wSoVwIa6sDUqVNTSm49AP4RpVKpWCy2t7fnHgL1SkBDHai+udHR0eHWA+Af\nUSwWvRLG/+eHH36YMWPG6dOnG/kyPn94AAAYlUql8vTTT58+fTr3kMwENAAAo/Lmm2/u378/\n94r8BDQAAFd34MCB1157bcKECbmH5CegAQC4il9//fXJJ58sl8s9PT25t+QnoAEAuIrnn39+\ncHDwmWeeWbBgQe4t+fkUDgBoLNUrVKrXqcAVlMvlvr6+np6e3bt3b926dcqUKY18++ClBDQA\nNJYjR46klIrFYrFYzL2FOnDhwoW9e/eOHTv2gw8+uP7663PPqQkCGgAaS1dXV6VSaWtr81HQ\nXFn1BPq7774bGhpavXr1XXfdlXtRrSg08odgQ73YuXPn/Pnzd+zY4SIVAP5lhULhCl9tzJL0\nrScAAAR4hQMAgBH9/Yy5eibdmGfPVU6gAQAgQEADAECAgAYAgADvQOd04cKFtWvX+iAhrqp6\n30H17gMAyKuR336uEtA5rVu3rru7O/cK6sbAwEDuCQCAgM5q6tSpKaVly5Z1dHTk3kJNK5VK\nxWKxvb099xAAQEBnVX1zo6Ojw+0YXFWxWPSqDwDUAv8/BgDgSn7//ffVq1fPmDFj3LhxLS0t\n06ZNW758+enTp3PvysYJNAAAIzp79uzs2bMPHjx48cnhw4cPHz68Z8+eUqk0YcKEjNtycQIN\nAMCINm7cePDgwdbW1u3bt//yyy9DQ0Mff/zxpEmTDh061LCfhSCgAQAY0fbt21NK77777vz5\n88ePHz9u3LjHH398y5YtKaVdu3blXpeHgAYAYESDg4MppTlz5lz68M4770wpnTx5Ms+m3LwD\nDQDAiIaGhv7+8LPPPkspTZs27V+fUxMENAA0FvfgMkrlcrmvr6+np6elpeXS5/v27Vu0aFFK\nafny5ZmmZSagAaCxuAeXkKampk2bNlV/fvLkyRUrVvT29qaUuru7n3rqqazTshHQANBY3IPL\nKFVPoDds2JBSqlQq77333iuvvHLq1KmpU6f29PQ8+OCDuQdmI6ABoLG4B5fRW7hwYUrp2LFj\n8+bN+/zzzydOnPjWW28tXbq0qakp97ScBDQAACM6d+7cQw891N/f39nZuXnz5ptvvjn3ovwE\nNAAAI9q8eXN/f//s2bM/+uijsWOlY0o+BxoAgCvYs2dPSuntt99Wzxf5jQAAYERff/11Smnm\nzJmX/erw8PC/O6cmOIEGAGBEDXvd4BU4gYY6UKlUUkqlUin3EOC/oPqXSfUvFriqs2fP5p5Q\ncwQ01IEjR46klIrFYrFYzL0F+I8YGBjIPQHqlYCGOtDV1VWpVNy7C/wjSqVSsVhsb2/PPQTq\nlYCGOtDU1PTqq6/mXgH8dxSLRd+Q839YvHhxb29vY/7DwUv5w5NTuVy++CMAQC07e/bs7t27\nc6+oCQI6p08++eTijwAAtenUqVN79+597LHHjh07lntLTfAKR06dnZ3btm3r7OzMPQQAYEQ3\n3nhj7gm1RUDnVL3Rx70+AEAtu/jSc6FQyLukRniFAwAAApx9AkBjcTcTo1Qul/v6+np6elpa\nWnJvqS0CGgAai7uZCGlqatq0aVPuFbVFQANAY3E3E6NUPYHesGFD7iE1R0ADQGNxNxOjt3Dh\nwtwTapFvPQEAIEBAAwBAgIAGAIAAAQ0AAAECGgAAAnwKBwAAo3LxTu8G5wQaAAACBDQAAAQI\naAAARnT8+PHnnntu0qRJzc3Nt99++7x587766qvcozLzDjQAAJd35syZjo6OwcHB6i+PHj16\n9OjRDz/8cPfu3XPnzs27LSMn0AAAXF6xWBwcHJwyZcqXX3557ty5b775Zs6cOX/++WeD3wYv\noAEAuLxdu3allN5555177733uuuumz59+pYtW1JKAwMDuaflJKABALi8o0ePppTuueeei0/K\n5XJK6aabbsq2qQZ4BxoAgMs7c+bMpb/88ccfX3zxxZTSokWLMi2qCQI6p0qlklIqlUq5hwAA\nXEZzc/PcuXOvueaalFKhUKg+XLJkSYO/A52Gyef111/P/d8fAOBKPv3002q3XHwyZsyYZ599\n9vz583k7KiMn0Dl1dXVVKpW2trYxY7yMDgDUnObm5gceeKD680qlcvLkyVKp9NJLL/X29ra2\ntq5ZsybvvFwKw+40BwBg1A4cODBr1qzJkyd///33ubfk4eATAIDLa2lpKRQKJ06cuPTh9OnT\nU0rHjx/PNCo/AQ0AwOW1tbWllPbt23fpwy+++CKlNGXKlCyTaoFXOAAAIMAJNAAABAhoAAAI\nENAAABAgoAEAIEBAAwBAgIAGAIAAAQ0AAAECGgAAAgQ0AAAECGgAAAgQ0AAAECCgAQAgQEAD\nAECAgAYAgAABDQAAAQIaAAACBDQAAAQIaAAACBDQAAAQIKABACBAQAMAQICABgCAAAENAAAB\nAhoAAAIENAAABAhoAAAIENAAABAgoAEAIEBAAwBAgIAGAIAAAQ0AAAECGgAAAgQ0AAAECGgA\nAAgQ0AAAECCgAQAgQEADAECAgAYAgAABDQAAAQIaAAACBDQAAAQIaAAACBDQAAAQIKABACBA\nQAMAQICABgCAAAENAAABAhoAAAIENAAABAhoAAAIENAAABAgoAEAIEBAAwBAgIAGAIAAAQ0A\nAAECGgAAAgQ0AAAECGgAAAgQ0AAAECCgAQAgQEADAECAgAYAgAABDQAAAQIaAAACBDQAAAQI\naAAACBDQAAAQIKABACBAQAMAQICABgCAAAENAAABAhoAAAIENAAABAhoAAAIENAAABAgoAEA\nIEBAAwBAgIAGAIAAAQ0AAAECGgAAAgQ0AAAECGgAAAgQ0AAAECCgAQAgQEADAECAgAYAgAAB\nDQAAAQIaAAACBDQAAAQIaAAACBDQAAAQIKABACBAQAMAQICABgCAAAENAAABAhoAAAIENAAA\nBAhoAAAIENAAABAgoAEAIEBAAwBAgIAGAIAAAQ0AAAECGgAAAgQ0AAAECGgAAAgQ0AAAECCg\nAQAgQEADAECAgAYAgAABDQAAAQIaAAACBDQAAAQIaAAACBDQAAAQIKABACBAQAMAQICABgCA\nAAENAAABAhoAAAL+B4jjNYKvuKoiAAAAAElFTkSuQmCC", "text/plain": [ "plot without title" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# flatten list\n", "x2 = paste0(lapply(T, function(y) paste0(\"(\", paste0(y, collapse = \",\"), \")\")), collapse = \",\")\n", "\n", "# remove unwanted characters\n", "x2 = gsub('\\\"|c|list| ', \"\", x2)\n", "x2 = paste0(\"(\", x2, \");\")\n", "\n", "# remove brackets from single term list object\n", "x3 = str_replace_all(x2, \"\\\\([a-z]*\\\\)\", function(x) gsub(\"^\\\\(|\\\\)$\", \"\", x))\n", "\n", "# plot\n", "plot(read.tree(text = x3), color=\"blue\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Once the tree $T$ is computed, one can compute the code $C_{i}$\n", "associated to each symbol $i$. This requires to perform a deep first\n", "search in the tree and stop at each node." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "codes = list()\n", "c = compute_huffcode(h)\n", "for (i in (1:length(h)))\n", "{\n", " codes[[toString(i)]] = c[i]\n", "}" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Display the code." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Code of token 1 = 100\"\n", "[1] \"Code of token 2 = 110\"\n", "[1] \"Code of token 3 = 0\"\n", "[1] \"Code of token 4 = 101\"\n", "[1] \"Code of token 5 = 111\"\n" ] } ], "source": [ "for (e in (1:length(codes)))\n", "{\n", " print(paste(\"Code of token\", e, \"=\", codes[[toString(e)]]))\n", "}" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We draw a vector $x$ according to the distribution $h$.\n", "\n", "Size of the signal." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "n = 1024" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Randomization." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "rand_discr = function (p, m)\n", "{\n", " # rand_discr - discrete random generator \n", " \n", " # y = rand_discr(p, n);\n", " \n", " # y is a random vector of length n drawn from \n", " # a variable X such that\n", " # p(i) = Prob( X=i )\n", " p = p/sum(p)\n", " n = length(p)\n", " coin = runif(m, 0, 1)\n", " cumprob = c(0,cumsum(p))\n", " sample = matrix(0, nrow=1, ncol=m)\n", "\n", " for (j in (1:n))\n", " {\n", " ind = c((coin > cumprob[j]) & (coin <= cumprob[j+1]))\n", " sample[ind] = j\n", " }\n", " return (sample)\n", "}\n", "\n", "x = rand_discr(h, n)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "__Exercise 1__\n", "\n", "Implement the coding of the vector $x$ to obtain a binary vector $y$, which corresponds to replacing each sybmol $x(i)$ by the code $C_{x(i)}$." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "source(\"nt_solutions/coding_2_entropic/exo1.R\")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "## Insert your code here." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Compare the length of the code with the entropy bound." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Entropy bound = 2197.95388894312\"\n", "[1] \"Huffman code = 2256\"\n" ] } ], "source": [ "e = - sum(h*log2(c(max(h[1],1e-20),max(h[2],1e-20),max(h[3],1e-20),max(h[4],1e-20),max(h[5],1e-20))))\n", "print(paste(\"Entropy bound = \", n*e))\n", "print(paste(\"Huffman code = \", nchar(y)))" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Decoding is more complicated, since it requires to iteratively parse the tree $T$." ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Initial empty decoded stream." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "x1 = c()" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Perform decoding." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "T0 = K\n", "for (e in strsplit(y,split=''))\n", "{\n", " if (e == '0')\n", " {\n", " T0 = T0[[1]]\n", " } \n", " else\n", " {\n", " T0 = T0[[1]][[2]]\n", " }\n", " if (typeof(T0) == 'character')\n", " {\n", " i = i+1\n", " x1 = c(x1,T0)\n", " T0 = T\n", " }\n", "}" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We test if the decoding is correct." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"Error (should be zero) : 0\"\n" ] } ], "source": [ "err = norm(x - as.double(x1))\n", "print(paste(\"Error (should be zero) : \", err))" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Huffman Block Coding\n", "--------------------\n", "A Huffman coder is inefficient because it can distribute only an integer\n", "number of bit per symbol. In particular, distribution where one of the\n", "symbol has a large probability are not well coded using a Huffman code.\n", "This can be aleviated by replacing the set of $m$ symbols by $m^q$\n", "symbols obtained by packing the symbols by blocks of $q$ (here we use $m=2$ for a binary alphabet). This breaks\n", "symbols with large probability into many symbols with smaller proablity,\n", "thus approaching the Shannon entropy bound.\n", "\n", "\n", "Generate a binary vector with a high probability of having 1, so that the\n", "Huffman code is not very efficient (far from Shanon bound).\n", "\n", "Proability of having 0." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "t = .12" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Probability distriution." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "h = c(t, 1-t)" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Generate signal." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "n = 4096 * 2\n", "x = (runif(n, 0, 1) >t) +1" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "For block of length $q=3$, create a new vector by coding each block\n", "with an integer in $\\{1,...,m^q=2^3\\}$. The new length of the vector is\n", "$n_1/q$ where $n_1=\\lceil n/q\\rceil q$.\n", "\n", "Block size." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "q = 3" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Maximum token value." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "m = 2" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "New size." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "n1 = (floor(n/q)+1)*q" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "New vector." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "x1 = matrix(0, nrow=1, ncol=n1)\n", "x1[1:length(x)] = x\n", "x1[length(x):length(x1)] = 1\n", "x1 = x1 - 1\n", "x2 = c()\n", "\n", "for (i in seq(1,n1,by=q))\n", "{\n", " mult = m**c(1:q-1)\n", " x2 = c(x2, sum(x1[i:(i+q)]*mult))\n", "}" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "We generate the probability table $H$ of $x_1$ that represents the probability\n", "of each new block symbols in $\\{1,...,m^q\\}$." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "H = h\n", "for (i in (1:(q-1)))\n", "{\n", " Hold = H\n", " H = c()\n", " for (j in (1:length(h)))\n", " {\n", " H = c(H, h[j]*Hold)\n", " }\n", "}\n" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "A simpler way to compute this block-histogram is to use the Kronecker product." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/html": [ "