{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Optimize your Docker Infrastructure with Python\n",
    "\n",
    "## PyData NYC\n",
    "\n",
    "November 11, 2015\n",
    "\n",
    "* Ryan J. O'Neil (@ryanjoneil)\n",
    "* Day: Lead Engineer @ [Yhat](http://www.yhathq.com) (<r@yhathq.com>)\n",
    "* Night: PhD student in Operations Research @ GMU (<roneil1@gmu.edu>)\n",
    "* Blog: http://ryanjoneil.github.io/blog.html\n",
    "* Slides: https://github.com/ryanjoneil/talks/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Motivation\n",
    "\n",
    "Consider the DevOps Engineer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### It's a noble profession...\n",
    "\n",
    "![](images/scotty.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### ...with grave responsibilities.\n",
    "\n",
    "One of the important functions of DevOps is building environments for:\n",
    "\n",
    "* Software development\n",
    "* Testing & quality assurance\n",
    "* Running operational systems\n",
    "* Whatever else you might need a computing environment for"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### There's even a Venn diagram about DevOps.\n",
    "\n",
    "And we all know how much we like Venn diagrams.\n",
    "\n",
    "![](images/devops-venn.svg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### A typical DevOps function\n",
    "\n",
    "You work on _Yet Another Enterprise Java Application (TM)_. YAEJA makes \\$\\$ for _Yet Another Enterprise Software Company_!\n",
    "\n",
    "You need to upgrade components where YAEJA runs.\n",
    "\n",
    "You can't upgrade production without testing. So you need two environments from your DevOps person.\n",
    "\n",
    "One for production:\n",
    "\n",
    "&nbsp;  \n",
    "\n",
    "|     | Command                                                        |\n",
    "|-----|----------------------------------------------------------------|\n",
    "| $A$ | Install the Java compiler and runtime environment.             |\n",
    "| $B$ | Download a set of external dependencies.                       |\n",
    "| $C$ | Set up a an EntepriseDB (TM) schema and populate it with data. |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "And one for testing it with the upgrade:\n",
    "\n",
    "&nbsp;  \n",
    "\n",
    "|     | Command                                                        |\n",
    "|-----|----------------------------------------------------------------|\n",
    "| $A$ | Install the Java compiler and runtime environment.             |\n",
    "| $B$ | Download a set of external dependencies.                       |\n",
    "| $C$ | Set up a an EntepriseDB (TM) schema and populate it with data. |\n",
    "| $D$ | Update the underlying system utility.                          |\n",
    "\n",
    "&nbsp;  \n",
    "\n",
    "If YAJEA integrates with 3rd party apps, you need another test environment for each one.\n",
    "\n",
    "If those interact in interesting ways, you need test environments for different combinations of them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Looks like your DevOps person will be working all weekend\n",
    "\n",
    "![](images/saturday.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### In the old days...\n",
    "\n",
    "At a big software shop, devops may setup and tear down dozens (or _hundreds_) of system configurations for a single software project.\n",
    "\n",
    "This used to be done with physical hardware on _(sometimes)_ fresh OS installs.\n",
    "\n",
    "Most medium-to-large software shops had an air conditioned room like this.\n",
    "\n",
    "![](images/cable-spaghetti.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Environmental setup was often pretty manual\n",
    "\n",
    "If you had to reproduce an environment, you had a few options:\n",
    "\n",
    "* Start with a fresh install of the OS\n",
    "\n",
    "\n",
    "* Setup and save the results to a CD\n",
    "\n",
    "\n",
    "* Hope that uninstalling and resintalling is good enough...\n",
    "\n",
    "    + A _lot_ of people did this\n",
    "    + Probably not good enough\n",
    "    + Software leaves relics\n",
    "        - Logs...\n",
    "        - Data files...\n",
    "        - Configuration..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Nowadays...\n",
    "\n",
    "![](images/cloud-docker.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### What's a container?\n",
    "\n",
    "Docker is so hot right now."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "I've been getting two questions a lot this week:\n",
    "\n",
    "1. What's Docker?\n",
    "2. Should I be using it?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### A1: Docker is a lightweight virtualization platform\n",
    "\n",
    "Docker provides two main things: images & containers.\n",
    "\n",
    "Images represent the environments containers run it.\n",
    "\n",
    "A running container acts like it is:\n",
    "    \n",
    "* In it own OS\n",
    "* On its own hardware\n",
    "\n",
    "But it doesn't load heavy stuff like a new kernel or device drivers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "$${\\Huge\\text{DEMO TIME}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### A2: I don't know what you should be doing\n",
    "\n",
    "But Yhat uses a ${\\large\\text{LOT}}$ of Docker.\n",
    "\n",
    "* Development (MySQL containers)\n",
    "* Functional tests for our software\n",
    "* Buiding binary packages for Linux distros\n",
    "* Productionizing predictive models (ScienceOps)\n",
    "* Environment management in collaborative tools (ScienceCluster)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "$${\\Huge\\text{DEMO TIME}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### So containers are about isolation...\n",
    "\n",
    "Processes and system resources behave as if the are on their own computers.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "#### ...but containers are also about sharing.\n",
    "\n",
    "And this is what we care about here."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Specifically, saving and retrieving the results of a computation from the Docker image cache."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Why?\n",
    "\n",
    "&nbsp;  \n",
    "\n",
    "$$\\text{SMART CACHE USE} = \\text{DEVOPS TIME SAVED!}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![](images/scotty-approved.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Docker cache mechanics\n",
    "\n",
    "Maybe I should call this section UnionFS mechanics, but then Docker is so hot right now."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### A Tale of Two Dockerfiles\n",
    "\n",
    "Let's say I collect old or non-standard revision control systems."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Everyone needs a hobby."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "I set up an environment with a few RCSs in it to play around with.\n",
    "\n",
    "Dockerfile: `rcs1`\n",
    "&nbsp;    \n",
    "```\n",
    "FROM ubuntu:trusty\n",
    "RUN apt-get install -y bzr\n",
    "RUN apt-get install -y cvs\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Let's build out our non-standard RCS image!\n",
    "\n",
    "```\n",
    "[ryan@localhost dockerfiles]$ docker build --file=rcs1 --tag=rcs1 .\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Step 0 would download the base Ubuntu image, but I've already got it.\n",
    "\n",
    "```\n",
    "Sending build context to Docker daemon 24.58 kB\n",
    "Step 0 : FROM ubuntu:trusty\n",
    " ---> a5a467fddcb8\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "`bzr` require a number of dependencies...\n",
    "\n",
    "```\n",
    "Step 1 : RUN apt-get install -y bzr\n",
    " ---> Running in 7679f0a1525e\n",
    "Reading package lists...\n",
    "Building dependency tree...\n",
    "Reading state information...\n",
    "The following extra packages will be installed:\n",
    "  ca-certificates dbus gir1.2-glib-2.0 libapparmor1 libassuan0\n",
    "  libdbus-glib-1-2 libgirepository-1.0-1 libglib2.0-0 libglib2.0-data libgmp10\n",
    "  [...snip...]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "As does `cvs`.\n",
    "\n",
    "```\n",
    "Step 2 : RUN apt-get install -y cvs\n",
    " ---> Running in 54229eef94d2\n",
    "Reading package lists...\n",
    "Building dependency tree...\n",
    "Reading state information...\n",
    "The following extra packages will be installed:\n",
    "  krb5-locales libedit2 libgssapi-krb5-2 libk5crypto3 libkeyutils1 libkrb5-3\n",
    "  libkrb5support0 libx11-6 libx11-data libxau6 libxcb1 libxdmcp6 libxext6\n",
    "  [...snip...]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "At the end of this I have a tagged image. I can easily use this to spin up new containers.\n",
    "\n",
    "```\n",
    "[ryan@localhost dockerfiles]$ docker run rcs1 bash -c \"which cvs && which bzr\"\n",
    "/usr/bin/cvs\n",
    "/usr/bin/bzr\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Oh snap.\n",
    "\n",
    "I found an old RCS repository from the 1980s I want to look at.\n",
    "\n",
    "![](images/rcs-sccs.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "I don't have `rcs` installed. I guess I'll have to build a new image.\n",
    "\n",
    "Dockerfile: `rcs2`\n",
    "&nbsp;  \n",
    "```\n",
    "FROM ubuntu:trusty\n",
    "RUN apt-get install -y bzr\n",
    "RUN apt-get install -y cvs\n",
    "RUN apt-get install -y software-properties-common\n",
    "RUN add-apt-repository \"deb http://archive.ubuntu.com/ubuntu trusty universe\"\n",
    "RUN apt-get update\n",
    "RUN apt-get install -y rcs\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "```\n",
    "[ryan@localhost dockerfiles]$ docker build --file=rcs2 --tag=rcs2 .\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "\n",
    "The first few steps have already been run before and can use the Docker cache!\n",
    "```\n",
    "Step 0 : FROM ubuntu:trusty\n",
    " ---> a5a467fddcb8\n",
    "\n",
    "Step 1 : RUN apt-get install -y bzr\n",
    " ---> Using cache\n",
    " ---> ed8a8efd04cc\n",
    "\n",
    "Step 2 : RUN apt-get install -y cvs\n",
    " ---> Using cache\n",
    " ---> 61fe05c24a19\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "After that it's business as usual.\n",
    "\n",
    "```\n",
    "Step 3 : RUN apt-get install -y software-properties-common\n",
    " ---> Running in 4adc2bb63a73\n",
    "The following extra packages will be installed:\n",
    "  iso-codes libasn1-8-heimdal libcurl3-gnutls libgssapi3-heimdal\n",
    "  libhcrypto4-heimdal libheimbase1-heimdal libheimntlm0-heimdal\n",
    "  [...snip...]\n",
    "\n",
    "Step 4 : RUN add-apt-repository \"deb http://archive.ubuntu.com/ubuntu trusty universe\"\n",
    " ---> Running in 5a023435448c\n",
    " ---> 02439f270099\n",
    "Removing intermediate container 5a023435448c\n",
    "\n",
    "Step 5 : RUN apt-get update\n",
    " ---> Running in 449c25ee3537\n",
    "Get:1 http://archive.ubuntu.com trusty-updates InRelease [64.4 kB]\n",
    "[...snip...]\n",
    "\n",
    "Step 6 : RUN apt-get install -y rcs\n",
    " ---> Running in 3574b94bbea3\n",
    "The following NEW packages will be installed:\n",
    "  rcs\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Adding to my stable of random revision control.\n",
    "\n",
    "_Also, why are they all three letters long?_\n",
    "\n",
    "So now I have `rcs` in addition to `bzr` and `cvs`. Note that my `rcs2` image was built off of my `rcs1` image. I didn't have to do any extra work to recreate those shared parts of it.\n",
    "\n",
    "```\n",
    "[ryan@localhost dockerfiles]$ docker run rcs2 bash -c \"which bzr && which cvs && which rcs\"\n",
    "/usr/bin/bzr\n",
    "/usr/bin/cvs\n",
    "/usr/bin/rcs\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### What happens if I rebuild the image again?\n",
    "\n",
    "Omigosh everything is cached!\n",
    "\n",
    "```\n",
    "[ryan@localhost dockerfiles]$ docker build --file=rcs2 --tag=rcs2 .\n",
    "Sending build context to Docker daemon 3.072 kB\n",
    "Step 0 : FROM ubuntu:trusty\n",
    " ---> a5a467fddcb8\n",
    " \n",
    "Step 1 : RUN apt-get install -y bzr\n",
    " ---> Using cache\n",
    " ---> ed8a8efd04cc\n",
    " \n",
    "Step 2 : RUN apt-get install -y cvs\n",
    " ---> Using cache\n",
    " ---> 61fe05c24a19\n",
    " \n",
    "Step 3 : RUN apt-get install -y software-properties-common\n",
    " ---> Using cache\n",
    " ---> 942207de6f7f \n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "```\n",
    "Step 4 : RUN add-apt-repository \"deb http://archive.ubuntu.com/ubuntu trusty universe\"\n",
    " ---> Using cache\n",
    " ---> 02439f270099\n",
    "\n",
    "Step 5 : RUN apt-get update\n",
    " ---> Using cache\n",
    " ---> 220536dfc3ff\n",
    "\n",
    "Step 6 : RUN apt-get install -y rcs\n",
    " ---> Using cache\n",
    " ---> 1a8cd22cb14d\n",
    "Successfully built 1a8cd22cb14d\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![](images/spock-logical-awesome.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### What if I change the order of that Dockerfile?\n",
    "\n",
    "I'd really rather have the commands to add `universe` at the top.\n",
    "\n",
    "Dockerfile: `rcs3`\n",
    "&nbsp;  \n",
    "```\n",
    "FROM ubuntu:trusty\n",
    "RUN apt-get install -y software-properties-common\n",
    "RUN add-apt-repository \"deb http://archive.ubuntu.com/ubuntu trusty universe\"\n",
    "RUN apt-get update\n",
    "RUN apt-get install -y bzr\n",
    "RUN apt-get install -y cvs\n",
    "RUN apt-get install -y rcs\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### What happens when I build an image with the commands reordered?\n",
    "\n",
    "Nothing except the base image is cached!\n",
    "\n",
    "```\n",
    "[ryan@localhost dockerfiles]$ docker build --file=rcs3 --tag=rcs3 .\n",
    "Sending build context to Docker daemon 4.096 kB\n",
    "\n",
    "Step 0 : FROM ubuntu:trusty\n",
    " ---> a5a467fddcb8\n",
    "\n",
    "Step 1 : RUN apt-get install -y software-properties-common\n",
    " ---> Running in 713e49f1f323\n",
    "Reading package lists...\n",
    "Building dependency tree...\n",
    "Reading state information...\n",
    "The following extra packages will be installed:\n",
    "  ca-certificates gir1.2-glib-2.0 iso-codes krb5-locales libasn1-8-heimdal\n",
    "  libcurl3-gnutls libdbus-glib-1-2 libgirepository-1.0-1 libglib2.0-0\n",
    "  libglib2.0-data libgssapi-krb5-2 libgssapi3-heimdal libhcrypto4-heimdal\n",
    "  libheimbase1-heimdal libheimntlm0-heimdal libhx509-5-heimdal libidn11\n",
    "  libk5crypto3 libkeyutils1 libkrb5-26-heimdal libkrb5-3 libkrb5support0\n",
    "  libldap-2.4-2 libroken18-heim\n",
    "  [...etc...]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### So Docker can use the cache when...\n",
    "\n",
    "It has seen the exact same commands in _the same order_.\n",
    "\n",
    "If it sees something new, it _stops using the cache_.\n",
    "\n",
    "Certain commands, like `COPY`, break the cache entirely."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Docker caching examples\n",
    "\n",
    "Building two images that share three commands:\n",
    "\n",
    "![](images/fig01.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "If we structure them as such, they share nothing:\n",
    "\n",
    "![](images/fig02.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Once images diverge, they can never merge back together\n",
    "\n",
    "This won't work:\n",
    "\n",
    "![](images/fig03.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "This is what would really happen:\n",
    "\n",
    "![](images/fig04.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "This would be better:\n",
    "\n",
    "![](images/fig05.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Problem formulation\n",
    "\n",
    "Let's write this thing in LaTeX!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### What are we trying to do again?\n",
    "\n",
    "Given a set of Docker images and their commands, find the best order to run the commands and\n",
    "minimize required computing time.\n",
    "\n",
    "_Could also read: maximize use of the Docker cache._"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Say we know a priori the time required by each command\n",
    "\n",
    "We could also use a prior.\n",
    "\n",
    "$$C = \\{\\text{set of commmands}\\}$$\n",
    "$$t_c > 0 = \\text{time required by command c}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We also know the images we need to create and the commands each one requires.\n",
    "\n",
    "$$I = \\{\\text{set of images}\\}$$\n",
    "$$C_i = \\{\\text{commands requires for image I}\\}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### We need some decision variables\n",
    "\n",
    "We define a set $P$ of (image set, command set) pairs, starting with each image $i \\in I$ and, for each of these, every command $c \\in C_i$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "In our first example we had two images, $I = \\{1, 2\\}$.\n",
    "\n",
    "The commands for the images were: \n",
    "\n",
    "$$C_1 = \\{A, B, C\\}$$\n",
    "$$C_2 = \\{A, B, C, D\\}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "So our set $P$ initially contains:\n",
    "\n",
    "$$P = \\{(\\{1\\},\\{A\\}), (\\{1\\},\\{B\\}), (\\{1\\},\\{C\\}), \\\\ (\\{2\\},\\{A\\}), (\\{2\\},\\{B\\}), (\\{2\\},\\{C\\}), (\\{2\\},\\{D\\})\\}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Now define $x_p \\in \\{0,1\\}$ for each $p \\in P$. \n",
    "\n",
    "If $x_p = 1$, then we run the commands in $p$ for the images in $p$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### A first, misguided stab at our image construction model\n",
    "\n",
    "Find the minimum cost schedule based on our set $P$ and the requirements of our images. \n",
    "\n",
    "$$\n",
    "\\begin{array}{r l}\n",
    "    \\min        & t_a x_{1,a} + t_b x_{1,b} + t_c x_{1,c} + t_a x_{2,a} + t_b x_{2,b} + t_c x_{2,c} + t_c x_{2,d} \\\\  \n",
    "    \\text{s.t.} & x_{1,a} = 1 \\\\\n",
    "                & x_{1,b} = 1 \\\\\n",
    "                & x_{1,c} = 1 \\\\\n",
    "                & x_{2,a} = 1 \\\\\n",
    "                & x_{2,b} = 1 \\\\\n",
    "                & x_{2,c} = 1 \\\\\n",
    "                & x_{2,d} = 1 \\\\\n",
    "                & x_p \\in \\{0,1\\}\\, \\forall\\, p \\in P\n",
    "\\end{array}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Well that's just ridiculous."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The solution is trivially:\n",
    "\n",
    "$$x_{1,a} =  x_{1,b} = x_{1,c} = x_{2,a} = x_{2,b} = x_{2,c} = x_{2,d} = 1$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The cost of implementing this plan is actually the worst we could possibly do:\n",
    "\n",
    "$$2 \\left(t_a + t_b + t_c\\right) + t_d$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "In fact it looks pretty much like we did this:\n",
    "\n",
    "![](images/fig02.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### We need a variable that allows us to use the Docker cache\n",
    "\n",
    "In this problem, commands $\\{A, B, C\\}$ are shared between images $\\{1,2\\}$.\n",
    "\n",
    "So let's make a variable for that and add it to $P$.\n",
    "\n",
    "$$x_{12,abc} \\in \\{0,1\\}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Of course this changes the constraints.\n",
    "\n",
    "$$x_{1,a} + x_{12,abc} = 1$$\n",
    "$$x_{1,b} + x_{12,abc} = 1$$\n",
    "$$x_{1,c} + x_{12,abc} = 1$$\n",
    "$$x_{2,a} + x_{12,abc} = 1$$\n",
    "$$x_{b,b} + x_{12,abc} = 1$$\n",
    "$$x_{c,c} + x_{12,abc} = 1$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### This time, with feeling\n",
    "\n",
    "This one actually allows us to share the time required to run commands between images.\n",
    "\n",
    "$$\n",
    "\\begin{array}{r l}\n",
    "    \\min        & t_a x_{1,a} + t_b x_{1,b} + t_c x_{1,c} + t_a x_{2,a} + t_b x_{2,b} + t_c x_{2,c} + t_c x_{2,d} + \\left(t_a + t_b + t_c\\right) x_{12,abc}\\\\  \n",
    "    \\text{s.t.} & x_{1,a} + x_{12,abc} = 1 \\\\\n",
    "                & x_{1,b} + x_{12,abc} = 1 \\\\\n",
    "                & x_{1,c} + x_{12,abc} = 1 \\\\\n",
    "                & x_{2,a} + x_{12,abc} = 1 \\\\\n",
    "                & x_{2,b} + x_{12,abc} = 1 \\\\\n",
    "                & x_{2,c} + x_{12,abc} = 1 \\\\\n",
    "                & x_{2,d} = 1 \\\\\n",
    "                & x_p \\in \\{0,1\\}\\, \\forall\\, p \\in P\n",
    "\\end{array}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "This is much better. It gives us the correct solution.\n",
    "\n",
    "$$x_{1,a} =  x_{1,b} = x_{1,c} = x_{2,a} = x_{2,b} = x_{2,c} = 0$$\n",
    "$$x_{12,abc} = x_{2,d} = 1$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The minimum time is:\n",
    "\n",
    "$$t_a + t_b + t_c + t_d$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Visually this could be represented as:\n",
    "\n",
    "![](images/fig01.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### But things are not always so simple.\n",
    "\n",
    "We created the variable $x_{12,abc}$ because it was obvious. $I_1 \\bigcap I_2 = \\{A, B, C\\}$, so it was the best possible option."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "But consider this set of 4 images:\n",
    "\n",
    "$$I_3 = \\{A, B, C\\}$$\n",
    "$$I_4 = \\{B, C, D\\}$$\n",
    "$$I_5 = \\{A, C, E\\}$$\n",
    "$$I_6 = \\{B, D, E\\}$$\n",
    "\n",
    "In this problem instance, the values $t_a, \\dots, t_e$ are what drive our final schedule."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "If $t_c$ is big enough, the optimal schedule might be:\n",
    "    \n",
    "```\n",
    "    I3:  C B A\n",
    "    I4:  C B D \n",
    "    I5:  C A E\n",
    "    I6:  B D E\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "And if $t_a > t_b$ it might be:\n",
    "    \n",
    "```\n",
    "    I3:  C A B\n",
    "    I4:  C B D \n",
    "    I5:  C A E\n",
    "    I6:  B D E\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "So if we set $x_{345,c} = 1$, we have the option to use two other variables to create our schedule:\n",
    "\n",
    "$$x_{345,c,34,b} \\in \\{0,1\\}$$\n",
    "$$x_{345,c,35,a} \\in \\{0,1\\}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "But this can only happen if $x_{345,c} = 1$. That is, $x_{345,c,34,b}$ and $x_{345,c,35,a}$ are **dependent** on it.\n",
    "\n",
    "$$x_{345,c,34,b} \\le x_{345,c}$$\n",
    "$$x_{345,c,35,a} \\le x_{345,c}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We will represent these dependencies using another set, $R$.\n",
    "\n",
    "For all $\\left(x_m, x_n\\right) \\in R$, any feasible solution requires that $x_m \\le x_n$.\n",
    "\n",
    "Note that dependencies have structure. We may have:\n",
    "\n",
    "$$x_m \\le x_n$$\n",
    "$$x_n \\le x_o$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### We also have to worry about intersections...\n",
    "\n",
    "Recall our set of 4 images:\n",
    "\n",
    "$$I_3 = \\{A, B, C\\}$$\n",
    "$$I_4 = \\{B, C, D\\}$$\n",
    "$$I_5 = \\{A, C, E\\}$$\n",
    "$$I_6 = \\{B, D, E\\}$$\n",
    "\n",
    "In addition to the variables $x_{3,a}$, $x_{3,b}$, $\\dots$, $x_{6,e}$, we create $x_{345,c}$, $x_{34,bc}$, $\\dots$, $x_{56,e}$ and add the appropriate $\\left(\\text{images}, \\text{commands}\\right)$ pairs to $P$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "For each image $i \\in I$, and command $c \\in C_i$, exactly one $x_p$ that provides $c$ for $i$ must be set to $1$. This restricts **intersections** in our schedule.\n",
    "\n",
    "$$\\sum_{p = \\left(p_i,p_c\\right) \\in P,\\, i \\in p_i,\\, c \\in p_c} x_p = 1\\, \\forall\\, i \\in I,\\, c \\in C_i$$ "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Now we have everything we need to build our model!\n",
    "\n",
    "$$\n",
    "\\begin{array}{r l l l}\n",
    "    \\min        & \\sum_{p \\in I} \\sum_{c \\in C_i} t_c x_p & \\\\\n",
    "    \\text{s.t.} & \\sum_{p = \\left(p_i,p_c\\right) \\in P,\\, i \\in p_i,\\, c \\in p_c} x_p = 1 & \\forall & i \\in I,\\, c \\in C_i \\\\\n",
    "                & x_m \\le x_n & \\forall & \\left(m, n\\right) \\in R \\\\\n",
    "                & x_p \\in \\{0,1\\} & \\\\\n",
    "\\end{array}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "All we have to do is generate $P$ and $R$!\n",
    "\n",
    "Right?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Finding maximal cliques with NetworkX\n",
    "\n",
    "At some point, somebody is going to write some code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "So how do we generate $P$ and $R$?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Naive option:\n",
    "\n",
    "* Make a graph with all images and commands\n",
    "* Connect each image to its commands\n",
    "* Connect images to each other\n",
    "* Connect commands to each other\n",
    "* Find maximal cliques!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "**Q:** What's a maximal clique?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "**A:** A subgraph where each pair of nodes is adjacent. If it can't be contained in another clique, it's maximal.\n",
    "\n",
    "![](images/clique.png)\n",
    "_A clique. Source: Wikipedia_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Creating a graph and connect every image to its commands.\n",
    "\n",
    "![](images/graph_1.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Now connect every pair of image, and every pair of commands.\n",
    "\n",
    "![](images/graph_2.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# Define the commands we have available.\n",
    "commands = ['A','B','C','D','E']\n",
    "\n",
    "# And the images we need to construct from them.\n",
    "images = [3,4,5,6]\n",
    "image_commands = {\n",
    "    3: ['A','B','C'],\n",
    "    4: ['B','C','D'],\n",
    "    5: ['A','C','E'],\n",
    "    6: ['B','D','E'],\n",
    "}\n",
    "\n",
    "# Make up some times we expect the commands to take.\n",
    "command_times = {\n",
    "    'A':  5,\n",
    "    'B': 10,\n",
    "    'C':  7,\n",
    "    'D': 12,\n",
    "    'E':  8,\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['cD', 'i3', 'cA', 'i5', 'i6', 'cE', 'i4', 'cC', 'cB']"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Construct the graph shown above.\n",
    "# Prefix images with 'i' and commands with 'c'.\n",
    "\n",
    "from itertools import combinations\n",
    "import networkx as nx\n",
    "\n",
    "def build_graph(image_dict):\n",
    "    g = nx.Graph()\n",
    "    \n",
    "    all_cmds = set()\n",
    "    for i, cmds in image_dict.items():\n",
    "        for c in cmds:\n",
    "            g.add_edge('i%d' % i, 'c%s' % c)\n",
    "            all_cmds.add(c)\n",
    "            \n",
    "    for i1, i2 in combinations(image_dict, 2):\n",
    "        g.add_edge('i%d' % i1, 'i%d' % i2)\n",
    "\n",
    "    for c1, c2 in combinations(all_cmds, 2):\n",
    "        g.add_edge('c%s' % c1, 'c%s' % c2)\n",
    "\n",
    "    return g\n",
    "    \n",
    "build_graph(image_commands).nodes()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[({4, 6}, {'B', 'D'}),\n",
       " ({3, 4, 6}, {'B'}),\n",
       " ({5, 6}, {'E'}),\n",
       " ({3, 4, 5}, {'C'}),\n",
       " ({3, 4}, {'B', 'C'}),\n",
       " ({3, 5}, {'A', 'C'})]"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Generate all maximal cliques with > 1 image and >= 1 command.\n",
    "\n",
    "from networkx.algorithms import find_cliques\n",
    "\n",
    "def command_cliques(image_dict):\n",
    "    max_cliques = []\n",
    "    \n",
    "    g = build_graph(image_dict)\n",
    "    for c in find_cliques(g):\n",
    "        imgs = set([int(x.replace('i','')) for x in c if x.startswith('i')])\n",
    "        cmds = set([x.replace('c','') for x in c if x.startswith('c')])\n",
    "        if len(imgs) > 1 and cmds:\n",
    "            max_cliques.append((imgs, cmds))\n",
    "    \n",
    "    return max_cliques\n",
    "        \n",
    "# Top-level cliques (these depend on nothing)\n",
    "top_cliques = command_cliques(image_commands)\n",
    "top_cliques"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Add each of these to $P$.\n",
    "\n",
    "Now consider each maximal clique. Let's look at $\\left(\\{3,4,5\\},\\{C\\}\\right)$.\n",
    "\n",
    "$x_{345,c} = 1$ if images $\\{3,4,5\\}$ share command $C$. \n",
    "\n",
    "Can they share anything after that?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Repeat the process with remaining commands for images $\\{3,4,5\\}$. Leave out $C$.\n",
    "\n",
    "![](images/graph_3.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "![](images/graph_4.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Max cliques:\n",
    "\n",
    "$$\\{3,4,B\\}$$\n",
    "$$\\{3,5,A\\}$$\n",
    "\n",
    "Add variables to $P$:\n",
    "\n",
    "$$x_{345,c,34,b} \\in \\{0,1\\}$$\n",
    "$$x_{345,c,35,a} \\in \\{0,1\\}$$\n",
    "\n",
    "Both depend on \"parent\" clique:\n",
    "\n",
    "$$x_{345,c,34,b} \\le x_{345,c}$$\n",
    "$$x_{345,c,35,a} \\le x_{345,c}$$\n",
    "\n",
    "So add dependencies to $R$:\n",
    "\n",
    "$$\\left(x_{345,c,34,b}, x_{345,c}\\right)$$\n",
    "$$\\left(x_{345,c,35,a}, x_{345,c}\\right)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Maximal clique detection is NP-complete.\n",
    "\n",
    "But it works _better than you might think!_ Why?\n",
    "\n",
    "&nbsp;  \n",
    "&nbsp;    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "$${\\Huge\\text{THEORY} \\ne \\text{PRACTICE}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "&nbsp;  \n",
    "&nbsp;    \n",
    "    \n",
    "There are more efficient ways, but that's for another day."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Partitioning sets with PuLP\n",
    "\n",
    "\"NP-Complete\" is a term used to scare graduate students."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### We have a \"set  partitioning problem\" with side constraints.\n",
    "\n",
    "Let's ignore dependencies. Say $R = \\emptyset$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "A optimal \"schedule\" is a set of $x_p$ variables we set to $1$ that:\n",
    "\n",
    "* Assigns each image all its commands **once**\n",
    "* Minimizes required compute time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "Mathematically, set partitioning looks like:\n",
    "\n",
    "$$\n",
    "\\begin{array}{r l}\n",
    "    \\min|\\max    & c^\\intercal x \\\\  \n",
    "    \\text{s.t.} & Ax = 1 \\\\\n",
    "                & x \\in \\{0,1\\}^n \\\\\n",
    "\\end{array}\n",
    "$$\n",
    "\n",
    "Where $c^\\intercal \\in \\mathcal{R}^n$ and $A \\in \\{0,1\\}^{m \\times n}$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Technically, set partitioning is NP-complete.\n",
    "\n",
    "But again...\n",
    "\n",
    "&nbsp;  \n",
    "&nbsp;  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "$${\\huge\\text{THEORY} \\ne \\text{PRACTICE}}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "&nbsp;  \n",
    "&nbsp;  \n",
    "\n",
    "Real world data is usually **NOT** pathological.\n",
    "\n",
    "We have _really_ good technology for solving set partitions.\n",
    "\n",
    "* Heuristics\n",
    "* Optimization (combinatorial, dynamic)\n",
    "\n",
    "So even with the dependency constraints, we're probably OK."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# Let's build a model!\n",
    "\n",
    "import pulp\n",
    "\n",
    "m = pulp.LpProblem('Docker schedule', pulp.LpMinimize)\n",
    "x = {}    # binary variables\n",
    "obj = 0.0 # objective = minimize required time\n",
    "\n",
    "# These variables are on when images do not share the results of commands.\n",
    "for i, cmds in image_commands.items():\n",
    "    for c in cmds:\n",
    "        n = '%d-%s' % (i,c)\n",
    "        x[n] = pulp.LpVariable(name=n, cat=pulp.LpBinary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "collapsed": true,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# Create variables for cliques. (Normally we'd do this automatically.)\n",
    "x['46-BD'] = pulp.LpVariable(name='46-BD', cat=pulp.LpBinary)\n",
    "x['346-B'] = pulp.LpVariable(name='346-B', cat=pulp.LpBinary)\n",
    "x['56-E' ] = pulp.LpVariable(name='56-E',  cat=pulp.LpBinary)\n",
    "x['345-C'] = pulp.LpVariable(name='345-C', cat=pulp.LpBinary)\n",
    "x['34-BC'] = pulp.LpVariable(name='34-BC', cat=pulp.LpBinary)\n",
    "x['35-AC'] = pulp.LpVariable(name='35-AC', cat=pulp.LpBinary)\n",
    "\n",
    "# These are child cliques we generate using subgraphs.\n",
    "x['346-B-34-C'] = pulp.LpVariable(name='346-B-34-C', cat=pulp.LpBinary)\n",
    "x['346-B-46-D'] = pulp.LpVariable(name='346-B-46-D', cat=pulp.LpBinary)\n",
    "x['345-C-34-B'] = pulp.LpVariable(name='345-C-34-B', cat=pulp.LpBinary)\n",
    "x['345-C-35-A'] = pulp.LpVariable(name='345-C-35-A', cat=pulp.LpBinary)\n",
    "\n",
    "# Dependencies from parent to child cliques.\n",
    "m += x['346-B-34-C'] <= x['346-B']\n",
    "m += x['346-B-46-D'] <= x['346-B']\n",
    "m += x['345-C-34-B'] <= x['345-C']\n",
    "m += x['345-C-35-A'] <= x['345-C']\n",
    "\n",
    "# Restrict intersecting cliques at the same level.\n",
    "m += x['346-B'] + x['345-C'] + x['34-BC'] + x['35-AC'] <= 1 # image 3\n",
    "m += x['46-BD'] + x['346-B'] + x['345-C'] + x['34-BC'] <= 1 # image 4\n",
    "m += x['56-E' ] + x['345-C'] + x['35-AC'] <= 1              # image 5\n",
    "m += x['46-BD'] + x['346-B'] + x['56-E' ] <= 1              # image 6\n",
    "m += x['346-B-34-C'] + x['346-B-46-D'] <= 1                 # image 4\n",
    "m += x['345-C-34-B'] + x['345-C-35-A'] <= 1                 # image 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time required: 67.0\n",
      "Schedule:\n",
      "\t 35-AC\n",
      "\t 46-BD\n",
      "\t 6-E\n",
      "\t 3-B\n",
      "\t 5-E\n",
      "\t 4-C\n"
     ]
    }
   ],
   "source": [
    "# Each image runs each command exactly once.\n",
    "for i, cmds in image_commands.items():\n",
    "    for c in cmds:\n",
    "        m += sum(x[n] for n in x if str(i) in n and c in n) == 1\n",
    "\n",
    "# Objective: minimize the required compute time.\n",
    "obj = 0.0\n",
    "for n, x_p in x.items():\n",
    "    t = sum(command_times.get(c,0) for c in n) # a little hacky...\n",
    "    obj += t * x_p\n",
    "\n",
    "# Use a variable so it's easy to pull out the objective value.\n",
    "z = pulp.LpVariable('objective', lowBound=0)\n",
    "m += z == obj\n",
    "m.setObjective(z)\n",
    "\n",
    "assert m.solve() == pulp.LpStatusOptimal\n",
    "print('Time required:', pulp.value(z))\n",
    "print('Schedule:')\n",
    "for n, x_p in x.items():\n",
    "    if pulp.value(x_p) > 0.5:\n",
    "        print('\\t', n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "To translate that output into a real schedule, first add any cliques to the schedule:\n",
    "\n",
    "```\n",
    "    I3:  A C\n",
    "    I4:  B D\n",
    "    I5:  A C\n",
    "    I6:  B D\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The remaining commands go at the end:\n",
    "\n",
    "```\n",
    "    I3:  A C B\n",
    "    I4:  B D C\n",
    "    I5:  A C E\n",
    "    I6:  B D E\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Computational example\n",
    "\n",
    "No, really. That's how we do this thing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### An example with 15 images and 25 commands.\n",
    "\n",
    "A randomly generated instance. These are the images and their commands.\n",
    "\n",
    "```json\n",
    "{\n",
    "    \"images\": {\n",
    "        \"01\": [\"03\", \"05\", \"07\", \"08\", \"09\", \"11\", \"14\", \"15\", \"18\", \"19\", \"24\"],\n",
    "        \"02\": [\"02\", \"05\", \"07\", \"10\", \"15\", \"20\", \"22\", \"23\", \"25\"],\n",
    "        \"03\": [\"08\", \"12\", \"14\", \"15\", \"16\", \"17\"],\n",
    "        \"04\": [\"03\", \"05\", \"13\", \"14\", \"17\"],\n",
    "        \"05\": [\"03\", \"05\", \"09\", \"15\", \"17\", \"18\", \"20\", \"21\", \"22\", \"23\", \"24\", \"25\"],\n",
    "        \"06\": [\"04\", \"08\", \"12\", \"13\", \"16\", \"17\"],\n",
    "        \"07\": [\"01\", \"03\", \"04\", \"05\", \"07\", \"08\", \"12\", \"15\", \"19\", \"24\", \"25\"],\n",
    "        \"08\": [\"11\", \"13\", \"15\", \"16\"],\n",
    "        \"09\": [\"01\", \"03\", \"11\", \"13\", \"17\", \"22\", \"24\"],\n",
    "        \"10\": [\"08\", \"12\", \"13\", \"14\", \"15\", \"19\", \"24\", \"25\"],\n",
    "        \"11\": [\"04\"],\n",
    "        \"12\": [\"07\", \"12\", \"16\", \"19\", \"22\", \"25\"],\n",
    "        \"13\": [\"01\", \"06\", \"07\", \"16\", \"18\", \"19\", \"21\", \"25\"],\n",
    "        \"14\": [\"05\", \"10\", \"11\", \"13\", \"14\", \"22\"],\n",
    "        \"15\": [\"01\", \"02\", \"03\", \"04\", \"11\", \"15\"]\n",
    "    },\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "And the time required for each command.\n",
    "\n",
    "```json\n",
    "    \"commands\": {\n",
    "        \"01\": 58,\n",
    "        \"02\": 98,\n",
    "        \"03\": 68,\n",
    "        \"04\": 152,\n",
    "        \"05\": 60,\n",
    "        \"06\": 79,\n",
    "        \"07\": 463,\n",
    "        \"08\": 95,\n",
    "        \"09\": 532,\n",
    "        \"10\": 51,\n",
    "        \"11\": 128,\n",
    "        \"12\": 88,\n",
    "        \"13\": 65,\n",
    "        \"14\": 155,\n",
    "        \"15\": 97,\n",
    "        \"16\": 148,\n",
    "        \"17\": 340,\n",
    "        \"18\": 115,\n",
    "        \"19\": 109,\n",
    "        \"20\": 14,\n",
    "        \"21\": 94,\n",
    "        \"22\": 61,\n",
    "        \"23\": 149,\n",
    "        \"24\": 43,\n",
    "        \"25\": 31\n",
    "    }\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "And the optimal image construction schedule!\n",
    "\n",
    "```json\n",
    "{\n",
    "    \"unique_images\": 73,\n",
    "    \"compute_time\": 9414,\n",
    "    \"schedule\": {\n",
    "        \"01\": [\"15\", \"08\", \"24\", \"19\", \"05\", \"07\", \"03\", \"11\", \"14\", \"09\", \"18\"],\n",
    "        \"02\": [\"15\", \"05\", \"20\", \"23\", \"22\", \"25\", \"07\", \"10\", \"02\"],\n",
    "        \"03\": [\"15\", \"08\", \"14\", \"12\", \"16\", \"17\"],\n",
    "        \"04\": [\"13\", \"17\", \"03\", \"05\", \"14\"],\n",
    "        \"05\": [\"15\", \"05\", \"20\", \"23\", \"22\", \"25\", \"18\", \"24\", \"17\", \"09\", \"03\", \"21\"],\n",
    "        \"06\": [\"13\", \"17\", \"16\", \"08\", \"04\", \"12\"],\n",
    "        \"07\": [\"15\", \"08\", \"24\", \"19\", \"05\", \"07\", \"03\", \"12\", \"04\", \"25\", \"01\"],\n",
    "        \"08\": [\"15\", \"11\", \"16\", \"13\"],\n",
    "        \"09\": [\"13\", \"17\", \"03\", \"01\", \"22\", \"11\", \"24\"],\n",
    "        \"10\": [\"15\", \"08\", \"24\", \"19\", \"12\", \"14\", \"25\", \"13\"],\n",
    "        \"11\": [\"04\"],\n",
    "        \"12\": [\"19\", \"07\", \"25\", \"16\", \"12\", \"22\"],\n",
    "        \"13\": [\"19\", \"07\", \"25\", \"16\", \"18\", \"21\", \"01\", \"06\"],\n",
    "        \"14\": [\"13\", \"14\", \"05\", \"10\", \"11\", \"22\"],\n",
    "        \"15\": [\"15\", \"11\", \"01\", \"04\", \"02\", \"03\"]\n",
    "    }\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### How does this compare to a heuristic solution?\n",
    "\n",
    "We'll call this the `most-common` heuristic. It makes the greediest choice possible at each step.\n",
    "\n",
    "* `most-common(images):`\n",
    "    + Choose the most common unscheduled command for `images`\n",
    "    + Add it to the schedule next for `images`\n",
    "    + `images2` = subset of `images` for which command was scheduled\n",
    "    + `most-common(images2)`\n",
    "    + `most-common(images \\ images2)`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "The heuristic is > 10% worse for this particular instance.\n",
    "\n",
    "```json\n",
    "{\n",
    "    \"unique_images\": 80,\n",
    "    \"compute_time\": 10531,\n",
    "    \"schedule\": {\n",
    "        \"01\": [\"15\", \"03\", \"24\", \"05\", \"11\", \"07\", \"19\", \"18\", \"08\", \"09\", \"14\"],\n",
    "        \"02\": [\"15\", \"02\", \"10\", \"25\", \"05\", \"20\", \"07\", \"22\", \"23\"],\n",
    "        \"03\": [\"15\", \"08\", \"12\", \"14\", \"17\", \"16\"],\n",
    "        \"04\": [\"13\", \"17\", \"03\", \"14\", \"05\"],\n",
    "        \"05\": [\"15\", \"03\", \"24\", \"05\", \"25\", \"20\", \"21\", \"18\", \"09\", \"17\", \"22\", \"23\"],\n",
    "        \"06\": [\"13\", \"17\", \"08\", \"12\", \"04\", \"16\"],\n",
    "        \"07\": [\"15\", \"03\", \"24\", \"05\", \"25\", \"12\", \"19\", \"08\", \"01\", \"07\", \"04\"],\n",
    "        \"08\": [\"15\", \"13\", \"11\", \"16\"],\n",
    "        \"09\": [\"13\", \"17\", \"03\", \"11\", \"24\", \"01\", \"22\"],\n",
    "        \"10\": [\"15\", \"13\", \"24\", \"19\", \"25\", \"08\", \"12\", \"14\"],\n",
    "        \"11\": [\"04\"],\n",
    "        \"12\": [\"25\", \"07\", \"16\", \"19\", \"12\", \"22\"],\n",
    "        \"13\": [\"25\", \"07\", \"16\", \"19\", \"18\", \"01\", \"06\", \"21\"],\n",
    "        \"14\": [\"13\", \"11\", \"10\", \"14\", \"22\", \"05\"],\n",
    "        \"15\": [\"15\", \"03\", \"02\", \"11\", \"01\", \"04\"]\n",
    "    }\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "$${\\Huge\\text{?}}$$"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}