{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Optimize your Docker Infrastructure with Python\n", "\n", "## PyData NYC\n", "\n", "November 11, 2015\n", "\n", "* Ryan J. O'Neil (@ryanjoneil)\n", "* Day: Lead Engineer @ [Yhat](http://www.yhathq.com) ()\n", "* Night: PhD student in Operations Research @ GMU ()\n", "* Blog: http://ryanjoneil.github.io/blog.html\n", "* Slides: https://github.com/ryanjoneil/talks/" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Motivation\n", "\n", "Consider the DevOps Engineer." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### It's a noble profession...\n", "\n", "![](images/scotty.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### ...with grave responsibilities.\n", "\n", "One of the important functions of DevOps is building environments for:\n", "\n", "* Software development\n", "* Testing & quality assurance\n", "* Running operational systems\n", "* Whatever else you might need a computing environment for" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### There's even a Venn diagram about DevOps.\n", "\n", "And we all know how much we like Venn diagrams.\n", "\n", "![](images/devops-venn.svg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### A typical DevOps function\n", "\n", "You work on _Yet Another Enterprise Java Application (TM)_. YAEJA makes \\$\\$ for _Yet Another Enterprise Software Company_!\n", "\n", "You need to upgrade components where YAEJA runs.\n", "\n", "You can't upgrade production without testing. So you need two environments from your DevOps person.\n", "\n", "One for production:\n", "\n", "  \n", "\n", "| | Command |\n", "|-----|----------------------------------------------------------------|\n", "| $A$ | Install the Java compiler and runtime environment. |\n", "| $B$ | Download a set of external dependencies. |\n", "| $C$ | Set up a an EntepriseDB (TM) schema and populate it with data. |" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "And one for testing it with the upgrade:\n", "\n", "  \n", "\n", "| | Command |\n", "|-----|----------------------------------------------------------------|\n", "| $A$ | Install the Java compiler and runtime environment. |\n", "| $B$ | Download a set of external dependencies. |\n", "| $C$ | Set up a an EntepriseDB (TM) schema and populate it with data. |\n", "| $D$ | Update the underlying system utility. |\n", "\n", "  \n", "\n", "If YAJEA integrates with 3rd party apps, you need another test environment for each one.\n", "\n", "If those interact in interesting ways, you need test environments for different combinations of them." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Looks like your DevOps person will be working all weekend\n", "\n", "![](images/saturday.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### In the old days...\n", "\n", "At a big software shop, devops may setup and tear down dozens (or _hundreds_) of system configurations for a single software project.\n", "\n", "This used to be done with physical hardware on _(sometimes)_ fresh OS installs.\n", "\n", "Most medium-to-large software shops had an air conditioned room like this.\n", "\n", "![](images/cable-spaghetti.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Environmental setup was often pretty manual\n", "\n", "If you had to reproduce an environment, you had a few options:\n", "\n", "* Start with a fresh install of the OS\n", "\n", "\n", "* Setup and save the results to a CD\n", "\n", "\n", "* Hope that uninstalling and resintalling is good enough...\n", "\n", " + A _lot_ of people did this\n", " + Probably not good enough\n", " + Software leaves relics\n", " - Logs...\n", " - Data files...\n", " - Configuration..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Nowadays...\n", "\n", "![](images/cloud-docker.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### What's a container?\n", "\n", "Docker is so hot right now." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "I've been getting two questions a lot this week:\n", "\n", "1. What's Docker?\n", "2. Should I be using it?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### A1: Docker is a lightweight virtualization platform\n", "\n", "Docker provides two main things: images & containers.\n", "\n", "Images represent the environments containers run it.\n", "\n", "A running container acts like it is:\n", " \n", "* In it own OS\n", "* On its own hardware\n", "\n", "But it doesn't load heavy stuff like a new kernel or device drivers." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "$${\\Huge\\text{DEMO TIME}}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### A2: I don't know what you should be doing\n", "\n", "But Yhat uses a ${\\large\\text{LOT}}$ of Docker.\n", "\n", "* Development (MySQL containers)\n", "* Functional tests for our software\n", "* Buiding binary packages for Linux distros\n", "* Productionizing predictive models (ScienceOps)\n", "* Environment management in collaborative tools (ScienceCluster)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "$${\\Huge\\text{DEMO TIME}}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### So containers are about isolation...\n", "\n", "Processes and system resources behave as if the are on their own computers.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "#### ...but containers are also about sharing.\n", "\n", "And this is what we care about here." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Specifically, saving and retrieving the results of a computation from the Docker image cache." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Why?\n", "\n", "  \n", "\n", "$$\\text{SMART CACHE USE} = \\text{DEVOPS TIME SAVED!}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![](images/scotty-approved.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Docker cache mechanics\n", "\n", "Maybe I should call this section UnionFS mechanics, but then Docker is so hot right now." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### A Tale of Two Dockerfiles\n", "\n", "Let's say I collect old or non-standard revision control systems." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Everyone needs a hobby." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "I set up an environment with a few RCSs in it to play around with.\n", "\n", "Dockerfile: `rcs1`\n", "  \n", "```\n", "FROM ubuntu:trusty\n", "RUN apt-get install -y bzr\n", "RUN apt-get install -y cvs\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Let's build out our non-standard RCS image!\n", "\n", "```\n", "[ryan@localhost dockerfiles]$ docker build --file=rcs1 --tag=rcs1 .\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Step 0 would download the base Ubuntu image, but I've already got it.\n", "\n", "```\n", "Sending build context to Docker daemon 24.58 kB\n", "Step 0 : FROM ubuntu:trusty\n", " ---> a5a467fddcb8\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "`bzr` require a number of dependencies...\n", "\n", "```\n", "Step 1 : RUN apt-get install -y bzr\n", " ---> Running in 7679f0a1525e\n", "Reading package lists...\n", "Building dependency tree...\n", "Reading state information...\n", "The following extra packages will be installed:\n", " ca-certificates dbus gir1.2-glib-2.0 libapparmor1 libassuan0\n", " libdbus-glib-1-2 libgirepository-1.0-1 libglib2.0-0 libglib2.0-data libgmp10\n", " [...snip...]\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "As does `cvs`.\n", "\n", "```\n", "Step 2 : RUN apt-get install -y cvs\n", " ---> Running in 54229eef94d2\n", "Reading package lists...\n", "Building dependency tree...\n", "Reading state information...\n", "The following extra packages will be installed:\n", " krb5-locales libedit2 libgssapi-krb5-2 libk5crypto3 libkeyutils1 libkrb5-3\n", " libkrb5support0 libx11-6 libx11-data libxau6 libxcb1 libxdmcp6 libxext6\n", " [...snip...]\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "At the end of this I have a tagged image. I can easily use this to spin up new containers.\n", "\n", "```\n", "[ryan@localhost dockerfiles]$ docker run rcs1 bash -c \"which cvs && which bzr\"\n", "/usr/bin/cvs\n", "/usr/bin/bzr\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Oh snap.\n", "\n", "I found an old RCS repository from the 1980s I want to look at.\n", "\n", "![](images/rcs-sccs.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "I don't have `rcs` installed. I guess I'll have to build a new image.\n", "\n", "Dockerfile: `rcs2`\n", "  \n", "```\n", "FROM ubuntu:trusty\n", "RUN apt-get install -y bzr\n", "RUN apt-get install -y cvs\n", "RUN apt-get install -y software-properties-common\n", "RUN add-apt-repository \"deb http://archive.ubuntu.com/ubuntu trusty universe\"\n", "RUN apt-get update\n", "RUN apt-get install -y rcs\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```\n", "[ryan@localhost dockerfiles]$ docker build --file=rcs2 --tag=rcs2 .\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "\n", "The first few steps have already been run before and can use the Docker cache!\n", "```\n", "Step 0 : FROM ubuntu:trusty\n", " ---> a5a467fddcb8\n", "\n", "Step 1 : RUN apt-get install -y bzr\n", " ---> Using cache\n", " ---> ed8a8efd04cc\n", "\n", "Step 2 : RUN apt-get install -y cvs\n", " ---> Using cache\n", " ---> 61fe05c24a19\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "After that it's business as usual.\n", "\n", "```\n", "Step 3 : RUN apt-get install -y software-properties-common\n", " ---> Running in 4adc2bb63a73\n", "The following extra packages will be installed:\n", " iso-codes libasn1-8-heimdal libcurl3-gnutls libgssapi3-heimdal\n", " libhcrypto4-heimdal libheimbase1-heimdal libheimntlm0-heimdal\n", " [...snip...]\n", "\n", "Step 4 : RUN add-apt-repository \"deb http://archive.ubuntu.com/ubuntu trusty universe\"\n", " ---> Running in 5a023435448c\n", " ---> 02439f270099\n", "Removing intermediate container 5a023435448c\n", "\n", "Step 5 : RUN apt-get update\n", " ---> Running in 449c25ee3537\n", "Get:1 http://archive.ubuntu.com trusty-updates InRelease [64.4 kB]\n", "[...snip...]\n", "\n", "Step 6 : RUN apt-get install -y rcs\n", " ---> Running in 3574b94bbea3\n", "The following NEW packages will be installed:\n", " rcs\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Adding to my stable of random revision control.\n", "\n", "_Also, why are they all three letters long?_\n", "\n", "So now I have `rcs` in addition to `bzr` and `cvs`. Note that my `rcs2` image was built off of my `rcs1` image. I didn't have to do any extra work to recreate those shared parts of it.\n", "\n", "```\n", "[ryan@localhost dockerfiles]$ docker run rcs2 bash -c \"which bzr && which cvs && which rcs\"\n", "/usr/bin/bzr\n", "/usr/bin/cvs\n", "/usr/bin/rcs\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### What happens if I rebuild the image again?\n", "\n", "Omigosh everything is cached!\n", "\n", "```\n", "[ryan@localhost dockerfiles]$ docker build --file=rcs2 --tag=rcs2 .\n", "Sending build context to Docker daemon 3.072 kB\n", "Step 0 : FROM ubuntu:trusty\n", " ---> a5a467fddcb8\n", " \n", "Step 1 : RUN apt-get install -y bzr\n", " ---> Using cache\n", " ---> ed8a8efd04cc\n", " \n", "Step 2 : RUN apt-get install -y cvs\n", " ---> Using cache\n", " ---> 61fe05c24a19\n", " \n", "Step 3 : RUN apt-get install -y software-properties-common\n", " ---> Using cache\n", " ---> 942207de6f7f \n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "```\n", "Step 4 : RUN add-apt-repository \"deb http://archive.ubuntu.com/ubuntu trusty universe\"\n", " ---> Using cache\n", " ---> 02439f270099\n", "\n", "Step 5 : RUN apt-get update\n", " ---> Using cache\n", " ---> 220536dfc3ff\n", "\n", "Step 6 : RUN apt-get install -y rcs\n", " ---> Using cache\n", " ---> 1a8cd22cb14d\n", "Successfully built 1a8cd22cb14d\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![](images/spock-logical-awesome.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### What if I change the order of that Dockerfile?\n", "\n", "I'd really rather have the commands to add `universe` at the top.\n", "\n", "Dockerfile: `rcs3`\n", "  \n", "```\n", "FROM ubuntu:trusty\n", "RUN apt-get install -y software-properties-common\n", "RUN add-apt-repository \"deb http://archive.ubuntu.com/ubuntu trusty universe\"\n", "RUN apt-get update\n", "RUN apt-get install -y bzr\n", "RUN apt-get install -y cvs\n", "RUN apt-get install -y rcs\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### What happens when I build an image with the commands reordered?\n", "\n", "Nothing except the base image is cached!\n", "\n", "```\n", "[ryan@localhost dockerfiles]$ docker build --file=rcs3 --tag=rcs3 .\n", "Sending build context to Docker daemon 4.096 kB\n", "\n", "Step 0 : FROM ubuntu:trusty\n", " ---> a5a467fddcb8\n", "\n", "Step 1 : RUN apt-get install -y software-properties-common\n", " ---> Running in 713e49f1f323\n", "Reading package lists...\n", "Building dependency tree...\n", "Reading state information...\n", "The following extra packages will be installed:\n", " ca-certificates gir1.2-glib-2.0 iso-codes krb5-locales libasn1-8-heimdal\n", " libcurl3-gnutls libdbus-glib-1-2 libgirepository-1.0-1 libglib2.0-0\n", " libglib2.0-data libgssapi-krb5-2 libgssapi3-heimdal libhcrypto4-heimdal\n", " libheimbase1-heimdal libheimntlm0-heimdal libhx509-5-heimdal libidn11\n", " libk5crypto3 libkeyutils1 libkrb5-26-heimdal libkrb5-3 libkrb5support0\n", " libldap-2.4-2 libroken18-heim\n", " [...etc...]\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### So Docker can use the cache when...\n", "\n", "It has seen the exact same commands in _the same order_.\n", "\n", "If it sees something new, it _stops using the cache_.\n", "\n", "Certain commands, like `COPY`, break the cache entirely." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Docker caching examples\n", "\n", "Building two images that share three commands:\n", "\n", "![](images/fig01.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "If we structure them as such, they share nothing:\n", "\n", "![](images/fig02.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Once images diverge, they can never merge back together\n", "\n", "This won't work:\n", "\n", "![](images/fig03.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This is what would really happen:\n", "\n", "![](images/fig04.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This would be better:\n", "\n", "![](images/fig05.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Problem formulation\n", "\n", "Let's write this thing in LaTeX!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### What are we trying to do again?\n", "\n", "Given a set of Docker images and their commands, find the best order to run the commands and\n", "minimize required computing time.\n", "\n", "_Could also read: maximize use of the Docker cache._" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Say we know a priori the time required by each command\n", "\n", "We could also use a prior.\n", "\n", "$$C = \\{\\text{set of commmands}\\}$$\n", "$$t_c > 0 = \\text{time required by command c}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We also know the images we need to create and the commands each one requires.\n", "\n", "$$I = \\{\\text{set of images}\\}$$\n", "$$C_i = \\{\\text{commands requires for image I}\\}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### We need some decision variables\n", "\n", "We define a set $P$ of (image set, command set) pairs, starting with each image $i \\in I$ and, for each of these, every command $c \\in C_i$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In our first example we had two images, $I = \\{1, 2\\}$.\n", "\n", "The commands for the images were: \n", "\n", "$$C_1 = \\{A, B, C\\}$$\n", "$$C_2 = \\{A, B, C, D\\}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "So our set $P$ initially contains:\n", "\n", "$$P = \\{(\\{1\\},\\{A\\}), (\\{1\\},\\{B\\}), (\\{1\\},\\{C\\}), \\\\ (\\{2\\},\\{A\\}), (\\{2\\},\\{B\\}), (\\{2\\},\\{C\\}), (\\{2\\},\\{D\\})\\}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Now define $x_p \\in \\{0,1\\}$ for each $p \\in P$. \n", "\n", "If $x_p = 1$, then we run the commands in $p$ for the images in $p$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### A first, misguided stab at our image construction model\n", "\n", "Find the minimum cost schedule based on our set $P$ and the requirements of our images. \n", "\n", "$$\n", "\\begin{array}{r l}\n", " \\min & t_a x_{1,a} + t_b x_{1,b} + t_c x_{1,c} + t_a x_{2,a} + t_b x_{2,b} + t_c x_{2,c} + t_c x_{2,d} \\\\ \n", " \\text{s.t.} & x_{1,a} = 1 \\\\\n", " & x_{1,b} = 1 \\\\\n", " & x_{1,c} = 1 \\\\\n", " & x_{2,a} = 1 \\\\\n", " & x_{2,b} = 1 \\\\\n", " & x_{2,c} = 1 \\\\\n", " & x_{2,d} = 1 \\\\\n", " & x_p \\in \\{0,1\\}\\, \\forall\\, p \\in P\n", "\\end{array}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Well that's just ridiculous." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The solution is trivially:\n", "\n", "$$x_{1,a} = x_{1,b} = x_{1,c} = x_{2,a} = x_{2,b} = x_{2,c} = x_{2,d} = 1$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The cost of implementing this plan is actually the worst we could possibly do:\n", "\n", "$$2 \\left(t_a + t_b + t_c\\right) + t_d$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In fact it looks pretty much like we did this:\n", "\n", "![](images/fig02.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### We need a variable that allows us to use the Docker cache\n", "\n", "In this problem, commands $\\{A, B, C\\}$ are shared between images $\\{1,2\\}$.\n", "\n", "So let's make a variable for that and add it to $P$.\n", "\n", "$$x_{12,abc} \\in \\{0,1\\}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Of course this changes the constraints.\n", "\n", "$$x_{1,a} + x_{12,abc} = 1$$\n", "$$x_{1,b} + x_{12,abc} = 1$$\n", "$$x_{1,c} + x_{12,abc} = 1$$\n", "$$x_{2,a} + x_{12,abc} = 1$$\n", "$$x_{b,b} + x_{12,abc} = 1$$\n", "$$x_{c,c} + x_{12,abc} = 1$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### This time, with feeling\n", "\n", "This one actually allows us to share the time required to run commands between images.\n", "\n", "$$\n", "\\begin{array}{r l}\n", " \\min & t_a x_{1,a} + t_b x_{1,b} + t_c x_{1,c} + t_a x_{2,a} + t_b x_{2,b} + t_c x_{2,c} + t_c x_{2,d} + \\left(t_a + t_b + t_c\\right) x_{12,abc}\\\\ \n", " \\text{s.t.} & x_{1,a} + x_{12,abc} = 1 \\\\\n", " & x_{1,b} + x_{12,abc} = 1 \\\\\n", " & x_{1,c} + x_{12,abc} = 1 \\\\\n", " & x_{2,a} + x_{12,abc} = 1 \\\\\n", " & x_{2,b} + x_{12,abc} = 1 \\\\\n", " & x_{2,c} + x_{12,abc} = 1 \\\\\n", " & x_{2,d} = 1 \\\\\n", " & x_p \\in \\{0,1\\}\\, \\forall\\, p \\in P\n", "\\end{array}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This is much better. It gives us the correct solution.\n", "\n", "$$x_{1,a} = x_{1,b} = x_{1,c} = x_{2,a} = x_{2,b} = x_{2,c} = 0$$\n", "$$x_{12,abc} = x_{2,d} = 1$$" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "The minimum time is:\n", "\n", "$$t_a + t_b + t_c + t_d$$" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "slideshow": { "slide_type": "fragment" } }, "source": [ "Visually this could be represented as:\n", "\n", "![](images/fig01.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### But things are not always so simple.\n", "\n", "We created the variable $x_{12,abc}$ because it was obvious. $I_1 \\bigcap I_2 = \\{A, B, C\\}$, so it was the best possible option." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "But consider this set of 4 images:\n", "\n", "$$I_3 = \\{A, B, C\\}$$\n", "$$I_4 = \\{B, C, D\\}$$\n", "$$I_5 = \\{A, C, E\\}$$\n", "$$I_6 = \\{B, D, E\\}$$\n", "\n", "In this problem instance, the values $t_a, \\dots, t_e$ are what drive our final schedule." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "If $t_c$ is big enough, the optimal schedule might be:\n", " \n", "```\n", " I3: C B A\n", " I4: C B D \n", " I5: C A E\n", " I6: B D E\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And if $t_a > t_b$ it might be:\n", " \n", "```\n", " I3: C A B\n", " I4: C B D \n", " I5: C A E\n", " I6: B D E\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "So if we set $x_{345,c} = 1$, we have the option to use two other variables to create our schedule:\n", "\n", "$$x_{345,c,34,b} \\in \\{0,1\\}$$\n", "$$x_{345,c,35,a} \\in \\{0,1\\}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "But this can only happen if $x_{345,c} = 1$. That is, $x_{345,c,34,b}$ and $x_{345,c,35,a}$ are **dependent** on it.\n", "\n", "$$x_{345,c,34,b} \\le x_{345,c}$$\n", "$$x_{345,c,35,a} \\le x_{345,c}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We will represent these dependencies using another set, $R$.\n", "\n", "For all $\\left(x_m, x_n\\right) \\in R$, any feasible solution requires that $x_m \\le x_n$.\n", "\n", "Note that dependencies have structure. We may have:\n", "\n", "$$x_m \\le x_n$$\n", "$$x_n \\le x_o$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### We also have to worry about intersections...\n", "\n", "Recall our set of 4 images:\n", "\n", "$$I_3 = \\{A, B, C\\}$$\n", "$$I_4 = \\{B, C, D\\}$$\n", "$$I_5 = \\{A, C, E\\}$$\n", "$$I_6 = \\{B, D, E\\}$$\n", "\n", "In addition to the variables $x_{3,a}$, $x_{3,b}$, $\\dots$, $x_{6,e}$, we create $x_{345,c}$, $x_{34,bc}$, $\\dots$, $x_{56,e}$ and add the appropriate $\\left(\\text{images}, \\text{commands}\\right)$ pairs to $P$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "For each image $i \\in I$, and command $c \\in C_i$, exactly one $x_p$ that provides $c$ for $i$ must be set to $1$. This restricts **intersections** in our schedule.\n", "\n", "$$\\sum_{p = \\left(p_i,p_c\\right) \\in P,\\, i \\in p_i,\\, c \\in p_c} x_p = 1\\, \\forall\\, i \\in I,\\, c \\in C_i$$ " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Now we have everything we need to build our model!\n", "\n", "$$\n", "\\begin{array}{r l l l}\n", " \\min & \\sum_{p \\in I} \\sum_{c \\in C_i} t_c x_p & \\\\\n", " \\text{s.t.} & \\sum_{p = \\left(p_i,p_c\\right) \\in P,\\, i \\in p_i,\\, c \\in p_c} x_p = 1 & \\forall & i \\in I,\\, c \\in C_i \\\\\n", " & x_m \\le x_n & \\forall & \\left(m, n\\right) \\in R \\\\\n", " & x_p \\in \\{0,1\\} & \\\\\n", "\\end{array}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "All we have to do is generate $P$ and $R$!\n", "\n", "Right?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Finding maximal cliques with NetworkX\n", "\n", "At some point, somebody is going to write some code." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "So how do we generate $P$ and $R$?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Naive option:\n", "\n", "* Make a graph with all images and commands\n", "* Connect each image to its commands\n", "* Connect images to each other\n", "* Connect commands to each other\n", "* Find maximal cliques!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "**Q:** What's a maximal clique?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "**A:** A subgraph where each pair of nodes is adjacent. If it can't be contained in another clique, it's maximal.\n", "\n", "![](images/clique.png)\n", "_A clique. Source: Wikipedia_" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Creating a graph and connect every image to its commands.\n", "\n", "![](images/graph_1.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Now connect every pair of image, and every pair of commands.\n", "\n", "![](images/graph_2.png)" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Define the commands we have available.\n", "commands = ['A','B','C','D','E']\n", "\n", "# And the images we need to construct from them.\n", "images = [3,4,5,6]\n", "image_commands = {\n", " 3: ['A','B','C'],\n", " 4: ['B','C','D'],\n", " 5: ['A','C','E'],\n", " 6: ['B','D','E'],\n", "}\n", "\n", "# Make up some times we expect the commands to take.\n", "command_times = {\n", " 'A': 5,\n", " 'B': 10,\n", " 'C': 7,\n", " 'D': 12,\n", " 'E': 8,\n", "}" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "['cD', 'i3', 'cA', 'i5', 'i6', 'cE', 'i4', 'cC', 'cB']" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Construct the graph shown above.\n", "# Prefix images with 'i' and commands with 'c'.\n", "\n", "from itertools import combinations\n", "import networkx as nx\n", "\n", "def build_graph(image_dict):\n", " g = nx.Graph()\n", " \n", " all_cmds = set()\n", " for i, cmds in image_dict.items():\n", " for c in cmds:\n", " g.add_edge('i%d' % i, 'c%s' % c)\n", " all_cmds.add(c)\n", " \n", " for i1, i2 in combinations(image_dict, 2):\n", " g.add_edge('i%d' % i1, 'i%d' % i2)\n", "\n", " for c1, c2 in combinations(all_cmds, 2):\n", " g.add_edge('c%s' % c1, 'c%s' % c2)\n", "\n", " return g\n", " \n", "build_graph(image_commands).nodes()" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "[({4, 6}, {'B', 'D'}),\n", " ({3, 4, 6}, {'B'}),\n", " ({5, 6}, {'E'}),\n", " ({3, 4, 5}, {'C'}),\n", " ({3, 4}, {'B', 'C'}),\n", " ({3, 5}, {'A', 'C'})]" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Generate all maximal cliques with > 1 image and >= 1 command.\n", "\n", "from networkx.algorithms import find_cliques\n", "\n", "def command_cliques(image_dict):\n", " max_cliques = []\n", " \n", " g = build_graph(image_dict)\n", " for c in find_cliques(g):\n", " imgs = set([int(x.replace('i','')) for x in c if x.startswith('i')])\n", " cmds = set([x.replace('c','') for x in c if x.startswith('c')])\n", " if len(imgs) > 1 and cmds:\n", " max_cliques.append((imgs, cmds))\n", " \n", " return max_cliques\n", " \n", "# Top-level cliques (these depend on nothing)\n", "top_cliques = command_cliques(image_commands)\n", "top_cliques" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Add each of these to $P$.\n", "\n", "Now consider each maximal clique. Let's look at $\\left(\\{3,4,5\\},\\{C\\}\\right)$.\n", "\n", "$x_{345,c} = 1$ if images $\\{3,4,5\\}$ share command $C$. \n", "\n", "Can they share anything after that?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Repeat the process with remaining commands for images $\\{3,4,5\\}$. Leave out $C$.\n", "\n", "![](images/graph_3.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "![](images/graph_4.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Max cliques:\n", "\n", "$$\\{3,4,B\\}$$\n", "$$\\{3,5,A\\}$$\n", "\n", "Add variables to $P$:\n", "\n", "$$x_{345,c,34,b} \\in \\{0,1\\}$$\n", "$$x_{345,c,35,a} \\in \\{0,1\\}$$\n", "\n", "Both depend on \"parent\" clique:\n", "\n", "$$x_{345,c,34,b} \\le x_{345,c}$$\n", "$$x_{345,c,35,a} \\le x_{345,c}$$\n", "\n", "So add dependencies to $R$:\n", "\n", "$$\\left(x_{345,c,34,b}, x_{345,c}\\right)$$\n", "$$\\left(x_{345,c,35,a}, x_{345,c}\\right)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Maximal clique detection is NP-complete.\n", "\n", "But it works _better than you might think!_ Why?\n", "\n", "  \n", "  " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$${\\Huge\\text{THEORY} \\ne \\text{PRACTICE}}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "  \n", "  \n", " \n", "There are more efficient ways, but that's for another day." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Partitioning sets with PuLP\n", "\n", "\"NP-Complete\" is a term used to scare graduate students." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### We have a \"set partitioning problem\" with side constraints.\n", "\n", "Let's ignore dependencies. Say $R = \\emptyset$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "A optimal \"schedule\" is a set of $x_p$ variables we set to $1$ that:\n", "\n", "* Assigns each image all its commands **once**\n", "* Minimizes required compute time" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Mathematically, set partitioning looks like:\n", "\n", "$$\n", "\\begin{array}{r l}\n", " \\min|\\max & c^\\intercal x \\\\ \n", " \\text{s.t.} & Ax = 1 \\\\\n", " & x \\in \\{0,1\\}^n \\\\\n", "\\end{array}\n", "$$\n", "\n", "Where $c^\\intercal \\in \\mathcal{R}^n$ and $A \\in \\{0,1\\}^{m \\times n}$\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Technically, set partitioning is NP-complete.\n", "\n", "But again...\n", "\n", "  \n", "  " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$${\\huge\\text{THEORY} \\ne \\text{PRACTICE}}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "  \n", "  \n", "\n", "Real world data is usually **NOT** pathological.\n", "\n", "We have _really_ good technology for solving set partitions.\n", "\n", "* Heuristics\n", "* Optimization (combinatorial, dynamic)\n", "\n", "So even with the dependency constraints, we're probably OK." ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Let's build a model!\n", "\n", "import pulp\n", "\n", "m = pulp.LpProblem('Docker schedule', pulp.LpMinimize)\n", "x = {} # binary variables\n", "obj = 0.0 # objective = minimize required time\n", "\n", "# These variables are on when images do not share the results of commands.\n", "for i, cmds in image_commands.items():\n", " for c in cmds:\n", " n = '%d-%s' % (i,c)\n", " x[n] = pulp.LpVariable(name=n, cat=pulp.LpBinary)" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Create variables for cliques. (Normally we'd do this automatically.)\n", "x['46-BD'] = pulp.LpVariable(name='46-BD', cat=pulp.LpBinary)\n", "x['346-B'] = pulp.LpVariable(name='346-B', cat=pulp.LpBinary)\n", "x['56-E' ] = pulp.LpVariable(name='56-E', cat=pulp.LpBinary)\n", "x['345-C'] = pulp.LpVariable(name='345-C', cat=pulp.LpBinary)\n", "x['34-BC'] = pulp.LpVariable(name='34-BC', cat=pulp.LpBinary)\n", "x['35-AC'] = pulp.LpVariable(name='35-AC', cat=pulp.LpBinary)\n", "\n", "# These are child cliques we generate using subgraphs.\n", "x['346-B-34-C'] = pulp.LpVariable(name='346-B-34-C', cat=pulp.LpBinary)\n", "x['346-B-46-D'] = pulp.LpVariable(name='346-B-46-D', cat=pulp.LpBinary)\n", "x['345-C-34-B'] = pulp.LpVariable(name='345-C-34-B', cat=pulp.LpBinary)\n", "x['345-C-35-A'] = pulp.LpVariable(name='345-C-35-A', cat=pulp.LpBinary)\n", "\n", "# Dependencies from parent to child cliques.\n", "m += x['346-B-34-C'] <= x['346-B']\n", "m += x['346-B-46-D'] <= x['346-B']\n", "m += x['345-C-34-B'] <= x['345-C']\n", "m += x['345-C-35-A'] <= x['345-C']\n", "\n", "# Restrict intersecting cliques at the same level.\n", "m += x['346-B'] + x['345-C'] + x['34-BC'] + x['35-AC'] <= 1 # image 3\n", "m += x['46-BD'] + x['346-B'] + x['345-C'] + x['34-BC'] <= 1 # image 4\n", "m += x['56-E' ] + x['345-C'] + x['35-AC'] <= 1 # image 5\n", "m += x['46-BD'] + x['346-B'] + x['56-E' ] <= 1 # image 6\n", "m += x['346-B-34-C'] + x['346-B-46-D'] <= 1 # image 4\n", "m += x['345-C-34-B'] + x['345-C-35-A'] <= 1 # image 3" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Time required: 67.0\n", "Schedule:\n", "\t 35-AC\n", "\t 46-BD\n", "\t 6-E\n", "\t 3-B\n", "\t 5-E\n", "\t 4-C\n" ] } ], "source": [ "# Each image runs each command exactly once.\n", "for i, cmds in image_commands.items():\n", " for c in cmds:\n", " m += sum(x[n] for n in x if str(i) in n and c in n) == 1\n", "\n", "# Objective: minimize the required compute time.\n", "obj = 0.0\n", "for n, x_p in x.items():\n", " t = sum(command_times.get(c,0) for c in n) # a little hacky...\n", " obj += t * x_p\n", "\n", "# Use a variable so it's easy to pull out the objective value.\n", "z = pulp.LpVariable('objective', lowBound=0)\n", "m += z == obj\n", "m.setObjective(z)\n", "\n", "assert m.solve() == pulp.LpStatusOptimal\n", "print('Time required:', pulp.value(z))\n", "print('Schedule:')\n", "for n, x_p in x.items():\n", " if pulp.value(x_p) > 0.5:\n", " print('\\t', n)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "To translate that output into a real schedule, first add any cliques to the schedule:\n", "\n", "```\n", " I3: A C\n", " I4: B D\n", " I5: A C\n", " I6: B D\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The remaining commands go at the end:\n", "\n", "```\n", " I3: A C B\n", " I4: B D C\n", " I5: A C E\n", " I6: B D E\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Computational example\n", "\n", "No, really. That's how we do this thing." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### An example with 15 images and 25 commands.\n", "\n", "A randomly generated instance. These are the images and their commands.\n", "\n", "```json\n", "{\n", " \"images\": {\n", " \"01\": [\"03\", \"05\", \"07\", \"08\", \"09\", \"11\", \"14\", \"15\", \"18\", \"19\", \"24\"],\n", " \"02\": [\"02\", \"05\", \"07\", \"10\", \"15\", \"20\", \"22\", \"23\", \"25\"],\n", " \"03\": [\"08\", \"12\", \"14\", \"15\", \"16\", \"17\"],\n", " \"04\": [\"03\", \"05\", \"13\", \"14\", \"17\"],\n", " \"05\": [\"03\", \"05\", \"09\", \"15\", \"17\", \"18\", \"20\", \"21\", \"22\", \"23\", \"24\", \"25\"],\n", " \"06\": [\"04\", \"08\", \"12\", \"13\", \"16\", \"17\"],\n", " \"07\": [\"01\", \"03\", \"04\", \"05\", \"07\", \"08\", \"12\", \"15\", \"19\", \"24\", \"25\"],\n", " \"08\": [\"11\", \"13\", \"15\", \"16\"],\n", " \"09\": [\"01\", \"03\", \"11\", \"13\", \"17\", \"22\", \"24\"],\n", " \"10\": [\"08\", \"12\", \"13\", \"14\", \"15\", \"19\", \"24\", \"25\"],\n", " \"11\": [\"04\"],\n", " \"12\": [\"07\", \"12\", \"16\", \"19\", \"22\", \"25\"],\n", " \"13\": [\"01\", \"06\", \"07\", \"16\", \"18\", \"19\", \"21\", \"25\"],\n", " \"14\": [\"05\", \"10\", \"11\", \"13\", \"14\", \"22\"],\n", " \"15\": [\"01\", \"02\", \"03\", \"04\", \"11\", \"15\"]\n", " },\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "And the time required for each command.\n", "\n", "```json\n", " \"commands\": {\n", " \"01\": 58,\n", " \"02\": 98,\n", " \"03\": 68,\n", " \"04\": 152,\n", " \"05\": 60,\n", " \"06\": 79,\n", " \"07\": 463,\n", " \"08\": 95,\n", " \"09\": 532,\n", " \"10\": 51,\n", " \"11\": 128,\n", " \"12\": 88,\n", " \"13\": 65,\n", " \"14\": 155,\n", " \"15\": 97,\n", " \"16\": 148,\n", " \"17\": 340,\n", " \"18\": 115,\n", " \"19\": 109,\n", " \"20\": 14,\n", " \"21\": 94,\n", " \"22\": 61,\n", " \"23\": 149,\n", " \"24\": 43,\n", " \"25\": 31\n", " }\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "And the optimal image construction schedule!\n", "\n", "```json\n", "{\n", " \"unique_images\": 73,\n", " \"compute_time\": 9414,\n", " \"schedule\": {\n", " \"01\": [\"15\", \"08\", \"24\", \"19\", \"05\", \"07\", \"03\", \"11\", \"14\", \"09\", \"18\"],\n", " \"02\": [\"15\", \"05\", \"20\", \"23\", \"22\", \"25\", \"07\", \"10\", \"02\"],\n", " \"03\": [\"15\", \"08\", \"14\", \"12\", \"16\", \"17\"],\n", " \"04\": [\"13\", \"17\", \"03\", \"05\", \"14\"],\n", " \"05\": [\"15\", \"05\", \"20\", \"23\", \"22\", \"25\", \"18\", \"24\", \"17\", \"09\", \"03\", \"21\"],\n", " \"06\": [\"13\", \"17\", \"16\", \"08\", \"04\", \"12\"],\n", " \"07\": [\"15\", \"08\", \"24\", \"19\", \"05\", \"07\", \"03\", \"12\", \"04\", \"25\", \"01\"],\n", " \"08\": [\"15\", \"11\", \"16\", \"13\"],\n", " \"09\": [\"13\", \"17\", \"03\", \"01\", \"22\", \"11\", \"24\"],\n", " \"10\": [\"15\", \"08\", \"24\", \"19\", \"12\", \"14\", \"25\", \"13\"],\n", " \"11\": [\"04\"],\n", " \"12\": [\"19\", \"07\", \"25\", \"16\", \"12\", \"22\"],\n", " \"13\": [\"19\", \"07\", \"25\", \"16\", \"18\", \"21\", \"01\", \"06\"],\n", " \"14\": [\"13\", \"14\", \"05\", \"10\", \"11\", \"22\"],\n", " \"15\": [\"15\", \"11\", \"01\", \"04\", \"02\", \"03\"]\n", " }\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### How does this compare to a heuristic solution?\n", "\n", "We'll call this the `most-common` heuristic. It makes the greediest choice possible at each step.\n", "\n", "* `most-common(images):`\n", " + Choose the most common unscheduled command for `images`\n", " + Add it to the schedule next for `images`\n", " + `images2` = subset of `images` for which command was scheduled\n", " + `most-common(images2)`\n", " + `most-common(images \\ images2)`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The heuristic is > 10% worse for this particular instance.\n", "\n", "```json\n", "{\n", " \"unique_images\": 80,\n", " \"compute_time\": 10531,\n", " \"schedule\": {\n", " \"01\": [\"15\", \"03\", \"24\", \"05\", \"11\", \"07\", \"19\", \"18\", \"08\", \"09\", \"14\"],\n", " \"02\": [\"15\", \"02\", \"10\", \"25\", \"05\", \"20\", \"07\", \"22\", \"23\"],\n", " \"03\": [\"15\", \"08\", \"12\", \"14\", \"17\", \"16\"],\n", " \"04\": [\"13\", \"17\", \"03\", \"14\", \"05\"],\n", " \"05\": [\"15\", \"03\", \"24\", \"05\", \"25\", \"20\", \"21\", \"18\", \"09\", \"17\", \"22\", \"23\"],\n", " \"06\": [\"13\", \"17\", \"08\", \"12\", \"04\", \"16\"],\n", " \"07\": [\"15\", \"03\", \"24\", \"05\", \"25\", \"12\", \"19\", \"08\", \"01\", \"07\", \"04\"],\n", " \"08\": [\"15\", \"13\", \"11\", \"16\"],\n", " \"09\": [\"13\", \"17\", \"03\", \"11\", \"24\", \"01\", \"22\"],\n", " \"10\": [\"15\", \"13\", \"24\", \"19\", \"25\", \"08\", \"12\", \"14\"],\n", " \"11\": [\"04\"],\n", " \"12\": [\"25\", \"07\", \"16\", \"19\", \"12\", \"22\"],\n", " \"13\": [\"25\", \"07\", \"16\", \"19\", \"18\", \"01\", \"06\", \"21\"],\n", " \"14\": [\"13\", \"11\", \"10\", \"14\", \"22\", \"05\"],\n", " \"15\": [\"15\", \"03\", \"02\", \"11\", \"01\", \"04\"]\n", " }\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "$${\\Huge\\text{?}}$$" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.3" } }, "nbformat": 4, "nbformat_minor": 0 }