{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 14.3. Resolving dependencies in a Directed Acyclic Graph with a topological sort" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You need the `python-apt` package in order to build the package dependency graph. (https://pypi.python.org/pypi/python-apt/)\n", "\n", "We also assume that this notebook is executed on a Debian system (like Ubuntu). If you don't have such a system, you can download the data *Debian* directly on the book's website. Extract it in the current directory, and start this notebook directly at step 7. (http://ipython-books.github.io)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. We import the `apt` module and we build the list of packages." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import json\n", "import apt\n", "cache = apt.Cache()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. The `graph` dictionary will contain the adjacency list of a small portion of the dependency graph." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "graph = {}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. We define a function that returns the list of dependencies of a package." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def get_dependencies(package):\n", " if package not in cache:\n", " return []\n", " pack = cache[package]\n", " ver = pack.candidate or pack.versions[0]\n", " # We flatten the list of dependencies,\n", " # and we remove the duplicates.\n", " return sorted(set([item.name \n", " for sublist in ver.dependencies \n", " for item in sublist]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. We now define a *recursive* function that builds the dependency graph for a particular package. This function updates the `graph` variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def get_dep_recursive(package):\n", " if package not in cache:\n", " return []\n", " if package not in graph:\n", " dep = get_dependencies(package)\n", " graph[package] = dep\n", " for dep in graph[package]:\n", " if dep not in graph:\n", " graph[dep] = get_dep_recursive(dep)\n", " return graph[package]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Let's build the dependency graph for IPython." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "get_dep_recursive('ipython');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. Finally, let's save the adjacency list in JSON." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "with open('data/apt.json', 'w') as f:\n", " json.dump(graph, f, indent=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. We import a few packages." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import json\n", "import numpy as np\n", "import networkx as nx\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "8. Let's load the adjacency list from the JSON file." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "with open('data/apt.json', 'r') as f:\n", " graph = json.load(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "9. Now, we create a directed graph (`DiGraph` in NetworkX) from our adjacency list. We reverse the graph to get a more natural ordering." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g = nx.DiGraph(graph).reverse()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "10. A topological sort only exists when the graph is a **directed acyclic graph** (DAG). This means that there is no cycle in the graph, i.e. no circular dependency here. Is our graph a DAG?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "nx.is_directed_acyclic_graph(g)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "11. What are the packages responsible for the cycles? We can find it out with the `simple_cycles` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "set([cycle[0] for cycle in nx.simple_cycles(g)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "12. Here, we can try to remove these packages. In an actual package manager, these cycles need to be carefully taken into account." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "g.remove_nodes_from(_)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "nx.is_directed_acyclic_graph(g)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "13. The graph is now a DAG. Let's display it first." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ug = g.to_undirected()\n", "deg = ug.degree()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "plt.figure(figsize=(4,4));\n", "# The size of the nodes depends on the number of dependencies.\n", "nx.draw(ug, font_size=6, \n", " node_size=[20*deg[k] for k in ug.nodes()]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "14. Finally, we can perform the topological sort, thereby obtaining a linear installation order satisfying all dependencies." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "strip_output": [ 3, 3 ] }, "outputs": [], "source": [ "nx.topological_sort(g)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n", "\n", "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.4.2" } }, "nbformat": 4, "nbformat_minor": 0 }