{
 "metadata": {
  "name": "",
  "signature": "sha256:2c00f0ddb7215d7c4cea19200a00bf8d1c743b2cbb0d2fadb582c83fab517cbf"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Intro\n",
      "\n",
      "Below is:\n",
      "\n",
      "- A *summary* of key points that cropped up again and again.\n",
      "- A set of *proposed readings by topic*\n",
      "\n",
      "Note that these notes haven't been edited or extended with audio recordings I've made, and some need updating with significantly more material. See the \"Proposed Readings by Topic* section or ask me for more details.\n",
      "\n",
      "## Summary\n",
      "\n",
      "- **Both the financial and academic worlds are increasingly adopting Python** for the same reasons.\n",
      "    - They're often encumbered with extremely large, heterogenous, legacy systems.\n",
      "        - Old work isn't discarded. Incremental additions, re-use over interfaces.\n",
      "        - Think COBOL/Excel/VBA for finance, FORTRAN/C in academia.\n",
      "    - They're both seeking **one** paradigm as an end-to-end high-level solution for all users.\n",
      "    - Neither can sacrifice performance, yet are finding the time-to-market and development lifecycles too long with legacy homogenous systems.\n",
      "    - Python has long had a reputation for being unable to deliver performance, but this is now commonly acknowledged to no longer be true.\n",
      "        - Python easily serves as a glue to high performance wrappers such as NumPy and SciPy, file formats such as HDF5, and heterogenous computation backends such as shared-memory parallelism (SMP) (OpenMP via Cython), GPUs (CUDA) and FPGAs (OpenCL).\n",
      "        - Achieving C/C++ performance can be done with significantly simpler code and designs, yet requires sophisticated knowledge of memory cache hierarchies, disk I/O patterns, and SMP issues.\n",
      "    - The financial industry has long relied on Python as an interface to core, high-performance components, but are paranoid and extremely closed and the only way knowledge gets shared is by stealing employees from other financial companies.\n",
      "    - More information:\n",
      "        - [\"06 - Python in the Financial Industry (KEYNOTE)\"](06%20-%20Python%20in%20the%20Financial%20Industry%20%28KEYNOTE%29.ipynb)\n",
      "        - [\"15 - Building a Cutting-Edge Data Processing Environment on a Budget\"](15%20-%20Building%20a%20Cutting-Edge%20Data%20Processing%20Environment%20on%20a%20Budget.ipynb)\n",
      "        - [\"20 - Python for High Throughput Science\"](20%20-%20Python%20for%20High%20Throughput%20Science.ipynb)\n",
      "\n",
      "- **IPython Notebook** is universally used and loved.\n",
      "    - Most technical talks used IPython Notebook either for all the content or to host demos/examples.\n",
      "    - Of those talks around half provided direct links to a pre-initialised IPython Notebook, so attendees could either download and follow in real-time, or download later.\n",
      "    - Deep and expanding integration with entire scientific Python ecosystem, e.g. `matplotlib`, `pandas`, `numpy`, `bokeh`, `sympy`, ...\n",
      "    - Good examples are:\n",
      "        - [\"04 - Visualisations Using Bokeh\"](04%20-%20Visualisations%20Using%20Bokeh.ipynb)\n",
      "        - [\"16 - presenter notes\" (for \"Generator Showcase Showdown\")](16%20-%20presenter%20notes.ipynb)\n",
      "        - [\"18 - presenter notes\" (for \"Measuring Similarity and Clustering Data\")](18%20-%20presenter%20notes.ipynb)\n",
      "\n",
      "- **No clear future for visualisations in Python**.\n",
      "    - `matplotlib` is universally used for publication-quality charts. API is difficult to use but powerful and well engineered.\n",
      "    - It's clear that web-based visualisations are the future, and very important even for publications.\n",
      "    - In order to reach the browser it's also acknowledged that JavaScript is the ideal interface, rather than static images.\n",
      "    - But how to reach browser? Many different perspectives:\n",
      "        - IPython Notebook - use `matplotlib` magic incantation to draw charts, no interactivity, no JavaScript.\n",
      "            - Other libraries build on top of `matplotlib` of course work just as well: `ggplot`, `seaborn`, `prettyplotlib`\n",
      "        - [\"04 - Visualisations in Bokeh\"](04%20-%20Visualisations%20Using%20Bokeh.ipynb): people love `ggplot` in R because of the Grammar of Graphics, and people love Python because it's a one-stop stop\n",
      "            - So use Python with a ggplot-like grammar to auto-generate HTML5-canvas backed web visualisations using JavaScript.\n",
      "            - HTML5-canvas is an investment and should reap rewards over SVG-based libraries like d3.js for very complex visualisations.\n",
      "        - [\"12 - Getting it out there - Python-JS-web-viz\"](12%20-%20Getting%20it%20out%20there%20-%20Python-JS-web-viz.ipynb): forget Python, just code front-end in JavaScript and defer back-end and data cleaning to Python.\n",
      "            - d3.js, nvd3, crossfilter, rickshaw, ...\n",
      "        - Lightning talks\n",
      "            - One presenter uses Python over a websocket bridge to Angular.js to create an RShiny-type interactive chart environment.\n",
      "            - Another presenter showed off IPython version 2 (coming end of April 2014) functionality with interactive widgets, dynamically recreating charts based on user input.\n",
      "    - There is no clear conclusion, except `matplotlib` is fantastic work and stood the test of time.\n",
      "    - Bokeh seems very exciting but rough around the edges with a large and difficult to install set of dependencies, worth exploring the tutorials in full (!!AI which I will, and post a new article).\n",
      "    \n",
      "- **Cython is almost universally used, but more agile methods are being sought**\n",
      "    - Cython is a strict superset of Python that, with annotations, allow it to reach C-like speeds.\n",
      "    - These annotations no longer allow Python compatibility. Can the Python community do better? Perhaps, with Shedskin, Pythran, or Numba. PyPy may eventually reach the Holy Grail of numpy compatibility.\n",
      "    - See:\n",
      "        - [\"10 - The High Performance Python Landscape\"](10%20-%20The%20High%20Performance%20Python%20Landscape.ipynb)\n",
      "        - [\"15 - Building a Cutting-Edge Data Processing Environment on a Budget\"](15%20-%20Building%20a%20Cutting-Edge%20Data%20Processing%20Environment%20on%20a%20Budget.ipynb)\n",
      "        - [\"03 - Faster Python Programs through Optimization\"](03%20-%20Faster%20Python%20Programs%20through%20Optimization.ipynb) (!!AI there is a significant quantity of missing information, I will type up soon).\n",
      "        - [\"11 - Shared Memory Parallelism with Python\"](11%20-%20Shared%20Memory%20Parallelism%20with%20Python.ipynb)\n",
      "\n",
      "- **Everyone uses `scikit-learn`**\n",
      "    - Well thought out, very opinionated design, strong and diverse set of core contributors.\n",
      "    - Stands out amongst Python packages as having > 10 contributors who equally make the same volume of contributions.\n",
      "    - At the very least prototype in `scikit-learn` and `nltk`.\n",
      "        - If you hit scalability issues people usually scale vertically (bigger boxes) or use Cython.\n",
      "    - See:\n",
      "        - [\"15 - Building a Cutting-Edge Data Processing Environment on a Budget\"](15%20-%20Building%20a%20Cutting-Edge%20Data%20Processing%20Environment%20on%20a%20Budget.ipynb)\n",
      "        - [\"22 - Correcting 10 years of messy CRM data\"](22%20-%20Correcting%2010%20years%20of%20messy%20CRM%20data.ipynb)\n",
      "        - [\"19 - Gradient Boosted Regression Trees in scikit-learn\"](19%20-%20Gradient%20Boosted%20Regression%20Trees%20in%20scikit-learn.ipynb)\n",
      "\n",
      "- **MapReduce/clusters have less hype and traction than you'd expect**\n",
      "    - Certainly there are some who use it for their data processing pipeline, e.g. \"07 - Hierarchical Text Clustering in Python and Hive\".\n",
      "    - Given a large data set that cannot fit onto one disk, prefer to create large RDBMS clusters. See:\n",
      "        - [\"05 - Databases for Scientists\"](05%20-%20Databases%20for%20Scientists.ipynb)\n",
      "        - [\"08 - Massively Parallel Processing with Procedural Python\"](08%20-%20Massively%20Parallel%20Processing%20with%20Procedural%20Python.ipynb)\n",
      "            - [Presenter Notes](08%20-%20presenter%20notes.ipynb)\n",
      "    - Given a large data set that cannot fit into memory prefer to use e.g. HDF5 to disk-back it, or create additional abstractions on top of NumPy/HDF5, a la \"23 - Manipulating massive disk-backed arrays\".\n",
      "    - `scikit-learn` core contributors strongly prefer shared-memory parallelism to clusters, and are actively creating OpenMP-style abstractions (with better debugging and NumPy array performance).\n",
      "        - [\"15 - Building a Cutting Edge Data Processing Environment on a Budget\"](15%20-%20Building%20a%20Cutting-Edge%20Data%20Processing%20Environment%20on%20a%20Budget.ipynb)\n",
      "    - Fantastic lightning talk on end used [FireDrake](http://firedrakeproject.org/index.html) to easily switch computing backend from SMP to GPU, but again no mention of clusters.\n",
      "\n",
      "## Proposed readings by group\n",
      "\n",
      "### Culture, industry background\n",
      "\n",
      "- [\"06 - Python in the Financial Industry (KEYNOTE)\"](06%20-%20Python%20in%20the%20Financial%20Industry%20%28KEYNOTE%29.ipynb)\n",
      "- [\"14 - Panel Discussion - Shouldn't companies be doing more data science?\"](14%20-%20Panel%20Discussion%20-%20Shouldn%27t%20companies%20be%20doing%20more%20data%20science%3F.ipynb)\n",
      "\n",
      "### Case studies\n",
      "\n",
      "- [\"07 - Hierarchical Text Clustering in Python and Hive\"](07%20-%20Hierarchical%20Text%20Clustering%20in%20Python%20and%20Hive.ipynb)\n",
      "- [\"09 - Measuring the digital economy using big data\"](09%20-%20Measuring%20the%20digital%20economy%20using%20big%20data.ipynb)\n",
      "- [\"17 - Adaptive Filtering of Tweets with Machine Learning\"](17%20-%20Adaptive%20Filtering%20of%20Tweets%20with%20Machine%20Learning.ipynb)\n",
      "- [\"20 - Python for High Throughput Science\"](20%20-%20Python%20for%20High%20Throughput%20Science.ipynb)\n",
      "- [\"22 - Correcting 10 years of messy CRM data\"](22%20-%20Correcting%2010%20years%20of%20messy%20CRM%20data.ipynb)\n",
      "\n",
      "### Technical - software engineering \n",
      "\n",
      "- [\"03 - Faster Python Programs through Optimization\"](03%20-%20Faster%20Python%20Programs%20through%20Optimization.ipynb)\n",
      "    - Needs updating with significant amount of presenter material we didn't cover.\n",
      "- [\"10 - The High Performance Python Landscape\"](10%20-%20The%20High%20Performance%20Python%20Landscape.ipynb)\n",
      "- [\"11 - Shared Memory Parallelism with Python\"](11%20-%20Shared%20Memory%20Parallelism%20with%20Python.ipynb)\n",
      "- [\"15 - Building a Cutting-Edge Data Processing Environment on a Budget\"](15%20-%20Building%20a%20Cutting-Edge%20Data%20Processing%20Environment%20on%20a%20Budget.ipynb)\n",
      "\n",
      "### Technical - mathematical\n",
      "\n",
      "- [\"02 - Introduction to Action Recognition\"](02%20-%20Introduction%20to%20Action%20Recognition.ipynb)\n",
      "- [\"13 - Python for Optimization\"](13%20-%20Python%20for%20Optimization.ipynb)\n",
      "- [\"18 - Measuring Similarity and Clustering Data\"](18%20-%20Measuring%20Similarity%20and%20Clustering%20Data.ipynb)\n",
      "    - And presenter notes: [\"18 - presenter notes\"](18%20-%20presenter%20notes.ipynb)\n",
      "- [\"19 - Gradient Boosted Regression Trees in scikit-learn\"](19%20-%20Gradient%20Boosted%20Regression%20Trees%20in%20scikit-learn.ipynb)\n",
      "\n",
      "### Technical - other\n",
      "\n",
      "- [\"01 - Interactive Financial Analytics with Python and IPython\"](01%20-%20Interactive%20Financial%20Analytics%20with%20Python%20and%20IPython.ipynb)\n",
      "    - Presenter's tutorial where I followed along with exercises are here: [\"01 - YH_PyData_Eurex_Tutorial\"](01%20-%20YH_PyData_Eurex_Tutorial.ipynb)\n",
      "- [\"05 - Databases for Scientists\"](05%20-%20Databases%20for%20Scientists.ipynb)\n",
      "    - Needs updating with presenter's material.\n",
      "- [\"08 - Massively Parallel Processing with Procedural Python\"](08%20-%20Massively%20Parallel%20Processing%20with%20Procedural%20Python.ipynb)\n",
      "    - And presenter notes: [\"08 - presenter notes\"](08%20-%20presenter%20notes.ipynb)\n",
      "- [\"16 - Generator Showcase Showdown\"](16%20-%20Generator%20Showcase%20Showdown.ipynb)\n",
      "    - And presenter notes: [\"16 - presenter notes\"](16%20-%20presenter%20notes.ipynb)\n",
      "- [\"23 - Manipulating massive disk-backed arrays\"](23%20-%20Manipulating%20massive%20disk-backed%20arrays.ipynb)\n",
      "\n",
      "### Visualisations\n",
      "\n",
      "- [\"04 - Visualisations Using Bokeh\"](04%20-%20Visualisations%20Using%20Bokeh.ipynb)\n",
      "    - I need to significantly update with the rest of their tutorial examples.\n",
      "- [\"12 - Getting it out there - Python-JS-web-viz\"](12%20-%20Getting%20it%20out%20there%20-%20Python-JS-web-viz.ipynb)\n",
      "- [\"21 - Winning Ways for Your Visualization Plays\"](21%20-%20Winning%20Ways%20for%20Your%20Visualization%20Plays.ipynb)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.core.display import HTML\n",
      "def css_styling():\n",
      "    styles = open(\"styles/custom.css\", \"r\").read()\n",
      "    return HTML(styles)\n",
      "css_styling()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<style>\n",
        "    @font-face {\n",
        "      font-family: 'Noticia Text';\n",
        "      font-style: normal;\n",
        "      font-weight: 400;\n",
        "      src: local('Noticia Text'), local('NoticiaText-Regular'), url(http://themes.googleusercontent.com/static/fonts/noticiatext/v4/wdyV6x3eKpdeUPQ7BJ5uUBa1RVmPjeKy21_GQJaLlJI.woff) format('woff');\n",
        "    }\n",
        "    @font-face {\n",
        "      font-family: 'Noticia Text';\n",
        "      font-style: normal;\n",
        "      font-weight: 700;\n",
        "      src: local('Noticia Text Bold'), local('NoticiaText-Bold'), url(http://themes.googleusercontent.com/static/fonts/noticiatext/v4/pEko-RqEtp45bE2P80AAKRGYIVA-f1n-gQW6IKoy_-M.woff) format('woff');\n",
        "    }\n",
        "    @font-face {\n",
        "      font-family: 'Noticia Text';\n",
        "      font-style: italic;\n",
        "      font-weight: 400;\n",
        "      src: local('Noticia Text Italic'), local('NoticiaText-Italic'), url(http://themes.googleusercontent.com/static/fonts/noticiatext/v4/dAuxVpkYE_Q_IwIm6elsKISNse2zJAGBOX1vGC0HDBk.woff) format('woff');\n",
        "    }\n",
        "    @font-face {\n",
        "      font-family: 'Noticia Text';\n",
        "      font-style: italic;\n",
        "      font-weight: 700;\n",
        "      src: local('Noticia Text Bold Italic'), local('NoticiaText-BoldItalic'), url(http://themes.googleusercontent.com/static/fonts/noticiatext/v4/-rQ7V8ARjf28_b7kRa0Jusr_TVbVq9Kr5jxArfkA4r0.woff) format('woff');\n",
        "    }\n",
        "    @font-face {\n",
        "      font-family: 'Source Sans Pro';\n",
        "      font-style: normal;\n",
        "      font-weight: 400;\n",
        "      src: local('Source Sans Pro'), local('SourceSansPro-Regular'), url(http://themes.googleusercontent.com/static/fonts/sourcesanspro/v7/ODelI1aHBYDBqgeIAH2zlBBHWFfxJXS04xYOz0jw624.woff) format('woff');\n",
        "    }\n",
        "    @font-face {\n",
        "      font-family: 'Source Sans Pro';\n",
        "      font-style: normal;\n",
        "      font-weight: 700;\n",
        "      src: local('Source Sans Pro Bold'), local('SourceSansPro-Bold'), url(http://themes.googleusercontent.com/static/fonts/sourcesanspro/v7/toadOcfmlt9b38dHJxOBGAE-U1AYRUXXE0Dth8uKIE0.woff) format('woff');\n",
        "    }\n",
        "    @font-face {\n",
        "      font-family: 'Source Sans Pro';\n",
        "      font-style: italic;\n",
        "      font-weight: 400;\n",
        "      src: local('Source Sans Pro Italic'), local('SourceSansPro-It'), url(http://themes.googleusercontent.com/static/fonts/sourcesanspro/v7/M2Jd71oPJhLKp0zdtTvoM1xDqsnd7zNt-b9r25av6rY.woff) format('woff');\n",
        "    }\n",
        "    @font-face {\n",
        "      font-family: 'Source Sans Pro';\n",
        "      font-style: italic;\n",
        "      font-weight: 700;\n",
        "      src: local('Source Sans Pro Bold Italic'), local('SourceSansPro-BoldIt'), url(http://themes.googleusercontent.com/static/fonts/sourcesanspro/v7/fpTVHK8qsXbIeTHTrnQH6L7TcrrtjxQtUk4wnkGIFYE.woff) format('woff');\n",
        "    }\n",
        "\n",
        "    div.cell{\n",
        "        width: 80%;\n",
        "/*        margin-left:auto;*/\n",
        "/*        margin-right:auto;*/\n",
        "    }\n",
        "\n",
        "    .rendered_html p, .rendered_html li {\n",
        "        font-family: 'Noticia Text', Verdana, sans-serif;\n",
        "        line-height: 1.6em;\n",
        "        /*max-width: 840px;*/\n",
        "        text-align: left;\n",
        "    }\n",
        "\n",
        "    .rendered_html h1, .rendered_html h2, .rendered_html h3, .rendered_html h4, .rendered_html h5, .rendered_html h6 {\n",
        "        margin-bottom: 0.5em;\n",
        "    }\n",
        "\n",
        "    h1, h2, h3, h4, h5, h6  {\n",
        "        font-family: 'Source Sans Pro', Verdana, sans-serif;\n",
        "        font-style: normal;\n",
        "        font-weight: 400;\n",
        "        border-bottom: 1px dotted black;\n",
        "    }\n",
        "\n",
        "    div.prompt, pre {\n",
        "        font-family: 'Inconsolata', monospace;\n",
        "        padding: 0.4em;\n",
        "    }\n",
        "\n",
        "    .warning{\n",
        "        color: rgb( 240, 20, 20 )\n",
        "        }\n",
        "</style>\n",
        "<script>\n",
        "    MathJax.Hub.Config({\n",
        "                        TeX: {\n",
        "                           extensions: [\"AMSmath.js\"]\n",
        "                           },\n",
        "                tex2jax: {\n",
        "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
        "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
        "                },\n",
        "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
        "                \"HTML-CSS\": {\n",
        "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
        "                }\n",
        "        });\n",
        "</script>\n"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 23,
       "text": [
        "<IPython.core.display.HTML at 0x2fe9590>"
       ]
      }
     ],
     "prompt_number": 23
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%autosave 10"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "javascript": [
        "IPython.notebook.set_autosave_interval(10000)"
       ],
       "metadata": {},
       "output_type": "display_data"
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Autosaving every 10 seconds\n"
       ]
      }
     ],
     "prompt_number": 19
    }
   ],
   "metadata": {}
  }
 ]
}