{
 "metadata": {
  "name": "",
  "signature": "sha256:91941a47b66e331df7feb5cec22b794c17f0a7ffd6c0e50d86f8d15efc55c5bc"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Anthony William Shannon\t\n",
      "----------------------\n",
      "\n",
      "Disruptor---Linking People & Technology.\n",
      "\n",
      "Charles---yes we can. \n",
      "\n",
      "Send over all your data over. We'll make killer plots and let you visualize all your data very efficiently\n",
      "\n",
      "anthony@plot.ly \n",
      "\n",
      "-AWS\n",
      "\n",
      "Sent from LinkedIn for iPhone\n",
      "http://lnkd.in/ios\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Charles Bombardier\n",
      "--------------------\n",
      "Hello Anthony\n",
      "\n",
      "I am doing a research on Crowdfunidng websites and I have some data to analyze, could you help me use plot.ly to extract the data I need for my thesis? \n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Anthony William Shannon\n",
      "--------------------\n",
      "Hi Charles, \n",
      "\n",
      "Thanks for the Add! I wanted to put Plot.ly on your radar--- a new data analysis & visualization platform allowing all engineers & management to collaborate & communicate efficiently. It's a complimentary product to MATLAB, which I know is widely used at Bombardier. \n",
      "\n",
      "We just closed a deal with Space X where all their flight data is being displayed via Plotly and are really interested in growing every vertical within Aerospace/Engineering.\n",
      "\n",
      "Is this of interest? Would love to get your eyes on it. \n",
      "\n",
      "AWS\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Charles Bombardier\n",
      "-------------------\n",
      "\n",
      "Hello Anthony\n",
      "\n",
      "Here is my data, \n",
      "\n",
      "\n",
      "\n",
      "\n",
      "I am trying to identify what make a the difference between a successful project on Kickstarter and the rest (Or the ones that almost succeeded)\n",
      "\n",
      "A successful project is\n",
      "\n",
      "1. A project that had a goal of raising at least 5 000\\$\n",
      "2. Who attracted at least 10 backers \n",
      "3. Who raised at least 100\\$ \n",
      "4. Who did meet its goal\n",
      "5. Who exceeded is goal by 50% \n",
      "6. Who attracted over 150 different backers in the end\n",
      "\n",
      "An unsuccessful project is\n",
      "\n",
      "1. A project that had a goal of raising at least 5 000\\$\n",
      "2. Who attracted at least 10 backers \n",
      "3. Who raised at least 100\\$ \n",
      "4. Who did not meet its goal\n",
      "\n",
      "I need to see if there is a **correlation** and **significance** between the data sets and the global average.\n",
      "\n",
      "What makes a difference?\n",
      "\n",
      "\n",
      "1. Does the length of the Title (Number of characters) make a difference? \n",
      "2. Are there some specific keywords in the Title that make a difference?\n",
      "3. Does the length of the Description (Number of characters) make a difference?\n",
      "4. Are there some specific keywords in the description that make a difference?\n",
      "5. Does the delay in days between the time it was created and launched make a difference?\n",
      "6. Does the day of the year (From 0 to 365) it was launched make a difference ? \n",
      "(PS: I was not able to identify the day of the year in numbers in excel because there are many years)  \n",
      "7. Does the day of the week make a difference?\n",
      "8. Does the number of backers make a difference?\n",
      "9. Does the number of comments make a difference?\n",
      "10. Does the number of comments per backer make a difference?\n",
      "11. Does the number of revisions make a difference?\n",
      "12. Does the project duration make a difference?\n",
      "13. Does the project location make a difference?\n",
      "14. Does the presence of a photo make a difference?\n",
      "15. Does the presence of a video make a difference?\n",
      "16. Does the number of pledge make a difference?\n",
      "17. How about the value of each pledge ?\n",
      "\n",
      "\n",
      "I\u2019ll wait to hear from you, thanks a lot!\n",
      "Best regards\n",
      "\n",
      "Charles"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Import data and modules"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import plotly.plotly as py\n",
      "import plotly.tools as tls\n",
      "from plotly.graph_objs import *"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import pandas as pd\n",
      "import numpy as np"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df_all = pd.read_excel(\"Kickstarter Data.xls\", \"All projects\")\n",
      "df_S = pd.read_excel(\"Kickstarter Data.xls\", \"Sucessful +5K + 50% +150 back\")\n",
      "df_U = pd.read_excel(\"Kickstarter Data.xls\", \"Unsucessfull +5K +100$ +10 back\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df_all.shape, df_S.shape, df_U.shape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 4,
       "text": [
        "((10159, 178), (1252, 178), (3728, 178))"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Some definitions"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Colors and grid style"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "col_S = '#99FF00'\n",
      "col_U = '#CC0000'\n",
      "col_diff = '#0099ff'\n",
      "\n",
      "grid = dict(\n",
      "    showgrid=True,\n",
      "    gridcolor='#FFFFFF',\n",
      "    gridwidth=1.5\n",
      ")        \n",
      "\n",
      "width = 650\n",
      "plot_bgcolor = '#EFECEA'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Colorbrewer color scale to plotly color scale function"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import colorbrewer as cb  \n",
      "\n",
      "def convert_cb_to_scl(cb_color,N=5):\n",
      "    '''\n",
      "    cb_color (positional): colorbrewer color dictionary\n",
      "    N (keyword): number of colors in color scale\n",
      "    '''\n",
      "    colors = cb_color[N]                   # get list of N color tuples from cb dict\n",
      "    levels = np.linspace(0,1,N).tolist()   # get list of N levels \n",
      "    \n",
      "    # Make color scale list of lists, conveting each tuple to 'rgb( , , )'\n",
      "    scl_cb = []\n",
      "    scl_cb += [[i, \"rgb(\"+','.join(map(str,color))+\")\"] \n",
      "                for i,color in zip(levels,colors)]\n",
      "    return scl_cb"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<hr>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "1. Does the length of the title (# of characters) make a difference?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "S = df_S['Project Title'].apply(len).values\n",
      "U = df_U['Project Title'].apply(len).values"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def stats_text(X):\n",
      "    X_mean = np.mean(X)\n",
      "    X_std = np.std(X)\n",
      "    return [\"<b>Mean</b>: {:5.2f}<br><b>Stand. dev.:</b> {:5.2f}\".format(X_mean,X_std) \n",
      "            for i in range(len(X))]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 8
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "histnorm='percent'\n",
      "opacity=0.5\n",
      "height = 500\n",
      "\n",
      "trace1 = Histogram(\n",
      "    x = S,\n",
      "    name = 'Succesful projects',\n",
      "    histnorm= histnorm,\n",
      "    marker= Marker(\n",
      "        color= col_S\n",
      "    ),\n",
      "    opacity=opacity,\n",
      "    text= stats_text(S)\n",
      ")\n",
      "\n",
      "trace2 = Histogram(\n",
      "    x = U,\n",
      "    name = 'Unsuccesful projects',\n",
      "    histnorm= histnorm,\n",
      "    marker= Marker(\n",
      "        color= col_U\n",
      "    ),\n",
      "    opacity= opacity,\n",
      "    text= stats_text(U)\n",
      ")\n",
      "\n",
      "data = Data([trace1, trace2])\n",
      "\n",
      "layout = Layout(\n",
      "    title='Does the length of the title (# of characters) make a difference?',\n",
      "    barmode='overlay',\n",
      "    xaxis= XAxis(\n",
      "        title='Number of characters in title',\n",
      "    ),\n",
      "    yaxis= YAxis(\n",
      "        grid,\n",
      "        title='Percentage of S/U projects',\n",
      "    ),\n",
      "    legend= Legend(\n",
      "        x=0,\n",
      "        y=1,\n",
      "        bgcolor=\"rgba(0,0,0,0)\"\n",
      "    ),\n",
      "    autosize=False,\n",
      "    width=width,\n",
      "    height=height,\n",
      "    plot_bgcolor=plot_bgcolor\n",
      ")\n",
      "\n",
      "fig = Figure(data=data, layout=layout)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 9
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "py.iplot(fig, filename='cbombardier-1')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\"seamless=\"seamless\" src=\"https://plot.ly/~etpinard/381\" height=\"525\" width=\"100%\"></iframe>"
       ],
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "<IPython.core.display.HTML at 0x7f80e81e0690>"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<hr>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "2. Are there some specific keywords in the title that make a difference?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Some [info](http://stackoverflow.com/questions/3406771/python-regex-string-to-list-of-words-including-words-with-hyphens) about regular expression."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import re\n",
      "from collections import Counter\n",
      "\n",
      "# Words to reject from counts\n",
      "rejects = ['the','for','a','your','and','canceled','39','to','s','with','in','of',\n",
      "           'on','an','by','you','that','it','4','5','way','from']\n",
      "\n",
      "def word_re(text):\n",
      "    p = re.compile('\\w(?:[-\\w]*\\w)?')  # words regex\n",
      "    text_lower = text.encode('ascii', 'replace').lower()  # to lowercase ascii\n",
      "    Words = p.findall(text_lower)      # get list of all words in text\n",
      "    return Words\n",
      "\n",
      "def get_count(df): \n",
      "    text = ' '.join(df['Project Title'].tolist())           # join all titles\n",
      "    iterables = zip(*Counter(word_re(text)).most_common())  # get 1 count and 1 index list\n",
      "    return pd.Series(*iterables[::-1])                      # output as pd Series\n",
      "\n",
      "cutoff = 30\n",
      "\n",
      "def to_plot(df0, df1, cutoff=cutoff):\n",
      "    \n",
      "    Df = pd.concat([get_count(df0), get_count(df1)], axis=1) # merge S and U Series\n",
      "    Df = Df.drop(rejects)                                    # delete rejects\n",
      "    Df = Df.fillna(0)                                        # fill in nan with 0\n",
      "    \n",
      "    Df['total'] = Df.ix[:,0:2].sum(axis=1)            # use S+U totals to cutoff\n",
      "    df = Df.sort('total', ascending=False)[0:cutoff] \n",
      "    \n",
      "    df['max'] = df.ix[:,0:2].max(axis=1)              # use S+U totals to sort\n",
      "    df = df.sort('max', ascending=True)               # in ascending order (for plot)\n",
      "    \n",
      "    df['ratio'] = (df.ix[:,0]-df.ix[:,1])/df.ix[:,2]  # compute S/U ratio\n",
      "    df['rel. perc.'] = df['ratio']*100                #   and relative percentage\n",
      "    \n",
      "    df['rank'] = df['rel. perc.'].rank(ascending=False)  # rank by rel. perc.\n",
      "    \n",
      "    return df\n",
      "\n",
      "df_q2 = to_plot(df_S,df_U)\n",
      "df_q2.tail()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>0</th>\n",
        "      <th>1</th>\n",
        "      <th>total</th>\n",
        "      <th>max</th>\n",
        "      <th>ratio</th>\n",
        "      <th>rel. perc.</th>\n",
        "      <th>rank</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>system</th>\n",
        "      <td> 32</td>\n",
        "      <td>  94</td>\n",
        "      <td> 126</td>\n",
        "      <td>  94</td>\n",
        "      <td>-0.492063</td>\n",
        "      <td>-49.206349</td>\n",
        "      <td> 17</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>stand</th>\n",
        "      <td> 24</td>\n",
        "      <td>  95</td>\n",
        "      <td> 119</td>\n",
        "      <td>  95</td>\n",
        "      <td>-0.596639</td>\n",
        "      <td>-59.663866</td>\n",
        "      <td> 25</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>ipad</th>\n",
        "      <td> 60</td>\n",
        "      <td> 173</td>\n",
        "      <td> 233</td>\n",
        "      <td> 173</td>\n",
        "      <td>-0.484979</td>\n",
        "      <td>-48.497854</td>\n",
        "      <td> 16</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>case</th>\n",
        "      <td> 35</td>\n",
        "      <td> 185</td>\n",
        "      <td> 220</td>\n",
        "      <td> 185</td>\n",
        "      <td>-0.681818</td>\n",
        "      <td>-68.181818</td>\n",
        "      <td> 28</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>iphone</th>\n",
        "      <td> 95</td>\n",
        "      <td> 298</td>\n",
        "      <td> 393</td>\n",
        "      <td> 298</td>\n",
        "      <td>-0.516539</td>\n",
        "      <td>-51.653944</td>\n",
        "      <td> 20</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>5 rows \u00d7 7 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "         0    1  total  max     ratio  rel. perc.  rank\n",
        "system  32   94    126   94 -0.492063  -49.206349    17\n",
        "stand   24   95    119   95 -0.596639  -59.663866    25\n",
        "ipad    60  173    233  173 -0.484979  -48.497854    16\n",
        "case    35  185    220  185 -0.681818  -68.181818    28\n",
        "iphone  95  298    393  298 -0.516539  -51.653944    20\n",
        "\n",
        "[5 rows x 7 columns]"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "height= 800\n",
      "opacity= 0.5\n",
      "\n",
      "trace1 = Bar(\n",
      "    x = df_q2.ix[:,0].values,\n",
      "    y = df_q2.index.values,\n",
      "    orientation='h',\n",
      "    name = 'Succesful projects',\n",
      "    marker= Marker(\n",
      "        color= col_S\n",
      "    ),\n",
      "    opacity= opacity\n",
      ")\n",
      "\n",
      "trace2 = Bar(\n",
      "    x = df_q2.ix[:,1].values,\n",
      "    y = df_q2.index.values,\n",
      "    orientation='h',\n",
      "    name = 'Unsuccesful projects',\n",
      "    marker= Marker(\n",
      "        color= col_U\n",
      "    ),\n",
      "    opacity= opacity\n",
      ")\n",
      "\n",
      "data = Data([trace1, trace2])\n",
      "\n",
      "layout = Layout(\n",
      "    title='Are there some specific keywords in the title that make a difference?',\n",
      "    barmode='group',\n",
      "    xaxis= XAxis(\n",
      "        grid,\n",
      "        title='Number of occurences in S/U projects titles',\n",
      "    ),\n",
      "    yaxis= YAxis(\n",
      "    ),\n",
      "    legend= Legend(\n",
      "        x=1,\n",
      "        y=0,\n",
      "        bgcolor=\"rgba(0,0,0,0)\"\n",
      "    ),\n",
      "    autosize=False,\n",
      "    width=width,\n",
      "    height=height,\n",
      "    plot_bgcolor=plot_bgcolor\n",
      ")\n",
      "\n",
      "fig = Figure(data=data, layout=layout)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 12
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "py.iplot(fig, filename='cbombardier-2', height=height)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\"seamless=\"seamless\" src=\"https://plot.ly/~etpinard/382\" height=\"800\" width=\"100%\"></iframe>"
       ],
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "<IPython.core.display.HTML at 0x7f80c51ece10>"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "try:\n",
      "    del figb\n",
      "except NameError:\n",
      "    pass\n",
      "import copy\n",
      "figb = copy.deepcopy(fig)\n",
      "\n",
      "def make_text(df, cutoff=cutoff):\n",
      "    return '<br><b>Total # of occurences:</b> %s\\\n",
      "    <br><b>Rank:</b> %s of out %s' % (int(df['total']), int(df['rank']), cutoff)  \n",
      "\n",
      "def make_color(X, cutoff=cutoff, N=6):\n",
      "    scl = convert_cb_to_scl(cb.PuBu, N+2)[:1:-1]\n",
      "    I_scl = np.floor(X/cutoff*(N-1))\n",
      "    return [scl[int(i_scl)][1] for i_scl in I_scl]\n",
      "\n",
      "figb['data'] += [Bar(\n",
      "    x = df_q2['rel. perc.'].values,\n",
      "    y = df_q2.index.values,\n",
      "    orientation='h',\n",
      "    name = 'Relative difference',\n",
      "    text= df_q2.apply(make_text,axis=1).tolist(),\n",
      "    marker= Marker(\n",
      "        color= make_color(df_q2['rank'].values)\n",
      "    ),\n",
      "    opacity=opacity,\n",
      "    xaxis='x2',\n",
      "    showlegend=False\n",
      ")]\n",
      "\n",
      "figb['layout']['xaxis'].update( \n",
      "    domain=[0, 0.47],\n",
      "    title= 'Number of S/U projects'\n",
      ")\n",
      "\n",
      "figb['layout'].update(\n",
      "    xaxis2 = XAxis(\n",
      "        grid,\n",
      "        domain=[0.53, 1],\n",
      "        title= 'Relative S/U difference [%]',\n",
      "        autotick=False,\n",
      "        dtick=20\n",
      "    ),\n",
      ")\n",
      "\n",
      "figb['layout']['legend'].update(\n",
      "    x=0.45,\n",
      "    y=0,\n",
      "    xanchor='right'\n",
      ")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "py.iplot(figb, filename='cbombardier-2b', height=height)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\"seamless=\"seamless\" src=\"https://plot.ly/~etpinard/386\" height=\"800\" width=\"100%\"></iframe>"
       ],
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "<IPython.core.display.HTML at 0x7f80c559bf10>"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<hr>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "5. Does the delay in days between the time it was created <br> and launched make a difference?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "S = df_S['Delay'].values\n",
      "U = df_U['Delay'].values"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 16
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def stats_text(X):\n",
      "    X_mean = np.mean(X)\n",
      "    X_std = np.std(X)\n",
      "    return [\"<b>Mean</b>: {:5.2f}<br><b>Stand. dev.:</b> {:5.2f}\".format(X_mean,X_std) \n",
      "            for i in range(len(X))]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 17
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "histnorm='percent'\n",
      "opacity=0.5\n",
      "height = 500\n",
      "\n",
      "bins = dict(\n",
      "    start=0,\n",
      "    end=365,\n",
      "    size=7\n",
      ")\n",
      "\n",
      "trace1 = Histogram(\n",
      "    x = S,\n",
      "    name = 'Succesful projects',\n",
      "    histnorm= histnorm,\n",
      "    marker= Marker(\n",
      "        color='#99FF00'\n",
      "    ),\n",
      "    opacity=opacity,\n",
      "    text = stats_text(S),\n",
      "    autobinx=False,\n",
      "    xbins= XBins(bins)\n",
      "    \n",
      ")\n",
      "\n",
      "trace2 = Histogram(\n",
      "    x = U,\n",
      "    name = 'Unsuccesful projects',\n",
      "    histnorm= histnorm,\n",
      "    marker= Marker(\n",
      "        color='#CC0000'\n",
      "    ),\n",
      "    opacity= opacity,\n",
      "    text= stats_text(U),\n",
      "    autobinx=False,\n",
      "    xbins= XBins(bins)\n",
      ")\n",
      "\n",
      "data = Data([trace1, trace2])\n",
      "\n",
      "layout = Layout(\n",
      "    title='Does the delay in days between the time it was created <br>\\\n",
      "and launched make a difference?',\n",
      "    barmode='overlay',\n",
      "    xaxis= XAxis(\n",
      "        grid,\n",
      "        title='Delay in days between creation and lauch times',\n",
      "    ),\n",
      "    yaxis= YAxis(\n",
      "        grid,\n",
      "        title='Percentage of total occurrences',\n",
      "        range=[0,15.5]\n",
      "    ),\n",
      "    legend= Legend(\n",
      "        x=1,\n",
      "        y=1,\n",
      "        bgcolor=\"rgba(0,0,0,0)\"\n",
      "    ),\n",
      "    autosize=False,\n",
      "    width=width,\n",
      "    height=height,\n",
      "    plot_bgcolor='#EFECEA'\n",
      ")\n",
      "\n",
      "fig = Figure(data=data, layout=layout)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 18
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "py.iplot(fig, filename='cbombardier-5')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\"seamless=\"seamless\" src=\"https://plot.ly/~etpinard/385\" height=\"525\" width=\"100%\"></iframe>"
       ],
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "<IPython.core.display.HTML at 0x7f80c8635f50>"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "try:\n",
      "    del figb\n",
      "except NameError:\n",
      "    pass\n",
      "import copy\n",
      "figb = copy.deepcopy(fig)\n",
      "\n",
      "tmp = copy.deepcopy(fig['data'])\n",
      "\n",
      "tmp.update(dict(\n",
      "    xaxis='x2',\n",
      "    yaxis='y2',\n",
      "    xbins= XBins(\n",
      "        start=0,\n",
      "        end=100,\n",
      "        size=1\n",
      "    ),\n",
      "    showlegend=False\n",
      "))\n",
      "\n",
      "figb['data'] += tmp\n",
      "\n",
      "figb['layout'].update( \n",
      "    xaxis2=XAxis(\n",
      "        grid,\n",
      "        domain=[0.52, 1],\n",
      "        range=[0,99],\n",
      "        anchor='y2',\n",
      "        autotick=False,\n",
      "        dtick=20\n",
      "    ),\n",
      "    yaxis2=YAxis(\n",
      "        grid,\n",
      "        domain=[0.26, 0.82],\n",
      "        anchor='x2'\n",
      "    )\n",
      ")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 20
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "py.iplot(figb, filename='cbombardier-5b')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\"seamless=\"seamless\" src=\"https://plot.ly/~etpinard/387\" height=\"525\" width=\"100%\"></iframe>"
       ],
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "<IPython.core.display.HTML at 0x7f80c51ec9d0>"
       ]
      }
     ],
     "prompt_number": 21
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<hr>"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "13. Does the project location make a difference?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def location_count(df): \n",
      "    return df.groupby(\"Location\").apply(lambda x: x.shape[0])  # count by Location\n",
      "\n",
      "cutoff = 30\n",
      "\n",
      "def to_plot(df0, df1, cutoff=cutoff):\n",
      "    \n",
      "    Df = pd.concat([location_count(df0), location_count(df1)], axis=1)\n",
      "    Df = Df.fillna(0)                                 # fill in nan with 0\n",
      "    \n",
      "    Df['total'] = Df.ix[:,0:2].sum(axis=1)\n",
      "    df = Df.sort('total', ascending=False)[0:cutoff] \n",
      "    \n",
      "    df['max'] = df.ix[:,0:2].max(axis=1)              # use S+U totals to sort\n",
      "    df = df.sort('max', ascending=True)               # in ascending order (for plot)\n",
      "    \n",
      "    df['ratio'] = (df.ix[:,0]-df.ix[:,1])/df.ix[:,2]  # compute S/U ratio\n",
      "    df['rel. perc.'] = df['ratio']*100                #   and relative percentage\n",
      "    \n",
      "    df['rank'] = df['rel. perc.'].rank(ascending=False)  # rank by rel. perc.\n",
      "    \n",
      "    return df\n",
      "\n",
      "df_q13 = to_plot(df_S,df_U)\n",
      "df_q13.tail()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>0</th>\n",
        "      <th>1</th>\n",
        "      <th>total</th>\n",
        "      <th>max</th>\n",
        "      <th>ratio</th>\n",
        "      <th>rel. perc.</th>\n",
        "      <th>rank</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>Chicago, IL</th>\n",
        "      <td>  52</td>\n",
        "      <td>  80</td>\n",
        "      <td> 132</td>\n",
        "      <td>  80</td>\n",
        "      <td>-0.212121</td>\n",
        "      <td>-21.212121</td>\n",
        "      <td>  4</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>London, UK</th>\n",
        "      <td>  41</td>\n",
        "      <td> 101</td>\n",
        "      <td> 142</td>\n",
        "      <td> 101</td>\n",
        "      <td>-0.422535</td>\n",
        "      <td>-42.253521</td>\n",
        "      <td> 13</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>San Francisco, CA</th>\n",
        "      <td> 110</td>\n",
        "      <td> 110</td>\n",
        "      <td> 220</td>\n",
        "      <td> 110</td>\n",
        "      <td> 0.000000</td>\n",
        "      <td>  0.000000</td>\n",
        "      <td>  2</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>New York, NY</th>\n",
        "      <td>  51</td>\n",
        "      <td> 136</td>\n",
        "      <td> 187</td>\n",
        "      <td> 136</td>\n",
        "      <td>-0.454545</td>\n",
        "      <td>-45.454545</td>\n",
        "      <td> 14</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>Los Angeles, CA</th>\n",
        "      <td>  55</td>\n",
        "      <td> 175</td>\n",
        "      <td> 230</td>\n",
        "      <td> 175</td>\n",
        "      <td>-0.521739</td>\n",
        "      <td>-52.173913</td>\n",
        "      <td> 17</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>5 rows \u00d7 7 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 22,
       "text": [
        "                     0    1  total  max     ratio  rel. perc.  rank\n",
        "Chicago, IL         52   80    132   80 -0.212121  -21.212121     4\n",
        "London, UK          41  101    142  101 -0.422535  -42.253521    13\n",
        "San Francisco, CA  110  110    220  110  0.000000    0.000000     2\n",
        "New York, NY        51  136    187  136 -0.454545  -45.454545    14\n",
        "Los Angeles, CA     55  175    230  175 -0.521739  -52.173913    17\n",
        "\n",
        "[5 rows x 7 columns]"
       ]
      }
     ],
     "prompt_number": 22
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "height= 800\n",
      "opacity= 0.5\n",
      "\n",
      "trace1 = Bar(\n",
      "    x = df_q13.ix[:,0].values,\n",
      "    y = df_q13.index.values,\n",
      "    orientation='h',\n",
      "    name = 'Succesful projects',\n",
      "    marker= Marker(\n",
      "        color='#99FF00'\n",
      "    ),\n",
      "    opacity=opacity\n",
      ")\n",
      "\n",
      "trace2 = Bar(\n",
      "    x = df_q13.ix[:,1].values,\n",
      "    y = df_q13.index.values,\n",
      "    orientation='h',\n",
      "    name = 'Unsuccesful projects',\n",
      "    marker= Marker(\n",
      "        color='#CC0000'\n",
      "    ),\n",
      "    opacity=opacity\n",
      ")\n",
      "\n",
      "data = Data([trace1, trace2])\n",
      "\n",
      "layout = Layout(\n",
      "    title='Does the project location make a difference?',\n",
      "    barmode='group',\n",
      "    xaxis= XAxis(\n",
      "        grid,\n",
      "        title='Number of S/U project in each city',\n",
      "    ),\n",
      "    yaxis= YAxis(\n",
      "    ),\n",
      "    legend= Legend(\n",
      "        x=1,\n",
      "        y=0,\n",
      "        bgcolor=\"rgba(0,0,0,0)\"\n",
      "    ),\n",
      "    autosize=False,\n",
      "    width=width,\n",
      "    height=height,\n",
      "    plot_bgcolor=plot_bgcolor,\n",
      "    margin= Margin(\n",
      "        l=135\n",
      "    )\n",
      ")\n",
      "\n",
      "fig = Figure(data=data, layout=layout)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 23
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "py.iplot(fig, filename='cbombardier-13', height=height)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\"seamless=\"seamless\" src=\"https://plot.ly/~etpinard/383\" height=\"800\" width=\"100%\"></iframe>"
       ],
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "<IPython.core.display.HTML at 0x7f80c4a90a90>"
       ]
      }
     ],
     "prompt_number": 24
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "try:\n",
      "    del figb\n",
      "except NameError:\n",
      "    pass\n",
      "import copy\n",
      "figb = copy.deepcopy(fig)\n",
      "\n",
      "def make_text(df, cutoff=cutoff):\n",
      "    return '<br><b>Total # of projects:</b> %s\\\n",
      "    <br><b>Rank:</b> %s of out %s' % (int(df['total']), int(df['rank']), cutoff)  \n",
      "\n",
      "def make_color(X, cutoff=cutoff, N=6):\n",
      "    scl = convert_cb_to_scl(cb.PuBu, N+2)[:1:-1]\n",
      "    I_scl = np.floor(X/cutoff*(N-1))\n",
      "    return [scl[int(i_scl)][1] for i_scl in I_scl]\n",
      "\n",
      "figb['data'] += [Bar(\n",
      "    x = df_q13.ix[:,5].values,\n",
      "    y = df_q13.index.values,\n",
      "    orientation='h',\n",
      "    name = 'Relative difference',\n",
      "    text= df_q13.apply(make_text,axis=1).tolist(),\n",
      "    marker= Marker(\n",
      "        color= make_color(df_q13['rank'].values)\n",
      "    ),\n",
      "    opacity=opacity,\n",
      "    xaxis='x2',\n",
      "    showlegend=False\n",
      ")]\n",
      "\n",
      "figb['layout']['xaxis'].update( \n",
      "    domain=[0, 0.47],\n",
      "    title= 'Number of S/U projects'\n",
      ")\n",
      "\n",
      "figb['layout'].update(\n",
      "    xaxis2 = XAxis(\n",
      "        grid,\n",
      "        domain=[0.53, 1],\n",
      "        title= 'Relative S/U difference [%]',\n",
      "        autotick=False,\n",
      "        dtick=20\n",
      "    ),\n",
      ")\n",
      "\n",
      "figb['layout']['legend'].update(\n",
      "    x=0.45,\n",
      "    y=0,\n",
      "    xanchor='right'\n",
      ")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 25
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "py.iplot(figb, filename='cbombardier-13b', height=height)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<iframe id=\"igraph\" scrolling=\"no\" style=\"border:none;\"seamless=\"seamless\" src=\"https://plot.ly/~etpinard/384\" height=\"800\" width=\"100%\"></iframe>"
       ],
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "<IPython.core.display.HTML at 0x7f80c4a909d0>"
       ]
      }
     ],
     "prompt_number": 26
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<br>\n",
      "<hr>\n",
      "<br>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "<div style=\"float:right; \\\">\n",
      "    <img src=\"http://i.imgur.com/4vwuxdJ.png\" \n",
      " align=right style=\"float:right; margin-left: 5px; margin-top: -10px\" />\n",
      "</div>\n",
      "\n",
      "<h4 style=\"margin-top:60px;\"> Got Questions or Feedback? </h4>\n",
      "\n",
      "About <a href=\"https://plot.ly\" target=\"_blank\">Plotly</a>\n",
      "\n",
      "* email: feedback@plot.ly \n",
      "* tweet: \n",
      "<a href=\"https://twitter.com/plotlygraphs\" target=\"_blank\">@plotlygraphs</a>\n",
      "\n",
      "<h4 style=\"margin-top:30px;\">Notebook styling ideas</h4>\n",
      "\n",
      "Big thanks to\n",
      "\n",
      "* <a href=\"http://nbviewer.ipython.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Prologue/Prologue.ipynb\" target=\"_blank\">Cam Davidson-Pilon</a>\n",
      "* <a href=\"http://lorenabarba.com/blog/announcing-aeropython/#.U1ULXdX1LJ4.google_plusone_share\" target=\"_blank\">Lorena A. Barba</a>\n",
      "\n",
      "<br>"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.display import display, HTML\n",
      "import urllib2\n",
      "url = 'https://raw.githubusercontent.com/plotly/python-user-guide/master/custom.css'\n",
      "display(HTML(urllib2.urlopen(url).read()))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<style>\n",
        "    /*body {\n",
        "        background-color: #F5F5F5;\n",
        "    }*/\n",
        "    div.cell{\n",
        "        width: 850px;\n",
        "        margin-left: 10% !important;\n",
        "        margin-right: auto;\n",
        "    }\n",
        "    h1 {\n",
        "        font-family: \"Open sans\",verdana,arial,sans-serif;\n",
        "    }\n",
        "    .text_cell_render h1 {\n",
        "        font-weight: 200;\n",
        "        font-size: 40pt;\n",
        "        line-height: 100%;\n",
        "        color:#447adb;\n",
        "        margin-bottom: 0em;\n",
        "        margin-top: 0em;\n",
        "        display: block;\n",
        "        white-space: nowrap;\n",
        "    } \n",
        "    h2 {\n",
        "        font-family: \"Open sans\",verdana,arial,sans-serif;\n",
        "        text-indent:1em;\n",
        "    }\n",
        "    .text_cell_render h2 {\n",
        "        font-weight: 200;\n",
        "        font-size: 20pt;\n",
        "        font-style: italic;\n",
        "        line-height: 100%;\n",
        "        color:#447adb;\n",
        "        margin-bottom: 1.5em;\n",
        "        margin-top: 0.5em;\n",
        "        display: block;\n",
        "        white-space: nowrap;\n",
        "    } \n",
        "    h3 {\n",
        "        font-family: \"Open sans\",verdana,arial,sans-serif;\n",
        "    }\n",
        "    .text_cell_render h3 {\n",
        "        font-weight: 300;\n",
        "        font-size: 18pt;\n",
        "        line-height: 100%;\n",
        "        color:#447adb;\n",
        "        margin-bottom: 0.5em;\n",
        "        margin-top: 2em;\n",
        "        display: block;\n",
        "        white-space: nowrap;\n",
        "    }\n",
        "    h4 {\n",
        "        font-family: \"Open sans\",verdana,arial,sans-serif;\n",
        "    }\n",
        "    .text_cell_render h4 {\n",
        "        font-weight: 300;\n",
        "        font-size: 16pt;\n",
        "        color:#447adb;\n",
        "        margin-bottom: 0.5em;\n",
        "        margin-top: 0.5em;\n",
        "        display: block;\n",
        "        white-space: nowrap;\n",
        "    }\n",
        "    h5 {\n",
        "        font-family: \"Open sans\",verdana,arial,sans-serif;\n",
        "    }\n",
        "    .text_cell_render h5 {\n",
        "        font-weight: 300;\n",
        "        font-style: normal;\n",
        "        color: #1d3b84;\n",
        "        font-size: 16pt;\n",
        "        margin-bottom: 0em;\n",
        "        margin-top: 1.5em;\n",
        "        display: block;\n",
        "        white-space: nowrap;\n",
        "    }\n",
        "    div.text_cell_render{\n",
        "        font-family: \"Open sans\",verdana,arial,sans-serif;\n",
        "        line-height: 135%;\n",
        "        font-size: 125%;\n",
        "        width:750px;\n",
        "        margin-left:auto;\n",
        "        margin-right:auto;\n",
        "        text-align:justify;\n",
        "        text-justify:inter-word;\n",
        "    }\n",
        "    div.output_subarea.output_text.output_pyout {\n",
        "        overflow-x: auto;\n",
        "        overflow-y: scroll;\n",
        "        max-height: 300px;\n",
        "    }\n",
        "    div.output_subarea.output_stream.output_stdout.output_text {\n",
        "        overflow-x: auto;\n",
        "        overflow-y: scroll;\n",
        "        max-height: 300px;\n",
        "    }\n",
        "    div.output_subarea.output_html.rendered_html {\n",
        "        overflow-x: scroll;\n",
        "        max-width: 100%;\n",
        "      /*  overflow-y: scroll; */\n",
        "      /*  max-height: 300px;   */\n",
        "    }\n",
        "    code{\n",
        "      font-size: 78%;\n",
        "    }\n",
        "    .rendered_html code{\n",
        "    background-color: transparent;\n",
        "    }\n",
        "    ul{\n",
        "        /* color:#447adb; */  \n",
        "        margin: 2em;\n",
        "    }\n",
        "    ul li{\n",
        "        padding-left: 0.5em; \n",
        "        margin-bottom: 0.5em; \n",
        "        margin-top: 0.5em; \n",
        "    }\n",
        "    ul li li{\n",
        "        padding-left: 0.2em; \n",
        "        margin-bottom: 0.2em; \n",
        "        margin-top: 0.2em; \n",
        "    }\n",
        "    ol{\n",
        "        /* color:#447adb; */  \n",
        "        margin: 2em;\n",
        "    }\n",
        "    ol li{\n",
        "        padding-left: 0.5em; \n",
        "        margin-bottom: 0.5em; \n",
        "        margin-top: 0.5em; \n",
        "    }\n",
        "    /*.prompt{\n",
        "        display: None;\n",
        "    } */\n",
        "    ul li{\n",
        "        padding-left: 0.5em; \n",
        "        margin-bottom: 0.5em; \n",
        "        margin-top: 0.2em; \n",
        "    }\n",
        "    a:link{\n",
        "       font-weight: bold;\n",
        "       color:#447adb;\n",
        "    }\n",
        "    a:visited{\n",
        "       font-weight: bold;\n",
        "       color: #1d3b84;\n",
        "    }\n",
        "    a:hover{\n",
        "       font-weight: bold;\n",
        "       color: #1d3b84;\n",
        "    }\n",
        "    a:focus{\n",
        "       font-weight: bold;\n",
        "       color:#447adb;\n",
        "    }\n",
        "    a:active{\n",
        "       font-weight: bold;\n",
        "       color:#447adb;\n",
        "    }\n",
        "    .rendered_html :link {\n",
        "       text-decoration: none; \n",
        "    }\n",
        "    .rendered_html :hover {\n",
        "       text-decoration: none; \n",
        "    }\n",
        "    .rendered_html :visited {\n",
        "      text-decoration: none;\n",
        "    }\n",
        "    .rendered_html :focus {\n",
        "      text-decoration: none;\n",
        "    }\n",
        "    .rendered_html :active {\n",
        "      text-decoration: none;\n",
        "    }\n",
        "    .warning{\n",
        "        color: rgb( 240, 20, 20 )\n",
        "    } \n",
        "    hr {\n",
        "      color: #f3f3f3;\n",
        "      background-color: #f3f3f3;\n",
        "      height: 1px;\n",
        "    }\n",
        "    blockquote{\n",
        "      display:block;\n",
        "      background: #f3f3f3;\n",
        "      font-family: \"Open sans\",verdana,arial,sans-serif;\n",
        "      width:610px;\n",
        "      padding: 15px 15px 15px 15px;\n",
        "      text-align:justify;\n",
        "      text-justify:inter-word;\n",
        "      }\n",
        "      blockquote p {\n",
        "        margin-bottom: 0;\n",
        "        line-height: 125%;\n",
        "        font-size: 100%;\n",
        "      }\n",
        "   /* element.style {\n",
        "    } */  \n",
        "</style>\n",
        "<script>\n",
        "    MathJax.Hub.Config({\n",
        "                        TeX: {\n",
        "                           extensions: [\"AMSmath.js\"]\n",
        "                           },\n",
        "                tex2jax: {\n",
        "                    inlineMath: [ [\"$\",\"$\"], [\"\\\\(\",\"\\\\)\"] ],\n",
        "                    displayMath: [ [\"$$\",\"$$\"], [\"\\\\[\",\"\\\\]\"] ]\n",
        "                },\n",
        "                displayAlign: \"center\", // Change this to \"center\" to center equations.\n",
        "                \"HTML-CSS\": {\n",
        "                    styles: {\".MathJax_Display\": {\"margin\": 4}}\n",
        "                }\n",
        "        });\n",
        "</script>\n"
       ],
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "<IPython.core.display.HTML at 0x7f80c4a90350>"
       ]
      }
     ],
     "prompt_number": 27
    }
   ],
   "metadata": {}
  }
 ]
}