{ "metadata": { "name": "", "signature": "sha256:91941a47b66e331df7feb5cec22b794c17f0a7ffd6c0e50d86f8d15efc55c5bc" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Anthony William Shannon\t\n", "----------------------\n", "\n", "Disruptor---Linking People & Technology.\n", "\n", "Charles---yes we can. \n", "\n", "Send over all your data over. We'll make killer plots and let you visualize all your data very efficiently\n", "\n", "anthony@plot.ly \n", "\n", "-AWS\n", "\n", "Sent from LinkedIn for iPhone\n", "http://lnkd.in/ios\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Charles Bombardier\n", "--------------------\n", "Hello Anthony\n", "\n", "I am doing a research on Crowdfunidng websites and I have some data to analyze, could you help me use plot.ly to extract the data I need for my thesis? \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Anthony William Shannon\n", "--------------------\n", "Hi Charles, \n", "\n", "Thanks for the Add! I wanted to put Plot.ly on your radar--- a new data analysis & visualization platform allowing all engineers & management to collaborate & communicate efficiently. It's a complimentary product to MATLAB, which I know is widely used at Bombardier. \n", "\n", "We just closed a deal with Space X where all their flight data is being displayed via Plotly and are really interested in growing every vertical within Aerospace/Engineering.\n", "\n", "Is this of interest? Would love to get your eyes on it. \n", "\n", "AWS\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Charles Bombardier\n", "-------------------\n", "\n", "Hello Anthony\n", "\n", "Here is my data, \n", "\n", "\n", "\n", "\n", "I am trying to identify what make a the difference between a successful project on Kickstarter and the rest (Or the ones that almost succeeded)\n", "\n", "A successful project is\n", "\n", "1. A project that had a goal of raising at least 5 000\\$\n", "2. Who attracted at least 10 backers \n", "3. Who raised at least 100\\$ \n", "4. Who did meet its goal\n", "5. Who exceeded is goal by 50% \n", "6. Who attracted over 150 different backers in the end\n", "\n", "An unsuccessful project is\n", "\n", "1. A project that had a goal of raising at least 5 000\\$\n", "2. Who attracted at least 10 backers \n", "3. Who raised at least 100\\$ \n", "4. Who did not meet its goal\n", "\n", "I need to see if there is a **correlation** and **significance** between the data sets and the global average.\n", "\n", "What makes a difference?\n", "\n", "\n", "1. Does the length of the Title (Number of characters) make a difference? \n", "2. Are there some specific keywords in the Title that make a difference?\n", "3. Does the length of the Description (Number of characters) make a difference?\n", "4. Are there some specific keywords in the description that make a difference?\n", "5. Does the delay in days between the time it was created and launched make a difference?\n", "6. Does the day of the year (From 0 to 365) it was launched make a difference ? \n", "(PS: I was not able to identify the day of the year in numbers in excel because there are many years) \n", "7. Does the day of the week make a difference?\n", "8. Does the number of backers make a difference?\n", "9. Does the number of comments make a difference?\n", "10. Does the number of comments per backer make a difference?\n", "11. Does the number of revisions make a difference?\n", "12. Does the project duration make a difference?\n", "13. Does the project location make a difference?\n", "14. Does the presence of a photo make a difference?\n", "15. Does the presence of a video make a difference?\n", "16. Does the number of pledge make a difference?\n", "17. How about the value of each pledge ?\n", "\n", "\n", "I\u2019ll wait to hear from you, thanks a lot!\n", "Best regards\n", "\n", "Charles" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Import data and modules" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import plotly.plotly as py\n", "import plotly.tools as tls\n", "from plotly.graph_objs import *" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import numpy as np" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "df_all = pd.read_excel(\"Kickstarter Data.xls\", \"All projects\")\n", "df_S = pd.read_excel(\"Kickstarter Data.xls\", \"Sucessful +5K + 50% +150 back\")\n", "df_U = pd.read_excel(\"Kickstarter Data.xls\", \"Unsucessfull +5K +100$ +10 back\")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "df_all.shape, df_S.shape, df_U.shape" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "((10159, 178), (1252, 178), (3728, 178))" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "Some definitions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Colors and grid style" ] }, { "cell_type": "code", "collapsed": false, "input": [ "col_S = '#99FF00'\n", "col_U = '#CC0000'\n", "col_diff = '#0099ff'\n", "\n", "grid = dict(\n", " showgrid=True,\n", " gridcolor='#FFFFFF',\n", " gridwidth=1.5\n", ") \n", "\n", "width = 650\n", "plot_bgcolor = '#EFECEA'" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Colorbrewer color scale to plotly color scale function" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import colorbrewer as cb \n", "\n", "def convert_cb_to_scl(cb_color,N=5):\n", " '''\n", " cb_color (positional): colorbrewer color dictionary\n", " N (keyword): number of colors in color scale\n", " '''\n", " colors = cb_color[N] # get list of N color tuples from cb dict\n", " levels = np.linspace(0,1,N).tolist() # get list of N levels \n", " \n", " # Make color scale list of lists, conveting each tuple to 'rgb( , , )'\n", " scl_cb = []\n", " scl_cb += [[i, \"rgb(\"+','.join(map(str,color))+\")\"] \n", " for i,color in zip(levels,colors)]\n", " return scl_cb" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "1. Does the length of the title (# of characters) make a difference?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "S = df_S['Project Title'].apply(len).values\n", "U = df_U['Project Title'].apply(len).values" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "def stats_text(X):\n", " X_mean = np.mean(X)\n", " X_std = np.std(X)\n", " return [\"Mean: {:5.2f}
Stand. dev.: {:5.2f}\".format(X_mean,X_std) \n", " for i in range(len(X))]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "histnorm='percent'\n", "opacity=0.5\n", "height = 500\n", "\n", "trace1 = Histogram(\n", " x = S,\n", " name = 'Succesful projects',\n", " histnorm= histnorm,\n", " marker= Marker(\n", " color= col_S\n", " ),\n", " opacity=opacity,\n", " text= stats_text(S)\n", ")\n", "\n", "trace2 = Histogram(\n", " x = U,\n", " name = 'Unsuccesful projects',\n", " histnorm= histnorm,\n", " marker= Marker(\n", " color= col_U\n", " ),\n", " opacity= opacity,\n", " text= stats_text(U)\n", ")\n", "\n", "data = Data([trace1, trace2])\n", "\n", "layout = Layout(\n", " title='Does the length of the title (# of characters) make a difference?',\n", " barmode='overlay',\n", " xaxis= XAxis(\n", " title='Number of characters in title',\n", " ),\n", " yaxis= YAxis(\n", " grid,\n", " title='Percentage of S/U projects',\n", " ),\n", " legend= Legend(\n", " x=0,\n", " y=1,\n", " bgcolor=\"rgba(0,0,0,0)\"\n", " ),\n", " autosize=False,\n", " width=width,\n", " height=height,\n", " plot_bgcolor=plot_bgcolor\n", ")\n", "\n", "fig = Figure(data=data, layout=layout)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "py.iplot(fig, filename='cbombardier-1')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "2. Are there some specific keywords in the title that make a difference?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some [info](http://stackoverflow.com/questions/3406771/python-regex-string-to-list-of-words-including-words-with-hyphens) about regular expression." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import re\n", "from collections import Counter\n", "\n", "# Words to reject from counts\n", "rejects = ['the','for','a','your','and','canceled','39','to','s','with','in','of',\n", " 'on','an','by','you','that','it','4','5','way','from']\n", "\n", "def word_re(text):\n", " p = re.compile('\\w(?:[-\\w]*\\w)?') # words regex\n", " text_lower = text.encode('ascii', 'replace').lower() # to lowercase ascii\n", " Words = p.findall(text_lower) # get list of all words in text\n", " return Words\n", "\n", "def get_count(df): \n", " text = ' '.join(df['Project Title'].tolist()) # join all titles\n", " iterables = zip(*Counter(word_re(text)).most_common()) # get 1 count and 1 index list\n", " return pd.Series(*iterables[::-1]) # output as pd Series\n", "\n", "cutoff = 30\n", "\n", "def to_plot(df0, df1, cutoff=cutoff):\n", " \n", " Df = pd.concat([get_count(df0), get_count(df1)], axis=1) # merge S and U Series\n", " Df = Df.drop(rejects) # delete rejects\n", " Df = Df.fillna(0) # fill in nan with 0\n", " \n", " Df['total'] = Df.ix[:,0:2].sum(axis=1) # use S+U totals to cutoff\n", " df = Df.sort('total', ascending=False)[0:cutoff] \n", " \n", " df['max'] = df.ix[:,0:2].max(axis=1) # use S+U totals to sort\n", " df = df.sort('max', ascending=True) # in ascending order (for plot)\n", " \n", " df['ratio'] = (df.ix[:,0]-df.ix[:,1])/df.ix[:,2] # compute S/U ratio\n", " df['rel. perc.'] = df['ratio']*100 # and relative percentage\n", " \n", " df['rank'] = df['rel. perc.'].rank(ascending=False) # rank by rel. perc.\n", " \n", " return df\n", "\n", "df_q2 = to_plot(df_S,df_U)\n", "df_q2.tail()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01totalmaxratiorel. perc.rank
system 32 94 126 94-0.492063-49.206349 17
stand 24 95 119 95-0.596639-59.663866 25
ipad 60 173 233 173-0.484979-48.497854 16
case 35 185 220 185-0.681818-68.181818 28
iphone 95 298 393 298-0.516539-51.653944 20
\n", "

5 rows \u00d7 7 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ " 0 1 total max ratio rel. perc. rank\n", "system 32 94 126 94 -0.492063 -49.206349 17\n", "stand 24 95 119 95 -0.596639 -59.663866 25\n", "ipad 60 173 233 173 -0.484979 -48.497854 16\n", "case 35 185 220 185 -0.681818 -68.181818 28\n", "iphone 95 298 393 298 -0.516539 -51.653944 20\n", "\n", "[5 rows x 7 columns]" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "height= 800\n", "opacity= 0.5\n", "\n", "trace1 = Bar(\n", " x = df_q2.ix[:,0].values,\n", " y = df_q2.index.values,\n", " orientation='h',\n", " name = 'Succesful projects',\n", " marker= Marker(\n", " color= col_S\n", " ),\n", " opacity= opacity\n", ")\n", "\n", "trace2 = Bar(\n", " x = df_q2.ix[:,1].values,\n", " y = df_q2.index.values,\n", " orientation='h',\n", " name = 'Unsuccesful projects',\n", " marker= Marker(\n", " color= col_U\n", " ),\n", " opacity= opacity\n", ")\n", "\n", "data = Data([trace1, trace2])\n", "\n", "layout = Layout(\n", " title='Are there some specific keywords in the title that make a difference?',\n", " barmode='group',\n", " xaxis= XAxis(\n", " grid,\n", " title='Number of occurences in S/U projects titles',\n", " ),\n", " yaxis= YAxis(\n", " ),\n", " legend= Legend(\n", " x=1,\n", " y=0,\n", " bgcolor=\"rgba(0,0,0,0)\"\n", " ),\n", " autosize=False,\n", " width=width,\n", " height=height,\n", " plot_bgcolor=plot_bgcolor\n", ")\n", "\n", "fig = Figure(data=data, layout=layout)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "py.iplot(fig, filename='cbombardier-2', height=height)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " del figb\n", "except NameError:\n", " pass\n", "import copy\n", "figb = copy.deepcopy(fig)\n", "\n", "def make_text(df, cutoff=cutoff):\n", " return '
Total # of occurences: %s\\\n", "
Rank: %s of out %s' % (int(df['total']), int(df['rank']), cutoff) \n", "\n", "def make_color(X, cutoff=cutoff, N=6):\n", " scl = convert_cb_to_scl(cb.PuBu, N+2)[:1:-1]\n", " I_scl = np.floor(X/cutoff*(N-1))\n", " return [scl[int(i_scl)][1] for i_scl in I_scl]\n", "\n", "figb['data'] += [Bar(\n", " x = df_q2['rel. perc.'].values,\n", " y = df_q2.index.values,\n", " orientation='h',\n", " name = 'Relative difference',\n", " text= df_q2.apply(make_text,axis=1).tolist(),\n", " marker= Marker(\n", " color= make_color(df_q2['rank'].values)\n", " ),\n", " opacity=opacity,\n", " xaxis='x2',\n", " showlegend=False\n", ")]\n", "\n", "figb['layout']['xaxis'].update( \n", " domain=[0, 0.47],\n", " title= 'Number of S/U projects'\n", ")\n", "\n", "figb['layout'].update(\n", " xaxis2 = XAxis(\n", " grid,\n", " domain=[0.53, 1],\n", " title= 'Relative S/U difference [%]',\n", " autotick=False,\n", " dtick=20\n", " ),\n", ")\n", "\n", "figb['layout']['legend'].update(\n", " x=0.45,\n", " y=0,\n", " xanchor='right'\n", ")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "py.iplot(figb, filename='cbombardier-2b', height=height)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "5. Does the delay in days between the time it was created
and launched make a difference?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "S = df_S['Delay'].values\n", "U = df_U['Delay'].values" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "def stats_text(X):\n", " X_mean = np.mean(X)\n", " X_std = np.std(X)\n", " return [\"Mean: {:5.2f}
Stand. dev.: {:5.2f}\".format(X_mean,X_std) \n", " for i in range(len(X))]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "histnorm='percent'\n", "opacity=0.5\n", "height = 500\n", "\n", "bins = dict(\n", " start=0,\n", " end=365,\n", " size=7\n", ")\n", "\n", "trace1 = Histogram(\n", " x = S,\n", " name = 'Succesful projects',\n", " histnorm= histnorm,\n", " marker= Marker(\n", " color='#99FF00'\n", " ),\n", " opacity=opacity,\n", " text = stats_text(S),\n", " autobinx=False,\n", " xbins= XBins(bins)\n", " \n", ")\n", "\n", "trace2 = Histogram(\n", " x = U,\n", " name = 'Unsuccesful projects',\n", " histnorm= histnorm,\n", " marker= Marker(\n", " color='#CC0000'\n", " ),\n", " opacity= opacity,\n", " text= stats_text(U),\n", " autobinx=False,\n", " xbins= XBins(bins)\n", ")\n", "\n", "data = Data([trace1, trace2])\n", "\n", "layout = Layout(\n", " title='Does the delay in days between the time it was created
\\\n", "and launched make a difference?',\n", " barmode='overlay',\n", " xaxis= XAxis(\n", " grid,\n", " title='Delay in days between creation and lauch times',\n", " ),\n", " yaxis= YAxis(\n", " grid,\n", " title='Percentage of total occurrences',\n", " range=[0,15.5]\n", " ),\n", " legend= Legend(\n", " x=1,\n", " y=1,\n", " bgcolor=\"rgba(0,0,0,0)\"\n", " ),\n", " autosize=False,\n", " width=width,\n", " height=height,\n", " plot_bgcolor='#EFECEA'\n", ")\n", "\n", "fig = Figure(data=data, layout=layout)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "py.iplot(fig, filename='cbombardier-5')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " del figb\n", "except NameError:\n", " pass\n", "import copy\n", "figb = copy.deepcopy(fig)\n", "\n", "tmp = copy.deepcopy(fig['data'])\n", "\n", "tmp.update(dict(\n", " xaxis='x2',\n", " yaxis='y2',\n", " xbins= XBins(\n", " start=0,\n", " end=100,\n", " size=1\n", " ),\n", " showlegend=False\n", "))\n", "\n", "figb['data'] += tmp\n", "\n", "figb['layout'].update( \n", " xaxis2=XAxis(\n", " grid,\n", " domain=[0.52, 1],\n", " range=[0,99],\n", " anchor='y2',\n", " autotick=False,\n", " dtick=20\n", " ),\n", " yaxis2=YAxis(\n", " grid,\n", " domain=[0.26, 0.82],\n", " anchor='x2'\n", " )\n", ")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "py.iplot(figb, filename='cbombardier-5b')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": [ "13. Does the project location make a difference?" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def location_count(df): \n", " return df.groupby(\"Location\").apply(lambda x: x.shape[0]) # count by Location\n", "\n", "cutoff = 30\n", "\n", "def to_plot(df0, df1, cutoff=cutoff):\n", " \n", " Df = pd.concat([location_count(df0), location_count(df1)], axis=1)\n", " Df = Df.fillna(0) # fill in nan with 0\n", " \n", " Df['total'] = Df.ix[:,0:2].sum(axis=1)\n", " df = Df.sort('total', ascending=False)[0:cutoff] \n", " \n", " df['max'] = df.ix[:,0:2].max(axis=1) # use S+U totals to sort\n", " df = df.sort('max', ascending=True) # in ascending order (for plot)\n", " \n", " df['ratio'] = (df.ix[:,0]-df.ix[:,1])/df.ix[:,2] # compute S/U ratio\n", " df['rel. perc.'] = df['ratio']*100 # and relative percentage\n", " \n", " df['rank'] = df['rel. perc.'].rank(ascending=False) # rank by rel. perc.\n", " \n", " return df\n", "\n", "df_q13 = to_plot(df_S,df_U)\n", "df_q13.tail()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01totalmaxratiorel. perc.rank
Chicago, IL 52 80 132 80-0.212121-21.212121 4
London, UK 41 101 142 101-0.422535-42.253521 13
San Francisco, CA 110 110 220 110 0.000000 0.000000 2
New York, NY 51 136 187 136-0.454545-45.454545 14
Los Angeles, CA 55 175 230 175-0.521739-52.173913 17
\n", "

5 rows \u00d7 7 columns

\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 22, "text": [ " 0 1 total max ratio rel. perc. rank\n", "Chicago, IL 52 80 132 80 -0.212121 -21.212121 4\n", "London, UK 41 101 142 101 -0.422535 -42.253521 13\n", "San Francisco, CA 110 110 220 110 0.000000 0.000000 2\n", "New York, NY 51 136 187 136 -0.454545 -45.454545 14\n", "Los Angeles, CA 55 175 230 175 -0.521739 -52.173913 17\n", "\n", "[5 rows x 7 columns]" ] } ], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "height= 800\n", "opacity= 0.5\n", "\n", "trace1 = Bar(\n", " x = df_q13.ix[:,0].values,\n", " y = df_q13.index.values,\n", " orientation='h',\n", " name = 'Succesful projects',\n", " marker= Marker(\n", " color='#99FF00'\n", " ),\n", " opacity=opacity\n", ")\n", "\n", "trace2 = Bar(\n", " x = df_q13.ix[:,1].values,\n", " y = df_q13.index.values,\n", " orientation='h',\n", " name = 'Unsuccesful projects',\n", " marker= Marker(\n", " color='#CC0000'\n", " ),\n", " opacity=opacity\n", ")\n", "\n", "data = Data([trace1, trace2])\n", "\n", "layout = Layout(\n", " title='Does the project location make a difference?',\n", " barmode='group',\n", " xaxis= XAxis(\n", " grid,\n", " title='Number of S/U project in each city',\n", " ),\n", " yaxis= YAxis(\n", " ),\n", " legend= Legend(\n", " x=1,\n", " y=0,\n", " bgcolor=\"rgba(0,0,0,0)\"\n", " ),\n", " autosize=False,\n", " width=width,\n", " height=height,\n", " plot_bgcolor=plot_bgcolor,\n", " margin= Margin(\n", " l=135\n", " )\n", ")\n", "\n", "fig = Figure(data=data, layout=layout)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "py.iplot(fig, filename='cbombardier-13', height=height)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [ "try:\n", " del figb\n", "except NameError:\n", " pass\n", "import copy\n", "figb = copy.deepcopy(fig)\n", "\n", "def make_text(df, cutoff=cutoff):\n", " return '
Total # of projects: %s\\\n", "
Rank: %s of out %s' % (int(df['total']), int(df['rank']), cutoff) \n", "\n", "def make_color(X, cutoff=cutoff, N=6):\n", " scl = convert_cb_to_scl(cb.PuBu, N+2)[:1:-1]\n", " I_scl = np.floor(X/cutoff*(N-1))\n", " return [scl[int(i_scl)][1] for i_scl in I_scl]\n", "\n", "figb['data'] += [Bar(\n", " x = df_q13.ix[:,5].values,\n", " y = df_q13.index.values,\n", " orientation='h',\n", " name = 'Relative difference',\n", " text= df_q13.apply(make_text,axis=1).tolist(),\n", " marker= Marker(\n", " color= make_color(df_q13['rank'].values)\n", " ),\n", " opacity=opacity,\n", " xaxis='x2',\n", " showlegend=False\n", ")]\n", "\n", "figb['layout']['xaxis'].update( \n", " domain=[0, 0.47],\n", " title= 'Number of S/U projects'\n", ")\n", "\n", "figb['layout'].update(\n", " xaxis2 = XAxis(\n", " grid,\n", " domain=[0.53, 1],\n", " title= 'Relative S/U difference [%]',\n", " autotick=False,\n", " dtick=20\n", " ),\n", ")\n", "\n", "figb['layout']['legend'].update(\n", " x=0.45,\n", " y=0,\n", " xanchor='right'\n", ")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 25 }, { "cell_type": "code", "collapsed": false, "input": [ "py.iplot(figb, filename='cbombardier-13b', height=height)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "
\n", "\n", "

Got Questions or Feedback?

\n", "\n", "About Plotly\n", "\n", "* email: feedback@plot.ly \n", "* tweet: \n", "@plotlygraphs\n", "\n", "

Notebook styling ideas

\n", "\n", "Big thanks to\n", "\n", "* Cam Davidson-Pilon\n", "* Lorena A. Barba\n", "\n", "
" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.display import display, HTML\n", "import urllib2\n", "url = 'https://raw.githubusercontent.com/plotly/python-user-guide/master/custom.css'\n", "display(HTML(urllib2.urlopen(url).read()))" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", "\n" ], "metadata": {}, "output_type": "display_data", "text": [ "" ] } ], "prompt_number": 27 } ], "metadata": {} } ] }