{
"metadata": {
"name": "",
"signature": "sha256:91941a47b66e331df7feb5cec22b794c17f0a7ffd6c0e50d86f8d15efc55c5bc"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Anthony William Shannon\t\n",
"----------------------\n",
"\n",
"Disruptor---Linking People & Technology.\n",
"\n",
"Charles---yes we can. \n",
"\n",
"Send over all your data over. We'll make killer plots and let you visualize all your data very efficiently\n",
"\n",
"anthony@plot.ly \n",
"\n",
"-AWS\n",
"\n",
"Sent from LinkedIn for iPhone\n",
"http://lnkd.in/ios\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Charles Bombardier\n",
"--------------------\n",
"Hello Anthony\n",
"\n",
"I am doing a research on Crowdfunidng websites and I have some data to analyze, could you help me use plot.ly to extract the data I need for my thesis? \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Anthony William Shannon\n",
"--------------------\n",
"Hi Charles, \n",
"\n",
"Thanks for the Add! I wanted to put Plot.ly on your radar--- a new data analysis & visualization platform allowing all engineers & management to collaborate & communicate efficiently. It's a complimentary product to MATLAB, which I know is widely used at Bombardier. \n",
"\n",
"We just closed a deal with Space X where all their flight data is being displayed via Plotly and are really interested in growing every vertical within Aerospace/Engineering.\n",
"\n",
"Is this of interest? Would love to get your eyes on it. \n",
"\n",
"AWS\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Charles Bombardier\n",
"-------------------\n",
"\n",
"Hello Anthony\n",
"\n",
"Here is my data, \n",
"\n",
"\n",
"\n",
"\n",
"I am trying to identify what make a the difference between a successful project on Kickstarter and the rest (Or the ones that almost succeeded)\n",
"\n",
"A successful project is\n",
"\n",
"1. A project that had a goal of raising at least 5 000\\$\n",
"2. Who attracted at least 10 backers \n",
"3. Who raised at least 100\\$ \n",
"4. Who did meet its goal\n",
"5. Who exceeded is goal by 50% \n",
"6. Who attracted over 150 different backers in the end\n",
"\n",
"An unsuccessful project is\n",
"\n",
"1. A project that had a goal of raising at least 5 000\\$\n",
"2. Who attracted at least 10 backers \n",
"3. Who raised at least 100\\$ \n",
"4. Who did not meet its goal\n",
"\n",
"I need to see if there is a **correlation** and **significance** between the data sets and the global average.\n",
"\n",
"What makes a difference?\n",
"\n",
"\n",
"1. Does the length of the Title (Number of characters) make a difference? \n",
"2. Are there some specific keywords in the Title that make a difference?\n",
"3. Does the length of the Description (Number of characters) make a difference?\n",
"4. Are there some specific keywords in the description that make a difference?\n",
"5. Does the delay in days between the time it was created and launched make a difference?\n",
"6. Does the day of the year (From 0 to 365) it was launched make a difference ? \n",
"(PS: I was not able to identify the day of the year in numbers in excel because there are many years) \n",
"7. Does the day of the week make a difference?\n",
"8. Does the number of backers make a difference?\n",
"9. Does the number of comments make a difference?\n",
"10. Does the number of comments per backer make a difference?\n",
"11. Does the number of revisions make a difference?\n",
"12. Does the project duration make a difference?\n",
"13. Does the project location make a difference?\n",
"14. Does the presence of a photo make a difference?\n",
"15. Does the presence of a video make a difference?\n",
"16. Does the number of pledge make a difference?\n",
"17. How about the value of each pledge ?\n",
"\n",
"\n",
"I\u2019ll wait to hear from you, thanks a lot!\n",
"Best regards\n",
"\n",
"Charles"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"
"
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Import data and modules"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import plotly.plotly as py\n",
"import plotly.tools as tls\n",
"from plotly.graph_objs import *"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd\n",
"import numpy as np"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_all = pd.read_excel(\"Kickstarter Data.xls\", \"All projects\")\n",
"df_S = pd.read_excel(\"Kickstarter Data.xls\", \"Sucessful +5K + 50% +150 back\")\n",
"df_U = pd.read_excel(\"Kickstarter Data.xls\", \"Unsucessfull +5K +100$ +10 back\")"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df_all.shape, df_S.shape, df_U.shape"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"((10159, 178), (1252, 178), (3728, 178))"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Some definitions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Colors and grid style"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"col_S = '#99FF00'\n",
"col_U = '#CC0000'\n",
"col_diff = '#0099ff'\n",
"\n",
"grid = dict(\n",
" showgrid=True,\n",
" gridcolor='#FFFFFF',\n",
" gridwidth=1.5\n",
") \n",
"\n",
"width = 650\n",
"plot_bgcolor = '#EFECEA'"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Colorbrewer color scale to plotly color scale function"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import colorbrewer as cb \n",
"\n",
"def convert_cb_to_scl(cb_color,N=5):\n",
" '''\n",
" cb_color (positional): colorbrewer color dictionary\n",
" N (keyword): number of colors in color scale\n",
" '''\n",
" colors = cb_color[N] # get list of N color tuples from cb dict\n",
" levels = np.linspace(0,1,N).tolist() # get list of N levels \n",
" \n",
" # Make color scale list of lists, conveting each tuple to 'rgb( , , )'\n",
" scl_cb = []\n",
" scl_cb += [[i, \"rgb(\"+','.join(map(str,color))+\")\"] \n",
" for i,color in zip(levels,colors)]\n",
" return scl_cb"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n", " | 0 | \n", "1 | \n", "total | \n", "max | \n", "ratio | \n", "rel. perc. | \n", "rank | \n", "
---|---|---|---|---|---|---|---|
system | \n", "32 | \n", "94 | \n", "126 | \n", "94 | \n", "-0.492063 | \n", "-49.206349 | \n", "17 | \n", "
stand | \n", "24 | \n", "95 | \n", "119 | \n", "95 | \n", "-0.596639 | \n", "-59.663866 | \n", "25 | \n", "
ipad | \n", "60 | \n", "173 | \n", "233 | \n", "173 | \n", "-0.484979 | \n", "-48.497854 | \n", "16 | \n", "
case | \n", "35 | \n", "185 | \n", "220 | \n", "185 | \n", "-0.681818 | \n", "-68.181818 | \n", "28 | \n", "
iphone | \n", "95 | \n", "298 | \n", "393 | \n", "298 | \n", "-0.516539 | \n", "-51.653944 | \n", "20 | \n", "
5 rows \u00d7 7 columns
\n", "\n", " | 0 | \n", "1 | \n", "total | \n", "max | \n", "ratio | \n", "rel. perc. | \n", "rank | \n", "
---|---|---|---|---|---|---|---|
Chicago, IL | \n", "52 | \n", "80 | \n", "132 | \n", "80 | \n", "-0.212121 | \n", "-21.212121 | \n", "4 | \n", "
London, UK | \n", "41 | \n", "101 | \n", "142 | \n", "101 | \n", "-0.422535 | \n", "-42.253521 | \n", "13 | \n", "
San Francisco, CA | \n", "110 | \n", "110 | \n", "220 | \n", "110 | \n", "0.000000 | \n", "0.000000 | \n", "2 | \n", "
New York, NY | \n", "51 | \n", "136 | \n", "187 | \n", "136 | \n", "-0.454545 | \n", "-45.454545 | \n", "14 | \n", "
Los Angeles, CA | \n", "55 | \n", "175 | \n", "230 | \n", "175 | \n", "-0.521739 | \n", "-52.173913 | \n", "17 | \n", "
5 rows \u00d7 7 columns
\n", "