{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Guided Project: Analyzing Startup Fundraising Deals from Crunchbase\n", "\n", "Throughout this guided project, we'll practice working with different memory constraints.\n", "\n", "In this step, let's assume we only have 10 megabytes of available memory.While crunchbase-investments.csv consumes 10.3 megabytes of disk space, we know from earlier missions that pandas often requires 4 to 6 times amount of space in memory as the file does on disk (especially when there's many string columns).\n", "\n", "The data set of investments we'll be exploring is current as of October 2013. You can download it from GitHub.\n", "https://github.com/datahoarder/crunchbase-october-2013/blob/master/crunchbase-investments.csv" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import sqlite3\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "crunch_iter = pd.read_csv('crunchbase_investments.csv', encoding='ISO-8859-1', low_memory=False)\n", "\n", "# dataframe\n", "col_types = {\n", " \"company_name\": \"category\", \"company_category_code\": \"category\", \"company_country_code\": \"category\",\n", " \"company_state_code\": \"category\", \"company_city\": \"category\", \"investor_name\": \"category\",\n", " \"investor_category_code\": \"category\", \"investor_country_code\": \"category\", \"investor_state_code\": \"category\",\n", " \"investor_city\": \"category\", \"funding_round_type\": \"category\", \"raised_amount_usd\": \"float64\"\n", "}\n", "use_cols = [\n", " \"company_name\", \"company_category_code\", \"company_country_code\", \"company_state_code\",\"company_city\",\n", " \"investor_name\", \"investor_category_code\", \"investor_country_code\", \"investor_state_code\", \"investor_city\",\n", " \"funding_round_type\", \"funded_at\", \"raised_amount_usd\"\n", "]\n", "\n", "crunch_iter_shape = pd.read_csv('crunchbase_investments.csv', usecols=use_cols, dtype=col_types, parse_dates=[\"funded_at\"], encoding='ISO-8859-1', chunksize=1000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# steps\n", "Steps:\n", "- Determine dtype for each column - DONE\n", "- count number of missing value counts - DONE\n", "- create dictionary of all values - DONE (but don't even think this is necessary)\n", "- count unique values and unique value percentage of total values - DONE\n", "- memory footprint - DONE\n", "- drop certain columns that aren't useful for analysis - DONE\n", "- specify dtypes - DONE\n", "- change types for efficiency - DONE\n", "- clean data, if necesssary - NOT NECESSARY\n", "- create new database - DONE\n", "- inject csv into new database" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "company_permalink\n", "object\n", "3.8699302673339844\n", "------------------------\n", "company_name\n", "object\n", "3.425077438354492\n", "------------------------\n", "company_category_code\n", "object\n", "3.2627410888671875\n", "------------------------\n", "company_country_code\n", "object\n", "3.0253448486328125\n", "------------------------\n", "company_state_code\n", "object\n", "2.962283134460449\n", "------------------------\n", "company_region\n", "object\n", "3.25362491607666\n", "------------------------\n", "company_city\n", "object\n", "3.343595504760742\n", "------------------------\n", "investor_permalink\n", "object\n", "4.749942779541016\n", "------------------------\n", "investor_name\n", "object\n", "3.7343921661376953\n", "------------------------\n", "investor_category_code\n", "object\n", "1.6893692016601562\n", "------------------------\n", "investor_country_code\n", "object\n", "2.704906463623047\n", "------------------------\n", "investor_state_code\n", "object\n", "2.542128562927246\n", "------------------------\n", "investor_region\n", "object\n", "3.239068031311035\n", "------------------------\n", "investor_city\n", "object\n", "2.9316816329956055\n", "------------------------\n", "funding_round_type\n", "object\n", "3.2528257369995117\n", "------------------------\n", "funded_at\n", "object\n", "3.3782129287719727\n", "------------------------\n", "funded_month\n", "object\n", "3.226959228515625\n", "------------------------\n", "funded_quarter\n", "object\n", "3.226959228515625\n", "------------------------\n", "funded_year\n", "float64\n", "0.4034881591796875\n", "------------------------\n", "raised_amount_usd\n", "float64\n", "0.4034881591796875\n", "------------------------\n" ] } ], "source": [ "# determine dtype for each column and memory usage in mb for each column\n", "for crunch in crunch_iter:\n", " print(crunch)\n", " print(crunch_iter[crunch].dtype)\n", " print((crunch_iter[crunch].memory_usage(deep=True)) / (2 ** 20))\n", " print('------------------------')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " company_name company_category_code company_country_code \\\n", "0 AdverCar advertising USA \n", "1 LaunchGram news USA \n", "2 uTaP messaging USA \n", "3 ZoopShop software USA \n", "4 eFuneral web USA \n", ".. ... ... ... \n", "995 SpotHero web USA \n", "996 Cureeo ecommerce USA \n", "997 Fibroblast enterprise USA \n", "998 Whimseybox ecommerce USA \n", "999 Moxie Jean ecommerce USA \n", "\n", " company_state_code company_city investor_name \\\n", "0 CA San Francisco 1-800-FLOWERS.COM \n", "1 CA Mountain View 10Xelerator \n", "2 NaN NaN 10Xelerator \n", "3 OH columbus 10Xelerator \n", "4 OH Cleveland 10Xelerator \n", ".. ... ... ... \n", "995 IL Chicago Excelerate Labs \n", "996 IL Chicago Excelerate Labs \n", "997 IL Chicago Excelerate Labs \n", "998 TX Houston Excelerate Labs \n", "999 IL Arlington Heights Excelerate Labs \n", "\n", " investor_category_code investor_country_code investor_state_code \\\n", "0 NaN USA NY \n", "1 finance USA OH \n", "2 finance USA OH \n", "3 finance USA OH \n", "4 finance USA OH \n", ".. ... ... ... \n", "995 consulting USA IL \n", "996 consulting USA IL \n", "997 consulting USA IL \n", "998 consulting USA IL \n", "999 consulting USA IL \n", "\n", " investor_city funding_round_type funded_at raised_amount_usd \n", "0 New York series-a 2012-10-30 2000000.0 \n", "1 Columbus other 2012-01-23 20000.0 \n", "2 Columbus other 2012-01-01 20000.0 \n", "3 Columbus angel 2012-02-15 20000.0 \n", "4 Columbus other 2011-09-08 20000.0 \n", ".. ... ... ... ... \n", "995 Chicago angel 2012-08-27 460000.0 \n", "996 Chicago angel 2012-06-01 75000.0 \n", "997 Chicago angel 2012-06-01 75000.0 \n", "998 Chicago angel 2012-06-01 75000.0 \n", "999 Chicago angel 2012-06-01 75000.0 \n", "\n", "[1000 rows x 13 columns]\n" ] }, { "ename": "TypeError", "evalue": "'TextFileReader' object is not subscriptable", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m\u001b[0m\n\u001b[0;32m 2\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mcrunch\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mcrunch_iter_shape\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 3\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mcrunch\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 4\u001b[1;33m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mcrunch_iter_shape\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mcrunch\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0misnull\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0msum\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'------------------------'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", "\u001b[1;31mTypeError\u001b[0m: 'TextFileReader' object is not subscriptable" ] } ], "source": [ "# count number of missing value counts\n", "for crunch in crunch_iter_shape:\n", " print(crunch)\n", " print(crunch_iter_shape[crunch].isnull().sum())\n", " print('------------------------')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "company_permalink\n", "0.21891431813883108\n", "------------------------\n", "company_name\n", "0.21891431813883108\n", "------------------------\n", "company_category_code\n", "0.0008322299981085682\n", "------------------------\n", "company_country_code\n", "5.6742954416493286e-05\n", "------------------------\n", "company_state_code\n", "0.0009646302250803858\n", "------------------------\n", "company_region\n", "0.010346132021940608\n", "------------------------\n", "company_city\n", "0.023264611310762247\n", "------------------------\n", "investor_permalink\n", "0.19960279931908453\n", "------------------------\n", "investor_name\n", "0.19795725364100625\n", "------------------------\n", "investor_category_code\n", "0.0006430868167202572\n", "------------------------\n", "investor_country_code\n", "0.0013807452241346699\n", "------------------------\n", "investor_state_code\n", "0.0009646302250803858\n", "------------------------\n", "investor_region\n", "0.011083790429355022\n", "------------------------\n", "investor_city\n", "0.018744089275581615\n", "------------------------\n", "funding_round_type\n", "0.00018914318138831096\n", "------------------------\n", "funded_at\n", "0.053130319651976544\n", "------------------------\n", "funded_month\n", "0.003650463400794401\n", "------------------------\n", "funded_quarter\n", "0.0013807452241346699\n", "------------------------\n", "funded_year\n", "0.000397200680915453\n", "------------------------\n", "raised_amount_usd\n", "0.02759599016455457\n", "------------------------\n", "/company/ecomom 58\n", "/company/fab-com 57\n", "/company/aperto-networks 49\n", "/company/practice-fusion 47\n", "/company/klout 46\n", " ..\n", "1204650.0 1\n", "1644500.0 1\n", "34995.0 1\n", "1512066.0 1\n", "83000000.0 1\n", "Length: 52322, dtype: int64\n" ] } ], "source": [ "# count unique values and unique value percentage of total values\n", "overall_vc = []\n", "for crunch in crunch_iter:\n", " print(crunch)\n", " num_unique_values = len(crunch_iter[crunch].unique())\n", " num_total_values = len(crunch_iter[crunch])\n", " print(num_unique_values / num_total_values)\n", " print('------------------------')\n", " overall_vc.append(crunch_iter[crunch].value_counts())\n", "combined_vc = pd.concat(overall_vc)\n", "print(combined_vc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "tables to drop\n", "1. company_permalink\n", "2. company_region\n", "3. investor_permalink\n", "4. investor_region\n", "5. funded_month\n", "6. funded_quarter\n", "7. funded_year\n", "\n", "Tables to include and their data type to specify:\n", "1. \"company_name\": \"category\"\n", "2. \"company_category_code\": \"category\"\n", "3. \"company_country_code\": \"category\"\n", "4. \"company_state_code\": \"category\"\n", "5. \"company_city\": \"category\"\n", "6. \"investor_name\": \"category\"\n", "7. \"investor_category_code\": \"category\"\n", "8. \"investor_country_code\": \"category\"\n", "9. \"investor_state_code\": \"category\"\n", "10. \"investor_city\": \"category\"\n", "11. \"funding_round_type\": \"category\"\n", "12. \"funded_at\": \"datetime64\"\n", "13. \"raised_amount_usd\": \"int64\" <-- need to convert all float to int" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3200000000.0\n" ] } ], "source": [ "# find max integer/float in column\n", "\n", "print(crunch_iter['raised_amount_usd'].max())" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([ 6., 20., 13., 11., 0., 0., 0., 1., 0., 1.]),\n", " array([0.12555218, 0.13660088, 0.14764957, 0.15869827, 0.16974697,\n", " 0.18079567, 0.19184437, 0.20289307, 0.21394176, 0.22499046,\n", " 0.23603916]),\n", " )" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAD4CAYAAAAKA1qZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAR3UlEQVR4nO3dfawld13H8ffHPmgElNZea+kDi1IbC7EFrwvIg8XS0hakahpp48MiJatYEokYUyWhBv6pGiBiiXWla4uB0igUm7SlXSumYFrgtm5LH4AttdpdavfCYkvBqAtf/7izcrk9d+/ZM3Pvucvv/UpO7sxvfjPz++Zkz2dn5pyZVBWSpDZ9z7QHIEmaHkNAkhpmCEhSwwwBSWqYISBJDTt02gMY5aijjqoNGzZMexiSdNC44447vlxVMwe63roMgQ0bNjA3NzftYUjSQSPJv02ynqeDJKlhhoAkNcwQkKSGGQKS1DBDQJIaZghIUsNWDIEkxyf5eJL7ktyb5He69iOTbEuyo/t7xDLrb+r67EiyaegCJEmTG+dIYC/wlqo6GXghcFGSk4GLgVuq6kTglm7+OyQ5ErgEeAGwEbhkubCQJK29FUOgqh6pqju76a8B9wPHAucCV3XdrgJ+YcTqrwS2VdWeqvoqsA04a4BxS5IGcEC/GE6yAXge8Cng6Kp6pFv0H8DRI1Y5Fnh40fzOrm3UtjcDmwFOOOGEAxnWurDh4uuntu+HLn3V1PYt6eA29oXhJE8FPgy8uaoeX7ysFh5P1usRZVW1papmq2p2ZuaAb38hSZrAWCGQ5DAWAuADVfWRrvnRJMd0y48Bdo9YdRdw/KL547o2SdI6MM63gwJcAdxfVe9atOg6YN+3fTYBfz9i9ZuAM5Mc0V0QPrNrkyStA+McCbwY+DXg55Js717nAJcCZyTZAbyimyfJbJL3AVTVHuAdwGe619u7NknSOrDiheGq+iSQZRafPqL/HPCGRfNbga2TDlCStHr8xbAkNcwQkKSGGQKS1DBDQJIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGmYISFLDDAFJapghIEkNMwQkqWErPlQmyVbg1cDuqnpu13YNcFLX5enAf1bVqSPWfQj4GvBNYG9VzQ4yaknSIFYMAeBK4DLg/fsaquq1+6aTvBN4bD/rv7yqvjzpACVJq2ecx0vemmTDqGXdQ+h/Gfi5gcclSVoDfa8JvBR4tKp2LLO8gJuT3JFkc899SZIGNs7poP25ALh6P8tfUlW7kvwwsC3J56rq1lEdu5DYDHDCCSf0HJYkaRwTHwkkORT4JeCa5fpU1a7u727gWmDjfvpuqarZqpqdmZmZdFiSpAPQ53TQK4DPVdXOUQuTPCXJ0/ZNA2cC9/TYnyRpYCuGQJKrgduAk5LsTHJht+h8lpwKSvKMJDd0s0cDn0xyF/Bp4Pqq+thwQ5ck9TXOt4MuWKb9dSPavgSc000/CJzSc3ySpFXkL4YlqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNcwQkKSGGQKS1DBDQJIaZghIUsMMAUlqmCEgSQ0zBCSpYeM8XnJrkt1J7lnU9kdJdiXZ3r3OWWbds5J8PskDSS4ecuCSpP7GORK4EjhrRPu7q+rU7nXD0oVJDgHeC5wNnAxckOTkPoOVJA1rxRCoqluBPRNseyPwQFU9WFX/A3wIOHeC7UiSVkmfawJvSnJ3d7roiBHLjwUeXjS/s2sbKcnmJHNJ5ubn53sMS5I0rklD4C+AHwNOBR4B3tl3IFW1papmq2p2Zmam7+YkSWOYKASq6tGq+mZVfQv4KxZO/Sy1Czh+0fxxXZskaZ2YKASSHLNo9heBe0Z0+wxwYpJnJTkcOB+4bpL9SZJWx6ErdUhyNXAacFSSncAlwGlJTgUKeAj4za7vM4D3VdU5VbU3yZuAm4BDgK1Vde9qFCFJmsyKIVBVF4xovmKZvl8Czlk0fwPwpK+PSpLWB38xLEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNcwQkKSGGQKS1DBDQJIaZghIUsNWfJ6A1r8NF18/lf0+dOmrprJfScPxSECSGrZiCCTZmmR3knsWtf1pks8luTvJtUmevsy6DyX5bJLtSeYGHLckaQDjHAlcCZy1pG0b8Nyq+kngC8Af7Gf9l1fVqVU1O9kQJUmrZcUQqKpbgT1L2m6uqr3d7O3AcaswNknSKhvimsDrgRuXWVbAzUnuSLJ5fxtJsjnJXJK5+fn5AYYlSVpJrxBI8lZgL/CBZbq8pKqeD5wNXJTkZcttq6q2VNVsVc3OzMz0GZYkaUwTh0CS1wGvBn6lqmpUn6ra1f3dDVwLbJx0f5Kk4U0UAknOAn4feE1VfWOZPk9J8rR908CZwD2j+kqSpmOcr4heDdwGnJRkZ5ILgcuApwHbuq9/Xt71fUaSG7pVjwY+meQu4NPA9VX1sVWpQpI0kRV/MVxVF4xovmKZvl8CzummHwRO6TU6SdKq8hfDktQwQ0CSGmYISFLDDAFJapghIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhrmg+Y1sWk94B58yL00FI8EJKlhhoAkNcwQkKSGGQKS1DBDQJIaZghIUsPGCoEkW5PsTnLPorYjk2xLsqP7e8Qy627q+uxIsmmogUuS+hv3SOBK4KwlbRcDt1TVicAt3fx3SHIkcAnwAmAjcMlyYSFJWntjhUBV3QrsWdJ8LnBVN30V8AsjVn0lsK2q9lTVV4FtPDlMJElT0ueawNFV9Ug3/R/A0SP6HAs8vGh+Z9f2JEk2J5lLMjc/P99jWJKkcQ1yYbiqCqie29hSVbNVNTszMzPEsCRJK+gTAo8mOQag+7t7RJ9dwPGL5o/r2iRJ60CfELgO2Pdtn03A34/ocxNwZpIjugvCZ3ZtkqR1YNyviF4N3AaclGRnkguBS4EzkuwAXtHNk2Q2yfsAqmoP8A7gM93r7V2bJGkdGOtW0lV1wTKLTh/Rdw54w6L5rcDWiUYnSVpV/mJYkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNcwQkKSGGQKS1DBDQJIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGjZxCCQ5Kcn2Ra/Hk7x5SZ/Tkjy2qM/beo9YkjSYsR4vOUpVfR44FSDJIcAu4NoRXT9RVa+edD+SpNUz1Omg04EvVtW/DbQ9SdIaGCoEzgeuXmbZi5LcleTGJM9ZbgNJNieZSzI3Pz8/0LAkSfvTOwSSHA68BvjbEYvvBJ5ZVacAfw58dLntVNWWqpqtqtmZmZm+w5IkjWGII4GzgTur6tGlC6rq8ap6opu+ATgsyVED7FOSNIAhQuACljkVlORHkqSb3tjt7ysD7FOSNICJvx0EkOQpwBnAby5q+y2AqrocOA94Y5K9wH8B51dV9dmnJGk4vUKgqr4O/NCStssXTV8GXNZnH5Kk1eMvhiWpYYaAJDXMEJCkhhkCktQwQ0CSGmYISFLDDAFJapghIEkNMwQkqWGGgCQ1rNdtI9ajDRdfP+0hSNJBwyMBSWqYISBJDTMEJKlhhoAkNcwQkKSGGQKS1LDeIZDkoSSfTbI9ydyI5UnyniQPJLk7yfP77lOSNIyhfifw8qr68jLLzgZO7F4vAP6i+ytJmrK1OB10LvD+WnA78PQkx6zBfiVJKxgiBAq4OckdSTaPWH4s8PCi+Z1d23dIsjnJXJK5+fn5AYYlSVrJECHwkqp6PgunfS5K8rJJNlJVW6pqtqpmZ2ZmBhiWJGklvUOgqnZ1f3cD1wIbl3TZBRy/aP64rk2SNGW9QiDJU5I8bd80cCZwz5Ju1wG/3n1L6IXAY1X1SJ/9SpKG0ffbQUcD1ybZt60PVtXHkvwWQFVdDtwAnAM8AHwD+I2e+5QkDaRXCFTVg8ApI9ovXzRdwEV99iNJWh3+YliSGmYISFLDDAFJapghIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNcwQkKSGGQKS1DBDQJIaNnEIJDk+yceT3Jfk3iS/M6LPaUkeS7K9e72t33AlSUPq83jJvcBbqurO7mHzdyTZVlX3Len3iap6dY/9SJJWycRHAlX1SFXd2U1/DbgfOHaogUmSVt8g1wSSbACeB3xqxOIXJbkryY1JnrOfbWxOMpdkbn5+fohhSZJW0DsEkjwV+DDw5qp6fMniO4FnVtUpwJ8DH11uO1W1papmq2p2Zmam77AkSWPoFQJJDmMhAD5QVR9ZuryqHq+qJ7rpG4DDkhzVZ5+SpOH0+XZQgCuA+6vqXcv0+ZGuH0k2dvv7yqT7lCQNq8+3g14M/Brw2STbu7Y/BE4AqKrLgfOANybZC/wXcH5VVY99SpIGNHEIVNUngazQ5zLgskn3IUlaXf5iWJIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGtbnthGSGrDh4uunst+HLn3VVPYLbdXskYAkNcwQkKSGGQKS1DBDQJIaZghIUsMMAUlqmCEgSQ3r+6D5s5J8PskDSS4esfx7k1zTLf9Ukg199idJGlafB80fArwXOBs4GbggyclLul0IfLWqng28G/jjSfcnSRpenyOBjcADVfVgVf0P8CHg3CV9zgWu6qb/Djg9yX6fSyxJWjt9bhtxLPDwovmdwAuW61NVe5M8BvwQ8OWlG0uyGdjczT6R5PM9xraeHcWI+r9LrFltWftjSt+3NTbQe7wua1vOAda8tLZnTrLPdXPvoKraAmyZ9jhWW5K5qpqd9jhWg7UdnKzt4DRUbX1OB+0Cjl80f1zXNrJPkkOBHwS+0mOfkqQB9QmBzwAnJnlWksOB84HrlvS5DtjUTZ8H/GNVVY99SpIGNPHpoO4c/5uAm4BDgK1VdW+StwNzVXUdcAXwN0keAPawEBSt+24+5WVtBydrOzgNUlv8j7kktctfDEtSwwwBSWqYITCQMW6h8bIkdybZm+S8Ect/IMnOJJetzYjH16e2JCckuTnJ/UnuW2+3DulZ258kuber7T3r7YeQY9T2u917cneSW5I8c9GyTUl2dK9NS9edtklrS3Jqktu69+3uJK9d+9GvrM971y0f//Okqnz1fLFwYfyLwI8ChwN3AScv6bMB+Eng/cB5I7bxZ8AHgcumXc+QtQH/BJzRTT8V+P5p1zREbcDPAP/cbeMQ4DbgtGnXdIC1vXzf+wG8Ebimmz4SeLD7e0Q3fcS0axqoth8HTuymnwE8Ajx92jUNVd+i5WN/nngkMIwVb6FRVQ9V1d3At5aunOSngKOBm9disAdo4tq6e0kdWlXbun5PVNU31mjc4+jzvhXwfSz8I/1e4DDg0dUf8tjGqe3ji96P21n4rQ/AK4FtVbWnqr4KbAPOWqNxj2Pi2qrqC1W1o5v+ErAbmFmzkY+nz3t3wJ8nhsAwRt1C49hxVkzyPcA7gd9bhXENYeLaWPhf138m+UiSf0nyp92NB9eLiWurqtuAj7PwP8lHgJuq6v7BRzi5A63tQuDGCddda31q+39JNrIQ4l8cdHT9TVzfJJ8n6+a2EQ37beCGqtq5zk4pD+FQ4KXA84B/B64BXsfC70cOakmeDfwE3/4f2LYkL62qT0xxWBNJ8qvALPCz0x7L0JarLckxwN8Am6rqSUfnB4sR9R3w54khMIxxbqGxnBcBL03y2yycMz88yRNV9aSLQVPSp7adwPaqehAgyUeBF7J+QqBPbb8I3F5VTwAkuZGF93K9hMBYtSV5BfBW4Ger6r8XrXvaknX/aVVGOZk+tZHkB4DrgbdW1e2rPNZJ9KnvwD9Ppn0R5LvhxUKYPgg8i29fyHnOMn2vZMSF4W7Z61h/F4Ynro2FC1x3ATPd/F8DF027poFqey3wD902DgNuAX5+2jUdSG0sHKF9ke5C6aL2I4F/ZeGi8BHd9JHTrmmg2g7v3qs3T7uO1ahvSZ+xPk+mXvB3yws4B/hC98a8tWt7O/CabvqnWfif8ddZuInevZO+aQdTbcAZwN3AZ7sP0sOnXc8QtXUB95fA/cB9wLumXcsEtf0DCxezt3ev6xat+3rgge71G9OuZajagF8F/ndR+3bg1GnXM+R7t2gbY32eeNsISWqY3w6SpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlh/weRnPkifTiGpwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# mapping the memory usage of each chunk\n", "\n", "memory_footprints = []\n", "for chunk in crunch_iter_shape:\n", " memory_footprints.append(chunk.memory_usage(deep=True).sum()/(2 ** 20))\n", "plt.hist(memory_footprints)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7.870912551879883\n" ] } ], "source": [ "# totaling the chunks together to determine total memory used in dataframe\n", "print(sum(memory_footprints))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# steps for 3/4 - loading chunks into sqlite\n", "1. create and connect to a new sqlite database file - DONE\n", "2. expand on the existing chunk processing code to export each chunk to a new table in the sqlite db\n", "3. query the table and make sure the data types match up for each column" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# create and connecto a new sqlite database file\n", "\n", "conn = sqlite3.connect('crunchbase_investments.db')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# export each chunk to a new table in the sqlite db\n", "\n", "for chunk in crunch_iter_shape:\n", " chunk.to_sql('investments', conn, if_exists='append', index=False)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_namecompany_category_codecompany_country_codecompany_state_codecompany_cityinvestor_nameinvestor_category_codeinvestor_country_codeinvestor_state_codeinvestor_cityfunding_round_typefunded_atraised_amount_usd
0AdverCaradvertisingUSACASan Francisco1-800-FLOWERS.COMNoneUSANYNew Yorkseries-a2012-10-30 00:00:002000000.0
1LaunchGramnewsUSACAMountain View10XeleratorfinanceUSAOHColumbusother2012-01-23 00:00:0020000.0
2uTaPmessagingUSANoneNone10XeleratorfinanceUSAOHColumbusother2012-01-01 00:00:0020000.0
3ZoopShopsoftwareUSAOHcolumbus10XeleratorfinanceUSAOHColumbusangel2012-02-15 00:00:0020000.0
4eFuneralwebUSAOHCleveland10XeleratorfinanceUSAOHColumbusother2011-09-08 00:00:0020000.0
..........................................
52865Garantia DataenterpriseUSACASanta ClaraZohar GilonNoneNoneNoneNoneseries-a2012-08-08 00:00:003800000.0
52866DudaMobilemobileUSACAPalo AltoZohar GilonNoneNoneNoneNoneseries-c+2013-04-08 00:00:0010300000.0
52867SiteBrainssoftwareUSACASan Franciscozohar israelNoneNoneNoneNoneangel2010-08-01 00:00:00350000.0
52868Comprehend SystemsenterpriseUSACAPalo AltoZorba LiebermanNoneNoneNoneNoneseries-a2013-07-11 00:00:008400000.0
52869SmartThingsmobileUSADCMinneapolisZorik GordonNoneNoneNoneNoneseries-a2012-12-04 00:00:003000000.0
\n", "

52870 rows × 13 columns

\n", "
" ], "text/plain": [ " company_name company_category_code company_country_code \\\n", "0 AdverCar advertising USA \n", "1 LaunchGram news USA \n", "2 uTaP messaging USA \n", "3 ZoopShop software USA \n", "4 eFuneral web USA \n", "... ... ... ... \n", "52865 Garantia Data enterprise USA \n", "52866 DudaMobile mobile USA \n", "52867 SiteBrains software USA \n", "52868 Comprehend Systems enterprise USA \n", "52869 SmartThings mobile USA \n", "\n", " company_state_code company_city investor_name \\\n", "0 CA San Francisco 1-800-FLOWERS.COM \n", "1 CA Mountain View 10Xelerator \n", "2 None None 10Xelerator \n", "3 OH columbus 10Xelerator \n", "4 OH Cleveland 10Xelerator \n", "... ... ... ... \n", "52865 CA Santa Clara Zohar Gilon \n", "52866 CA Palo Alto Zohar Gilon \n", "52867 CA San Francisco zohar israel \n", "52868 CA Palo Alto Zorba Lieberman \n", "52869 DC Minneapolis Zorik Gordon \n", "\n", " investor_category_code investor_country_code investor_state_code \\\n", "0 None USA NY \n", "1 finance USA OH \n", "2 finance USA OH \n", "3 finance USA OH \n", "4 finance USA OH \n", "... ... ... ... \n", "52865 None None None \n", "52866 None None None \n", "52867 None None None \n", "52868 None None None \n", "52869 None None None \n", "\n", " investor_city funding_round_type funded_at raised_amount_usd \n", "0 New York series-a 2012-10-30 00:00:00 2000000.0 \n", "1 Columbus other 2012-01-23 00:00:00 20000.0 \n", "2 Columbus other 2012-01-01 00:00:00 20000.0 \n", "3 Columbus angel 2012-02-15 00:00:00 20000.0 \n", "4 Columbus other 2011-09-08 00:00:00 20000.0 \n", "... ... ... ... ... \n", "52865 None series-a 2012-08-08 00:00:00 3800000.0 \n", "52866 None series-c+ 2013-04-08 00:00:00 10300000.0 \n", "52867 None angel 2010-08-01 00:00:00 350000.0 \n", "52868 None series-a 2013-07-11 00:00:00 8400000.0 \n", "52869 None series-a 2012-12-04 00:00:00 3000000.0 \n", "\n", "[52870 rows x 13 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# query the table and make sure the data types match up for each column\n", "\n", "pd.read_sql('''SELECT * FROM investments;''', conn)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " cid name type notnull dflt_value pk\n", "0 0 company_name TEXT 0 None 0\n", "1 1 company_category_code TEXT 0 None 0\n", "2 2 company_country_code TEXT 0 None 0\n", "3 3 company_state_code TEXT 0 None 0\n", "4 4 company_city TEXT 0 None 0\n", "5 5 investor_name TEXT 0 None 0\n", "6 6 investor_category_code TEXT 0 None 0\n", "7 7 investor_country_code TEXT 0 None 0\n", "8 8 investor_state_code TEXT 0 None 0\n", "9 9 investor_city TEXT 0 None 0\n", "10 10 funding_round_type TEXT 0 None 0\n", "11 11 funded_at TIMESTAMP 0 None 0\n", "12 12 raised_amount_usd REAL 0 None 0\n" ] } ], "source": [ "results_df = pd.read_sql('''PRAGMA table_info(investments);''', conn)\n", "print(results_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Questions to Answer\n", "1. What proportion of the total amount of funds did the top 10% raise? What about the top 1%? \n", "2. Compare these values to the proportions the bottom 10% and bottom 1% raised.\n", "3. Which category of company attracted the most investments?\n", "4. Which investor contributed the most money (across all startups)?\n", "5. Which investors contributed the most money per startup?\n", "6. Which funding round was the most popular?\n", "7. Which funding round was the least popular?" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
investor_nametotal_investment%_of_total
0Kleiner Perkins Caufield & Byers1.121783e+100.016455
1New Enterprise Associates9.692542e+090.014218
2Accel Partners6.472126e+090.009494
3Goldman Sachs6.375459e+090.009352
4Sequoia Capital6.039402e+090.008859
5Intel5.969200e+090.008756
6Google5.808800e+090.008521
7Time Warner5.730000e+090.008405
8Comcast5.669000e+090.008316
9Greylock Partners4.960983e+090.007277
\n", "
" ], "text/plain": [ " investor_name total_investment %_of_total\n", "0 Kleiner Perkins Caufield & Byers 1.121783e+10 0.016455\n", "1 New Enterprise Associates 9.692542e+09 0.014218\n", "2 Accel Partners 6.472126e+09 0.009494\n", "3 Goldman Sachs 6.375459e+09 0.009352\n", "4 Sequoia Capital 6.039402e+09 0.008859\n", "5 Intel 5.969200e+09 0.008756\n", "6 Google 5.808800e+09 0.008521\n", "7 Time Warner 5.730000e+09 0.008405\n", "8 Comcast 5.669000e+09 0.008316\n", "9 Greylock Partners 4.960983e+09 0.007277" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# What proportion of the total amount of funds did the top 10% raise?\n", "pd.read_sql('''\n", " SELECT investor_name,\n", " SUM(raised_amount_usd) AS total_investment,\n", " (SUM(raised_amount_usd) / 681732200000) AS '%_of_total'\n", " FROM investments\n", " GROUP BY 1\n", " ORDER BY 2 DESC\n", " LIMIT 10;\n", " ''', conn)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SUM(tti.total_investment)
05.654430e+11
\n", "
" ], "text/plain": [ " SUM(tti.total_investment)\n", "0 5.654430e+11" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# top ten percent of investors\n", "pd.read_sql('''\n", "WITH top_ten_investors AS\n", " ( \n", " SELECT investor_name,\n", " SUM(raised_amount_usd) AS total_investment\n", " FROM investments\n", " GROUP BY 1\n", " ORDER BY 2 DESC\n", " LIMIT 1046\n", " )\n", "\n", "SELECT SUM(tti.total_investment)\n", " FROM top_ten_investors AS tti;\n", "''', conn)\n", "\n", "# total amount of top ten percent of investors = $565,443,000,000" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SUM(raised_amount_usd)
06.817322e+11
\n", "
" ], "text/plain": [ " SUM(raised_amount_usd)\n", "0 6.817322e+11" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_sql('''\n", "SELECT SUM(raised_amount_usd)\n", " FROM investments\n", "\n", "''', conn)\n", "\n", "# total amount raised from all investors = $681,732,200,000" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SUM(tti.total_investment)
02.769699e+11
\n", "
" ], "text/plain": [ " SUM(tti.total_investment)\n", "0 2.769699e+11" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# top one percent of investors\n", "pd.read_sql('''\n", "WITH top_one_investors AS\n", " ( \n", " SELECT investor_name,\n", " SUM(raised_amount_usd) AS total_investment\n", " FROM investments\n", " GROUP BY 1\n", " ORDER BY 2 DESC\n", " LIMIT 105\n", " )\n", "\n", "SELECT SUM(tti.total_investment)\n", " FROM top_one_investors AS tti;\n", "''', conn)\n", "\n", "# total amount of top one percent of investors = $276,969,900,000" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SUM(tti.total_investment)
021409138.0
\n", "
" ], "text/plain": [ " SUM(tti.total_investment)\n", "0 21409138.0" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# bottom ten percent of investors\n", "pd.read_sql('''\n", "WITH bottom_ten_investors AS\n", " ( \n", " SELECT investor_name,\n", " SUM(raised_amount_usd) AS total_investment\n", " FROM investments\n", " GROUP BY 1\n", " ORDER BY 2 ASC\n", " LIMIT 1046\n", " )\n", "\n", "SELECT SUM(tti.total_investment)\n", " FROM bottom_ten_investors AS tti;\n", "''', conn)\n", "\n", "# total amount of bottom ten percent of investors = $21,409,138" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SUM(tti.total_investment)
0None
\n", "
" ], "text/plain": [ " SUM(tti.total_investment)\n", "0 None" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# bottom one percent of investors\n", "pd.read_sql('''\n", "WITH bottom_one_investors AS\n", " ( \n", " SELECT investor_name,\n", " SUM(raised_amount_usd) AS total_investment\n", " FROM investments\n", " GROUP BY 1\n", " ORDER BY 2 ASC\n", " LIMIT 105\n", " )\n", "\n", "SELECT SUM(tti.total_investment)\n", " FROM bottom_one_investors AS tti;\n", "''', conn)\n", "\n", "# total amount of bottom one percent of investors = $0.00" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_category_codenumber_of_investorstotal_investment
0biotech49511.103964e+11
\n", "
" ], "text/plain": [ " company_category_code number_of_investors total_investment\n", "0 biotech 4951 1.103964e+11" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Which category of company attracted the most investments?\n", "pd.read_sql('''\n", "SELECT company_category_code,\n", " COUNT(investor_name) AS number_of_investors,\n", " SUM(raised_amount_usd) AS total_investment\n", " FROM investments\n", " GROUP BY 1\n", " ORDER BY 3 DESC\n", " LIMIT 1\n", "''', conn)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
investor_nametotal_investment
0Kleiner Perkins Caufield & Byers1.121783e+10
\n", "
" ], "text/plain": [ " investor_name total_investment\n", "0 Kleiner Perkins Caufield & Byers 1.121783e+10" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Which investor contributed the most money (across all startups)?\n", "pd.read_sql('''\n", "SELECT investor_name,\n", " SUM(raised_amount_usd) As total_investment\n", " FROM investments\n", " GROUP BY 1\n", " ORDER BY 2 DESC\n", " LIMIT 1;\n", "''', conn)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
investor_namemoney_per_startup
0Hillman Ventures1.300000e+07
1City Light Capital5.000000e+06
2KLP Enterprises5.000000e+06
3Nestors Financial3.333333e+06
4Dan Ehrman2.500000e+06
\n", "
" ], "text/plain": [ " investor_name money_per_startup\n", "0 Hillman Ventures 1.300000e+07\n", "1 City Light Capital 5.000000e+06\n", "2 KLP Enterprises 5.000000e+06\n", "3 Nestors Financial 3.333333e+06\n", "4 Dan Ehrman 2.500000e+06" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Which investors contributed the most money per startup?\n", "pd.read_sql('''\n", "SELECT investor_name,\n", " (raised_amount_usd / company_name) AS money_per_startup\n", " FROM investments\n", " GROUP BY 1\n", " ORDER BY 2 DESC\n", " LIMIT 5;\n", "''', conn)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
funding_round_typeCOUNT(investor_name)
0series-a13938
\n", "
" ], "text/plain": [ " funding_round_type COUNT(investor_name)\n", "0 series-a 13938" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Which funding round was the most popular?\n", "pd.read_sql('''\n", "SELECT funding_round_type,\n", " COUNT(investor_name)\n", " FROM investments\n", " GROUP BY 1\n", " ORDER BY 2 DESC\n", " LIMIT 1;\n", "''', conn)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_namefunded_atCOUNT(investor_name)raised_amount_usd
0Priceonomics2012-05-04 00:00:00291500000.0
\n", "
" ], "text/plain": [ " company_name funded_at COUNT(investor_name) raised_amount_usd\n", "0 Priceonomics 2012-05-04 00:00:00 29 1500000.0" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Which funding round was the most popular?\n", "pd.read_sql('''\n", "SELECT company_name,\n", " funded_at,\n", " COUNT(investor_name),\n", " raised_amount_usd\n", " FROM investments\n", " WHERE raised_amount_usd > 0\n", " GROUP BY 2, 1, 4\n", " ORDER BY 3 DESC\n", " LIMIT 1;\n", "''', conn)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
funding_round_typeCOUNT(investor_name)
0crowdfunding5
\n", "
" ], "text/plain": [ " funding_round_type COUNT(investor_name)\n", "0 crowdfunding 5" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Which funding round was the least popular?\n", "pd.read_sql('''\n", "SELECT funding_round_type,\n", " COUNT(investor_name)\n", " FROM investments\n", " WHERE funding_round_type != 'None'\n", " GROUP BY 1\n", " ORDER BY 2 ASC\n", " LIMIT 1;\n", "''', conn)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_namefunded_atCOUNT(investor_name)raised_amount_usd
0Cisco1987-01-01 00:00:0012500000.0
\n", "
" ], "text/plain": [ " company_name funded_at COUNT(investor_name) raised_amount_usd\n", "0 Cisco 1987-01-01 00:00:00 1 2500000.0" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Which funding round was the most popular?\n", "pd.read_sql('''\n", "SELECT company_name,\n", " funded_at,\n", " COUNT(investor_name),\n", " raised_amount_usd\n", " FROM investments\n", " WHERE funded_at != 'None'\n", " GROUP BY 2, 1, 4\n", " ORDER BY 3 ASC\n", " LIMIT 1;\n", "''', conn)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_namecompany_category_codecompany_country_codecompany_state_codecompany_cityinvestor_nameinvestor_category_codeinvestor_country_codeinvestor_state_codeinvestor_cityfunding_round_typefunded_atraised_amount_usd
0SocialcammobileUSACASanta Clara CountyAtom FactoryNoneNoneNoneNoneangel2012-04-30 00:00:00None
1SocialcammobileUSACASanta Clara CountyA-Grade InvestmentsNoneUSACALos Angelesangel2012-04-30 00:00:00None
2SocialcammobileUSACASanta Clara CountyCrunchFundNoneUSACASan Franciscoangel2012-04-30 00:00:00None
3SocialcammobileUSACASanta Clara CountyStart FundNoneNoneNoneNoneangel2012-04-30 00:00:00None
4SocialcammobileUSACASanta Clara CountySV AngelNoneUSACASan Franciscoangel2012-04-30 00:00:00None
5SocialcammobileUSACASanta Clara CountyAlexis OhanianNoneNoneNoneNoneangel2012-04-30 00:00:00None
6SocialcammobileUSACASanta Clara CountyAri EmmanuelNoneNoneNoneNoneangel2012-04-30 00:00:00None
7SocialcammobileUSACASanta Clara CountyAshton KutcherNoneNoneNoneNoneangel2012-04-30 00:00:00None
8SocialcammobileUSACASanta Clara CountyBrian CheskyNoneNoneNoneNoneangel2012-04-30 00:00:00None
9SocialcammobileUSACASanta Clara CountyEmmett ShearNoneNoneNoneNoneangel2012-04-30 00:00:00None
10SocialcammobileUSACASanta Clara CountyErik MooreNoneNoneNoneNoneangel2012-04-30 00:00:00None
11SocialcammobileUSACASanta Clara CountyGarry TanNoneNoneNoneNoneangel2012-04-30 00:00:00None
12SocialcammobileUSACASanta Clara CountyHarjeet TaggarNoneNoneNoneNoneangel2012-04-30 00:00:00None
13SocialcammobileUSACASanta Clara CountyJason JohnsonNoneNoneNoneNoneangel2012-04-30 00:00:00None
14SocialcammobileUSACASanta Clara CountyJeff KapelNoneNoneNoneNoneangel2012-04-30 00:00:00None
15SocialcammobileUSACASanta Clara CountyJessica LivingstonNoneNoneNoneNoneangel2012-04-30 00:00:00None
16SocialcammobileUSACASanta Clara CountyJonathan AbramsNoneNoneNoneNoneangel2012-04-30 00:00:00None
17SocialcammobileUSACASanta Clara CountyJustin CaldbeckNoneNoneNoneNoneangel2012-04-30 00:00:00None
18SocialcammobileUSACASanta Clara CountyJustin KanNoneNoneNoneNoneangel2012-04-30 00:00:00None
19SocialcammobileUSACASanta Clara CountyKyle VogtNoneNoneNoneNoneangel2012-04-30 00:00:00None
20SocialcammobileUSACASanta Clara CountyLaurene Powell JobsNoneNoneNoneNoneangel2012-04-30 00:00:00None
21SocialcammobileUSACASanta Clara CountyMatt OckoNoneNoneNoneNoneangel2012-04-30 00:00:00None
22SocialcammobileUSACASanta Clara CountyMichael AbramsNoneNoneNoneNoneangel2012-04-30 00:00:00None
23SocialcammobileUSACASanta Clara CountyMichael LevitNoneNoneNoneNoneangel2012-04-30 00:00:00None
24SocialcammobileUSACASanta Clara CountyMichael RapinoNoneNoneNoneNoneangel2012-04-30 00:00:00None
25SocialcammobileUSACASanta Clara CountyPaul BuchheitNoneNoneNoneNoneangel2012-04-30 00:00:00None
26SocialcammobileUSACASanta Clara CountyPaul GrahamNoneNoneNoneNoneangel2012-04-30 00:00:00None
27SocialcammobileUSACASanta Clara CountyRam ShiramNoneNoneNoneNoneangel2012-04-30 00:00:00None
28SocialcammobileUSACASanta Clara CountyRonny ConwayNoneNoneNoneNoneangel2012-04-30 00:00:00None
29SocialcammobileUSACASanta Clara CountyShane BattierNoneNoneNoneNoneangel2012-04-30 00:00:00None
30SocialcammobileUSACASanta Clara CountyShervin PishevarNoneNoneNoneNoneangel2012-04-30 00:00:00None
31SocialcammobileUSACASanta Clara CountyStewart AlsopNoneNoneNoneNoneangel2012-04-30 00:00:00None
32SocialcammobileUSACASanta Clara CountyTimothy DraperNoneNoneNoneNoneangel2012-04-30 00:00:00None
33SocialcammobileUSACASanta Clara CountyTrajan LangdonNoneNoneNoneNoneangel2012-04-30 00:00:00None
34SocialcammobileUSACASanta Clara CountyYuri MilnerNoneNoneNoneNoneangel2012-04-30 00:00:00None
35SocialcammobileUSACASanta Clara CountyZachary BogueNoneNoneNoneNoneangel2012-04-30 00:00:00None
\n", "
" ], "text/plain": [ " company_name company_category_code company_country_code company_state_code \\\n", "0 Socialcam mobile USA CA \n", "1 Socialcam mobile USA CA \n", "2 Socialcam mobile USA CA \n", "3 Socialcam mobile USA CA \n", "4 Socialcam mobile USA CA \n", "5 Socialcam mobile USA CA \n", "6 Socialcam mobile USA CA \n", "7 Socialcam mobile USA CA \n", "8 Socialcam mobile USA CA \n", "9 Socialcam mobile USA CA \n", "10 Socialcam mobile USA CA \n", "11 Socialcam mobile USA CA \n", "12 Socialcam mobile USA CA \n", "13 Socialcam mobile USA CA \n", "14 Socialcam mobile USA CA \n", "15 Socialcam mobile USA CA \n", "16 Socialcam mobile USA CA \n", "17 Socialcam mobile USA CA \n", "18 Socialcam mobile USA CA \n", "19 Socialcam mobile USA CA \n", "20 Socialcam mobile USA CA \n", "21 Socialcam mobile USA CA \n", "22 Socialcam mobile USA CA \n", "23 Socialcam mobile USA CA \n", "24 Socialcam mobile USA CA \n", "25 Socialcam mobile USA CA \n", "26 Socialcam mobile USA CA \n", "27 Socialcam mobile USA CA \n", "28 Socialcam mobile USA CA \n", "29 Socialcam mobile USA CA \n", "30 Socialcam mobile USA CA \n", "31 Socialcam mobile USA CA \n", "32 Socialcam mobile USA CA \n", "33 Socialcam mobile USA CA \n", "34 Socialcam mobile USA CA \n", "35 Socialcam mobile USA CA \n", "\n", " company_city investor_name investor_category_code \\\n", "0 Santa Clara County Atom Factory None \n", "1 Santa Clara County A-Grade Investments None \n", "2 Santa Clara County CrunchFund None \n", "3 Santa Clara County Start Fund None \n", "4 Santa Clara County SV Angel None \n", "5 Santa Clara County Alexis Ohanian None \n", "6 Santa Clara County Ari Emmanuel None \n", "7 Santa Clara County Ashton Kutcher None \n", "8 Santa Clara County Brian Chesky None \n", "9 Santa Clara County Emmett Shear None \n", "10 Santa Clara County Erik Moore None \n", "11 Santa Clara County Garry Tan None \n", "12 Santa Clara County Harjeet Taggar None \n", "13 Santa Clara County Jason Johnson None \n", "14 Santa Clara County Jeff Kapel None \n", "15 Santa Clara County Jessica Livingston None \n", "16 Santa Clara County Jonathan Abrams None \n", "17 Santa Clara County Justin Caldbeck None \n", "18 Santa Clara County Justin Kan None \n", "19 Santa Clara County Kyle Vogt None \n", "20 Santa Clara County Laurene Powell Jobs None \n", "21 Santa Clara County Matt Ocko None \n", "22 Santa Clara County Michael Abrams None \n", "23 Santa Clara County Michael Levit None \n", "24 Santa Clara County Michael Rapino None \n", "25 Santa Clara County Paul Buchheit None \n", "26 Santa Clara County Paul Graham None \n", "27 Santa Clara County Ram Shiram None \n", "28 Santa Clara County Ronny Conway None \n", "29 Santa Clara County Shane Battier None \n", "30 Santa Clara County Shervin Pishevar None \n", "31 Santa Clara County Stewart Alsop None \n", "32 Santa Clara County Timothy Draper None \n", "33 Santa Clara County Trajan Langdon None \n", "34 Santa Clara County Yuri Milner None \n", "35 Santa Clara County Zachary Bogue None \n", "\n", " investor_country_code investor_state_code investor_city \\\n", "0 None None None \n", "1 USA CA Los Angeles \n", "2 USA CA San Francisco \n", "3 None None None \n", "4 USA CA San Francisco \n", "5 None None None \n", "6 None None None \n", "7 None None None \n", "8 None None None \n", "9 None None None \n", "10 None None None \n", "11 None None None \n", "12 None None None \n", "13 None None None \n", "14 None None None \n", "15 None None None \n", "16 None None None \n", "17 None None None \n", "18 None None None \n", "19 None None None \n", "20 None None None \n", "21 None None None \n", "22 None None None \n", "23 None None None \n", "24 None None None \n", "25 None None None \n", "26 None None None \n", "27 None None None \n", "28 None None None \n", "29 None None None \n", "30 None None None \n", "31 None None None \n", "32 None None None \n", "33 None None None \n", "34 None None None \n", "35 None None None \n", "\n", " funding_round_type funded_at raised_amount_usd \n", "0 angel 2012-04-30 00:00:00 None \n", "1 angel 2012-04-30 00:00:00 None \n", "2 angel 2012-04-30 00:00:00 None \n", "3 angel 2012-04-30 00:00:00 None \n", "4 angel 2012-04-30 00:00:00 None \n", "5 angel 2012-04-30 00:00:00 None \n", "6 angel 2012-04-30 00:00:00 None \n", "7 angel 2012-04-30 00:00:00 None \n", "8 angel 2012-04-30 00:00:00 None \n", "9 angel 2012-04-30 00:00:00 None \n", "10 angel 2012-04-30 00:00:00 None \n", "11 angel 2012-04-30 00:00:00 None \n", "12 angel 2012-04-30 00:00:00 None \n", "13 angel 2012-04-30 00:00:00 None \n", "14 angel 2012-04-30 00:00:00 None \n", "15 angel 2012-04-30 00:00:00 None \n", "16 angel 2012-04-30 00:00:00 None \n", "17 angel 2012-04-30 00:00:00 None \n", "18 angel 2012-04-30 00:00:00 None \n", "19 angel 2012-04-30 00:00:00 None \n", "20 angel 2012-04-30 00:00:00 None \n", "21 angel 2012-04-30 00:00:00 None \n", "22 angel 2012-04-30 00:00:00 None \n", "23 angel 2012-04-30 00:00:00 None \n", "24 angel 2012-04-30 00:00:00 None \n", "25 angel 2012-04-30 00:00:00 None \n", "26 angel 2012-04-30 00:00:00 None \n", "27 angel 2012-04-30 00:00:00 None \n", "28 angel 2012-04-30 00:00:00 None \n", "29 angel 2012-04-30 00:00:00 None \n", "30 angel 2012-04-30 00:00:00 None \n", "31 angel 2012-04-30 00:00:00 None \n", "32 angel 2012-04-30 00:00:00 None \n", "33 angel 2012-04-30 00:00:00 None \n", "34 angel 2012-04-30 00:00:00 None \n", "35 angel 2012-04-30 00:00:00 None " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# query the table and make sure the data types match up for each column\n", "\n", "pd.read_sql('''\n", "SELECT *\n", " FROM investments\n", " WHERE company_name = 'Socialcam';\n", "\n", "''', conn)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" } }, "nbformat": 4, "nbformat_minor": 4 }