{ "metadata": { "name": "", "signature": "sha256:a8fda131401e32bc525a34cfe4bd54bbb434caaaf8b83392c5b0410a531fa10e" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "##\"We Have to Go Back!\" - A data-driven look at the LOST actors' careers" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from IPython.display import YouTubeVideo, Image, HTML\n", "YouTubeVideo('0Q14rHLvMco')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", " \n", " " ], "metadata": {}, "output_type": "pyout", "prompt_number": 172, "text": [ "" ] } ], "prompt_number": 172 }, { "cell_type": "markdown", "metadata": {}, "source": [ "For some reason I've been thinking a lot about LOST lately--thinking about it enough that I rewatched the pilot a few nights ago. I got to thinking: how have all of the actors fared in their post-LOST careers? Despite it's trials and tribulations, did acting on LOST give a sense of purpose, just like Jack felt with the Island? Is it time yet for a career revitalizing LOST reboot? \n", "\n", "Normally these questions are relegated to some very simple [slide show listicle](http://www.tvguide.com/galleries/lost-stars-1066587/). However, we don't have to settle for that! We've got data! We can perform a far more interesting analysis than googling \"Matthew Fox.\" \n", "\n", "###The Data\n", "I scraped all of this data from IMDB, following the process below:\n", "1. Get each actor that appears on the main page for [LOST](http://www.imdb.com/title/tt0411008/) in IMDB (the top 15 actors by episode appearances).\n", "2. Go to each actor's page and grab the list of movies/tv they have been in.\n", "3. Go to each of those movie/tv pages and grab the title, score, and year info.\n", "\n", "###Additional Notes\n", "**Minor Roles:** To eliminate minor roles, I only counted roles where that actor appeared on the main cast list for that movie/tv. For example: Jorge Garcia was in two episodes of How I Met Your Mother, but doesn't appear on the main HIMYM IMDB cast page.\n", "\n", "**Year info:** For TV shows, the year included is the year that TV show premiered. It's not the year in which an actor might have appeared on the show. For example: everyone who appeared in LOST will have a year of 2004, regardless of when they actually started on the show. Actors this impacts:\n", "- Michael Emerson (first appeared 2006)\n", "- Elizabeth Mitchell (2006)\n", "- Ken Leung (2008)\n", "\n", "**Language:**\n", "I'll use *actors* to refer to both actors and actresses throughout this exploration. I'll use the term *media* to refer to the general collection of TV or Movies." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "%matplotlib inline" ], "language": "python", "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "prompt_number": 173 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Exploration\n", "\n", "We'll first read in the dataset and see what our data looks like." ] }, { "cell_type": "code", "collapsed": false, "input": [ "we_have_to_go_back = pd.read_csv('./data/LOST_clean.csv')\n", "print \"Total rows:\", len(we_have_to_go_back)\n", "we_have_to_go_back.head()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Total rows: 353\n" ] }, { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
actortitlescorestart_yeartype
0 Jorge Garcia The Wedding Ringer 6.8 2015 Movie
1 Jorge Garcia Cooties 5.3 2014 Movie
2 Jorge Garcia iSteve 5.4 2013 Movie
3 Jorge Garcia The Ordained 6.9 2013 TV Movie
4 Jorge Garcia Alcatraz 7.1 2012 TV Series
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 161, "text": [ " actor title score start_year type\n", "0 Jorge Garcia The Wedding Ringer 6.8 2015 Movie\n", "1 Jorge Garcia Cooties 5.3 2014 Movie\n", "2 Jorge Garcia iSteve 5.4 2013 Movie\n", "3 Jorge Garcia The Ordained 6.9 2013 TV Movie\n", "4 Jorge Garcia Alcatraz 7.1 2012 TV Series" ] } ], "prompt_number": 161 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have 353 total rows listing the actor, the title of the media, the IMDB score, the year that media first aired, and the type of media. Let's first take a look at what different types of media we're working with." ] }, { "cell_type": "code", "collapsed": false, "input": [ "we_have_to_go_back['type'].value_counts()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 162, "text": [ "Movie 177\n", "TV Movie 79\n", "TV Series 55\n", "Other/Unknown 23\n", "Video Game 19\n", "dtype: int64" ] } ], "prompt_number": 162 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're only going to include the data from Television or Film, and exclude Other/Unknown and Video Game." ] }, { "cell_type": "code", "collapsed": false, "input": [ "big_and_small_screen = we_have_to_go_back[(we_have_to_go_back['type'] == 'TV Series') |\n", " (we_have_to_go_back['type'] == 'Movie') |\n", " (we_have_to_go_back['type'] == 'TV Movie')]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 163 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we've got a clean dataset, let's get a little more information about the scores. LOST's IMDB score is an 8.5, but we have no context to understand whether that's high or low. (Sidebar: [Here](http://blog.moertel.com/posts/2006-01-17-mining-gold-from-the-internet-movie-database-part-1.html) is a good analysis of the distribution of all IMDB scores) \n", "\n", "We've also got to remove the duplicates for this next step. LOST is listed 15 times (once for each actor) hence the spike around 8.5. We'll assume a duplicate is an item with the same title and score.\n", "\n", "Let's look at the distribution with a histogram, and also print out some summary statistics." ] }, { "cell_type": "code", "collapsed": false, "input": [ "big_and_small_screen.drop_duplicates(['title','score'])['score'].hist(bins=16)\n", "big_and_small_screen.drop_duplicates(['title','score'])['score'].describe()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 164, "text": [ "count 296.000000\n", "mean 6.399324\n", "std 1.059773\n", "min 2.900000\n", "25% 5.800000\n", "50% 6.500000\n", "75% 7.100000\n", "max 9.000000\n", "dtype: float64" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAEACAYAAAB8nvebAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGRFJREFUeJzt3X+QZXdd5vH3MzNgGEDaGE1mJW6nLCOWBm8QLGoDyw0b\nNCLG4B8IKtAUWli6EHGX3UCVm6hbRmAT5g9rU6VJ6BERfyQSSakkIcwJpCyDCTOQH2BKKlNLNJlg\nIGg2kiKbz/5xT0/f6fTcc26fc+85n3ufV1XX3HP63k8/6Zz+9unn/lJEYGZmuezqOoCZmU3Pi7eZ\nWUJevM3MEvLibWaWkBdvM7OEvHibmSVUa/GWtFvSIUk3lNuXSnqg3HdI0vmzjWlmZuP21LzeRcC9\nwHPL7QCuiIgrZpLKzMwmqjzzlvR84NXAVYA2do9dNjOzOatTm3wAeBfw1Ni+AN4u6XOSrpa0MpN0\nZma2rYmLt6TXAA9HxCGOP9O+EjgDGAAPApfPLKGZmT2NJr22iaTfBt4IPAmcBHwrcF1EvGnsOqvA\nDRFx1ja39wunmJntQERMrKYnnnlHxHsi4vSIOAN4PfDJiHiTpH1jV3stcNeEGWk/Lrnkks4zLGN2\n598454kWPnb2M7js3/+uP+qo+2gTGNUmG1PfJ+mHyu37gbdNMSeNI0eOdB1hxzJnB+fvmvP3X+3F\nOyIKoCgvv3FGeczMrAY/w3KCtbW1riPsWObs4Pxdc/7+m3iHZePhUsxyvtmiksZbykaTaneo1h+S\niCZ3WC67oii6jrBjmbOD83fN+fvPi7eZWUKuTcx6yLXJcnNtYma2oLx4T5C5N8ucHZy/a87ff168\nzcwScudt1kPuvJebO28zswXlxXuCzL1Z5uyQN7+kVj66lvX7vyF7/jq8eJu1LoCDNH01QLNJ3Hmb\ntajNrtqd9/Jy521mtqC8eE+QuTfLnB3y5y9fPTmt7N//7Pnr8OJtZpZQrc5b0m7gDuCBiPhJSScD\nfwL8e+AI8LqIeHSb27nztqXiztva0GbnfRFwL5tH08XAzRFxJnBLuW1mZnNSuXhLej7wauAqRqcD\nABcAB8rLB4ALZ5KuY5l7s8zZIX9+d97dyp6/jjpn3h8A3gU8Nbbv1Ig4Wl4+CpzadjAzMzuxiW9A\nLOk1wMMRcUjScLvrRERIOmGptra2xurqKgArKysMBgOGw9Gojd+Ofd3e2NeXPNNsD4fDXuVZpvwj\nQzbPvoflv91st/WMzYMHD46m9+z7vQjHT1EUrK+vAxxbL6tMvMNS0m8DbwSeBE4CvhX4c+AlwDAi\nHpK0DzgYES/Y5va+w9KWSh/vsPQdn/k0vsMyIt4TEadHxBnA64FPRsQbgY8Bby6v9mbg+jYC983x\nZ1K5ZM4O+fNn77yz589//FSb9nHeG796fwd4laT7gFeW22ZmNid+bROzFrk2sTb4tU3MzBaUF+8J\nMvdmmbND/vzZO+Ps+fMfP9W8eJuZJeTO26xF7rytDe68zcwWlBfvCTL3ZpmzQ/782Tvj7PnzHz/V\nvHibmSXkztusRe68rQ3uvM3MFpQX7wky92aZs0P+/Nk74+z58x8/1bx4m5kl5M7brEXuvK0N7rzN\nzBaUF+8JMvdmmbND/vzZO+Ps+fMfP9W8eJuZJeTO26xF7rytDa103pJOknS7pMOS7pV0Wbn/UkkP\nSDpUfpzfVnAzM5uscvGOiG8A50bEAHghcK6klzH6dX5FRJxdfnx8xlnnLnNvljk75M+fvTPOnj//\n8VOtVucdEY+XF58J7Aa+Vm5PPK03M7PZqNV5S9oFfBb4HuDKiPhvki4B3gJ8HbgD+C8R8eiW27nz\ntqXiztvaUKfz3lNnUEQ8BQwkPQ+4UdIQuBL4zfIqvwVcDrx1623X1tZYXV0FYGVlhcFgwHA4BDb/\ntPG2txdpe9PG9rDj7Xby9OX7u4jbRVGwvr4OcGy9rDL1o00k/TrwbxHxv8b2rQI3RMRZW66b+sy7\nKIpj3+hsMmeHvPk3z7wLNhfBHU2i2zPvguPz5zrzznr8bGjr0SanSFopLz8LeBVwSNJpY1d7LXBX\nk7BmZlZf5Zm3pLOAA4wW+l3AhyLi/ZL+ABgw+rV+P/C2iDi65bapz7zNpuXO29pQ58zbT9Ixa5EX\nb2uDX5iqoaffAZVH5uyQP3/2x0lnz5//+KnmxdvMLCHXJmYtcm1ibXBtYma2oLx4T5C5N8ucHfLn\nz94ZZ8+f//ipVusZlmaLbFR1mOXiztuWXns9NfSxq3bnnY87bzOzBeXFe4LMvVnm7JA/f/bOOHv+\n/MdPNS/eZmYJufO2pefOu94c/yzPjztvM7MF5cV7gsy9WebskD9/9s44e/78x081L95mZgm587al\n58673hz/LM+PO28zswU1cfGWdJKk2yUdlnSvpMvK/SdLulnSfZJu2nibtEWTuTfLnB3y58/eGWfP\nn//4qTZx8Y6IbwDnRsQAeCFwrqSXARcDN0fEmcAt5baZmc1J7c5b0l7gVmANuA54RUQcLd+IuIiI\nF2xzG3fe1nvuvOvN8c/y/LT17vG7JB0GjgIHI+Ie4NSxNxs+CpzaOK2ZmdVW+ZKwEfEUMJD0POBG\nSedu+XxIOuGv5LW1NVZXVwFYWVlhMBgwHA6BzV6qr9v79+9PlXd8e7zz60OePufftLE9bLi99XLT\neW3kmfb247cdfb5Px0fVdrbjvygK1tfXAY6tl1WmeqigpF8H/g34BWAYEQ9J2sfojHzhapOiKI59\no7PJnB3mm382tUnB5qLYZE5beaZVcHz+XLVJ9uO/Tm0ycfGWdArwZEQ8KulZwI3AbwA/BjwSEe+V\ndDGwEhFPu9My++Jty8Gdd705/lmenzqLd1Vtsg84IGkXo378QxFxi6RDwJ9KeitwBHhdG4HNzKye\nqocK3hURL4qIQUS8MCLeX+7/akScFxFnRsSPRsSj84k7X0/vRPPInB3y58/+OOns+fMfP9X8HpZm\nVkub7/XpCqY5v7aJLT133vOcM5rldWEyv7aJmdmC8uI9QebeLHN2yJ8/e2ecPX/+46eaF28zs4Tc\nedvSc+c9zzkbs9qxqOtLG4/zNjObgbZ+oSwv1yYTZO7NMmeH/Pmzd8bZ8+c/fqp58TYzS8idty09\nd97znNPmrMV9vLgf521mtqC8eE+QuTfLnB3y58/eGWfPn//4qebF28wsIXfetvTcec9zTpuz3Hmb\nmVkydd6A+HRJByXdI+luSe8o918q6QFJh8qP82cfd74y92aZs0P+/Nk74+z58x8/1eo8w/KbwDsj\n4rCk5wB3SrqZ0d89V0TEFTNNaGZmTzN15y3peuB3gXOAxyLi8gnXdedtvefOe55z2pzlznuagavA\n2cDflrveLulzkq6WtLKjlGZmNrXai3dZmVwLXBQRjwFXAmcAA+BB4IRn4Fll7s0yZ4f8+bN3xtnz\n5z9+qtV6VUFJzwCuA/4wIq4HiIiHxz5/FXDDdrddW1tjdXUVgJWVFQaDAcPhENj8Bvd1+/Dhw73K\n4+3ZbG/a2B423G57nvNsvz36f9j18dPGdlEUrK+vAxxbL6tUdt4aFYIHgEci4p1j+/dFxIPl5XcC\nL4mIn91yW3fe1nvuvOc5p81Zy91511m8XwZ8Cvg8m9/x9wBvYFSZBHA/8LaIOLrltl68rfe8eM9z\nTpuzlnvxruy8I+K2iNgVEYOIOLv8+OuIeFNEvDAifigiLty6cC+CzL1Z5uyQP3/2zjh7/vzHTzU/\nw9LMLCG/toktPdcm85zT5izXJmZmlowX7wky92aZs0P+/Nk74+z58x8/1bx4m5kl5M7blp4773nO\naXOWO28zM0vGi/cEmXuzzNkhf/7snXH2/PmPn2pevM3MEnLnbUvPnfc857Q5y523mZkl48V7gsy9\nWebskD9/9s44e/78x081L95mZgm587al5857nnPanOXO28zMkvHiPUHm3ixzdsifP3tnnD1//uOn\nWuXiLel0SQcl3SPpbknvKPefLOlmSfdJusnvHm9mNj913gbtNOC0iDhcvoP8ncCFwFuAf46I90n6\n78C3RcTFW27rztt6z533POe0Ocud90QR8VBEHC4vPwZ8Afgu4AJGb0xM+e+FzeKamVldU3XeklaB\ns4HbgVPH3rfyKHBqq8l6IHNvljk75M+fvTPOnj//8VOt9uJdVibXARdFxL+Of67sRhbz7xczsx7a\nU+dKkp7BaOH+UERcX+4+Kum0iHhI0j7g4e1uu7a2xurqKgArKysMBgOGwyGw+duxr9sb+/qSZ5rt\n4XDYqzyzyD/qqttUlP8OG25v7GtrXht5pr39sGd5tts+8c9ntuO/KArW19cBjq2XVercYSlGnfYj\nEfHOsf3vK/e9V9LFwIrvsLR5au+Oxn7eGbeYc9qc5Tssq5wD/DxwrqRD5cf5wO8Ar5J0H/DKcnuh\nZO7NMmeH/Pmzd8bZ8+c/fqpV1iYRcRsnXuTPazeOmZnV4dc2sbRcm2Sc0+Ys1yZmZpaMF+8JMvdm\nmbND/vzZO+Ps+fMfP9W8eJuZJeTO29Jy551xTpuz3HmbmVkyXrwnyNybZc4O+fNn74yz589//FTz\n4m1mlpA7b0vLnXfGOW3OcudtZmbJePGeIHNvljk75M+fvTPOnj//8VPNi7eZWULuvC0td94Z57Q5\ny523mZkl48V7gsy9WebskD9/9s44e/78x081L95mZgnVeRu0a4CfAB6OiLPKfZcCvwB8pbzauyPi\n49vc1p23zYw774xz2pzlzrvKB4Hzt+wL4IqIOLv8eNrCbWZms1O5eEfEp4GvbfOptt+6u3cy92aZ\ns0P+/Nk74+z58x8/1Zp03m+X9DlJV0taaS2RmZlVqnwD4hO4EvjN8vJvAZcDb93uimtra6yurgKw\nsrLCYDBgOBwCm78d+7q9sa8veabZHg6Hvcozi/wjBTAcu8wOtqn4/E7mDVuc10aeaW8/7Fme7bY3\n7vdoLiI6Pd6LomB9fR3g2HpZpdaTdCStAjds3GE5xed8h6XNjO+wzDinzVmLe8fnzJ6kI2nf2OZr\ngbt2MqfvMvdmmbND/vzZO2Pn77/K2kTSR4BXAKdI+jJwCTCUNGD0a+9+4G0zTWlmZsfxa5tYWq5N\nMs5pc5ZrEzMzS8aL9wSZe9fM2SF//vyda9F1gIaKrgPMnBdvM7OE3HlbWu68M85pc5Y7bzMzS8aL\n9wSZe9c+Z5fUyke/FV0HaKjoOkBDRdcBZs6Lt3UkKj4O1riO2fJy521z17+uerH73H7NaXOWO28z\nM0vGi/cEfe6Nq2TOPlJ0HaChousADRVdB2io6DrAzHnxNjNLyJ23zZ0772We0+Ysd95mZpaMF+8J\nMvfGmbOPFF0HaKjoOkBDRdcBGiq6DjBzXrzNzBJy521z5857mee0Ocudd9WQayQdlXTX2L6TJd0s\n6T5JN/nd483M5qtObfJB4Pwt+y4Gbo6IM4Fbyu2Fk7k3zpx9pOg6QENF1wEaKroO0FDRdYCZq1y8\nI+LTwNe27L4AOFBePgBc2HIuMzOboFbnLWkVuCEiziq3vxYR31ZeFvDVje0tt3PnbU/jznuZ57Q5\na7k778p3j68SESHphP/la2trrK6uArCyssJgMGA4HAKbf9p7e7m2N21sD3e4vbFvp7dvO8+s5jnP\n9tsb+9rJ0+XPR1EUrK+vAxxbL6vs9Mz7i8AwIh6StA84GBEv2OZ2qc+8i6I49o3Ops/Z6515Fxz/\nQ7rtpBpzaiVqac74rILq/HXmtJVnWgXH58925l1Q//uf88x7p4/z/hjw5vLym4HrdzjHzMx2oPLM\nW9JHgFcApwBHgf8B/AXwp8B3A0eA10XEo9vcNvWZt82GO+9lntPmrOXuvP0kHZs7L97LPKfNWcu9\nePvp8RNkfqz0LLLP970n288/X0XXARoqug7QUNF1gJlr/GgTWzZtnTGZWROuTay2xa07FrsS6Nec\nNme5NjEzs2S8eE/gzrtLRdcBGiq6DtBQ0XWAhoquA8ycO+8lUP9OQjPLwp33EnBXPa85bc5a1Dlt\nznLnbWZmyXjxniBzb5w5+0jRdYCGiq4DNFR0HaChousAM+fF28wsIXfeS8Cd97zmtDlrUee0Ocud\nt5mZJePFe4LMvXHm7CNF1wEaKroO0FDRdYCGiq4DzJwXbzOzhNx5LwF33vOa0+asRZ3T5ix33mZm\nlkyjxVvSEUmfl3RI0mfaCtUXmXvjzNlHiq4DNFR0HaChousADRVdB5i5pq9tEozeiPirbYQxM7N6\nGnXeku4HXhwRj5zg8+68e8Cd97zmtDlrUee0OcuddxMBfELSHZJ+seEsMzOrqWltck5EPCjpO4Cb\nJX0xIj49foW1tTVWV1cBWFlZYTAYMBwOgc1etq/b+/fvT5V3fHv7zntj33CH2xv7dnr7rZkmXX/8\nun3IM+288ctN57WRZ9rbj9+2D3m2297Yt93nx79WdZ62Xjb54MGDO/55XV9fBzi2XlZp7aGCki4B\nHouIy8f2pa5NiqI49o3OZjx7ztqk4Pgf0q7zTDuroDp/nTlt5ZlWwfH5s9UmBfW///2rX+rUJjte\nvCXtBXZHxL9KejZwE/AbEXHT2HVSL96LIufinXFOm7MWdU6bs/o3Z56Ld5Pa5FTgo+WfG3uAD48v\n3GZmNjs7vsMyIu6PiEH58YMRcVmbwfog82OlM2cfKboO0FDRdYCGiq4DNFR0HWDm/AxLM7OE/Nom\nS8Cd97zmtDlrUee0Oat/c+bZefvM28wsIS/eE2TujTNnHym6DtBQ0XWAhoquAzRUdB1g5rx4m5kl\n5M67ZW09U6t9/eoGF3NOm7MWdU6bs/o3J8vjvO2E+nRAbcwys0Xi2mSC3L1x0XWAhoquAzRUdB2g\noaLrAA0VXQeYOS/eZmYJufNuWf8eU93mLM+Z36xFndPmrP7NcedtZpbQPB+w4NpkAnfeXSq6DtBQ\n0XWAhoquAzRUdPR1o6WPal68zcwScufdMnfeyzynzVmLOqfNWYs6ZzTLnXcNl132fj71qTu6jmFm\nVlujxVvS+cB+YDdwVUS8t5VUc3bjjbdx663fC7xky2fuAX5gikkfay9UYwXN3oarawXO36UC5++3\nHS/eknYDvwucB/wj8HeSPhYRX2gr3Hy9HPipLfv2Az8zxYz/A/xRa4maOUzug9f5u+X8fdfkDssf\nAf4hIo5ExDeBP+bpq19yj3YdoIHM2cH5u+b8fddk8f4u4Mtj2w+U+8zMbMaadN4L8zCS3bth797/\nyZ49Vx23//HHD7F375215zzxxD/wxBNtp9upI10HaOhI1wEaOtJ1gIaOdB2goSNdB5i5HT9UUNJL\ngUsj4vxy+93AU+N3WkpamAXezGyeqh4q2GTx3gP8PfCfgH8CPgO8Ie8dlmZmeey4NomIJyX9Z+BG\nRg8VvNoLt5nZfMz0GZZmZjYbrb+2iaTTJR2UdI+kuyW9o+2vMUuSTpJ0u6TDku6VdFnXmXZC0m5J\nhyTd0HWWaUk6IunzZf7PdJ1nWpJWJF0r6QvlMfTSrjPVJen7yu/7xsfXM/0MS3p3ufbcJemPJH1L\n15mmIemiMvvdki6aeN22z7wlnQacFhGHJT0HuBO4MFOlImlvRDxe9vq3Af81Im7rOtc0JP0a8MPA\ncyPigq7zTEPS/cAPR8RXu86yE5IOALdGxDXlMfTsiPh617mmJWkXoyfg/UhEfLnq+l2TtAp8Evj+\niHhC0p8AfxURBzoNVpOkHwQ+wuip3t8EPg78UkR8abvrt37mHREPRcTh8vJjwBeAf9f215mliHi8\nvPhMRn1+qkVE0vOBVwNXkfcNLFPmlvQ84OURcQ2M7hvKuHCXzgO+lGHhLv0Lo0Vvb/lLcy+jXz5Z\nvAC4PSK+ERH/D7gV+OkTXXmmLwlb/iY8G7h9ll+nbZJ2SToMHAUORsS9XWea0geAdwFPdR1khwL4\nhKQ7JP1i12GmdAbwFUkflPRZSb8vaW/XoXbo9fTn9R4qlX+pXc7odSr+CXg0Ij7Rbaqp3A28XNLJ\n5THzE8DzT3TlmS3eZWVyLXBReQaeRkQ8FREDRt+4/yhp2HGk2iS9Bng4Ig6R9OwVOCcizgZ+HPgV\nSS/vOtAU9gAvAv53RLwI+L/Axd1Gmp6kZwI/CfxZ11nqkvQ9wK8Cq4z+2n+OpJ/rNNQUIuKLwHuB\nm4C/Bg4x4QRsJou3pGcA1wF/GBHXz+JrzEP55+5fAi/uOssU/gNwQdkbfwR4paQ/6DjTVCLiwfLf\nrwAfZfQ6Olk8ADwQEX9Xbl/LaDHP5seBO8v/B1m8GPibiHgkIp4E/pzRz0MaEXFNRLw4Il7B6AVa\n/v5E153Fo00EXA3cGxH7254/a5JOkbRSXn4W8CpGvwFTiIj3RMTpEXEGoz97PxkRb+o6V12S9kp6\nbnn52cCPAnd1m6q+iHgI+LKkM8td5zF6beFs3sDol38mXwReKulZ5Tp0HpCq8pT0neW/3w28lgm1\n1SzejOEc4OeBz0vaWPTeHREfn8HXmoV9wIHynvZdwIci4paOMzWR7YH8pwIfLd/IdQ/w4Yi4qdtI\nU3s78OGyevgS8JaO80yl/KV5HpDq/oaI+Fz5V+YdjOqGzwK/122qqV0r6dsZ3fH6yxHxLye6op+k\nY2aWkN+A2MwsIS/eZmYJefE2M0vIi7eZWUJevM3MEvLibWaWkBdvM7OEvHibmSX0/wFd0AQDx7MA\nzgAAAABJRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 164 }, { "cell_type": "markdown", "metadata": {}, "source": [ "###Initial Scores Recap:\n", "Comparing LOST's 8.5 score to these numbers shows us a few things:\n", "- It's higher scored than average (6.3)\n", "- It's higher scored than the median (6.5)\n", "- It's higher scored than at least 75% of the scores (75th percentile: 7.1)\n", "\n", "Also notice the top scored media for any actor is 9.0. Out of curiosity, let's take a look at the top 5 scored items in our dataset: " ] }, { "cell_type": "code", "collapsed": false, "input": [ "big_and_small_screen.sort('score', ascending=0).head(5)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
actortitlescorestart_yeartype
171 Terry O'Quinn Guts and Glory: The Rise and Fall of Oliver North 9.0 1989 TV Movie
295 Harold Perrineau Oz 8.9 1997 TV Series
23 Naveen Andrews Lost 8.5 2004 TV Series
284 Harold Perrineau Lost 8.5 2004 TV Series
337 Ken Leung Lost 8.5 2004 TV Series
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 165, "text": [ " actor title \\\n", "171 Terry O'Quinn Guts and Glory: The Rise and Fall of Oliver North \n", "295 Harold Perrineau Oz \n", "23 Naveen Andrews Lost \n", "284 Harold Perrineau Lost \n", "337 Ken Leung Lost \n", "\n", " score start_year type \n", "171 9.0 1989 TV Movie \n", "295 8.9 1997 TV Series \n", "23 8.5 2004 TV Series \n", "284 8.5 2004 TV Series \n", "337 8.5 2004 TV Series " ] } ], "prompt_number": 165 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Don't tell Terry O'Quinn what he can't do, because he can clearly star in a [highly rated 1989 TV Movie](https://www.youtube.com/watch?v=sZLmf7GQs2c)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "YouTubeVideo('arMtFxv7jlw')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", " \n", " " ], "metadata": {}, "output_type": "pyout", "prompt_number": 166, "text": [ "" ] } ], "prompt_number": 166 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even in our listing of top 5 scores, we already see LOST appearing in there. It seems time to ask the ultimate question:\n", "\n", "###Is LOST the best rated thing that these actors have starred in?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The next cell finds the maximum score for each actor, then prints the row that score appears on.\n", "\n", "####Top Scored Media by Actor" ] }, { "cell_type": "code", "collapsed": false, "input": [ "big_and_small_screen.ix[big_and_small_screen.groupby('actor')['score'].idxmax()]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
actortitlescorestart_yeartype
80 Daniel Dae Kim Lost 8.5 2004 TV Series
255 Dominic Monaghan Lost 8.5 2004 TV Series
315 Elizabeth Mitchell Lost 8.5 2004 TV Series
193 Emilie de Ravin Lost 8.5 2004 TV Series
116 Evangeline Lilly Lost 8.5 2004 TV Series
295 Harold Perrineau Oz 8.9 1997 TV Series
233 Henry Ian Cusick Lost 8.5 2004 TV Series
6 Jorge Garcia Lost 8.5 2004 TV Series
67 Josh Holloway Lost 8.5 2004 TV Series
337 Ken Leung Lost 8.5 2004 TV Series
50 Matthew Fox Lost 8.5 2004 TV Series
205 Michael Emerson Person of Interest 8.5 2011 TV Series
23 Naveen Andrews Lost 8.5 2004 TV Series
171 Terry O'Quinn Guts and Glory: The Rise and Fall of Oliver North 9.0 1989 TV Movie
102 Yunjin Kim Lost 8.5 2004 TV Series
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 167, "text": [ " actor title \\\n", "80 Daniel Dae Kim Lost \n", "255 Dominic Monaghan Lost \n", "315 Elizabeth Mitchell Lost \n", "193 Emilie de Ravin Lost \n", "116 Evangeline Lilly Lost \n", "295 Harold Perrineau Oz \n", "233 Henry Ian Cusick Lost \n", "6 Jorge Garcia Lost \n", "67 Josh Holloway Lost \n", "337 Ken Leung Lost \n", "50 Matthew Fox Lost \n", "205 Michael Emerson Person of Interest \n", "23 Naveen Andrews Lost \n", "171 Terry O'Quinn Guts and Glory: The Rise and Fall of Oliver North \n", "102 Yunjin Kim Lost \n", "\n", " score start_year type \n", "80 8.5 2004 TV Series \n", "255 8.5 2004 TV Series \n", "315 8.5 2004 TV Series \n", "193 8.5 2004 TV Series \n", "116 8.5 2004 TV Series \n", "295 8.9 1997 TV Series \n", "233 8.5 2004 TV Series \n", "6 8.5 2004 TV Series \n", "67 8.5 2004 TV Series \n", "337 8.5 2004 TV Series \n", "50 8.5 2004 TV Series \n", "205 8.5 2011 TV Series \n", "23 8.5 2004 TV Series \n", "171 9.0 1989 TV Movie \n", "102 8.5 2004 TV Series " ] } ], "prompt_number": 167 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Of the 15 of the most frequent actors on LOST, only 2 of them have ever had a major role in something that has a score higher than LOST.** Note that *Person of Interest* for Michael Emerson is rated the same as LOST, so we're excluding him from the club.\n", "\n", "---\n", "###Did LOST help actors get more major roles?\n", "\n", "We can also explore how many appearances each actor has had before and after LOST. In order to do that, we'll flag every entry as post-LOST if it started after 2004, then count the number of titles that come before or after." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# side note: not happy with this code... there must be a better way.\n", "\n", "big_and_small_screen['post_lost'] = big_and_small_screen['start_year'] > 2004\n", "before_and_after = pd.pivot_table(big_and_small_screen, columns=['post_lost'], \n", " values=['start_year'], index=['actor'], aggfunc=np.size).reset_index()\n", "before_and_after['more_after_lost'] = (before_and_after['start_year'][True] - before_and_after['start_year'][False] > 0)\n", "before_and_after" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
actorstart_yearmore_after_lost
post_lostFalseTrue
0 Daniel Dae Kim 9 4 False
1 Dominic Monaghan 6 10 True
2 Elizabeth Mitchell 15 8 False
3 Emilie de Ravin 3 12 True
4 Evangeline Lilly 1 3 True
5 Harold Perrineau 16 21 True
6 Henry Ian Cusick 8 9 True
7 Jorge Garcia 7 8 True
8 Josh Holloway 6 8 True
9 Ken Leung 12 8 False
10 Matthew Fox 7 5 False
11 Michael Emerson 9 10 True
12 Naveen Andrews 18 9 False
13 Terry O'Quinn 61 5 False
14 Yunjin Kim 5 8 True
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 168, "text": [ " actor start_year more_after_lost\n", "post_lost False True \n", "0 Daniel Dae Kim 9 4 False\n", "1 Dominic Monaghan 6 10 True\n", "2 Elizabeth Mitchell 15 8 False\n", "3 Emilie de Ravin 3 12 True\n", "4 Evangeline Lilly 1 3 True\n", "5 Harold Perrineau 16 21 True\n", "6 Henry Ian Cusick 8 9 True\n", "7 Jorge Garcia 7 8 True\n", "8 Josh Holloway 6 8 True\n", "9 Ken Leung 12 8 False\n", "10 Matthew Fox 7 5 False\n", "11 Michael Emerson 9 10 True\n", "12 Naveen Andrews 18 9 False\n", "13 Terry O'Quinn 61 5 False\n", "14 Yunjin Kim 5 8 True" ] } ], "prompt_number": 168 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**9 out of 15** actors had more major roles after 2004. This is a pretty naive comparison, though, since a recurring role on a TV show is only going to count for 1, while an actor who chooses to go to the big screen is going to have multiple movies they're starring in. It also doesn't take into account things like Terry O'Quinn's massive 61 roles before LOST.\n", "\n", "On that note, let's see if there's a difference in what type of media the actors starred in before and after LOST. We'll count the number of Movies, TV, or TV Movies to each actors name before and after LOST, then see which of those categories is the highest.\n", "\n", "####Before LOST: Most Major Roles by Type" ] }, { "cell_type": "code", "collapsed": false, "input": [ "pre_LOST_roles = big_and_small_screen[big_and_small_screen['post_lost'] == False]\n", "actor_type_counts = pre_LOST_roles.groupby(['actor','type']).size().reset_index()\n", "actor_type_counts.columns = ['actor','type','occurrences']\n", "actor_type_counts.ix[actor_type_counts.groupby('actor')['occurrences'].idxmax()]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
actortypeoccurrences
0 Daniel Dae Kim Movie 4
5 Dominic Monaghan TV Series 3
6 Elizabeth Mitchell Movie 6
10 Emilie de Ravin TV Series 2
11 Evangeline Lilly TV Series 1
12 Harold Perrineau Movie 13
16 Henry Ian Cusick TV Movie 4
18 Jorge Garcia Movie 5
21 Josh Holloway Movie 4
24 Ken Leung Movie 9
29 Matthew Fox TV Series 4
30 Michael Emerson Movie 6
33 Naveen Andrews Movie 12
37 Terry O'Quinn TV Movie 32
39 Yunjin Kim Movie 4
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 169, "text": [ " actor type occurrences\n", "0 Daniel Dae Kim Movie 4\n", "5 Dominic Monaghan TV Series 3\n", "6 Elizabeth Mitchell Movie 6\n", "10 Emilie de Ravin TV Series 2\n", "11 Evangeline Lilly TV Series 1\n", "12 Harold Perrineau Movie 13\n", "16 Henry Ian Cusick TV Movie 4\n", "18 Jorge Garcia Movie 5\n", "21 Josh Holloway Movie 4\n", "24 Ken Leung Movie 9\n", "29 Matthew Fox TV Series 4\n", "30 Michael Emerson Movie 6\n", "33 Naveen Andrews Movie 12\n", "37 Terry O'Quinn TV Movie 32\n", "39 Yunjin Kim Movie 4" ] } ], "prompt_number": 169 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that these also include starring in LOST itself. Henry Ian Cusick is in good company with Terry O'Quinn as a major TV Movie actor! Alright!\n", "\n", "Let's quick tally tally up the types:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "actor_type_counts.ix[actor_type_counts.groupby('actor')['occurrences'].idxmax()]['type'].value_counts()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 170, "text": [ "Movie 9\n", "TV Series 4\n", "TV Movie 2\n", "dtype: int64" ] } ], "prompt_number": 170 }, { "cell_type": "markdown", "metadata": {}, "source": [ "####After LOST: Most Major Roles by Type" ] }, { "cell_type": "code", "collapsed": false, "input": [ "post_LOST_roles = big_and_small_screen[big_and_small_screen['post_lost'] == True]\n", "actor_type_counts = post_LOST_roles.groupby(['actor','type']).size().reset_index()\n", "actor_type_counts.columns = ['actor','type','occurrences']\n", "actor_type_counts.ix[actor_type_counts.groupby('actor')['occurrences'].idxmax()]" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
actortypeoccurrences
0 Daniel Dae Kim Movie 3
2 Dominic Monaghan Movie 6
5 Elizabeth Mitchell Movie 4
8 Emilie de Ravin Movie 7
11 Evangeline Lilly Movie 3
12 Harold Perrineau Movie 15
15 Henry Ian Cusick Movie 8
17 Jorge Garcia Movie 6
20 Josh Holloway Movie 6
23 Ken Leung Movie 3
26 Matthew Fox Movie 5
27 Michael Emerson Movie 6
30 Naveen Andrews Movie 5
34 Terry O'Quinn TV Series 3
35 Yunjin Kim Movie 6
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 171, "text": [ " actor type occurrences\n", "0 Daniel Dae Kim Movie 3\n", "2 Dominic Monaghan Movie 6\n", "5 Elizabeth Mitchell Movie 4\n", "8 Emilie de Ravin Movie 7\n", "11 Evangeline Lilly Movie 3\n", "12 Harold Perrineau Movie 15\n", "15 Henry Ian Cusick Movie 8\n", "17 Jorge Garcia Movie 6\n", "20 Josh Holloway Movie 6\n", "23 Ken Leung Movie 3\n", "26 Matthew Fox Movie 5\n", "27 Michael Emerson Movie 6\n", "30 Naveen Andrews Movie 5\n", "34 Terry O'Quinn TV Series 3\n", "35 Yunjin Kim Movie 6" ] } ], "prompt_number": 171 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Everyone but Terry O'Quinn seemed to go to the big screen after LOST." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Wrap-Up:\n", "Our quick, rudimentary analysis gave us some wonderful insight into the acting lives of 15 actors from LOST. Here's what we've learned:\n", "- The average score for something in which any LOST actor had a major role is 6.3.\n", "- Of the 15 main LOST actors, LOST was the highest scored media for 13 of them\n", "- 9 of the 15 actors had more major roles after LOST than before it\n", "- 14 of the 15 actors appear to have gone to the big screen after LOST\n", "\n", "###Some other stuff we could do\n", "- Build a cool network diagram from the data. (http://en.wikipedia.org/wiki/Co-stardom_network)\n", "- Include minor roles and conduct a more detailed analysis with that info\n", "- [You tell me](mailto:petermbaumgartner@gmail.com) or [do your own thing](https://github.com/pmbaumgartner/LOST)" ] } ], "metadata": {} } ] }