{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# PyCon 2019: Data Science Best Practices with pandas ([video](https://www.youtube.com/watch?v=dPwLlJkSHLo&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=36))\n", "\n", "### GitHub repository: https://github.com/justmarkham/pycon-2019-tutorial\n", "\n", "### Instructor: Kevin Markham\n", "\n", "- Website: https://www.dataschool.io\n", "- YouTube: https://www.youtube.com/dataschool\n", "- Patreon: https://www.patreon.com/dataschool\n", "- Twitter: https://twitter.com/justmarkham\n", "- GitHub: https://github.com/justmarkham" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Introduction to the TED Talks dataset\n", "\n", "https://www.kaggle.com/rounakbanik/ted-talks" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0.24.2'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "pd.__version__" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "ted = pd.read_csv('ted.csv')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
commentsdescriptiondurationeventfilm_datelanguagesmain_speakernamenum_speakerpublished_dateratingsrelated_talksspeaker_occupationtagstitleurlviews
04553Sir Ken Robinson makes an entertaining and pro...1164TED2006114082560060Ken RobinsonKen Robinson: Do schools kill creativity?11151367060[{'id': 7, 'name': 'Funny', 'count': 19645}, {...[{'id': 865, 'hero': 'https://pe.tedcdn.com/im...Author/educator['children', 'creativity', 'culture', 'dance',...Do schools kill creativity?https://www.ted.com/talks/ken_robinson_says_sc...47227110
1265With the same humor and humanity he exuded in ...977TED2006114082560043Al GoreAl Gore: Averting the climate crisis11151367060[{'id': 7, 'name': 'Funny', 'count': 544}, {'i...[{'id': 243, 'hero': 'https://pe.tedcdn.com/im...Climate advocate['alternative energy', 'cars', 'climate change...Averting the climate crisishttps://www.ted.com/talks/al_gore_on_averting_...3200520
2124New York Times columnist David Pogue takes aim...1286TED2006114073920026David PogueDavid Pogue: Simplicity sells11151367060[{'id': 7, 'name': 'Funny', 'count': 964}, {'i...[{'id': 1725, 'hero': 'https://pe.tedcdn.com/i...Technology columnist['computers', 'entertainment', 'interface desi...Simplicity sellshttps://www.ted.com/talks/david_pogue_says_sim...1636292
3200In an emotionally charged talk, MacArthur-winn...1116TED2006114091200035Majora CarterMajora Carter: Greening the ghetto11151367060[{'id': 3, 'name': 'Courageous', 'count': 760}...[{'id': 1041, 'hero': 'https://pe.tedcdn.com/i...Activist for environmental justice['MacArthur grant', 'activism', 'business', 'c...Greening the ghettohttps://www.ted.com/talks/majora_carter_s_tale...1697550
4593You've never seen data presented like this. Wi...1190TED2006114056640048Hans RoslingHans Rosling: The best stats you've ever seen11151440680[{'id': 9, 'name': 'Ingenious', 'count': 3202}...[{'id': 2056, 'hero': 'https://pe.tedcdn.com/i...Global health expert; data visionary['Africa', 'Asia', 'Google', 'demo', 'economic...The best stats you've ever seenhttps://www.ted.com/talks/hans_rosling_shows_t...12005869
\n", "
" ], "text/plain": [ " comments description duration \\\n", "0 4553 Sir Ken Robinson makes an entertaining and pro... 1164 \n", "1 265 With the same humor and humanity he exuded in ... 977 \n", "2 124 New York Times columnist David Pogue takes aim... 1286 \n", "3 200 In an emotionally charged talk, MacArthur-winn... 1116 \n", "4 593 You've never seen data presented like this. Wi... 1190 \n", "\n", " event film_date languages main_speaker \\\n", "0 TED2006 1140825600 60 Ken Robinson \n", "1 TED2006 1140825600 43 Al Gore \n", "2 TED2006 1140739200 26 David Pogue \n", "3 TED2006 1140912000 35 Majora Carter \n", "4 TED2006 1140566400 48 Hans Rosling \n", "\n", " name num_speaker published_date \\\n", "0 Ken Robinson: Do schools kill creativity? 1 1151367060 \n", "1 Al Gore: Averting the climate crisis 1 1151367060 \n", "2 David Pogue: Simplicity sells 1 1151367060 \n", "3 Majora Carter: Greening the ghetto 1 1151367060 \n", "4 Hans Rosling: The best stats you've ever seen 1 1151440680 \n", "\n", " ratings \\\n", "0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {... \n", "1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i... \n", "2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i... \n", "3 [{'id': 3, 'name': 'Courageous', 'count': 760}... \n", "4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}... \n", "\n", " related_talks \\\n", "0 [{'id': 865, 'hero': 'https://pe.tedcdn.com/im... \n", "1 [{'id': 243, 'hero': 'https://pe.tedcdn.com/im... \n", "2 [{'id': 1725, 'hero': 'https://pe.tedcdn.com/i... \n", "3 [{'id': 1041, 'hero': 'https://pe.tedcdn.com/i... \n", "4 [{'id': 2056, 'hero': 'https://pe.tedcdn.com/i... \n", "\n", " speaker_occupation \\\n", "0 Author/educator \n", "1 Climate advocate \n", "2 Technology columnist \n", "3 Activist for environmental justice \n", "4 Global health expert; data visionary \n", "\n", " tags \\\n", "0 ['children', 'creativity', 'culture', 'dance',... \n", "1 ['alternative energy', 'cars', 'climate change... \n", "2 ['computers', 'entertainment', 'interface desi... \n", "3 ['MacArthur grant', 'activism', 'business', 'c... \n", "4 ['Africa', 'Asia', 'Google', 'demo', 'economic... \n", "\n", " title \\\n", "0 Do schools kill creativity? \n", "1 Averting the climate crisis \n", "2 Simplicity sells \n", "3 Greening the ghetto \n", "4 The best stats you've ever seen \n", "\n", " url views \n", "0 https://www.ted.com/talks/ken_robinson_says_sc... 47227110 \n", "1 https://www.ted.com/talks/al_gore_on_averting_... 3200520 \n", "2 https://www.ted.com/talks/david_pogue_says_sim... 1636292 \n", "3 https://www.ted.com/talks/majora_carter_s_tale... 1697550 \n", "4 https://www.ted.com/talks/hans_rosling_shows_t... 12005869 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# each row represents a single talk\n", "ted.head()" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2550, 17)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# rows, columns\n", "ted.shape" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "comments int64\n", "description object\n", "duration int64\n", "event object\n", "film_date int64\n", "languages int64\n", "main_speaker object\n", "name object\n", "num_speaker int64\n", "published_date int64\n", "ratings object\n", "related_talks object\n", "speaker_occupation object\n", "tags object\n", "title object\n", "url object\n", "views int64\n", "dtype: object" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# object columns are usually strings, but can also be arbitrary Python objects (lists, dictionaries)\n", "ted.dtypes" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "comments 0\n", "description 0\n", "duration 0\n", "event 0\n", "film_date 0\n", "languages 0\n", "main_speaker 0\n", "name 0\n", "num_speaker 0\n", "published_date 0\n", "ratings 0\n", "related_talks 0\n", "speaker_occupation 6\n", "tags 0\n", "title 0\n", "url 0\n", "views 0\n", "dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# count the number of missing values in each column\n", "ted.isna().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Which talks provoke the most online discussion?" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
commentsdescriptiondurationeventfilm_datelanguagesmain_speakernamenum_speakerpublished_dateratingsrelated_talksspeaker_occupationtagstitleurlviews
17872673Our consciousness is a fundamental aspect of o...1117TED2014139510080033David ChalmersDavid Chalmers: How do you explain consciousness?11405350484[{'id': 25, 'name': 'OK', 'count': 280}, {'id'...[{'id': 1308, 'hero': 'https://pe.tedcdn.com/i...Philosopher['brain', 'consciousness', 'neuroscience', 'ph...How do you explain consciousness?https://www.ted.com/talks/david_chalmers_how_d...2162764
2012877Jill Bolte Taylor got a research opportunity f...1099TED2008120407040049Jill Bolte TaylorJill Bolte Taylor: My stroke of insight11205284200[{'id': 22, 'name': 'Fascinating', 'count': 14...[{'id': 184, 'hero': 'https://pe.tedcdn.com/im...Neuroanatomist['biology', 'brain', 'consciousness', 'global ...My stroke of insighthttps://www.ted.com/talks/jill_bolte_taylor_s_...21190883
6443356Questions of good and evil, right and wrong ar...1386TED2010126584640039Sam HarrisSam Harris: Science can answer moral questions11269249180[{'id': 8, 'name': 'Informative', 'count': 923...[{'id': 666, 'hero': 'https://pe.tedcdn.com/im...Neuroscientist, philosopher['culture', 'evolutionary psychology', 'global...Science can answer moral questionshttps://www.ted.com/talks/sam_harris_science_c...3433437
04553Sir Ken Robinson makes an entertaining and pro...1164TED2006114082560060Ken RobinsonKen Robinson: Do schools kill creativity?11151367060[{'id': 7, 'name': 'Funny', 'count': 19645}, {...[{'id': 865, 'hero': 'https://pe.tedcdn.com/im...Author/educator['children', 'creativity', 'culture', 'dance',...Do schools kill creativity?https://www.ted.com/talks/ken_robinson_says_sc...47227110
966404Richard Dawkins urges all atheists to openly s...1750TED2002101260800042Richard DawkinsRichard Dawkins: Militant atheism11176689220[{'id': 3, 'name': 'Courageous', 'count': 3236...[{'id': 86, 'hero': 'https://pe.tedcdn.com/ima...Evolutionary biologist['God', 'atheism', 'culture', 'religion', 'sci...Militant atheismhttps://www.ted.com/talks/richard_dawkins_on_m...4374792
\n", "
" ], "text/plain": [ " comments description duration \\\n", "1787 2673 Our consciousness is a fundamental aspect of o... 1117 \n", "201 2877 Jill Bolte Taylor got a research opportunity f... 1099 \n", "644 3356 Questions of good and evil, right and wrong ar... 1386 \n", "0 4553 Sir Ken Robinson makes an entertaining and pro... 1164 \n", "96 6404 Richard Dawkins urges all atheists to openly s... 1750 \n", "\n", " event film_date languages main_speaker \\\n", "1787 TED2014 1395100800 33 David Chalmers \n", "201 TED2008 1204070400 49 Jill Bolte Taylor \n", "644 TED2010 1265846400 39 Sam Harris \n", "0 TED2006 1140825600 60 Ken Robinson \n", "96 TED2002 1012608000 42 Richard Dawkins \n", "\n", " name num_speaker \\\n", "1787 David Chalmers: How do you explain consciousness? 1 \n", "201 Jill Bolte Taylor: My stroke of insight 1 \n", "644 Sam Harris: Science can answer moral questions 1 \n", "0 Ken Robinson: Do schools kill creativity? 1 \n", "96 Richard Dawkins: Militant atheism 1 \n", "\n", " published_date ratings \\\n", "1787 1405350484 [{'id': 25, 'name': 'OK', 'count': 280}, {'id'... \n", "201 1205284200 [{'id': 22, 'name': 'Fascinating', 'count': 14... \n", "644 1269249180 [{'id': 8, 'name': 'Informative', 'count': 923... \n", "0 1151367060 [{'id': 7, 'name': 'Funny', 'count': 19645}, {... \n", "96 1176689220 [{'id': 3, 'name': 'Courageous', 'count': 3236... \n", "\n", " related_talks \\\n", "1787 [{'id': 1308, 'hero': 'https://pe.tedcdn.com/i... \n", "201 [{'id': 184, 'hero': 'https://pe.tedcdn.com/im... \n", "644 [{'id': 666, 'hero': 'https://pe.tedcdn.com/im... \n", "0 [{'id': 865, 'hero': 'https://pe.tedcdn.com/im... \n", "96 [{'id': 86, 'hero': 'https://pe.tedcdn.com/ima... \n", "\n", " speaker_occupation \\\n", "1787 Philosopher \n", "201 Neuroanatomist \n", "644 Neuroscientist, philosopher \n", "0 Author/educator \n", "96 Evolutionary biologist \n", "\n", " tags \\\n", "1787 ['brain', 'consciousness', 'neuroscience', 'ph... \n", "201 ['biology', 'brain', 'consciousness', 'global ... \n", "644 ['culture', 'evolutionary psychology', 'global... \n", "0 ['children', 'creativity', 'culture', 'dance',... \n", "96 ['God', 'atheism', 'culture', 'religion', 'sci... \n", "\n", " title \\\n", "1787 How do you explain consciousness? \n", "201 My stroke of insight \n", "644 Science can answer moral questions \n", "0 Do schools kill creativity? \n", "96 Militant atheism \n", "\n", " url views \n", "1787 https://www.ted.com/talks/david_chalmers_how_d... 2162764 \n", "201 https://www.ted.com/talks/jill_bolte_taylor_s_... 21190883 \n", "644 https://www.ted.com/talks/sam_harris_science_c... 3433437 \n", "0 https://www.ted.com/talks/ken_robinson_says_sc... 47227110 \n", "96 https://www.ted.com/talks/richard_dawkins_on_m... 4374792 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# sort by the number of first-level comments, though this is biased in favor of older talks\n", "ted.sort_values('comments').tail()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# correct for this bias by calculating the number of comments per view\n", "ted['comments_per_view'] = ted.comments / ted.views" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
commentsdescriptiondurationeventfilm_datelanguagesmain_speakernamenum_speakerpublished_dateratingsrelated_talksspeaker_occupationtagstitleurlviewscomments_per_view
9542492Janet Echelman found her true voice as an arti...566TED2011129911040035Janet EchelmanJanet Echelman: Taking imagination seriously11307489760[{'id': 23, 'name': 'Jaw-dropping', 'count': 3...[{'id': 453, 'hero': 'https://pe.tedcdn.com/im...Artist['art', 'cities', 'culture', 'data', 'design',...Taking imagination seriouslyhttps://www.ted.com/talks/janet_echelman18329300.001360
6941502Filmmaker Sharmeen Obaid-Chinoy takes on a ter...489TED2010126576000032Sharmeen Obaid-ChinoySharmeen Obaid-Chinoy: Inside a school for sui...11274865960[{'id': 23, 'name': 'Jaw-dropping', 'count': 3...[{'id': 171, 'hero': 'https://pe.tedcdn.com/im...Filmmaker['TED Fellows', 'children', 'culture', 'film',...Inside a school for suicide bombershttps://www.ted.com/talks/sharmeen_obaid_chino...10572380.001421
966404Richard Dawkins urges all atheists to openly s...1750TED2002101260800042Richard DawkinsRichard Dawkins: Militant atheism11176689220[{'id': 3, 'name': 'Courageous', 'count': 3236...[{'id': 86, 'hero': 'https://pe.tedcdn.com/ima...Evolutionary biologist['God', 'atheism', 'culture', 'religion', 'sci...Militant atheismhttps://www.ted.com/talks/richard_dawkins_on_m...43747920.001464
803834David Bismark demos a new system for voting th...422TEDGlobal 2010127906560036David BismarkDavid Bismark: E-voting without fraud11288685640[{'id': 25, 'name': 'OK', 'count': 111}, {'id'...[{'id': 803, 'hero': 'https://pe.tedcdn.com/im...Voting system designer['culture', 'democracy', 'design', 'global iss...E-voting without fraudhttps://www.ted.com/talks/david_bismark_e_voti...5435510.001534
744649Hours before New York lawmakers rejected a key...453New York State Senate12597120000Diane J. SavinoDiane J. Savino: The case for same-sex marriage11282062180[{'id': 25, 'name': 'OK', 'count': 100}, {'id'...[{'id': 217, 'hero': 'https://pe.tedcdn.com/im...Senator['God', 'LGBT', 'culture', 'government', 'law'...The case for same-sex marriagehttps://www.ted.com/talks/diane_j_savino_the_c...2923950.002220
\n", "
" ], "text/plain": [ " comments description duration \\\n", "954 2492 Janet Echelman found her true voice as an arti... 566 \n", "694 1502 Filmmaker Sharmeen Obaid-Chinoy takes on a ter... 489 \n", "96 6404 Richard Dawkins urges all atheists to openly s... 1750 \n", "803 834 David Bismark demos a new system for voting th... 422 \n", "744 649 Hours before New York lawmakers rejected a key... 453 \n", "\n", " event film_date languages main_speaker \\\n", "954 TED2011 1299110400 35 Janet Echelman \n", "694 TED2010 1265760000 32 Sharmeen Obaid-Chinoy \n", "96 TED2002 1012608000 42 Richard Dawkins \n", "803 TEDGlobal 2010 1279065600 36 David Bismark \n", "744 New York State Senate 1259712000 0 Diane J. Savino \n", "\n", " name num_speaker \\\n", "954 Janet Echelman: Taking imagination seriously 1 \n", "694 Sharmeen Obaid-Chinoy: Inside a school for sui... 1 \n", "96 Richard Dawkins: Militant atheism 1 \n", "803 David Bismark: E-voting without fraud 1 \n", "744 Diane J. Savino: The case for same-sex marriage 1 \n", "\n", " published_date ratings \\\n", "954 1307489760 [{'id': 23, 'name': 'Jaw-dropping', 'count': 3... \n", "694 1274865960 [{'id': 23, 'name': 'Jaw-dropping', 'count': 3... \n", "96 1176689220 [{'id': 3, 'name': 'Courageous', 'count': 3236... \n", "803 1288685640 [{'id': 25, 'name': 'OK', 'count': 111}, {'id'... \n", "744 1282062180 [{'id': 25, 'name': 'OK', 'count': 100}, {'id'... \n", "\n", " related_talks \\\n", "954 [{'id': 453, 'hero': 'https://pe.tedcdn.com/im... \n", "694 [{'id': 171, 'hero': 'https://pe.tedcdn.com/im... \n", "96 [{'id': 86, 'hero': 'https://pe.tedcdn.com/ima... \n", "803 [{'id': 803, 'hero': 'https://pe.tedcdn.com/im... \n", "744 [{'id': 217, 'hero': 'https://pe.tedcdn.com/im... \n", "\n", " speaker_occupation \\\n", "954 Artist \n", "694 Filmmaker \n", "96 Evolutionary biologist \n", "803 Voting system designer \n", "744 Senator \n", "\n", " tags \\\n", "954 ['art', 'cities', 'culture', 'data', 'design',... \n", "694 ['TED Fellows', 'children', 'culture', 'film',... \n", "96 ['God', 'atheism', 'culture', 'religion', 'sci... \n", "803 ['culture', 'democracy', 'design', 'global iss... \n", "744 ['God', 'LGBT', 'culture', 'government', 'law'... \n", "\n", " title \\\n", "954 Taking imagination seriously \n", "694 Inside a school for suicide bombers \n", "96 Militant atheism \n", "803 E-voting without fraud \n", "744 The case for same-sex marriage \n", "\n", " url views \\\n", "954 https://www.ted.com/talks/janet_echelman 1832930 \n", "694 https://www.ted.com/talks/sharmeen_obaid_chino... 1057238 \n", "96 https://www.ted.com/talks/richard_dawkins_on_m... 4374792 \n", "803 https://www.ted.com/talks/david_bismark_e_voti... 543551 \n", "744 https://www.ted.com/talks/diane_j_savino_the_c... 292395 \n", "\n", " comments_per_view \n", "954 0.001360 \n", "694 0.001421 \n", "96 0.001464 \n", "803 0.001534 \n", "744 0.002220 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# interpretation: for every view of the same-sex marriage talk, there are 0.002 comments\n", "ted.sort_values('comments_per_view').tail()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# make this more interpretable by inverting the calculation\n", "ted['views_per_comment'] = ted.views / ted.comments" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
commentsdescriptiondurationeventfilm_datelanguagesmain_speakernamenum_speakerpublished_dateratingsrelated_talksspeaker_occupationtagstitleurlviewscomments_per_viewviews_per_comment
744649Hours before New York lawmakers rejected a key...453New York State Senate12597120000Diane J. SavinoDiane J. Savino: The case for same-sex marriage11282062180[{'id': 25, 'name': 'OK', 'count': 100}, {'id'...[{'id': 217, 'hero': 'https://pe.tedcdn.com/im...Senator['God', 'LGBT', 'culture', 'government', 'law'...The case for same-sex marriagehttps://www.ted.com/talks/diane_j_savino_the_c...2923950.002220450.531587
803834David Bismark demos a new system for voting th...422TEDGlobal 2010127906560036David BismarkDavid Bismark: E-voting without fraud11288685640[{'id': 25, 'name': 'OK', 'count': 111}, {'id'...[{'id': 803, 'hero': 'https://pe.tedcdn.com/im...Voting system designer['culture', 'democracy', 'design', 'global iss...E-voting without fraudhttps://www.ted.com/talks/david_bismark_e_voti...5435510.001534651.739808
966404Richard Dawkins urges all atheists to openly s...1750TED2002101260800042Richard DawkinsRichard Dawkins: Militant atheism11176689220[{'id': 3, 'name': 'Courageous', 'count': 3236...[{'id': 86, 'hero': 'https://pe.tedcdn.com/ima...Evolutionary biologist['God', 'atheism', 'culture', 'religion', 'sci...Militant atheismhttps://www.ted.com/talks/richard_dawkins_on_m...43747920.001464683.134291
6941502Filmmaker Sharmeen Obaid-Chinoy takes on a ter...489TED2010126576000032Sharmeen Obaid-ChinoySharmeen Obaid-Chinoy: Inside a school for sui...11274865960[{'id': 23, 'name': 'Jaw-dropping', 'count': 3...[{'id': 171, 'hero': 'https://pe.tedcdn.com/im...Filmmaker['TED Fellows', 'children', 'culture', 'film',...Inside a school for suicide bombershttps://www.ted.com/talks/sharmeen_obaid_chino...10572380.001421703.886818
9542492Janet Echelman found her true voice as an arti...566TED2011129911040035Janet EchelmanJanet Echelman: Taking imagination seriously11307489760[{'id': 23, 'name': 'Jaw-dropping', 'count': 3...[{'id': 453, 'hero': 'https://pe.tedcdn.com/im...Artist['art', 'cities', 'culture', 'data', 'design',...Taking imagination seriouslyhttps://www.ted.com/talks/janet_echelman18329300.001360735.525682
\n", "
" ], "text/plain": [ " comments description duration \\\n", "744 649 Hours before New York lawmakers rejected a key... 453 \n", "803 834 David Bismark demos a new system for voting th... 422 \n", "96 6404 Richard Dawkins urges all atheists to openly s... 1750 \n", "694 1502 Filmmaker Sharmeen Obaid-Chinoy takes on a ter... 489 \n", "954 2492 Janet Echelman found her true voice as an arti... 566 \n", "\n", " event film_date languages main_speaker \\\n", "744 New York State Senate 1259712000 0 Diane J. Savino \n", "803 TEDGlobal 2010 1279065600 36 David Bismark \n", "96 TED2002 1012608000 42 Richard Dawkins \n", "694 TED2010 1265760000 32 Sharmeen Obaid-Chinoy \n", "954 TED2011 1299110400 35 Janet Echelman \n", "\n", " name num_speaker \\\n", "744 Diane J. Savino: The case for same-sex marriage 1 \n", "803 David Bismark: E-voting without fraud 1 \n", "96 Richard Dawkins: Militant atheism 1 \n", "694 Sharmeen Obaid-Chinoy: Inside a school for sui... 1 \n", "954 Janet Echelman: Taking imagination seriously 1 \n", "\n", " published_date ratings \\\n", "744 1282062180 [{'id': 25, 'name': 'OK', 'count': 100}, {'id'... \n", "803 1288685640 [{'id': 25, 'name': 'OK', 'count': 111}, {'id'... \n", "96 1176689220 [{'id': 3, 'name': 'Courageous', 'count': 3236... \n", "694 1274865960 [{'id': 23, 'name': 'Jaw-dropping', 'count': 3... \n", "954 1307489760 [{'id': 23, 'name': 'Jaw-dropping', 'count': 3... \n", "\n", " related_talks \\\n", "744 [{'id': 217, 'hero': 'https://pe.tedcdn.com/im... \n", "803 [{'id': 803, 'hero': 'https://pe.tedcdn.com/im... \n", "96 [{'id': 86, 'hero': 'https://pe.tedcdn.com/ima... \n", "694 [{'id': 171, 'hero': 'https://pe.tedcdn.com/im... \n", "954 [{'id': 453, 'hero': 'https://pe.tedcdn.com/im... \n", "\n", " speaker_occupation \\\n", "744 Senator \n", "803 Voting system designer \n", "96 Evolutionary biologist \n", "694 Filmmaker \n", "954 Artist \n", "\n", " tags \\\n", "744 ['God', 'LGBT', 'culture', 'government', 'law'... \n", "803 ['culture', 'democracy', 'design', 'global iss... \n", "96 ['God', 'atheism', 'culture', 'religion', 'sci... \n", "694 ['TED Fellows', 'children', 'culture', 'film',... \n", "954 ['art', 'cities', 'culture', 'data', 'design',... \n", "\n", " title \\\n", "744 The case for same-sex marriage \n", "803 E-voting without fraud \n", "96 Militant atheism \n", "694 Inside a school for suicide bombers \n", "954 Taking imagination seriously \n", "\n", " url views \\\n", "744 https://www.ted.com/talks/diane_j_savino_the_c... 292395 \n", "803 https://www.ted.com/talks/david_bismark_e_voti... 543551 \n", "96 https://www.ted.com/talks/richard_dawkins_on_m... 4374792 \n", "694 https://www.ted.com/talks/sharmeen_obaid_chino... 1057238 \n", "954 https://www.ted.com/talks/janet_echelman 1832930 \n", "\n", " comments_per_view views_per_comment \n", "744 0.002220 450.531587 \n", "803 0.001534 651.739808 \n", "96 0.001464 683.134291 \n", "694 0.001421 703.886818 \n", "954 0.001360 735.525682 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# interpretation: 1 out of every 450 people leave a comment\n", "ted.sort_values('views_per_comment').head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lessons:\n", "\n", "1. Consider the limitations and biases of your data when analyzing it\n", "2. Make your results understandable" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Visualize the distribution of comments" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# line plot is not appropriate here (use it to measure something over time)\n", "ted.comments.plot()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAD8CAYAAABgmUMCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAE7pJREFUeJzt3X+w5XV93/Hny+WHojQsYaHbZckuma2VZBToBnFIU6MRgTQSO7GFycQdYrKZBGZ0kplmMW2wyThjOlFTphbFsg1YleDvrZKSlZg4+UNgQeSHK+EGt7IuZdfQANFUAnn3j/O5cFju3ns+eM8957jPx8x3zvf7Pp/v+b4PHHjd74/zPakqJEka1Qsm3YAkabYYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhwx6QbG4YQTTqgNGzZMug1Jmim33377t6pqzVLjvi+DY8OGDezatWvSbUjSTEnyv0cZ56EqSVKXsQVHkvVJvpBkd5J7k7y11d+R5JtJ7mzTBUPrXJ5kLsl9SV4/VD+v1eaSbBtXz5KkpY3zUNWTwG9U1R1JjgVuT7KzPffeqvr94cFJTgMuAn4E+CfA55P80/b0+4DXAXuB25LsqKqvjrF3SdIhjC04quoh4KE2/3iS3cC6RVa5ELi+qr4LfD3JHHBWe26uqh4ASHJ9G2twSNIErMg5jiQbgDOAW1rpsiR3JdmeZHWrrQMeHFptb6sdqn7wNrYm2ZVk14EDB5b5HUiS5o09OJK8BPgE8Laqegy4Cvhh4HQGeyTvnh+6wOq1SP3Zhaqrq2pzVW1es2bJq8kkSc/TWC/HTXIkg9D4cFV9EqCqHh56/oPAZ9viXmD90OonA/va/KHqkqQVNs6rqgJcA+yuqvcM1dcODXsjcE+b3wFclOToJBuBTcCtwG3ApiQbkxzF4AT6jnH1LUla3Dj3OM4BfgG4O8mdrfZ24OIkpzM43LQH+BWAqro3yQ0MTno/CVxaVU8BJLkMuAlYBWyvqnvH2LckaRGpes7pgpm3efPm+l6+Ob5h2+eWsZvR7XnXT09ku5IEkOT2qtq81Di/OS5J6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqMrbgSLI+yReS7E5yb5K3tvrxSXYmub89rm71JLkyyVySu5KcOfRaW9r4+5NsGVfPkqSljXOP40ngN6rqZcDZwKVJTgO2ATdX1Sbg5rYMcD6wqU1bgatgEDTAFcArgbOAK+bDRpK08sYWHFX1UFXd0eYfB3YD64ALgWvbsGuBn23zFwLX1cCXgOOSrAVeD+ysqkeq6v8CO4HzxtW3JGlxK3KOI8kG4AzgFuCkqnoIBuECnNiGrQMeHFptb6sdqi5JmoCxB0eSlwCfAN5WVY8tNnSBWi1SP3g7W5PsSrLrwIEDz69ZSdKSxhocSY5kEBofrqpPtvLD7RAU7XF/q+8F1g+tfjKwb5H6s1TV1VW1uao2r1mzZnnfiCTpaeO8qirANcDuqnrP0FM7gPkro7YAnxmqv7ldXXU28Gg7lHUTcG6S1e2k+LmtJkmagCPG+NrnAL8A3J3kzlZ7O/Au4IYkbwG+AbypPXcjcAEwB3wHuASgqh5J8rvAbW3c71TVI2PsW5K0iLEFR1X9BQufnwB47QLjC7j0EK+1Hdi+fN1Jkp4vvzkuSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLiMFR5IfHXcjkqTZMOoex/uT3Jrk15IcN9aOJElTbaTgqKofB34eWA/sSvKRJK9bbJ0k25PsT3LPUO0dSb6Z5M42XTD03OVJ5pLcl+T1Q/XzWm0uybbudyhJWlYjn+OoqvuBfw/8JvAvgSuTfC3Jvz7EKn8InLdA/b1VdXqbbgRIchpwEfAjbZ3/mmRVklXA+4DzgdOAi9tYSdKEjHqO4+VJ3gvsBl4D/ExVvazNv3ehdarqi8AjI/ZxIXB9VX23qr4OzAFntWmuqh6oqieA69tYSdKEjLrH8V+AO4BXVNWlVXUHQFXtY7AX0uOyJHe1Q1mrW20d8ODQmL2tdqi6JGlCRg2OC4CPVNXfASR5QZJjAKrqQx3buwr4YeB04CHg3a2eBcbWIvXnSLI1ya4kuw4cONDRkiSpx6jB8XngRUPLx7Ral6p6uKqeqqp/AD7I4FAUDPYk1g8NPRnYt0h9ode+uqo2V9XmNWvW9LYmSRrRqMHxwqr62/mFNn9M78aSrB1afCMwf8XVDuCiJEcn2QhsAm4FbgM2JdmY5CgGJ9B39G5XkrR8jhhx3LeTnDl/biPJPwf+brEVknwUeDVwQpK9wBXAq5OczuBw0x7gVwCq6t4kNwBfBZ4ELq2qp9rrXAbcBKwCtlfVvV3vUJK0rEYNjrcBH0syf5hoLfBvF1uhqi5eoHzNIuPfCbxzgfqNwI0j9ilJGrORgqOqbkvyz4CXMjhh/bWq+vuxdiZJmkqj7nEA/Biwoa1zRhKq6rqxdCVJmlojBUeSDzG4jPZO4KlWLsDgkKTDzKh7HJuB06pqwe9QSJIOH6NejnsP8I/H2YgkaTaMusdxAvDVJLcC350vVtUbxtKVJGlqjRoc7xhnE5Kk2THq5bh/nuSHgE1V9fl2n6pV421NkjSNRr2t+i8DHwc+0ErrgE+PqylJ0vQa9eT4pcA5wGPw9I86nTiupiRJ02vU4Phu+yElAJIcwSFuby5J+v42anD8eZK3Ay9qvzX+MeB/jq8tSdK0GjU4tgEHgLsZ3NH2Rvp/+U+S9H1g1Kuq5n946YPjbUeSNO1GvVfV11ngnEZVnbrsHUmSplrPvarmvRB4E3D88rcjSZp2I53jqKq/Hpq+WVV/ALxmzL1JkqbQqIeqzhxafAGDPZBjx9KRJGmqjXqo6t1D808y+L3wf7Ps3UiSpt6oV1X95LgbkSTNhlEPVf36Ys9X1XuWpx1J0rTruarqx4AdbflngC8CD46jKUnS9Or5Iaczq+pxgCTvAD5WVb80rsYkSdNp1FuOnAI8MbT8BLBh2buRJE29Ufc4PgTcmuRTDL5B/kbgurF1JUmaWqNeVfXOJH8M/ItWuqSqvjy+tiRJ02rUQ1UAxwCPVdV/BvYm2TimniRJU2zUn469AvhN4PJWOhL4H+NqSpI0vUbd43gj8Abg2wBVtQ9vOSJJh6VRg+OJqirardWTvHh8LUmSptmowXFDkg8AxyX5ZeDz+KNOknRYGvWqqt9vvzX+GPBS4LeraudYO5MkTaUlgyPJKuCmqvopwLCQpMPckoeqquop4DtJfmAF+pEkTblRz3H8P+DuJNckuXJ+WmyFJNuT7E9yz1Dt+CQ7k9zfHle3etprziW5a/iHo5JsaePvT7Ll+bxJSdLyGTU4Pgf8BwZ3xL19aFrMHwLnHVTbBtxcVZuAm9sywPnApjZtBa6CQdAAVwCvBM4CrpgPG0nSZCx6jiPJKVX1jaq6tveFq+qLSTYcVL4QeHWbvxb4MwZfLLwQuK5d8vulJMclWdvG7qyqR1o/OxmE0Ud7+5EkLY+l9jg+PT+T5BPLsL2TquohgPZ4Yquv49m/7bG31Q5Vf44kW5PsSrLrwIEDy9CqJGkhSwVHhuZPHWMfWaBWi9SfW6y6uqo2V9XmNWvWLGtzkqRnLBUcdYj55+vhdgiK9ri/1fcC64fGnQzsW6QuSZqQpYLjFUkeS/I48PI2/1iSx5M89jy2twOYvzJqC/CZofqb29VVZwOPtkNZNwHnJlndToqf22qSpAlZ9OR4Va16vi+c5KMMTm6fkGQvg6uj3sXg9iVvAb4BvKkNvxG4AJgDvgNc0rb/SJLfBW5r435n/kS5JGkyRv0FwG5VdfEhnnrtAmMLuPQQr7Md2L6MrUmSvgc9P+QkSZLBIUnqY3BIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpy0SCI8meJHcnuTPJrlY7PsnOJPe3x9WtniRXJplLcleSMyfRsyRpYJJ7HD9ZVadX1ea2vA24uao2ATe3ZYDzgU1t2gpcteKdSpKeNk2Hqi4Erm3z1wI/O1S/rga+BByXZO0kGpQkTS44CviTJLcn2dpqJ1XVQwDt8cRWXwc8OLTu3laTJE3AERPa7jlVtS/JicDOJF9bZGwWqNVzBg0CaCvAKaecsjxdSpKeYyJ7HFW1rz3uBz4FnAU8PH8Iqj3ub8P3AuuHVj8Z2LfAa15dVZuravOaNWvG2b4kHdZWPDiSvDjJsfPzwLnAPcAOYEsbtgX4TJvfAby5XV11NvDo/CEtSdLKm8ShqpOATyWZ3/5Hqup/JbkNuCHJW4BvAG9q428ELgDmgO8Al6x8y5KkeSseHFX1APCKBep/Dbx2gXoBl65Aa5KkEUzT5biSpBlgcEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4GhySpi8EhSepicEiSuhgckqQuBockqYvBIUnqYnBIkrocMekG9IwN2z43sW3veddPT2zbkmaLexySpC4GhySpy8wER5LzktyXZC7Jtkn3I0mHq5kIjiSrgPcB5wOnARcnOW2yXUnS4WkmggM4C5irqgeq6gngeuDCCfckSYelWbmqah3w4NDyXuCVE+rl+9Ikr+iaBK8ik56/WQmOLFCrZw1ItgJb2+LfJrnve9jeCcC3vof1J2mWe4cV6j+/N5aX9Z/95Mxy7zA9/f/QKINmJTj2AuuHlk8G9g0PqKqrgauXY2NJdlXV5uV4rZU2y73DbPc/y73DbPc/y73D7PU/K+c4bgM2JdmY5CjgImDHhHuSpMPSTOxxVNWTSS4DbgJWAdur6t4JtyVJh6WZCA6AqroRuHGFNrcsh7wmZJZ7h9nuf5Z7h9nuf5Z7hxnrP1W19ChJkppZOcchSZoSBseQab2tSZLtSfYnuWeodnySnUnub4+rWz1Jrmzv4a4kZw6ts6WNvz/JlhXqfX2SLyTZneTeJG+dlf6TvDDJrUm+0nr/j62+McktrY8/ahdskOTotjzXnt8w9FqXt/p9SV4/7t4Peh+rknw5yWdnrf8ke5LcneTOJLtabeo/O22bxyX5eJKvtc//q2al9yVVldPgcN0q4K+AU4GjgK8Ap026r9bbTwBnAvcM1f4TsK3NbwN+r81fAPwxg+++nA3c0urHAw+0x9VtfvUK9L4WOLPNHwv8JYPbxkx9/62Hl7T5I4FbWk83ABe1+vuBX23zvwa8v81fBPxRmz+tfZ6OBja2z9mqFfz8/DrwEeCzbXlm+gf2ACccVJv6z07b7rXAL7X5o4DjZqX3Jd/bpBuYlgl4FXDT0PLlwOWT7muonw08OzjuA9a2+bXAfW3+A8DFB48DLgY+MFR/1rgVfB+fAV43a/0DxwB3MLhjwbeAIw7+3DC46u9Vbf6INi4Hf5aGx61A3ycDNwOvAT7b+pml/vfw3OCY+s8O8I+Ar9POI89S76NMHqp6xkK3NVk3oV5GcVJVPQTQHk9s9UO9j4m/v3bo4wwGf7nPRP/tMM+dwH5gJ4O/tv+mqp5coI+ne2zPPwr84KR6b/4A+HfAP7TlH2S2+i/gT5LcnsHdIWA2PjunAgeA/94OE/63JC+ekd6XZHA8Y8nbmsyIQ72Pib6/JC8BPgG8raoeW2zoArWJ9V9VT1XV6Qz+cj8LeNkifUxV70n+FbC/qm4fLi/Sy1T135xTVWcyuDP2pUl+YpGx09T/EQwOL19VVWcA32ZwaOpQpqn3JRkcz1jytiZT5uEkawHa4/5WP9T7mNj7S3Ikg9D4cFV9spVnpn+Aqvob4M8YHH8+Lsn8d6CG+3i6x/b8DwCPMLnezwHekGQPgztKv4bBHsis9E9V7WuP+4FPMQjvWfjs7AX2VtUtbfnjDIJkFnpfksHxjFm7rckOYP4Kiy0Mzh3M19/crtI4G3i07RLfBJybZHW7kuPcVhurJAGuAXZX1Xtmqf8ka5Ic1+ZfBPwUsBv4AvBzh+h9/j39HPCnNTgwvQO4qF21tBHYBNw6zt4Bquryqjq5qjYw+Dz/aVX9/Kz0n+TFSY6dn2fw7/weZuCzU1X/B3gwyUtb6bXAV2eh95FM+iTLNE0Mrmz4SwbHsX9r0v0M9fVR4CHg7xn8BfIWBseebwbub4/Ht7Fh8KNXfwXcDWweep1fBObadMkK9f7jDHat7wLubNMFs9A/8HLgy633e4DfbvVTGfyPcw74GHB0q7+wLc+1508deq3fau/pPuD8CXyGXs0zV1XNRP+tz6+06d75/yZn4bPTtnk6sKt9fj7N4Kqomeh9qclvjkuSunioSpLUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSl/8PI9ITMXQHUxUAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# histogram shows the frequency distribution of a single numeric variable\n", "ted.comments.plot(kind='hist')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAD8CAYAAABgmUMCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAExFJREFUeJzt3X+w5XV93/HnS1ZBSJVfiyW7JBfqjtFxkkI3ijFtUzBGIHFJB1odJ27INtuZ0qohM3GxmZK20xmcsSJOO1QCJIu1RkUrW6BhyEqSyR8iizqCIN2NUrgukWtBSESD6Lt/nM+Fw7Lsns/uPfecc+/zMXPmfL+f7+ec7/tzvzu8+P4432+qCkmSRvWiSRcgSZotBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC5rJl3AOJx44ok1Nzc36TIkaabcdddd366qtQfrtyKDY25ujl27dk26DEmaKUn+7yj9PFQlSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6rIifzl+uOa23TyR9T5w+XkTWa8k9XCPQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUZW3AkuS7JI0nuGWo7PsltSXa39+Nae5J8OMmeJF9JcsbQZza3/ruTbB5XvZKk0Yxzj+MPgbfs07YN2FlVG4CdbR7gHGBDe20FroJB0ACXAa8HXgdcthg2kqTJGFtwVNWfA4/u07wJ2N6mtwPnD7VfXwOfB45NcjLwS8BtVfVoVT0G3Mbzw0iStIyW+xzHK6rqYYD2flJrXwc8NNRvvrW9ULskaUKm5eR49tNWB2h//hckW5PsSrJrYWFhSYuTJD1ruYPjW+0QFO39kdY+D5wy1G89sPcA7c9TVVdX1caq2rh27dolL1ySNLDcwbEDWLwyajNw41D7O9vVVWcCj7dDWbcCb05yXDsp/ubWJkmakDXj+uIkHwd+ATgxyTyDq6MuBz6ZZAvwIHBh634LcC6wB3gSuAigqh5N8h+BO1u//1BV+55wlyQto7EFR1W9/QUWnb2fvgVc/ALfcx1w3RKWJkk6DNNyclySNCMMDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldJhIcSX4ryVeT3JPk40mOSnJqkjuS7E7yiSQvaX2PbPN72vK5SdQsSRpY9uBIsg54F7Cxql4LHAG8DXg/cEVVbQAeA7a0j2wBHquqVwJXtH6SpAmZ1KGqNcBLk6wBjgYeBs4CbmjLtwPnt+lNbZ62/OwkWcZaJUlDlj04quqbwAeABxkExuPAXcB3qurp1m0eWNem1wEPtc8+3fqfsO/3JtmaZFeSXQsLC+MdhCStYpM4VHUcg72IU4EfB44BztlP11r8yAGWPdtQdXVVbayqjWvXrl2qciVJ+5jEoao3Ad+oqoWq+gHwGeDngGPboSuA9cDeNj0PnALQlr8ceHR5S5YkLZpEcDwInJnk6Hau4mzgXuB24ILWZzNwY5ve0eZpyz9XVc/b45AkLY9JnOO4g8FJ7i8Cd7cargbeC1ySZA+DcxjXto9cC5zQ2i8Bti13zZKkZ605eJelV1WXAZft0/x14HX76ft94MLlqEuSdHD+clyS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVKXkW5ymOS1VXXPuItZ7ea23TyR9T5w+XkTWa+k2TTqHsd/S/KFJP8qybFjrUiSNNVGCo6q+nngHQyexLcryf9I8otjrUySNJVGPsdRVbuB32XwwKV/DHw4ydeS/NNxFSdJmj4jBUeSn05yBXAfcBbwK1X16jZ9xRjrkyRNmVGfAPhfgN8H3ldV31tsrKq9SX53LJVJkqbSqMFxLvC9qvohQJIXAUdV1ZNV9dGxVSdJmjqjnuP4E+ClQ/NHtzZJ0iozanAcVVV/szjTpo8eT0mSpGk2anB8N8kZizNJ/gHwvQP0lyStUKOe43gP8Kkke9v8ycA/H09JkqRpNlJwVNWdSX4KeBUQ4GtV9YOxViZJmkqj7nEA/Cww1z5zehKq6vqxVCVJmlqj3uTwo8DfA74M/LA1F2BwSNIqM+oex0bgNVVV4yxGkjT9Rr2q6h7g746zEEnSbBh1j+NE4N4kXwD+drGxqt56KCttt2a/Bngtg0NevwHcD3yCwXmUB4B/VlWPJQlwJYNfrz8J/HpVffFQ1itJOnyjBsfvLfF6rwT+uKouSPISBj8mfB+ws6ouT7IN2MbgTrznABva6/XAVe1dkjQBoz6P488Y7AW8uE3fCRzS//UneRnwj4Br23c/VVXfATYB21u37cD5bXoTcH0NfB44NsnJh7JuSdLhG/W26r8J3AB8pDWtAz57iOs8DVgA/iDJl5Jck+QY4BVV9TBAez9paF0PDX1+vrVJkiZg1JPjFwNvBJ6AZx7qdNIBP/HC1gBnAFdV1enAdxkclnoh2U/b867uSrI1ya4kuxYWFg6xNEnSwYwaHH9bVU8tziRZw37+4z2ieWC+qu5o8zcwCJJvLR6Cau+PDPU/Zejz64G97KOqrq6qjVW1ce3atYdYmiTpYEYNjj9L8j7gpe1Z458C/tehrLCq/gp4KMmrWtPZwL3ADmBza9sM3NimdwDvzMCZwOOLh7QkSctv1KuqtgFbgLuBfwncwuBy2kP1b4CPtSuqvg5cxCDEPplkC/AgcGHrewuDS3H3MLgc96LDWK8k6TCNepPDHzF4dOzvL8VKq+rLDH6Nvq+z99O3GJxjkSRNgVHvVfUN9nNOo6pOW/KKJElTredeVYuOYnAY6filL0eSNO1G/QHg/xt6fbOqPgScNebaJElTaNRDVWcMzb6IwR7I3xlLRZKkqTbqoar/PDT9NO0mhEtejSRp6o16VdU/GXchkqTZMOqhqksOtLyqPrg05UiSpl3PVVU/y+BX3AC/Avw5z735oCRpFeh5kNMZVfXXAEl+D/hUVf2LcRUmSZpOo96r6ieAp4bmn2LwpD5J0ioz6h7HR4EvJPmfDH5B/qvA9WOrSpI0tUa9quo/JfnfwD9sTRdV1ZfGV5YkaVqNeqgKBs8Ff6KqrgTmk5w6ppokSVNs1EfHXga8F7i0Nb0Y+O/jKkqSNL1G3eP4VeCtDB7zSlXtxVuOSNKqNGpwPNWei1EASY4ZX0mSpGk2anB8MslHgGOT/CbwJyzRQ50kSbNl1KuqPtCeNf4E8Crg31XVbWOtTJI0lQ4aHEmOAG6tqjcBhoUkrXIHPVRVVT8Enkzy8mWoR5I05Ub95fj3gbuT3Ea7sgqgqt41lqokSVNr1OC4ub0kSavcAYMjyU9U1YNVtX25CpIkTbeDneP47OJEkk+PuRZJ0gw4WHBkaPq0cRYiSZoNBwuOeoFpSdIqdbCT4z+T5AkGex4vbdO0+aqql421OknS1DlgcFTVEctViCRpNvQ8j0OSpMkFR5IjknwpyU1t/tQkdyTZneQTSV7S2o9s83va8rlJ1SxJmuwex7uB+4bm3w9cUVUbgMeALa19C/BYVb0SuKL1kyRNyESCI8l64DzgmjYf4CzghtZlO3B+m97U5mnLz279JUkTMKk9jg8BvwP8qM2fAHynqp5u8/PAuja9DngIoC1/vPWXJE3AsgdHkl8GHqmqu4ab99O1Rlg2/L1bk+xKsmthYWEJKpUk7c8k9jjeCLw1yQPAHzE4RPUhBk8XXLw8eD2wt03PA6cAtOUvBx7d90ur6uqq2lhVG9euXTveEUjSKrbswVFVl1bV+qqaA94GfK6q3gHcDlzQum0GbmzTO9o8bfnn2vPPJUkTME2/43gvcEmSPQzOYVzb2q8FTmjtlwDbJlSfJInRn8cxFlX1p8CftumvA6/bT5/vAxcua2GSpBc0TXsckqQZYHBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4T/R2HpsPctpsntu4HLj9vYuuWdGjc45AkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1WfbgSHJKktuT3Jfkq0ne3dqPT3Jbkt3t/bjWniQfTrInyVeSnLHcNUuSnjWJPY6ngd+uqlcDZwIXJ3kNsA3YWVUbgJ1tHuAcYEN7bQWuWv6SJUmLlj04qurhqvpim/5r4D5gHbAJ2N66bQfOb9ObgOtr4PPAsUlOXuayJUnNRM9xJJkDTgfuAF5RVQ/DIFyAk1q3dcBDQx+bb237ftfWJLuS7FpYWBhn2ZK0qk0sOJL8GPBp4D1V9cSBuu6nrZ7XUHV1VW2sqo1r165dqjIlSfuYSHAkeTGD0PhYVX2mNX9r8RBUe3+ktc8Dpwx9fD2wd7lqlSQ91ySuqgpwLXBfVX1waNEOYHOb3gzcONT+znZ11ZnA44uHtCRJy2/NBNb5RuDXgLuTfLm1vQ+4HPhkki3Ag8CFbdktwLnAHuBJ4KLlLVeSNGzZg6Oq/oL9n7cAOHs//Qu4eKxFSZJG5i/HJUldDA5JUpdJnOOQnjG37eaJrPeBy8+byHqllcA9DklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF26prVZrU7dzBW7pr9rnHIUnqYnBIkroYHJKkLp7jkJaZj8vVrHOPQ5LUxeCQJHUxOCRJXQwOSVIXT45Lq4Q/etRScY9DktRlZvY4krwFuBI4Arimqi6fcEmSRuQlyCvLTARHkiOA/wr8IjAP3JlkR1XdO9nKJE2zSR6em5TlCMtZOVT1OmBPVX29qp4C/gjYNOGaJGlVmpXgWAc8NDQ/39okSctsJg5VAdlPWz2nQ7IV2Npm/ybJ/Ye4rhOBbx/iZ2fVahwzrM5xr8Yxwyoad97/zOShjPknR+k0K8ExD5wyNL8e2DvcoaquBq4+3BUl2VVVGw/3e2bJahwzrM5xr8Yxw+oc9zjHPCuHqu4ENiQ5NclLgLcBOyZckyStSjOxx1FVTyf518CtDC7Hva6qvjrhsiRpVZqJ4ACoqluAW5ZhVYd9uGsGrcYxw+oc92ocM6zOcY9tzKmqg/eSJKmZlXMckqQpYXA0Sd6S5P4ke5Jsm3Q9SynJKUluT3Jfkq8meXdrPz7JbUl2t/fjWnuSfLj9Lb6S5IzJjuDQJTkiyZeS3NTmT01yRxvzJ9rFFiQ5ss3vacvnJln34UhybJIbknytbfM3rPRtneS32r/te5J8PMlRK3FbJ7kuySNJ7hlq6962STa3/ruTbO6tw+DgObc0OQd4DfD2JK+ZbFVL6mngt6vq1cCZwMVtfNuAnVW1AdjZ5mHwd9jQXluBq5a/5CXzbuC+ofn3A1e0MT8GbGntW4DHquqVwBWt36y6Evjjqvop4GcYjH/Fbusk64B3ARur6rUMLqB5GytzW/8h8JZ92rq2bZLjgcuA1zO4K8dli2Ezsqpa9S/gDcCtQ/OXApdOuq4xjvdGBvf9uh84ubWdDNzfpj8CvH2o/zP9ZunF4Pc+O4GzgJsY/JD028Cafbc7gyv23tCm17R+mfQYDmHMLwO+sW/tK3lb8+ydJY5v2+4m4JdW6rYG5oB7DnXbAm8HPjLU/px+o7zc4xhYNbc0abvlpwN3AK+oqocB2vtJrdtK+Xt8CPgd4Edt/gTgO1X1dJsfHtczY27LH2/9Z81pwALwB+0Q3TVJjmEFb+uq+ibwAeBB4GEG2+4uVv62XtS7bQ97mxscAwe9pclKkOTHgE8D76mqJw7UdT9tM/X3SPLLwCNVdddw83661gjLZska4Azgqqo6Hfguzx662J+ZH3c7zLIJOBX4ceAYBodp9rXStvXBvNA4D3v8BsfAQW9pMuuSvJhBaHysqj7Tmr+V5OS2/GTgkda+Ev4ebwTemuQBBndTPovBHsixSRZ/vzQ8rmfG3Ja/HHh0OQteIvPAfFXd0eZvYBAkK3lbvwn4RlUtVNUPgM8AP8fK39aLerftYW9zg2NgRd/SJEmAa4H7quqDQ4t2AItXVGxmcO5jsf2d7aqMM4HHF3eFZ0VVXVpV66tqjsH2/FxVvQO4Hbigddt3zIt/iwta/5n7v9Cq+ivgoSSvak1nA/eygrc1g0NUZyY5uv1bXxzzit7WQ3q37a3Am5Mc1/bW3tzaRjfpEz3T8gLOBf4P8JfAv510PUs8tp9nsCv6FeDL7XUug+O6O4Hd7f341j8MrjL7S+BuBlerTHwchzH+XwBuatOnAV8A9gCfAo5s7Ue1+T1t+WmTrvswxvv3gV1te38WOG6lb2vg3wNfA+4BPgocuRK3NfBxBudxfsBgz2HLoWxb4Dfa+PcAF/XW4S/HJUldPFQlSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKnL/wfyZ+IhUPrvsAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# modify the plot to be more informative\n", "ted[ted.comments < 1000].comments.plot(kind='hist')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(32, 19)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check how many observations we removed from the plot\n", "ted[ted.comments >= 1000].shape" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAD8CAYAAABgmUMCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAExFJREFUeJzt3X+w5XV93/HnS1ZBSJVfiyW7JBfqjtFxkkI3ijFtUzBGIHFJB1odJ27INtuZ0qohM3GxmZK20xmcsSJOO1QCJIu1RkUrW6BhyEqSyR8iizqCIN2NUrgukWtBSESD6Lt/nM+Fw7Lsns/uPfecc+/zMXPmfL+f7+ec7/tzvzu8+P4432+qCkmSRvWiSRcgSZotBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC5rJl3AOJx44ok1Nzc36TIkaabcdddd366qtQfrtyKDY25ujl27dk26DEmaKUn+7yj9PFQlSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6rIifzl+uOa23TyR9T5w+XkTWa8k9XCPQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUZW3AkuS7JI0nuGWo7PsltSXa39+Nae5J8OMmeJF9JcsbQZza3/ruTbB5XvZKk0Yxzj+MPgbfs07YN2FlVG4CdbR7gHGBDe20FroJB0ACXAa8HXgdcthg2kqTJGFtwVNWfA4/u07wJ2N6mtwPnD7VfXwOfB45NcjLwS8BtVfVoVT0G3Mbzw0iStIyW+xzHK6rqYYD2flJrXwc8NNRvvrW9ULskaUKm5eR49tNWB2h//hckW5PsSrJrYWFhSYuTJD1ruYPjW+0QFO39kdY+D5wy1G89sPcA7c9TVVdX1caq2rh27dolL1ySNLDcwbEDWLwyajNw41D7O9vVVWcCj7dDWbcCb05yXDsp/ubWJkmakDXj+uIkHwd+ATgxyTyDq6MuBz6ZZAvwIHBh634LcC6wB3gSuAigqh5N8h+BO1u//1BV+55wlyQto7EFR1W9/QUWnb2fvgVc/ALfcx1w3RKWJkk6DNNyclySNCMMDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldJhIcSX4ryVeT3JPk40mOSnJqkjuS7E7yiSQvaX2PbPN72vK5SdQsSRpY9uBIsg54F7Cxql4LHAG8DXg/cEVVbQAeA7a0j2wBHquqVwJXtH6SpAmZ1KGqNcBLk6wBjgYeBs4CbmjLtwPnt+lNbZ62/OwkWcZaJUlDlj04quqbwAeABxkExuPAXcB3qurp1m0eWNem1wEPtc8+3fqfsO/3JtmaZFeSXQsLC+MdhCStYpM4VHUcg72IU4EfB44BztlP11r8yAGWPdtQdXVVbayqjWvXrl2qciVJ+5jEoao3Ad+oqoWq+gHwGeDngGPboSuA9cDeNj0PnALQlr8ceHR5S5YkLZpEcDwInJnk6Hau4mzgXuB24ILWZzNwY5ve0eZpyz9XVc/b45AkLY9JnOO4g8FJ7i8Cd7cargbeC1ySZA+DcxjXto9cC5zQ2i8Bti13zZKkZ605eJelV1WXAZft0/x14HX76ft94MLlqEuSdHD+clyS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVKXkW5ymOS1VXXPuItZ7ea23TyR9T5w+XkTWa+k2TTqHsd/S/KFJP8qybFjrUiSNNVGCo6q+nngHQyexLcryf9I8otjrUySNJVGPsdRVbuB32XwwKV/DHw4ydeS/NNxFSdJmj4jBUeSn05yBXAfcBbwK1X16jZ9xRjrkyRNmVGfAPhfgN8H3ldV31tsrKq9SX53LJVJkqbSqMFxLvC9qvohQJIXAUdV1ZNV9dGxVSdJmjqjnuP4E+ClQ/NHtzZJ0iozanAcVVV/szjTpo8eT0mSpGk2anB8N8kZizNJ/gHwvQP0lyStUKOe43gP8Kkke9v8ycA/H09JkqRpNlJwVNWdSX4KeBUQ4GtV9YOxViZJmkqj7nEA/Cww1z5zehKq6vqxVCVJmlqj3uTwo8DfA74M/LA1F2BwSNIqM+oex0bgNVVV4yxGkjT9Rr2q6h7g746zEEnSbBh1j+NE4N4kXwD+drGxqt56KCttt2a/Bngtg0NevwHcD3yCwXmUB4B/VlWPJQlwJYNfrz8J/HpVffFQ1itJOnyjBsfvLfF6rwT+uKouSPISBj8mfB+ws6ouT7IN2MbgTrznABva6/XAVe1dkjQBoz6P488Y7AW8uE3fCRzS//UneRnwj4Br23c/VVXfATYB21u37cD5bXoTcH0NfB44NsnJh7JuSdLhG/W26r8J3AB8pDWtAz57iOs8DVgA/iDJl5Jck+QY4BVV9TBAez9paF0PDX1+vrVJkiZg1JPjFwNvBJ6AZx7qdNIBP/HC1gBnAFdV1enAdxkclnoh2U/b867uSrI1ya4kuxYWFg6xNEnSwYwaHH9bVU8tziRZw37+4z2ieWC+qu5o8zcwCJJvLR6Cau+PDPU/Zejz64G97KOqrq6qjVW1ce3atYdYmiTpYEYNjj9L8j7gpe1Z458C/tehrLCq/gp4KMmrWtPZwL3ADmBza9sM3NimdwDvzMCZwOOLh7QkSctv1KuqtgFbgLuBfwncwuBy2kP1b4CPtSuqvg5cxCDEPplkC/AgcGHrewuDS3H3MLgc96LDWK8k6TCNepPDHzF4dOzvL8VKq+rLDH6Nvq+z99O3GJxjkSRNgVHvVfUN9nNOo6pOW/KKJElTredeVYuOYnAY6filL0eSNO1G/QHg/xt6fbOqPgScNebaJElTaNRDVWcMzb6IwR7I3xlLRZKkqTbqoar/PDT9NO0mhEtejSRp6o16VdU/GXchkqTZMOqhqksOtLyqPrg05UiSpl3PVVU/y+BX3AC/Avw5z735oCRpFeh5kNMZVfXXAEl+D/hUVf2LcRUmSZpOo96r6ieAp4bmn2LwpD5J0ioz6h7HR4EvJPmfDH5B/qvA9WOrSpI0tUa9quo/JfnfwD9sTRdV1ZfGV5YkaVqNeqgKBs8Ff6KqrgTmk5w6ppokSVNs1EfHXga8F7i0Nb0Y+O/jKkqSNL1G3eP4VeCtDB7zSlXtxVuOSNKqNGpwPNWei1EASY4ZX0mSpGk2anB8MslHgGOT/CbwJyzRQ50kSbNl1KuqPtCeNf4E8Crg31XVbWOtTJI0lQ4aHEmOAG6tqjcBhoUkrXIHPVRVVT8Enkzy8mWoR5I05Ub95fj3gbuT3Ea7sgqgqt41lqokSVNr1OC4ub0kSavcAYMjyU9U1YNVtX25CpIkTbeDneP47OJEkk+PuRZJ0gw4WHBkaPq0cRYiSZoNBwuOeoFpSdIqdbCT4z+T5AkGex4vbdO0+aqql421OknS1DlgcFTVEctViCRpNvQ8j0OSpMkFR5IjknwpyU1t/tQkdyTZneQTSV7S2o9s83va8rlJ1SxJmuwex7uB+4bm3w9cUVUbgMeALa19C/BYVb0SuKL1kyRNyESCI8l64DzgmjYf4CzghtZlO3B+m97U5mnLz279JUkTMKk9jg8BvwP8qM2fAHynqp5u8/PAuja9DngIoC1/vPWXJE3AsgdHkl8GHqmqu4ab99O1Rlg2/L1bk+xKsmthYWEJKpUk7c8k9jjeCLw1yQPAHzE4RPUhBk8XXLw8eD2wt03PA6cAtOUvBx7d90ur6uqq2lhVG9euXTveEUjSKrbswVFVl1bV+qqaA94GfK6q3gHcDlzQum0GbmzTO9o8bfnn2vPPJUkTME2/43gvcEmSPQzOYVzb2q8FTmjtlwDbJlSfJInRn8cxFlX1p8CftumvA6/bT5/vAxcua2GSpBc0TXsckqQZYHBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4T/R2HpsPctpsntu4HLj9vYuuWdGjc45AkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1WfbgSHJKktuT3Jfkq0ne3dqPT3Jbkt3t/bjWniQfTrInyVeSnLHcNUuSnjWJPY6ngd+uqlcDZwIXJ3kNsA3YWVUbgJ1tHuAcYEN7bQWuWv6SJUmLlj04qurhqvpim/5r4D5gHbAJ2N66bQfOb9ObgOtr4PPAsUlOXuayJUnNRM9xJJkDTgfuAF5RVQ/DIFyAk1q3dcBDQx+bb237ftfWJLuS7FpYWBhn2ZK0qk0sOJL8GPBp4D1V9cSBuu6nrZ7XUHV1VW2sqo1r165dqjIlSfuYSHAkeTGD0PhYVX2mNX9r8RBUe3+ktc8Dpwx9fD2wd7lqlSQ91ySuqgpwLXBfVX1waNEOYHOb3gzcONT+znZ11ZnA44uHtCRJy2/NBNb5RuDXgLuTfLm1vQ+4HPhkki3Ag8CFbdktwLnAHuBJ4KLlLVeSNGzZg6Oq/oL9n7cAOHs//Qu4eKxFSZJG5i/HJUldDA5JUpdJnOOQnjG37eaJrPeBy8+byHqllcA9DklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF26prVZrU7dzBW7pr9rnHIUnqYnBIkroYHJKkLp7jkJaZj8vVrHOPQ5LUxeCQJHUxOCRJXQwOSVIXT45Lq4Q/etRScY9DktRlZvY4krwFuBI4Arimqi6fcEmSRuQlyCvLTARHkiOA/wr8IjAP3JlkR1XdO9nKJE2zSR6em5TlCMtZOVT1OmBPVX29qp4C/gjYNOGaJGlVmpXgWAc8NDQ/39okSctsJg5VAdlPWz2nQ7IV2Npm/ybJ/Ye4rhOBbx/iZ2fVahwzrM5xr8Yxwyoad97/zOShjPknR+k0K8ExD5wyNL8e2DvcoaquBq4+3BUl2VVVGw/3e2bJahwzrM5xr8Yxw+oc9zjHPCuHqu4ENiQ5NclLgLcBOyZckyStSjOxx1FVTyf518CtDC7Hva6qvjrhsiRpVZqJ4ACoqluAW5ZhVYd9uGsGrcYxw+oc92ocM6zOcY9tzKmqg/eSJKmZlXMckqQpYXA0Sd6S5P4ke5Jsm3Q9SynJKUluT3Jfkq8meXdrPz7JbUl2t/fjWnuSfLj9Lb6S5IzJjuDQJTkiyZeS3NTmT01yRxvzJ9rFFiQ5ss3vacvnJln34UhybJIbknytbfM3rPRtneS32r/te5J8PMlRK3FbJ7kuySNJ7hlq6962STa3/ruTbO6tw+DgObc0OQd4DfD2JK+ZbFVL6mngt6vq1cCZwMVtfNuAnVW1AdjZ5mHwd9jQXluBq5a/5CXzbuC+ofn3A1e0MT8GbGntW4DHquqVwBWt36y6Evjjqvop4GcYjH/Fbusk64B3ARur6rUMLqB5GytzW/8h8JZ92rq2bZLjgcuA1zO4K8dli2Ezsqpa9S/gDcCtQ/OXApdOuq4xjvdGBvf9uh84ubWdDNzfpj8CvH2o/zP9ZunF4Pc+O4GzgJsY/JD028Cafbc7gyv23tCm17R+mfQYDmHMLwO+sW/tK3lb8+ydJY5v2+4m4JdW6rYG5oB7DnXbAm8HPjLU/px+o7zc4xhYNbc0abvlpwN3AK+oqocB2vtJrdtK+Xt8CPgd4Edt/gTgO1X1dJsfHtczY27LH2/9Z81pwALwB+0Q3TVJjmEFb+uq+ibwAeBB4GEG2+4uVv62XtS7bQ97mxscAwe9pclKkOTHgE8D76mqJw7UdT9tM/X3SPLLwCNVdddw83661gjLZska4Azgqqo6Hfguzx662J+ZH3c7zLIJOBX4ceAYBodp9rXStvXBvNA4D3v8BsfAQW9pMuuSvJhBaHysqj7Tmr+V5OS2/GTgkda+Ev4ebwTemuQBBndTPovBHsixSRZ/vzQ8rmfG3Ja/HHh0OQteIvPAfFXd0eZvYBAkK3lbvwn4RlUtVNUPgM8AP8fK39aLerftYW9zg2NgRd/SJEmAa4H7quqDQ4t2AItXVGxmcO5jsf2d7aqMM4HHF3eFZ0VVXVpV66tqjsH2/FxVvQO4Hbigddt3zIt/iwta/5n7v9Cq+ivgoSSvak1nA/eygrc1g0NUZyY5uv1bXxzzit7WQ3q37a3Am5Mc1/bW3tzaRjfpEz3T8gLOBf4P8JfAv510PUs8tp9nsCv6FeDL7XUug+O6O4Hd7f341j8MrjL7S+BuBlerTHwchzH+XwBuatOnAV8A9gCfAo5s7Ue1+T1t+WmTrvswxvv3gV1te38WOG6lb2vg3wNfA+4BPgocuRK3NfBxBudxfsBgz2HLoWxb4Dfa+PcAF/XW4S/HJUldPFQlSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKnL/wfyZ+IhUPrvsAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# can also write this using the query method\n", "ted.query('comments < 1000').comments.plot(kind='hist')" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAD8CAYAAABgmUMCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAExFJREFUeJzt3X+w5XV93/HnS1ZBSJVfiyW7JBfqjtFxkkI3ijFtUzBGIHFJB1odJ27INtuZ0qohM3GxmZK20xmcsSJOO1QCJIu1RkUrW6BhyEqSyR8iizqCIN2NUrgukWtBSESD6Lt/nM+Fw7Lsns/uPfecc+/zMXPmfL+f7+ec7/tzvzu8+P4432+qCkmSRvWiSRcgSZotBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC5rJl3AOJx44ok1Nzc36TIkaabcdddd366qtQfrtyKDY25ujl27dk26DEmaKUn+7yj9PFQlSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKmLwSFJ6rIifzl+uOa23TyR9T5w+XkTWa8k9XCPQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUZW3AkuS7JI0nuGWo7PsltSXa39+Nae5J8OMmeJF9JcsbQZza3/ruTbB5XvZKk0Yxzj+MPgbfs07YN2FlVG4CdbR7gHGBDe20FroJB0ACXAa8HXgdcthg2kqTJGFtwVNWfA4/u07wJ2N6mtwPnD7VfXwOfB45NcjLwS8BtVfVoVT0G3Mbzw0iStIyW+xzHK6rqYYD2flJrXwc8NNRvvrW9ULskaUKm5eR49tNWB2h//hckW5PsSrJrYWFhSYuTJD1ruYPjW+0QFO39kdY+D5wy1G89sPcA7c9TVVdX1caq2rh27dolL1ySNLDcwbEDWLwyajNw41D7O9vVVWcCj7dDWbcCb05yXDsp/ubWJkmakDXj+uIkHwd+ATgxyTyDq6MuBz6ZZAvwIHBh634LcC6wB3gSuAigqh5N8h+BO1u//1BV+55wlyQto7EFR1W9/QUWnb2fvgVc/ALfcx1w3RKWJkk6DNNyclySNCMMDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldJhIcSX4ryVeT3JPk40mOSnJqkjuS7E7yiSQvaX2PbPN72vK5SdQsSRpY9uBIsg54F7Cxql4LHAG8DXg/cEVVbQAeA7a0j2wBHquqVwJXtH6SpAmZ1KGqNcBLk6wBjgYeBs4CbmjLtwPnt+lNbZ62/OwkWcZaJUlDlj04quqbwAeABxkExuPAXcB3qurp1m0eWNem1wEPtc8+3fqfsO/3JtmaZFeSXQsLC+MdhCStYpM4VHUcg72IU4EfB44BztlP11r8yAGWPdtQdXVVbayqjWvXrl2qciVJ+5jEoao3Ad+oqoWq+gHwGeDngGPboSuA9cDeNj0PnALQlr8ceHR5S5YkLZpEcDwInJnk6Hau4mzgXuB24ILWZzNwY5ve0eZpyz9XVc/b45AkLY9JnOO4g8FJ7i8Cd7cargbeC1ySZA+DcxjXto9cC5zQ2i8Bti13zZKkZ605eJelV1WXAZft0/x14HX76ft94MLlqEuSdHD+clyS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVKXkW5ymOS1VXXPuItZ7ea23TyR9T5w+XkTWa+k2TTqHsd/S/KFJP8qybFjrUiSNNVGCo6q+nngHQyexLcryf9I8otjrUySNJVGPsdRVbuB32XwwKV/DHw4ydeS/NNxFSdJmj4jBUeSn05yBXAfcBbwK1X16jZ9xRjrkyRNmVGfAPhfgN8H3ldV31tsrKq9SX53LJVJkqbSqMFxLvC9qvohQJIXAUdV1ZNV9dGxVSdJmjqjnuP4E+ClQ/NHtzZJ0iozanAcVVV/szjTpo8eT0mSpGk2anB8N8kZizNJ/gHwvQP0lyStUKOe43gP8Kkke9v8ycA/H09JkqRpNlJwVNWdSX4KeBUQ4GtV9YOxViZJmkqj7nEA/Cww1z5zehKq6vqxVCVJmlqj3uTwo8DfA74M/LA1F2BwSNIqM+oex0bgNVVV4yxGkjT9Rr2q6h7g746zEEnSbBh1j+NE4N4kXwD+drGxqt56KCttt2a/Bngtg0NevwHcD3yCwXmUB4B/VlWPJQlwJYNfrz8J/HpVffFQ1itJOnyjBsfvLfF6rwT+uKouSPISBj8mfB+ws6ouT7IN2MbgTrznABva6/XAVe1dkjQBoz6P488Y7AW8uE3fCRzS//UneRnwj4Br23c/VVXfATYB21u37cD5bXoTcH0NfB44NsnJh7JuSdLhG/W26r8J3AB8pDWtAz57iOs8DVgA/iDJl5Jck+QY4BVV9TBAez9paF0PDX1+vrVJkiZg1JPjFwNvBJ6AZx7qdNIBP/HC1gBnAFdV1enAdxkclnoh2U/b867uSrI1ya4kuxYWFg6xNEnSwYwaHH9bVU8tziRZw37+4z2ieWC+qu5o8zcwCJJvLR6Cau+PDPU/Zejz64G97KOqrq6qjVW1ce3atYdYmiTpYEYNjj9L8j7gpe1Z458C/tehrLCq/gp4KMmrWtPZwL3ADmBza9sM3NimdwDvzMCZwOOLh7QkSctv1KuqtgFbgLuBfwncwuBy2kP1b4CPtSuqvg5cxCDEPplkC/AgcGHrewuDS3H3MLgc96LDWK8k6TCNepPDHzF4dOzvL8VKq+rLDH6Nvq+z99O3GJxjkSRNgVHvVfUN9nNOo6pOW/KKJElTredeVYuOYnAY6filL0eSNO1G/QHg/xt6fbOqPgScNebaJElTaNRDVWcMzb6IwR7I3xlLRZKkqTbqoar/PDT9NO0mhEtejSRp6o16VdU/GXchkqTZMOqhqksOtLyqPrg05UiSpl3PVVU/y+BX3AC/Avw5z735oCRpFeh5kNMZVfXXAEl+D/hUVf2LcRUmSZpOo96r6ieAp4bmn2LwpD5J0ioz6h7HR4EvJPmfDH5B/qvA9WOrSpI0tUa9quo/JfnfwD9sTRdV1ZfGV5YkaVqNeqgKBs8Ff6KqrgTmk5w6ppokSVNs1EfHXga8F7i0Nb0Y+O/jKkqSNL1G3eP4VeCtDB7zSlXtxVuOSNKqNGpwPNWei1EASY4ZX0mSpGk2anB8MslHgGOT/CbwJyzRQ50kSbNl1KuqPtCeNf4E8Crg31XVbWOtTJI0lQ4aHEmOAG6tqjcBhoUkrXIHPVRVVT8Enkzy8mWoR5I05Ub95fj3gbuT3Ea7sgqgqt41lqokSVNr1OC4ub0kSavcAYMjyU9U1YNVtX25CpIkTbeDneP47OJEkk+PuRZJ0gw4WHBkaPq0cRYiSZoNBwuOeoFpSdIqdbCT4z+T5AkGex4vbdO0+aqql421OknS1DlgcFTVEctViCRpNvQ8j0OSpMkFR5IjknwpyU1t/tQkdyTZneQTSV7S2o9s83va8rlJ1SxJmuwex7uB+4bm3w9cUVUbgMeALa19C/BYVb0SuKL1kyRNyESCI8l64DzgmjYf4CzghtZlO3B+m97U5mnLz279JUkTMKk9jg8BvwP8qM2fAHynqp5u8/PAuja9DngIoC1/vPWXJE3AsgdHkl8GHqmqu4ab99O1Rlg2/L1bk+xKsmthYWEJKpUk7c8k9jjeCLw1yQPAHzE4RPUhBk8XXLw8eD2wt03PA6cAtOUvBx7d90ur6uqq2lhVG9euXTveEUjSKrbswVFVl1bV+qqaA94GfK6q3gHcDlzQum0GbmzTO9o8bfnn2vPPJUkTME2/43gvcEmSPQzOYVzb2q8FTmjtlwDbJlSfJInRn8cxFlX1p8CftumvA6/bT5/vAxcua2GSpBc0TXsckqQZYHBIkroYHJKkLgaHJKmLwSFJ6mJwSJK6GBySpC4T/R2HpsPctpsntu4HLj9vYuuWdGjc45AkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1WfbgSHJKktuT3Jfkq0ne3dqPT3Jbkt3t/bjWniQfTrInyVeSnLHcNUuSnjWJPY6ngd+uqlcDZwIXJ3kNsA3YWVUbgJ1tHuAcYEN7bQWuWv6SJUmLlj04qurhqvpim/5r4D5gHbAJ2N66bQfOb9ObgOtr4PPAsUlOXuayJUnNRM9xJJkDTgfuAF5RVQ/DIFyAk1q3dcBDQx+bb237ftfWJLuS7FpYWBhn2ZK0qk0sOJL8GPBp4D1V9cSBuu6nrZ7XUHV1VW2sqo1r165dqjIlSfuYSHAkeTGD0PhYVX2mNX9r8RBUe3+ktc8Dpwx9fD2wd7lqlSQ91ySuqgpwLXBfVX1waNEOYHOb3gzcONT+znZ11ZnA44uHtCRJy2/NBNb5RuDXgLuTfLm1vQ+4HPhkki3Ag8CFbdktwLnAHuBJ4KLlLVeSNGzZg6Oq/oL9n7cAOHs//Qu4eKxFSZJG5i/HJUldDA5JUpdJnOOQnjG37eaJrPeBy8+byHqllcA9DklSF4NDktTF4JAkdTE4JEldDA5JUheDQ5LUxeCQJHUxOCRJXQwOSVIXg0OS1MXgkCR1MTgkSV0MDklSF4NDktTF26prVZrU7dzBW7pr9rnHIUnqYnBIkroYHJKkLp7jkJaZj8vVrHOPQ5LUxeCQJHUxOCRJXQwOSVIXT45Lq4Q/etRScY9DktRlZvY4krwFuBI4Arimqi6fcEmSRuQlyCvLTARHkiOA/wr8IjAP3JlkR1XdO9nKJE2zSR6em5TlCMtZOVT1OmBPVX29qp4C/gjYNOGaJGlVmpXgWAc8NDQ/39okSctsJg5VAdlPWz2nQ7IV2Npm/ybJ/Ye4rhOBbx/iZ2fVahwzrM5xr8Yxwyoad97/zOShjPknR+k0K8ExD5wyNL8e2DvcoaquBq4+3BUl2VVVGw/3e2bJahwzrM5xr8Yxw+oc9zjHPCuHqu4ENiQ5NclLgLcBOyZckyStSjOxx1FVTyf518CtDC7Hva6qvjrhsiRpVZqJ4ACoqluAW5ZhVYd9uGsGrcYxw+oc92ocM6zOcY9tzKmqg/eSJKmZlXMckqQpYXA0Sd6S5P4ke5Jsm3Q9SynJKUluT3Jfkq8meXdrPz7JbUl2t/fjWnuSfLj9Lb6S5IzJjuDQJTkiyZeS3NTmT01yRxvzJ9rFFiQ5ss3vacvnJln34UhybJIbknytbfM3rPRtneS32r/te5J8PMlRK3FbJ7kuySNJ7hlq6962STa3/ruTbO6tw+DgObc0OQd4DfD2JK+ZbFVL6mngt6vq1cCZwMVtfNuAnVW1AdjZ5mHwd9jQXluBq5a/5CXzbuC+ofn3A1e0MT8GbGntW4DHquqVwBWt36y6Evjjqvop4GcYjH/Fbusk64B3ARur6rUMLqB5GytzW/8h8JZ92rq2bZLjgcuA1zO4K8dli2Ezsqpa9S/gDcCtQ/OXApdOuq4xjvdGBvf9uh84ubWdDNzfpj8CvH2o/zP9ZunF4Pc+O4GzgJsY/JD028Cafbc7gyv23tCm17R+mfQYDmHMLwO+sW/tK3lb8+ydJY5v2+4m4JdW6rYG5oB7DnXbAm8HPjLU/px+o7zc4xhYNbc0abvlpwN3AK+oqocB2vtJrdtK+Xt8CPgd4Edt/gTgO1X1dJsfHtczY27LH2/9Z81pwALwB+0Q3TVJjmEFb+uq+ibwAeBB4GEG2+4uVv62XtS7bQ97mxscAwe9pclKkOTHgE8D76mqJw7UdT9tM/X3SPLLwCNVdddw83661gjLZska4Azgqqo6Hfguzx662J+ZH3c7zLIJOBX4ceAYBodp9rXStvXBvNA4D3v8BsfAQW9pMuuSvJhBaHysqj7Tmr+V5OS2/GTgkda+Ev4ebwTemuQBBndTPovBHsixSRZ/vzQ8rmfG3Ja/HHh0OQteIvPAfFXd0eZvYBAkK3lbvwn4RlUtVNUPgM8AP8fK39aLerftYW9zg2NgRd/SJEmAa4H7quqDQ4t2AItXVGxmcO5jsf2d7aqMM4HHF3eFZ0VVXVpV66tqjsH2/FxVvQO4Hbigddt3zIt/iwta/5n7v9Cq+ivgoSSvak1nA/eygrc1g0NUZyY5uv1bXxzzit7WQ3q37a3Am5Mc1/bW3tzaRjfpEz3T8gLOBf4P8JfAv510PUs8tp9nsCv6FeDL7XUug+O6O4Hd7f341j8MrjL7S+BuBlerTHwchzH+XwBuatOnAV8A9gCfAo5s7Ue1+T1t+WmTrvswxvv3gV1te38WOG6lb2vg3wNfA+4BPgocuRK3NfBxBudxfsBgz2HLoWxb4Dfa+PcAF/XW4S/HJUldPFQlSepicEiSuhgckqQuBockqYvBIUnqYnBIkroYHJKkLgaHJKnL/wfyZ+IhUPrvsAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# can also write this using the loc accessor\n", "ted.loc[ted.comments < 1000, 'comments'].plot(kind='hist')" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAD8CAYAAABthzNFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAE/NJREFUeJzt3X+w5XV93/HnS1AQEl2QC93uLrlQd4xMpsr2VteStgaM5Ufqmg40Gqfu0G226ZBGa2biYjM1mWlncCYVZJIhEjEuVEXEH2yBanDFZDpT0WWggILdFSl7s4RdIyxRVETf/eN8Lp4u3909d/d+77k/no+ZM+f7/Xw/33Pen/vd4cX350lVIUnSgV4w7gIkSQuTASFJ6mRASJI6GRCSpE4GhCSpkwEhSepkQEiSOhkQkqROBoQkqdOx4y7gaJxyyik1OTk57jIkaVG5++67v11VE4frt6gDYnJykh07doy7DElaVJL831H6eYhJktTJgJAkdTIgJEmdDAhJUicDQpLUyYCQJHUyICRJnQwISVKnXgMiyYokNyd5KMmDSV6X5OQkdyTZ2d5Pan2T5Ooku5Lcl2Rdn7VJkg6t7zupPwB8rqouTvIi4ATgPcD2qroiyRZgC/Bu4AJgbXu9FrimvS9Ik1tuO+J1H7niojmsRJL60dseRJKXAP8EuA6gqp6pqieBDcDW1m0r8OY2vQG4vga+DKxIsrKv+iRJh9bnIaYzgX3AnyW5J8mHkpwInFZVjwG091Nb/1XA7qH1p1ubJGkM+gyIY4F1wDVVdTbwPQaHkw4mHW31vE7J5iQ7kuzYt2/f3FQqSXqePgNiGpiuqrva/M0MAuPxmUNH7X3vUP81Q+uvBvYc+KFVdW1VTVXV1MTEYZ9WK0k6Qr0FRFX9NbA7ySta03nA14FtwMbWthG4pU1vA97ermZaD+yfORQlSZp/fV/F9O+Bj7YrmB4GLmUQSjcl2QQ8ClzS+t4OXAjsAp5ufSVJY9JrQFTVvcBUx6LzOvoWcFmf9UiSRued1JKkTgaEJKmTASFJ6mRASJI6GRCSpE4GhCSpkwEhSepkQEiSOhkQkqROBoQkqZMBIUnqZEBIkjoZEJKkTgaEJKmTASFJ6mRASJI6GRCSpE4GhCSpkwEhSepkQEiSOhkQkqROBoQkqZMBIUnqZEBIkjoZEJKkTr0GRJJHktyf5N4kO1rbyUnuSLKzvZ/U2pPk6iS7ktyXZF2ftUmSDm0+9iB+qapeXVVTbX4LsL2q1gLb2zzABcDa9toMXDMPtUmSDmIch5g2AFvb9FbgzUPt19fAl4EVSVaOoT5JEv0HRAF/nuTuJJtb22lV9RhAez+1ta8Cdg+tO93aJEljcGzPn39OVe1JcipwR5KHDtE3HW31vE6DoNkMcPrpp89NlZKk5+l1D6Kq9rT3vcBngNcAj88cOmrve1v3aWDN0OqrgT0dn3ltVU1V1dTExESf5UvSstZbQCQ5McnPzkwDbwQeALYBG1u3jcAtbXob8PZ2NdN6YP/MoShJ0vzr8xDTacBnksx8z8eq6nNJvgrclGQT8ChwSet/O3AhsAt4Gri0x9qY3HJbnx8vSYtebwFRVQ8Dr+po/xvgvI72Ai7rqx5J0ux4J7UkqZMBIUnqZEBIkjoZEJKkTgaEJKmTASFJ6mRASJI6GRCSpE4GhCSpkwEhSepkQEiSOhkQkqROBoQkqZMBIUnq1PdPjqrD0fwWxSNXXDSHlUjSwbkHIUnqZEBIkjoZEJKkTgaEJKmTASFJ6mRASJI6GRCSpE4GhCSpkwEhSepkQEiSOvUeEEmOSXJPklvb/BlJ7kqyM8knkryotR/X5ne15ZN91yZJOrj52IN4B/Dg0Pz7gCurai3wBLCptW8CnqiqlwNXtn6SpDHpNSCSrAYuAj7U5gOcC9zcumwF3tymN7R52vLzWn9J0hj0vQdxFfC7wE/a/MuAJ6vq2TY/Daxq06uA3QBt+f7WX5I0Br0FRJJfAfZW1d3DzR1da4Rlw5+7OcmOJDv27ds3B5VKkrr0uQdxDvCmJI8ANzI4tHQVsCLJzO9QrAb2tOlpYA1AW/5S4DsHfmhVXVtVU1U1NTEx0WP5krS89RYQVXV5Va2uqkngLcAXq+ptwJ3Axa3bRuCWNr2tzdOWf7GqnrcHIUmaHyMFRJJfmMPvfDfwriS7GJxjuK61Xwe8rLW/C9gyh98pSZqlUX9y9E/a/QofAT5WVU/O5kuq6kvAl9r0w8BrOvr8ALhkNp8rSerPSHsQVfWLwNsYnCPYkeRjSX6518okSWM18jmIqtoJ/B6DQ0T/FLg6yUNJ/kVfxUmSxmfUcxB/P8mVDO6IPhf451X1yjZ9ZY/1SZLGZNRzEH8E/Cnwnqr6/kxjVe1J8nu9VCZJGqtRA+JC4PtV9WOAJC8Ajq+qp6vqht6qkySNzajnIL4AvHho/oTWJklaokYNiOOr6rszM236hH5KkiQtBKMGxPeSrJuZSfIPgO8for8kaZEb9RzEO4FPJpl5btJK4Nf6KUmStBCMFBBV9dUkPw+8gsFTVx+qqh/1WpkkaaxG3YMA+IfAZFvn7CRU1fW9VCVJGruRAiLJDcDfA+4FftyaCzAgJGmJGnUPYgo4y8dvS9LyMepVTA8Af6fPQiRJC8uoexCnAF9P8hXghzONVfWmXqqSJI3dqAHx+30WIUlaeEa9zPUvkvwcsLaqvpDkBOCYfkuTJI3TqI/7/g3gZuCDrWkV8Nm+ipIkjd+oJ6kvA84BnoLnfjzo1L6KkiSN36gB8cOqemZmJsmxDO6DkCQtUaMGxF8keQ/w4vZb1J8E/nt/ZUmSxm3UgNgC7APuB/4tcDuD36eWJC1Ro17F9BMGPzn6p/2WI0laKEZ9FtO36DjnUFVnznlFkqQFYTbPYppxPHAJcPLclyNJWihGOgdRVX8z9PqrqroKOPdQ6yQ5PslXkvzvJF9L8get/YwkdyXZmeQTSV7U2o9r87va8smjHJsk6SiMeqPcuqHXVJLfBH72MKv9EDi3ql4FvBo4P8l64H3AlVW1FngC2NT6bwKeqKqXA1e2fpKkMRn1ENN/HZp+FngE+JeHWqE9Gvy7bfaF7VUM9jx+vbVvZfCcp2uADfz0mU83A3+UJD5iXJLGY9SrmH7pSD48yTHA3cDLgT8Gvgk8WVXPti7TDB7bQXvf3b7v2ST7gZcB3z6S75YkHZ1Rr2J616GWV9X7D9L+Y+DVSVYAnwFe2dVt5msOsWy4ls3AZoDTTz/9UGVJko7CqDfKTQH/jsH/5a8CfhM4i8F5iMOdi6CqngS+BKwHVrRHdQCsBva06WlgDTz3KI+XAt/p+Kxrq2qqqqYmJiZGLF+SNFuz+cGgdVX1twBJfh/4ZFX9m4OtkGQC+FFVPZnkxcAbGJx4vhO4GLgR2Ajc0lbZ1ub/V1v+Rc8/SNL4jBoQpwPPDM0/A0weZp2VwNZ2HuIFwE1VdWuSrwM3JvnPwD3Ada3/dcANSXYx2HN4y4i1SZJ6MGpA3AB8JclnGJwX+FXg+kOtUFX3AWd3tD8MvKaj/QcMbsCTJC0Ao17F9F+S/A/gH7emS6vqnv7KkiSN26gnqQFOAJ6qqg8A00nO6KkmSdICMOqd1O8F3g1c3ppeCPy3voqSJI3fqHsQvwq8CfgeQFXtYYTLWyVJi9eoAfFMu+S0AJKc2F9JkqSFYNSAuCnJBxnc5PYbwBfwx4MkaUkb9SqmP2y/Rf0U8ArgP1XVHb1WJkkaq8MGRLvR7fNV9QbAUJCkZeKwAVFVP07ydJKXVtX++ShKBze55bYjXveRKy6aw0okLXWj3kn9A+D+JHfQrmQCqKrf7qUqSdLYjRoQt7WXJGmZOGRAJDm9qh6tqq3zVZAkaWE43GWun52ZSPKpnmuRJC0ghwuI4V95O7PPQiRJC8vhAqIOMi1JWuIOd5L6VUmeYrAn8eI2TZuvqnpJr9VJksbmkAFRVcfMVyGSpIVlNr8HIUlaRgwISVInA0KS1MmAkCR1MiAkSZ0MCElSJwNCktTJgJAkdeotIJKsSXJnkgeTfC3JO1r7yUnuSLKzvZ/U2pPk6iS7ktyXZF1ftUmSDq/PPYhngd+pqlcC64HLkpwFbAG2V9VaYHubB7gAWNtem4FreqxNknQYo/5g0KxV1WPAY236b5M8CKwCNgCvb922Al8C3t3ar6+qAr6cZEWSle1zNAf8uVJJszEv5yCSTAJnA3cBp838R7+9n9q6rQJ2D6023dokSWPQe0Ak+RngU8A7q+qpQ3XtaHveI8aTbE6yI8mOffv2zVWZkqQD9BoQSV7IIBw+WlWfbs2PJ1nZlq8E9rb2aWDN0OqrgT0HfmZVXVtVU1U1NTEx0V/xkrTM9XkVU4DrgAer6v1Di7YBG9v0RuCWofa3t6uZ1gP7Pf8gSePT20lq4BzgXwH3J7m3tb0HuAK4Kckm4FHgkrbsduBCYBfwNHBpj7VJkg6jz6uY/ifd5xUAzuvoX8BlfdUjSZod76SWJHUyICRJnQwISVInA0KS1MmAkCR1MiAkSZ0MCElSJwNCktTJgJAkdTIgJEmdDAhJUicDQpLUyYCQJHUyICRJnQwISVInA0KS1MmAkCR1MiAkSZ0MCElSJwNCktTp2HEXoMVhcsttR7X+I1dcNEeVSJov7kFIkjoZEJKkTgaEJKmTASFJ6tRbQCT5cJK9SR4Yajs5yR1Jdrb3k1p7klydZFeS+5Ks66suSdJo+tyD+Ahw/gFtW4DtVbUW2N7mAS4A1rbXZuCaHuuSJI2gt4Coqr8EvnNA8wZga5veCrx5qP36GvgysCLJyr5qkyQd3nyfgzitqh4DaO+ntvZVwO6hftOtTZI0JgvlJHU62qqzY7I5yY4kO/bt29dzWZK0fM13QDw+c+iove9t7dPAmqF+q4E9XR9QVddW1VRVTU1MTPRarCQtZ/P9qI1twEbgivZ+y1D7byW5EXgtsH/mUJSWhqN5VIeP6ZDGo7eASPJx4PXAKUmmgfcyCIabkmwCHgUuad1vBy4EdgFPA5f2VZckaTS9BURVvfUgi87r6FvAZX3VIkmavYVyklqStMAYEJKkTgaEJKmTASFJ6mRASJI6GRCSpE4GhCSp03zfSS3NmndhS+PhHoQkqZMBIUnqZEBIkjoZEJKkTgaEJKmTASFJ6uRlrlrSvERWOnLuQUiSOhkQkqROHmKSDsLDU1ruDAhpATKctBB4iEmS1MmAkCR18hCT1IOjOUQkLRTuQUiSOrkHIS0x49p78eT40mNASFoQvHJr4VlQAZHkfOADwDHAh6rqijGXJGkRGFe4LPVQWzABkeQY4I+BXwamga8m2VZVXx9vZZI09472UOB8BMyCCQjgNcCuqnoYIMmNwAbAgJAWgcV65dZirXs+LKSrmFYBu4fmp1ubJGkMFtIeRDra6nmdks3A5jb73STfOMLvOwX49hGuu1gtxzHD8hz3chwzLKNx533PTR7JmH9ulE4LKSCmgTVD86uBPQd2qqprgWuP9suS7KiqqaP9nMVkOY4Zlue4l+OYYXmOu88xL6RDTF8F1iY5I8mLgLcA28ZckyQtWwtmD6Kqnk3yW8DnGVzm+uGq+tqYy5KkZWvBBARAVd0O3D5PX3fUh6kWoeU4Zlie416OY4blOe7expyq550HliRpQZ2DkCQtIMsuIJKcn+QbSXYl2TLueuZKkjVJ7kzyYJKvJXlHaz85yR1Jdrb3k1p7klzd/g73JVk33hEcnSTHJLknya1t/owkd7Vxf6Jd+ECS49r8rrZ8cpx1H6kkK5LcnOShts1ftxy2dZL/0P59P5Dk40mOX4rbOsmHk+xN8sBQ26y3b5KNrf/OJBtnW8eyCoihx3lcAJwFvDXJWeOtas48C/xOVb0SWA9c1sa2BdheVWuB7W0eBn+Dte21Gbhm/kueU+8AHhyafx9wZRv3E8Cm1r4JeKKqXg5c2fotRh8APldVPw+8isHYl/S2TrIK+G1gqqp+gcHFLG9haW7rjwDnH9A2q+2b5GTgvcBrGTyp4r0zoTKyqlo2L+B1wOeH5i8HLh93XT2N9RYGz7X6BrCyta0EvtGmPwi8daj/c/0W24vBPTPbgXOBWxncdPlt4NgDtzuDq+Re16aPbf0y7jHMcrwvAb51YN1LfVvz06ctnNy23a3AP1uq2xqYBB440u0LvBX44FD7/9dvlNey2oNgmTzOo+1Knw3cBZxWVY8BtPdTW7el9Le4Cvhd4Cdt/mXAk1X1bJsfHttz427L97f+i8mZwD7gz9phtQ8lOZElvq2r6q+APwQeBR5jsO3uZmlv62Gz3b5Hvd2XW0CM9DiPxSzJzwCfAt5ZVU8dqmtH26L7WyT5FWBvVd093NzRtUZYtlgcC6wDrqmqs4Hv8dPDDV2Wwphph0c2AGcAfxc4kcHhlQMtpW09ioON86jHv9wCYqTHeSxWSV7IIBw+WlWfbs2PJ1nZlq8E9rb2pfK3OAd4U5JHgBsZHGa6CliRZOY+n+GxPTfutvylwHfms+A5MA1MV9Vdbf5mBoGx1Lf1G4BvVdW+qvoR8GngH7G0t/Ww2W7fo97uyy0gluzjPJIEuA54sKreP7RoGzBz9cJGBucmZtrf3q6AWA/sn9l9XUyq6vKqWl1Vkwy25xer6m3AncDFrduB4575e1zc+i+q/6usqr8Gdid5RWs6j8Fj8Zf0tmZwaGl9khPav/eZcS/ZbX2A2W7fzwNvTHJS2/t6Y2sb3bhPxIzhxM+FwP8Bvgn8x3HXM4fj+kUGu4/3Afe214UMjrluB3a295Nb/zC4ouubwP0MrgwZ+ziO8m/weuDWNn0m8BVgF/BJ4LjWfnyb39WWnznuuo9wrK8GdrTt/VngpOWwrYE/AB4CHgBuAI5bitsa+DiD8yw/YrAnsOlIti/wr9v4dwGXzrYO76SWJHVaboeYJEkjMiAkSZ0MCElSJwNCktTJgJAkdTIgJEmdDAhJUicDQpLU6f8BGHUhs1TTeTsAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# increase the number of bins to see more detail\n", "ted.loc[ted.comments < 1000, 'comments'].plot(kind='hist', bins=20)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAD8CAYAAAB+UHOxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFC5JREFUeJzt3X+QXWV9x/H3d+9uspCQEGClMZu4dEhbCsUf3fFH7bSOVCJaCH/Ujo5TgjKmOIJVmdbQacehnQDSDladihMlGjtW4mBboNImKcZqp5WaIBUllWT4kayhsJBLJLkh2WW//WPPxl2yJLn3ZvfuzXm/Zu6cc5773Hu+m8nu5z7PPT8iM5EklU9HqwuQJLWGASBJJWUASFJJGQCSVFIGgCSVlAEgSSVlAEhSSRkAklRSBoAklVRnqws4mrPOOiv7+vpaXYYktZWtW7c+k5k9x+o3owOgr6+PLVu2tLoMSWorEfHE8fRzCkiSSsoAkKSSMgAkqaSOGQARsTYino6IH41rOyMiNkXE9mK5oGiPiPhMROyIiB9GxOvGvWZF0X97RKyYmh9HknS8jmcE8GXg7S9pWwXcl5lLgfuKbYBLgKXFYyVwG4wGBvAJ4A3A64FPjIWGJKk1jhkAmfkdYM9LmpcD64r1dcDl49q/kqO+B5weEQuBZcCmzNyTmVVgE0eGitQWli1bRkdHBxFBR0cHy5Yta3VJUkMa/Q7g7Mx8EqBYvqJoXwTsGtdvoGh7uXaprSxbtoyNGzdy9dVX89xzz3H11VezceNGQ0Bt6USfBxCTtOVR2o98g4iVjE4fsWTJkhNXmXQCbNq0iQ9+8IN87nOfAzi8/PznP9/KsqSGNDoCeKqY2qFYPl20DwCLx/XrBXYfpf0ImbkmM/szs7+n55gnsknTKjO56aabJrTddNNNeG9ttaNGA+BuYOxInhXAXeParyiOBnojsLeYItoAXBwRC4ovfy8u2qS2EhFcf/31E9quv/56IiYb5Eoz2zGngCLia8BbgLMiYoDRo3luBr4eEVcBO4F3Fd3vBd4B7ABqwPsAMnNPRPwl8P2i319k5ku/WJZmvLe97W3cdttt3HHHHVSrVRYsWEC1WuXiiy9udWlS3Y4ZAJn5npd56qJJ+ibwoZd5n7XA2rqqk2aYK6+8ku985ztUq1UAqtUq3d3dXHnlla0tTGqAZwJLdVi9ejXXXXcd559/Ph0dHZx//vlcd911rF69utWlSXWb0VcDlWaahx9+mKeffpo5c+YAsH//ftasWcMzzzzT4sqk+jkCkOpQqVQ4cOAAwOEjfw4cOEClUmllWVJDDACpDsPDw9RqNa699lr27dvHtddeS61WY3h4uNWlSXWLmXz8cn9/f3pDGM0kEcHSpUvZsWMHmUlEcO6557J9+3bPBdCMERFbM7P/WP0cAUh12r59+4RLQWzfvr3VJUkNcQQg1SEi6OrqAmBoaGjC+kz+XVK5HO8IwKOApDoNDQ1Nui61G6eAJKmkDACpAZdddhmDg4NcdtllrS5FaphTQFKdzjvvPDZs2EBPTw+zZ8/mvPPOY9u2ba0uS6qbIwCpTk888QQLFy6ko6ODhQsX8sQTT7S6JKkhjgCkOkQEtVqNnTt3MjIycnjp5aDVjhwBSHU49dRTARgZGZmwHGuX2okjAKkO+/fv55RTTmF4ePjweQCdnZ3s37+/1aVJdXMEINXphhtu4NChQ2Qmhw4d4oYbbmh1SVJDDACpTrfeeiubN29maGiIzZs3c+utt7a6JKkhTgFJdejt7eX555/n/e9/Pzt37mTJkiUcOHCA3t7eVpcm1c0RgFSHW265hZGREX76059OWN5yyy2tLk2qmwEg1am7u5tFixYRESxatIju7u5WlyQ1xACQ6rB69WrWr1/PY489xsjICI899hjr16/3nsBqSwaAVIdt27YxMDDABRdcQKVS4YILLmBgYMBLQagteT8AqQ6LFy/mqaeemnAZ6K6uLs4++2x27drVwsqkn/OOYNIUGBwcZGhoiLlz5xIRzJ07l6GhIQYHB1tdmlQ3DwOV6nDw4EEign379gGwb98+IoKDBw+2uDKpfo4ApDq9dNp0Jk+jSkdjAEgN8IYwOhk4BSQ14Lvf/S49PT0sWLCg1aVIDXMEIDWgWq1OWErtyACQGjB37twJS6kdGQBSHebMmQMw4Sig8e1SO2kqACLioxHx44j4UUR8LSK6I+KciLg/IrZHxPqImFX0nV1s7yie7zsRP4A0nWq1Gl1dXRPaurq6qNVqLapIalzDARARi4APA/2ZeQFQAd4NfBL4VGYuBarAVcVLrgKqmXku8Kmin9RWKpUKs2bNoq+vj46ODvr6+pg1axaVSqXVpUl1a3YKqBM4JSI6gVOBJ4G3AncWz68DLi/WlxfbFM9fFN5JW21meHiYWq3Grl27GBkZYdeuXdRqNYaHh1tdmlS3hgMgM38K/DWwk9E//HuBrcBzmTn22zAALCrWFwG7itcOF/3PbHT/UqtkJmOfXSLCE8HUtpqZAlrA6Kf6c4BXAnOASybpOvbbMdmn/SN+cyJiZURsiYgtXl9FM9XYH33/+KudNTMF9DvAY5k5mJlDwD8AvwGcXkwJAfQCu4v1AWAxQPH8fGDPS980M9dkZn9m9vf09DRRnjR15s2bR0Qwb968VpciNayZANgJvDEiTi3m8i8CHgY2A79X9FkB3FWs311sUzz/rfTjk9pQR0cH1WqVzKRardLR4dHUak/NfAdwP6Nf5j4APFS81xrg48DHImIHo3P8txcvuR04s2j/GLCqibqllhkZGTnqttQuvCGMVIejHbg2k3+XVC7eEEaaQmMXgfNicGpnBoBUp/nz50+4GNz8+fNbXJHUGANAqtPevXsn3A9g7969rS5Jaoj3A5AacM8999DT03PU7wSkmc4RgFSnSqUy4UQwrwOkdmUASHXo7Oyku7t7wsXguru76ex0MK32YwBIdZg3bx61Wo0XXngBgBdeeIFareYZwWpLBoBUh2q1yty5c3n22WcZGRnh2WefZe7cud4aUm3JAJDqMGvWLBYuXHj48s/Dw8MsXLiQWbNmtbgyqX4GgFSHgwcP8sgjj3DppZcyODjIpZdeyiOPPMLBgwdbXZpUN7+5kurU19fHhg0b6OnpYfbs2fT19fH444+3uiypbo4ApDoNDAxw4403sn//fm688UYGBgZaXZLUEANAqtOFF17I2rVrOe2001i7di0XXnhhq0uSGuIUkFSnBx544PBF4Hbv3u0RQGpbjgCkOvT29tLZ2Um1WmVkZIRqtUpnZye9vb2tLk2qmwEg1aFWqzE8PHz48g+VSoXh4WFqtVqLK5PqZwBIddizZ88RF4CLCPbsOeL21tKMZwBIdapUKofvA9zR0eHF4NS2/BJYqtPw8PDhAHjxxRe9J7DaliMASSopA0BqwNinfj/9q50ZAFIDxn8HILUr//dKDXAEoJOBASBJJWUASFJJGQCSVFIGgCSVlAEgSSVlAEhSSRkAklRSBoAklVRTARARp0fEnRHxvxGxLSLeFBFnRMSmiNheLBcUfSMiPhMROyLihxHxuhPzI0iSGtHsCODTwL9m5q8Arwa2AauA+zJzKXBfsQ1wCbC0eKwEbmty35KkJjQcABExD/gt4HaAzDyUmc8By4F1Rbd1wOXF+nLgKznqe8DpEbGw4colSU1pZgTwi8Ag8KWI+EFEfDEi5gBnZ+aTAMXyFUX/RcCuca8fKNokSS3QTAB0Aq8DbsvM1wL7+fl0z2RikrY8olPEyojYEhFbBgcHmyhPknQ0zQTAADCQmfcX23cyGghPjU3tFMunx/VfPO71vcDul75pZq7JzP7M7O/p6WmiPEnS0TQcAJn5f8CuiPjlouki4GHgbmBF0bYCuKtYvxu4ojga6I3A3rGpIqndLFiwYMJSakfN3hP4WuCrETELeBR4H6Oh8vWIuArYCbyr6Hsv8A5gB1Ar+kptad++fROWUjtqKgAy80Ggf5KnLpqkbwIfamZ/0kwxNDQ0YSm1I88ElqSSMgAkqaQMAEkqKQNAasCCBQuICI8CUltr9iggqZSq1eqEpdSOHAFIUkkZAJJUUgaAJJWUASBJJWUASFJJGQCSVFIGgCSVlAEgSSVlAEhSSRkAklRSBoAklZQBIEklZQBIUkkZAJJUUgaAJJWUASBJJWUASFJJGQCSVFIGgCSVlAEgSSVlAEgNiIgJS6kddba6AGkmqPcPeWZOWB7ve4zvL7WaIwCJ0T/Mx/O45ppriAgqlQoAlUqFiOCaa645rtdLM4kjAKkOn/3sZwH4whe+wIsvvkhnZycf+MAHDrdL7SRm8qeS/v7+3LJlS6vLkCbVt+qbPH7zO1tdhnSEiNiamf3H6ucUkCSVVNMBEBGViPhBRPxzsX1ORNwfEdsjYn1EzCraZxfbO4rn+5rdtySpcSdiBPBHwLZx258EPpWZS4EqcFXRfhVQzcxzgU8V/SRJLdJUAEREL/BO4IvFdgBvBe4suqwDLi/WlxfbFM9fFB5ELUkt0+wI4G+APwFGiu0zgecyc7jYHgAWFeuLgF0AxfN7i/4TRMTKiNgSEVsGBwebLE+S9HIaDoCI+F3g6czcOr55kq55HM/9vCFzTWb2Z2Z/T09Po+VJko6hmfMA3gxcFhHvALqBeYyOCE6PiM7iU34vsLvoPwAsBgYiohOYD+xpYv+SpCY0PALIzOszszcz+4B3A9/KzPcCm4HfK7qtAO4q1u8utime/1bO5JMQJOkkNxXnAXwc+FhE7GB0jv/2ov124Myi/WPAqinYtyTpOJ2QS0Fk5reBbxfrjwKvn6TPC8C7TsT+JEnN80xgSSopA0CSSsoAkKSSMgAkqaQMAEkqKQNAkkrKAJCkkjIAJKmkDABJKikDQJJKygCQpJIyACSppAwASSopA0CSSsoAkKSSMgAkqaQMAEkqKQNAkkrKAJCkkjIAJKmkDABJKikDQJJKygCQpJIyACSppAwASSopA0CSSsoAkKSSMgAkqaQMAEkqKQNAkkrKAJCkkups9IURsRj4CvALwAiwJjM/HRFnAOuBPuBx4PczsxoRAXwaeAdQA67MzAeaK1+a3Ktv2MjeA0NTvp++Vd+c0veff0oX//OJi6d0HyqvhgMAGAauy8wHIuI0YGtEbAKuBO7LzJsjYhWwCvg4cAmwtHi8AbitWEon3N4DQzx+8ztbXUbTpjpgVG4NTwFl5pNjn+Az83lgG7AIWA6sK7qtAy4v1pcDX8lR3wNOj4iFDVcuSWrKCfkOICL6gNcC9wNnZ+aTMBoSwCuKbouAXeNeNlC0vfS9VkbElojYMjg4eCLKkyRNoukAiIi5wDeAj2Tmz47WdZK2PKIhc01m9mdmf09PT7PlSZJeRlMBEBFdjP7x/2pm/kPR/NTY1E6xfLpoHwAWj3t5L7C7mf1LkhrXcAAUR/XcDmzLzFvHPXU3sKJYXwHcNa79ihj1RmDv2FSRJGn6NXMU0JuBPwAeiogHi7Y/BW4Gvh4RVwE7gXcVz93L6CGgOxg9DPR9TexbktSkhgMgM/+Dyef1AS6apH8CH2p0f5KkE8szgSWppAwASSopA0CSSsoAkKSSMgAkqaQMAEkqKQNAkkrKAJCkkjIAJKmkDABJKikDQJJKqpmLwUkz1mnnreLX1q1qdRlNO+08gPa/taVmJgNAJ6Xnt93sPYGlY3AKSJJKygCQpJIyACSppAwASSopA0CSSsoAkKSSMgAkqaQMAEkqKQNAkkrKM4F10joZzqKdf0pXq0vQScwA0ElpOi4D0bfqmyfF5SZUXk4BSVJJGQCSVFIGgCSVlAEgSSVlAEhSSRkAklRSBoAkldS0B0BEvD0ifhIROyKi/W/aKkltaloDICIqwN8ClwC/CrwnIn51OmuQJI2a7hHA64EdmfloZh4C7gCWT3MNkiSm/1IQi4Bd47YHgDeM7xARK4GVAEuWLJm+ylRqEdHY6z5ZX//MbGg/0lSY7hHAZL9lE34jMnNNZvZnZn9PT880laWyy8xpeUgzyXQHwACweNx2L7B7mmuQJDH9AfB9YGlEnBMRs4B3A3dPcw2SJKb5O4DMHI6Ia4ANQAVYm5k/ns4aJEmjpv1+AJl5L3DvdO9XkjSRZwJLUkkZAJJUUgaAJJWUASBJJRUz+eSUiBgEnmh1HdLLOAt4ptVFSJN4VWYe80zaGR0A0kwWEVsys7/VdUiNcgpIkkrKAJCkkjIApMataXUBUjP8DkCSSsoRgCSVlAEgtVBEfCQiTm11HSonp4CkFoqIx4H+zPR8Ak07RwA6KUXEFRHxw4j4n4j4u4h4VUTcV7TdFxFLin5fjojbImJzRDwaEb8dEWsjYltEfHnc++2LiE9GxNaI+LeIeH1EfLt4zWVFn0pE/FVEfL/Yzx8W7W8p+t4ZEf8bEV+NUR8GXglsLvZfKer5UUQ8FBEfbcE/ncpkum6F58PHdD2A84GfAGcV22cA9wAriu33A/9UrH8ZuIPR25UuB34G/BqjH462Aq8p+iVwSbH+j8BGoAt4NfBg0b4S+LNifTawBTgHeAuwl9E74HUA/wX8ZtHv8XF1/jqwadzPcXqr/y19nNwPRwA6Gb0VuDOLaZXM3AO8Cfj74vm/A35zXP97MjOBh4CnMvOhzBwBfgz0FX0OAf9arD8E/HtmDhXrY30uBq6IiAeB+4EzgaXFc/+dmQPF+z447jXjPQr8YkR8NiLezmgYSVPGANDJKBj9xH40458/WCxHxq2PbY/dNGmoCIkJ/Yo/6GN9Arg2M19TPM7JzI0v2QfAi0xyM6bMrDI6ovg28CHgi8f4GaSmGAA6Gd0H/H5EnAkQEWcA/8noPagB3gv8xxTsdwPwwYjoKvb7SxEx5xiveR44reh/FtCRmd8A/hx43RTUKB027beElKZaZv44IlYD/x4RLwI/AD4MrI2IPwYGgfdNwa6/yOjUzgMREcV+Lj/Ga9YA/xIRTwIfAb4UEWMfzK6fghqlwzwMVJJKyikgSSopA0CSSsoAkKSSMgAkqaQMAEkqKQNAkkrKAJCkkjIAJKmk/h/OplPavCQBBQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# boxplot can also show distributions, but it's far less useful for concentrated distributions because of outliers\n", "ted.loc[ted.comments < 1000, 'comments'].plot(kind='box')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lessons:\n", "\n", "1. Choose your plot type based on the question you are answering and the data type(s) you are working with\n", "2. Use pandas one-liners to iterate through plots quickly\n", "3. Try modifying the plot defaults\n", "4. Creating plots involves decision-making" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Plot the number of talks that took place each year\n", "\n", "Bonus exercise: calculate the average delay between filming and publishing" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2012 TEDxBoulder\n", "1307 TEDxUCL\n", "144 TEDGlobal 2007\n", "1739 TED2014\n", "1529 TEDGlobal 2013\n", "1181 TEDxWomen 2011\n", "2150 TEDYouth 2015\n", "1719 TED2014\n", "64 TED2007\n", "1178 TEDxCambridge\n", "Name: event, dtype: object" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# event column does not always include the year\n", "ted.event.sample(10)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1140825600\n", "1 1140825600\n", "2 1140739200\n", "3 1140912000\n", "4 1140566400\n", "Name: film_date, dtype: int64" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# dataset documentation for film_date says \"Unix timestamp of the filming\"\n", "ted.film_date.head()" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1970-01-01 00:00:01.140825600\n", "1 1970-01-01 00:00:01.140825600\n", "2 1970-01-01 00:00:01.140739200\n", "3 1970-01-01 00:00:01.140912000\n", "4 1970-01-01 00:00:01.140566400\n", "Name: film_date, dtype: datetime64[ns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# results don't look right\n", "pd.to_datetime(ted.film_date).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[pandas documentation for `to_datetime`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2006-02-25\n", "1 2006-02-25\n", "2 2006-02-24\n", "3 2006-02-26\n", "4 2006-02-22\n", "Name: film_date, dtype: datetime64[ns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# now the results look right\n", "pd.to_datetime(ted.film_date, unit='s').head()" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "ted['film_datetime'] = pd.to_datetime(ted.film_date, unit='s')" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
eventfilm_datetime
831TEDWomen 20102010-12-08
2464TED20172017-04-24
2392TEDxBeaconStreet2016-11-19
1307TEDxUCL2012-06-03
2234TED20162016-02-17
\n", "
" ], "text/plain": [ " event film_datetime\n", "831 TEDWomen 2010 2010-12-08\n", "2464 TED2017 2017-04-24\n", "2392 TEDxBeaconStreet 2016-11-19\n", "1307 TEDxUCL 2012-06-03\n", "2234 TED2016 2016-02-17" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# verify that event name matches film_datetime for a random sample\n", "ted[['event', 'film_datetime']].sample(5)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "comments int64\n", "description object\n", "duration int64\n", "event object\n", "film_date int64\n", "languages int64\n", "main_speaker object\n", "name object\n", "num_speaker int64\n", "published_date int64\n", "ratings object\n", "related_talks object\n", "speaker_occupation object\n", "tags object\n", "title object\n", "url object\n", "views int64\n", "comments_per_view float64\n", "views_per_comment float64\n", "film_datetime datetime64[ns]\n", "dtype: object" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# new column uses the datetime data type (this was an automatic conversion)\n", "ted.dtypes" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 2006\n", "1 2006\n", "2 2006\n", "3 2006\n", "4 2006\n", "Name: film_datetime, dtype: int64" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# datetime columns have convenient attributes under the dt namespace\n", "ted.film_datetime.dt.year.head()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 ted2006\n", "1 ted2006\n", "2 ted2006\n", "3 ted2006\n", "4 ted2006\n", "Name: event, dtype: object" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# similar to string methods under the str namespace\n", "ted.event.str.lower().head()" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2013 270\n", "2011 270\n", "2010 267\n", "2012 267\n", "2016 246\n", "2015 239\n", "2014 237\n", "2009 232\n", "2007 114\n", "2017 98\n", "2008 84\n", "2005 66\n", "2006 50\n", "2003 33\n", "2004 33\n", "2002 27\n", "1998 6\n", "2001 5\n", "1983 1\n", "1991 1\n", "1994 1\n", "1990 1\n", "1984 1\n", "1972 1\n", "Name: film_datetime, dtype: int64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# count the number of talks each year using value_counts()\n", "ted.film_datetime.dt.year.value_counts()" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# points are plotted and connected in the order you give them to pandas\n", "ted.film_datetime.dt.year.value_counts().plot()" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# need to sort the index before plotting\n", "ted.film_datetime.dt.year.value_counts().sort_index().plot()" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "Timestamp('2017-08-27 00:00:00')" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we only have partial data for 2017\n", "ted.film_datetime.max()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lessons:\n", "\n", "1. Read the documentation\n", "2. Use the datetime data type for dates and times\n", "3. Check your work as you go\n", "4. Consider excluding data if it might not be relevant" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. What were the \"best\" events in TED history to attend?" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "TED2014 84\n", "TED2009 83\n", "TED2013 77\n", "TED2016 77\n", "TED2015 75\n", "Name: event, dtype: int64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# count the number of talks (great if you value variety, but they may not be great talks)\n", "ted.event.value_counts().head()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "event\n", "AORN Congress 149818.0\n", "Arbejdsglaede Live 971594.0\n", "BBC TV 521974.0\n", "Bowery Poetry Club 676741.0\n", "Business Innovation Factory 304086.0\n", "Name: views, dtype: float64" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# use views as a proxy for \"quality of talk\"\n", "ted.groupby('event').views.mean().head()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "event\n", "TEDxNorrkoping 6569493.0\n", "TEDxCreativeCoast 8444981.0\n", "TEDxBloomington 9484259.5\n", "TEDxHouston 16140250.5\n", "TEDxPuget Sound 34309432.0\n", "Name: views, dtype: float64" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# find the largest values, but we don't know how many talks are being averaged\n", "ted.groupby('event').views.mean().sort_values().tail()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmean
event
TEDxNorrkoping16569493.0
TEDxCreativeCoast18444981.0
TEDxBloomington29484259.5
TEDxHouston216140250.5
TEDxPuget Sound134309432.0
\n", "
" ], "text/plain": [ " count mean\n", "event \n", "TEDxNorrkoping 1 6569493.0\n", "TEDxCreativeCoast 1 8444981.0\n", "TEDxBloomington 2 9484259.5\n", "TEDxHouston 2 16140250.5\n", "TEDxPuget Sound 1 34309432.0" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# show the number of talks along with the mean (events with the highest means had only 1 or 2 talks)\n", "ted.groupby('event').views.agg(['count', 'mean']).sort_values('mean').tail()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeansum
event
TED2006453.274345e+06147345533
TED2015752.011017e+06150826305
TEDGlobal 2013662.584163e+06170554736
TED2014842.072874e+06174121423
TED2013772.302700e+06177307937
\n", "
" ], "text/plain": [ " count mean sum\n", "event \n", "TED2006 45 3.274345e+06 147345533\n", "TED2015 75 2.011017e+06 150826305\n", "TEDGlobal 2013 66 2.584163e+06 170554736\n", "TED2014 84 2.072874e+06 174121423\n", "TED2013 77 2.302700e+06 177307937" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# calculate the total views per event\n", "ted.groupby('event').views.agg(['count', 'mean', 'sum']).sort_values('sum').tail()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lessons:\n", "\n", "1. Think creatively for how you can use the data you have to answer your question\n", "2. Watch out for small sample sizes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Unpack the ratings data" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...\n", "1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...\n", "2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...\n", "3 [{'id': 3, 'name': 'Courageous', 'count': 760}...\n", "4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...\n", "Name: ratings, dtype: object" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# previously, users could tag talks on the TED website (funny, inspiring, confusing, etc.)\n", "ted.ratings.head()" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "\"[{'id': 7, 'name': 'Funny', 'count': 19645}, {'id': 1, 'name': 'Beautiful', 'count': 4573}, {'id': 9, 'name': 'Ingenious', 'count': 6073}, {'id': 3, 'name': 'Courageous', 'count': 3253}, {'id': 11, 'name': 'Longwinded', 'count': 387}, {'id': 2, 'name': 'Confusing', 'count': 242}, {'id': 8, 'name': 'Informative', 'count': 7346}, {'id': 22, 'name': 'Fascinating', 'count': 10581}, {'id': 21, 'name': 'Unconvincing', 'count': 300}, {'id': 24, 'name': 'Persuasive', 'count': 10704}, {'id': 23, 'name': 'Jaw-dropping', 'count': 4439}, {'id': 25, 'name': 'OK', 'count': 1174}, {'id': 26, 'name': 'Obnoxious', 'count': 209}, {'id': 10, 'name': 'Inspiring', 'count': 24924}]\"" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# two ways to examine the ratings data for the first talk\n", "ted.loc[0, 'ratings']\n", "ted.ratings[0]" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "str" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# this is a string not a list\n", "type(ted.ratings[0])" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "# convert this into something useful using Python's ast module (Abstract Syntax Tree)\n", "import ast" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 2, 3]" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# literal_eval() allows you to evaluate a string containing a Python literal or container\n", "ast.literal_eval('[1, 2, 3]')" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# if you have a string representation of something, you can retrieve what it actually represents\n", "type(ast.literal_eval('[1, 2, 3]'))" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': 7, 'name': 'Funny', 'count': 19645},\n", " {'id': 1, 'name': 'Beautiful', 'count': 4573},\n", " {'id': 9, 'name': 'Ingenious', 'count': 6073},\n", " {'id': 3, 'name': 'Courageous', 'count': 3253},\n", " {'id': 11, 'name': 'Longwinded', 'count': 387},\n", " {'id': 2, 'name': 'Confusing', 'count': 242},\n", " {'id': 8, 'name': 'Informative', 'count': 7346},\n", " {'id': 22, 'name': 'Fascinating', 'count': 10581},\n", " {'id': 21, 'name': 'Unconvincing', 'count': 300},\n", " {'id': 24, 'name': 'Persuasive', 'count': 10704},\n", " {'id': 23, 'name': 'Jaw-dropping', 'count': 4439},\n", " {'id': 25, 'name': 'OK', 'count': 1174},\n", " {'id': 26, 'name': 'Obnoxious', 'count': 209},\n", " {'id': 10, 'name': 'Inspiring', 'count': 24924}]" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# unpack the ratings data for the first talk\n", "ast.literal_eval(ted.ratings[0])" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# now we have a list (of dictionaries)\n", "type(ast.literal_eval(ted.ratings[0]))" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "# define a function to convert an element in the ratings Series from string to list\n", "def str_to_list(ratings_str):\n", " return ast.literal_eval(ratings_str)" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': 7, 'name': 'Funny', 'count': 19645},\n", " {'id': 1, 'name': 'Beautiful', 'count': 4573},\n", " {'id': 9, 'name': 'Ingenious', 'count': 6073},\n", " {'id': 3, 'name': 'Courageous', 'count': 3253},\n", " {'id': 11, 'name': 'Longwinded', 'count': 387},\n", " {'id': 2, 'name': 'Confusing', 'count': 242},\n", " {'id': 8, 'name': 'Informative', 'count': 7346},\n", " {'id': 22, 'name': 'Fascinating', 'count': 10581},\n", " {'id': 21, 'name': 'Unconvincing', 'count': 300},\n", " {'id': 24, 'name': 'Persuasive', 'count': 10704},\n", " {'id': 23, 'name': 'Jaw-dropping', 'count': 4439},\n", " {'id': 25, 'name': 'OK', 'count': 1174},\n", " {'id': 26, 'name': 'Obnoxious', 'count': 209},\n", " {'id': 10, 'name': 'Inspiring', 'count': 24924}]" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# test the function\n", "str_to_list(ted.ratings[0])" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...\n", "1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...\n", "2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...\n", "3 [{'id': 3, 'name': 'Courageous', 'count': 760}...\n", "4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...\n", "Name: ratings, dtype: object" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Series apply method applies a function to every element in a Series and returns a Series\n", "ted.ratings.apply(str_to_list).head()" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...\n", "1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...\n", "2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...\n", "3 [{'id': 3, 'name': 'Courageous', 'count': 760}...\n", "4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...\n", "Name: ratings, dtype: object" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# lambda is a shorter alternative\n", "ted.ratings.apply(lambda x: ast.literal_eval(x)).head()" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...\n", "1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...\n", "2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...\n", "3 [{'id': 3, 'name': 'Courageous', 'count': 760}...\n", "4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...\n", "Name: ratings, dtype: object" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# an even shorter alternative is to apply the function directly (without lambda)\n", "ted.ratings.apply(ast.literal_eval).head()" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "ted['ratings_list'] = ted.ratings.apply(lambda x: ast.literal_eval(x))" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': 7, 'name': 'Funny', 'count': 19645},\n", " {'id': 1, 'name': 'Beautiful', 'count': 4573},\n", " {'id': 9, 'name': 'Ingenious', 'count': 6073},\n", " {'id': 3, 'name': 'Courageous', 'count': 3253},\n", " {'id': 11, 'name': 'Longwinded', 'count': 387},\n", " {'id': 2, 'name': 'Confusing', 'count': 242},\n", " {'id': 8, 'name': 'Informative', 'count': 7346},\n", " {'id': 22, 'name': 'Fascinating', 'count': 10581},\n", " {'id': 21, 'name': 'Unconvincing', 'count': 300},\n", " {'id': 24, 'name': 'Persuasive', 'count': 10704},\n", " {'id': 23, 'name': 'Jaw-dropping', 'count': 4439},\n", " {'id': 25, 'name': 'OK', 'count': 1174},\n", " {'id': 26, 'name': 'Obnoxious', 'count': 209},\n", " {'id': 10, 'name': 'Inspiring', 'count': 24924}]" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check that the new Series looks as expected\n", "ted.ratings_list[0]" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "list" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# each element in the Series is a list\n", "type(ted.ratings_list[0])" ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "dtype('O')" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# data type of the new Series is object\n", "ted.ratings_list.dtype" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "comments int64\n", "description object\n", "duration int64\n", "event object\n", "film_date int64\n", "languages int64\n", "main_speaker object\n", "name object\n", "num_speaker int64\n", "published_date int64\n", "ratings object\n", "related_talks object\n", "speaker_occupation object\n", "tags object\n", "title object\n", "url object\n", "views int64\n", "comments_per_view float64\n", "views_per_comment float64\n", "film_datetime datetime64[ns]\n", "ratings_list object\n", "dtype: object" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# object is not just for strings\n", "ted.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lessons:\n", "\n", "1. Pay attention to data types in pandas\n", "2. Use apply any time it is necessary" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Count the total number of ratings received by each talk\n", "\n", "Bonus exercises:\n", "\n", "- for each talk, calculate the percentage of ratings that were negative\n", "- for each talk, calculate the average number of ratings it received per day since it was published" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': 7, 'name': 'Funny', 'count': 19645},\n", " {'id': 1, 'name': 'Beautiful', 'count': 4573},\n", " {'id': 9, 'name': 'Ingenious', 'count': 6073},\n", " {'id': 3, 'name': 'Courageous', 'count': 3253},\n", " {'id': 11, 'name': 'Longwinded', 'count': 387},\n", " {'id': 2, 'name': 'Confusing', 'count': 242},\n", " {'id': 8, 'name': 'Informative', 'count': 7346},\n", " {'id': 22, 'name': 'Fascinating', 'count': 10581},\n", " {'id': 21, 'name': 'Unconvincing', 'count': 300},\n", " {'id': 24, 'name': 'Persuasive', 'count': 10704},\n", " {'id': 23, 'name': 'Jaw-dropping', 'count': 4439},\n", " {'id': 25, 'name': 'OK', 'count': 1174},\n", " {'id': 26, 'name': 'Obnoxious', 'count': 209},\n", " {'id': 10, 'name': 'Inspiring', 'count': 24924}]" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# expected result (for each talk) is sum of count\n", "ted.ratings_list[0]" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [], "source": [ "# start by building a simple function\n", "def get_num_ratings(list_of_dicts):\n", " return list_of_dicts[0]" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': 7, 'name': 'Funny', 'count': 19645}" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# pass it a list, and it returns the first element in the list, which is a dictionary\n", "get_num_ratings(ted.ratings_list[0])" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [], "source": [ "# modify the function to return the vote count\n", "def get_num_ratings(list_of_dicts):\n", " return list_of_dicts[0]['count']" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "19645" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# pass it a list, and it returns a value from the first dictionary in the list\n", "get_num_ratings(ted.ratings_list[0])" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "# modify the function to get the sum of count\n", "def get_num_ratings(list_of_dicts):\n", " num = 0\n", " for d in list_of_dicts:\n", " num = num + d['count']\n", " return num" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "93850" ] }, "execution_count": 63, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# looks about right\n", "get_num_ratings(ted.ratings_list[0])" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': 7, 'name': 'Funny', 'count': 544},\n", " {'id': 3, 'name': 'Courageous', 'count': 139},\n", " {'id': 2, 'name': 'Confusing', 'count': 62},\n", " {'id': 1, 'name': 'Beautiful', 'count': 58},\n", " {'id': 21, 'name': 'Unconvincing', 'count': 258},\n", " {'id': 11, 'name': 'Longwinded', 'count': 113},\n", " {'id': 8, 'name': 'Informative', 'count': 443},\n", " {'id': 10, 'name': 'Inspiring', 'count': 413},\n", " {'id': 22, 'name': 'Fascinating', 'count': 132},\n", " {'id': 9, 'name': 'Ingenious', 'count': 56},\n", " {'id': 24, 'name': 'Persuasive', 'count': 268},\n", " {'id': 23, 'name': 'Jaw-dropping', 'count': 116},\n", " {'id': 26, 'name': 'Obnoxious', 'count': 131},\n", " {'id': 25, 'name': 'OK', 'count': 203}]" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check with another record\n", "ted.ratings_list[1]" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2936" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# looks about right\n", "get_num_ratings(ted.ratings_list[1])" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 93850\n", "1 2936\n", "2 2824\n", "3 3728\n", "4 25620\n", "Name: ratings_list, dtype: int64" ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# apply it to every element in the Series\n", "ted.ratings_list.apply(get_num_ratings).head()" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "93850" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# another alternative is to use a generator expression\n", "sum((d['count'] for d in ted.ratings_list[0]))" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 93850\n", "1 2936\n", "2 2824\n", "3 3728\n", "4 25620\n", "Name: ratings_list, dtype: int64" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# use lambda to apply this method\n", "ted.ratings_list.apply(lambda x: sum((d['count'] for d in x))).head()" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "93850" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# another alternative is to use pd.DataFrame()\n", "pd.DataFrame(ted.ratings_list[0])['count'].sum()" ] }, { "cell_type": "code", "execution_count": 70, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 93850\n", "1 2936\n", "2 2824\n", "3 3728\n", "4 25620\n", "Name: ratings_list, dtype: int64" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# use lambda to apply this method\n", "ted.ratings_list.apply(lambda x: pd.DataFrame(x)['count'].sum()).head()" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [], "source": [ "ted['num_ratings'] = ted.ratings_list.apply(get_num_ratings)" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 2550.000000\n", "mean 2436.408235\n", "std 4226.795631\n", "min 68.000000\n", "25% 870.750000\n", "50% 1452.500000\n", "75% 2506.750000\n", "max 93850.000000\n", "Name: num_ratings, dtype: float64" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# do one more check\n", "ted.num_ratings.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lessons:\n", "\n", "1. Write your code in small chunks, and check your work as you go\n", "2. Lambda is best for simple functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Which occupations deliver the funniest TED talks on average?\n", "\n", "Bonus exercises:\n", "\n", "- for each talk, calculate the most frequent rating\n", "- for each talk, clean the occupation data so that there's only one occupation per talk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: Count the number of funny ratings" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 [{'id': 7, 'name': 'Funny', 'count': 19645}, {...\n", "1 [{'id': 7, 'name': 'Funny', 'count': 544}, {'i...\n", "2 [{'id': 7, 'name': 'Funny', 'count': 964}, {'i...\n", "3 [{'id': 3, 'name': 'Courageous', 'count': 760}...\n", "4 [{'id': 9, 'name': 'Ingenious', 'count': 3202}...\n", "Name: ratings_list, dtype: object" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# \"Funny\" is not always the first dictionary in the list\n", "ted.ratings_list.head()" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True 2550\n", "Name: ratings, dtype: int64" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check ratings (not ratings_list) to see if \"Funny\" is always a rating type\n", "ted.ratings.str.contains('Funny').value_counts()" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [], "source": [ "# write a custom function\n", "def get_funny_ratings(list_of_dicts):\n", " for d in list_of_dicts:\n", " if d['name'] == 'Funny':\n", " return d['count']" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'id': 3, 'name': 'Courageous', 'count': 760},\n", " {'id': 1, 'name': 'Beautiful', 'count': 291},\n", " {'id': 2, 'name': 'Confusing', 'count': 32},\n", " {'id': 7, 'name': 'Funny', 'count': 59},\n", " {'id': 9, 'name': 'Ingenious', 'count': 105},\n", " {'id': 21, 'name': 'Unconvincing', 'count': 36},\n", " {'id': 11, 'name': 'Longwinded', 'count': 53},\n", " {'id': 8, 'name': 'Informative', 'count': 380},\n", " {'id': 10, 'name': 'Inspiring', 'count': 1070},\n", " {'id': 22, 'name': 'Fascinating', 'count': 132},\n", " {'id': 24, 'name': 'Persuasive', 'count': 460},\n", " {'id': 23, 'name': 'Jaw-dropping', 'count': 230},\n", " {'id': 26, 'name': 'Obnoxious', 'count': 35},\n", " {'id': 25, 'name': 'OK', 'count': 85}]" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# examine a record in which \"Funny\" is not the first dictionary\n", "ted.ratings_list[3]" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "59" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check that the function works\n", "get_funny_ratings(ted.ratings_list[3])" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 19645\n", "1 544\n", "2 964\n", "3 59\n", "4 1390\n", "Name: funny_ratings, dtype: int64" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# apply it to every element in the Series\n", "ted['funny_ratings'] = ted.ratings_list.apply(get_funny_ratings)\n", "ted.funny_ratings.head()" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# check for missing values\n", "ted.funny_ratings.isna().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Calculate the percentage of ratings that are funny" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [], "source": [ "ted['funny_rate'] = ted.funny_ratings / ted.num_ratings" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1849 Science humorist\n", "337 Comedian\n", "124 Performance poet, multimedia artist\n", "315 Expert\n", "1168 Social energy entrepreneur\n", "1468 Ornithologist\n", "595 Comedian, voice artist\n", "1534 Cartoon editor\n", "97 Satirist\n", "2297 Actor, writer\n", "568 Comedian\n", "675 Data scientist\n", "21 Humorist, web artist\n", "194 Jugglers\n", "2273 Comedian and writer\n", "2114 Comedian and writer\n", "173 Investor\n", "747 Comedian\n", "1398 Comedian\n", "685 Actor, comedian, playwright\n", "Name: speaker_occupation, dtype: object" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# \"gut check\" that this calculation makes sense by examining the occupations of the funniest talks\n", "ted.sort_values('funny_rate').speaker_occupation.tail(20)" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2549 Game designer\n", "1612 Biologist\n", "612 Sculptor\n", "998 Penguin expert\n", "593 Engineer\n", "284 Space activist\n", "1041 Biomedical engineer\n", "1618 Spinal cord researcher\n", "2132 Computational geneticist\n", "442 Sculptor\n", "426 Author, thinker\n", "458 Educator\n", "2437 Environmental engineer\n", "1491 Photojournalist\n", "1893 Forensic anthropologist\n", "783 Marine biologist\n", "195 Kenyan MP\n", "772 HIV/AIDS fighter\n", "788 Building activist\n", "936 Neuroengineer\n", "Name: speaker_occupation, dtype: object" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# examine the occupations of the least funny talks\n", "ted.sort_values('funny_rate').speaker_occupation.head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3: Analyze the funny rate by occupation" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "speaker_occupation\n", "Comedian 0.512457\n", "Actor, writer 0.515152\n", "Actor, comedian, playwright 0.558107\n", "Jugglers 0.566828\n", "Comedian and writer 0.602085\n", "Name: funny_rate, dtype: float64" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# calculate the mean funny rate for each occupation\n", "ted.groupby('speaker_occupation').funny_rate.mean().sort_values().tail()" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 2544\n", "unique 1458\n", "top Writer\n", "freq 45\n", "Name: speaker_occupation, dtype: object" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# however, most of the occupations have a sample size of 1\n", "ted.speaker_occupation.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4: Focus on occupations that are well-represented in the data" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Writer 45\n", "Artist 34\n", "Designer 34\n", "Journalist 33\n", "Entrepreneur 31\n", "Architect 30\n", "Inventor 27\n", "Psychologist 26\n", "Photographer 25\n", "Filmmaker 21\n", "Author 20\n", "Economist 20\n", "Neuroscientist 20\n", "Educator 20\n", "Roboticist 16\n", "Philosopher 16\n", "Biologist 15\n", "Physicist 14\n", "Musician 11\n", "Marine biologist 11\n", "Technologist 10\n", "Activist 10\n", "Global health expert; data visionary 10\n", "Historian 9\n", "Singer/songwriter 9\n", "Oceanographer 9\n", "Behavioral economist 9\n", "Poet 9\n", "Astronomer 9\n", "Graphic designer 9\n", " ..\n", "Anatomical artist 1\n", "Literary scholar 1\n", "Social entrepreneur, lawyer 1\n", "Physician, bioengineer and entrepreneur 1\n", "medical inventor 1\n", "Mental health advocate 1\n", "Public sector researcher 1\n", "Speleologist 1\n", "Disaster relief expert 1\n", "Artist and curator 1\n", "Finance journalist 1\n", "Wildlife conservationist 1\n", "Sex worker and activist 1\n", "Connector 1\n", "Sociologist, human rights activist 1\n", "Author, producer 1\n", "Painter 1\n", "Policy expert 1\n", "Environmental economist 1\n", "Sound artist, composer 1\n", "Senator 1\n", "High school principal 1\n", "Poet of code 1\n", "Healthcare revolutionary 1\n", "Circular economy advocate 1\n", "Caregiver 1\n", "Transportation geek 1\n", "Music icon 1\n", "Surprisologist 1\n", "Psychiatrist and writer 1\n", "Name: speaker_occupation, Length: 1458, dtype: int64" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# count how many times each occupation appears\n", "ted.speaker_occupation.value_counts()" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# value_counts() outputs a pandas Series, thus we can use pandas to manipulate the output\n", "occupation_counts = ted.speaker_occupation.value_counts()\n", "type(occupation_counts)" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Writer 45\n", "Artist 34\n", "Designer 34\n", "Journalist 33\n", "Entrepreneur 31\n", "Architect 30\n", "Inventor 27\n", "Psychologist 26\n", "Photographer 25\n", "Filmmaker 21\n", "Author 20\n", "Economist 20\n", "Neuroscientist 20\n", "Educator 20\n", "Roboticist 16\n", "Philosopher 16\n", "Biologist 15\n", "Physicist 14\n", "Musician 11\n", "Marine biologist 11\n", "Technologist 10\n", "Activist 10\n", "Global health expert; data visionary 10\n", "Historian 9\n", "Singer/songwriter 9\n", "Oceanographer 9\n", "Behavioral economist 9\n", "Poet 9\n", "Astronomer 9\n", "Graphic designer 9\n", " ..\n", "Legal activist 6\n", "Photojournalist 6\n", "Evolutionary biologist 6\n", "Singer-songwriter 6\n", "Performance poet, multimedia artist 6\n", "Climate advocate 6\n", "Techno-illusionist 6\n", "Social entrepreneur 6\n", "Comedian 6\n", "Reporter 6\n", "Writer, activist 6\n", "Investor and advocate for moral leadership 5\n", "Surgeon 5\n", "Paleontologist 5\n", "Physician 5\n", "Tech visionary 5\n", "Chef 5\n", "Science writer 5\n", "Game designer 5\n", "Cartoonist 5\n", "Producer 5\n", "Violinist 5\n", "Researcher 5\n", "Social Media Theorist 5\n", "Environmentalist, futurist 5\n", "Data scientist 5\n", "Musician, activist 5\n", "Sculptor 5\n", "Chemist 5\n", "Sound consultant 5\n", "Name: speaker_occupation, Length: 68, dtype: int64" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# show occupations which appear at least 5 times\n", "occupation_counts[occupation_counts >= 5]" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Writer', 'Artist', 'Designer', 'Journalist', 'Entrepreneur',\n", " 'Architect', 'Inventor', 'Psychologist', 'Photographer', 'Filmmaker',\n", " 'Author', 'Economist', 'Neuroscientist', 'Educator', 'Roboticist',\n", " 'Philosopher', 'Biologist', 'Physicist', 'Musician', 'Marine biologist',\n", " 'Technologist', 'Activist', 'Global health expert; data visionary',\n", " 'Historian', 'Singer/songwriter', 'Oceanographer',\n", " 'Behavioral economist', 'Poet', 'Astronomer', 'Graphic designer',\n", " 'Philanthropist', 'Novelist', 'Social psychologist', 'Engineer',\n", " 'Computer scientist', 'Futurist', 'Astrophysicist', 'Mathematician',\n", " 'Legal activist', 'Photojournalist', 'Evolutionary biologist',\n", " 'Singer-songwriter', 'Performance poet, multimedia artist',\n", " 'Climate advocate', 'Techno-illusionist', 'Social entrepreneur',\n", " 'Comedian', 'Reporter', 'Writer, activist',\n", " 'Investor and advocate for moral leadership', 'Surgeon',\n", " 'Paleontologist', 'Physician', 'Tech visionary', 'Chef',\n", " 'Science writer', 'Game designer', 'Cartoonist', 'Producer',\n", " 'Violinist', 'Researcher', 'Social Media Theorist',\n", " 'Environmentalist, futurist', 'Data scientist', 'Musician, activist',\n", " 'Sculptor', 'Chemist', 'Sound consultant'],\n", " dtype='object')" ] }, "execution_count": 88, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# save the index of this Series\n", "top_occupations = occupation_counts[occupation_counts >= 5].index\n", "top_occupations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 5: Re-analyze the funny rate by occupation (for top occupations only)" ] }, { "cell_type": "code", "execution_count": 89, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(786, 24)" ] }, "execution_count": 89, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# filter DataFrame to include only those occupations\n", "ted_top_occupations = ted[ted.speaker_occupation.isin(top_occupations)]\n", "ted_top_occupations.shape" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "speaker_occupation\n", "Surgeon 0.002465\n", "Physician 0.004515\n", "Photojournalist 0.004908\n", "Investor and advocate for moral leadership 0.005198\n", "Photographer 0.007152\n", "Environmentalist, futurist 0.007317\n", "Violinist 0.009534\n", "Singer-songwriter 0.010597\n", "Chemist 0.010970\n", "Philanthropist 0.012522\n", "Activist 0.012539\n", "Astrophysicist 0.013147\n", "Oceanographer 0.014596\n", "Paleontologist 0.015780\n", "Social psychologist 0.015887\n", "Tech visionary 0.016654\n", "Sculptor 0.016960\n", "Social Media Theorist 0.017450\n", "Social entrepreneur 0.017921\n", "Inventor 0.021801\n", "Sound consultant 0.022011\n", "Legal activist 0.022303\n", "Historian 0.023215\n", "Musician, activist 0.023395\n", "Economist 0.025488\n", "Writer, activist 0.026665\n", "Journalist 0.027997\n", "Computer scientist 0.029070\n", "Architect 0.030579\n", "Engineer 0.031711\n", " ... \n", "Roboticist 0.042777\n", "Astronomer 0.044581\n", "Psychologist 0.044984\n", "Musician 0.045336\n", "Physicist 0.046302\n", "Filmmaker 0.048603\n", "Futurist 0.050460\n", "Behavioral economist 0.050460\n", "Technologist 0.050965\n", "Chef 0.054207\n", "Science writer 0.055993\n", "Designer 0.059287\n", "Writer 0.060745\n", "Game designer 0.062317\n", "Reporter 0.066250\n", "Evolutionary biologist 0.069157\n", "Novelist 0.070876\n", "Entrepreneur 0.073295\n", "Author 0.075508\n", "Artist 0.078939\n", "Global health expert; data visionary 0.090306\n", "Poet 0.107398\n", "Graphic designer 0.135718\n", "Techno-illusionist 0.152171\n", "Cartoonist 0.162120\n", "Data scientist 0.184076\n", "Producer 0.202531\n", "Singer/songwriter 0.252205\n", "Performance poet, multimedia artist 0.306468\n", "Comedian 0.512457\n", "Name: funny_rate, Length: 68, dtype: float64" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# redo the previous groupby\n", "ted_top_occupations.groupby('speaker_occupation').funny_rate.mean().sort_values()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lessons:\n", "\n", "1. Check your assumptions about your data\n", "2. Check whether your results are reasonable\n", "3. Take advantage of the fact that pandas operations often output a DataFrame or a Series\n", "4. Watch out for small sample sizes\n", "5. Consider the impact of missing data\n", "6. Data scientists are hilarious" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }