{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction\n", "\n", "My friend [Johannes Giorgis] and I are developing a series of [Data Science Challenges] to help others\n", "become better data scientists by presenting a series of challenges. Why did we do this?\n", "\n", "\n", "> Because that's what heroes do!\n", ">\n", "> --Johannes Giorgis\n", "\n", "I now present my response to the first challenge, Exploring the Meetup API in the city of my choice.\n", "\n", "**San Francisco, CA**, I choose you!\n", "\n", "\n", "\n", "# Challenge 01: Explore the Meetup API\n", "Use the [Meetup API] to explore meetups in your city of choice.\n", "\n", "\n", "**Guide Questions**:\n", "\n", "Below are some guide line questions to get you started:\n", "\n", "- What is the largest meetup in your location of choice (city, cities, country...etc)?\n", "- How many meetups of a certain category (e.g. Tech, Art...etc) are in your city?\n", "- Basic statistics of meetups\n", "\t- What is the average size of meetups?\n", "\t- How frequently do meetups host events?\n", " \n", "## Okay, but what I really want to know is\n", "\n", "What is the biggest Tech Group in San Francisco that meets regularly and has a growing and enthusiastic membership?\n", "\n", "\n", "## Prerequisites:\n", "Add a [Meetup API Key] to your environment.\n", "\n", "[//]: # (References)\n", "\n", "[Meetup API]: https://www.meetup.com/meetup_api/\n", "[Meetup API Key]: https://secure.meetup.com/meetup_api/key/\n", "[Johannes Giorgis]: http://johannesgiorgis.com/\n", "[Data Science Challenges]: https://medium.com/red-panda-ai/introducing-data-science-challenges-4ae4a103d67b" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import datetime\n", "import json\n", "import math\n", "import meetup.api\n", "import os\n", "import pprint\n", "import requests\n", "\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import numpy as np\n", "import seaborn as sb\n", "\n", "from tqdm import tnrange, tqdm_notebook\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Double check your environment\n", "\n", "Nothing works without **MEETUP_API_KEY**." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "assert 'MEETUP_API_KEY' in os.environ, (\n", " \"You need a MEETUP_API_KEY in your environment please look at the \"\n", " \"README for instructions.\")\n", "client = meetup.api.Client()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Categories\n", "\n", "There are multiple categories of groups in Meetup, let's use Python's meetup.api to [GetCategories](https://meetup-api.readthedocs.io/en/latest/meetup_api.html#meetup.api.meetup.api.Client.GetCategories)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "%%capture --no-display\n", "categories = client.GetCategories()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What are the attributes of the response object? \n", "\n", "First, \n", "let's create ahelper function to help us parse out the two most useful\n", "different pieces:\n", "\n", "1. **meta**: an object containing meta-data about the response object itself\n", "2. **results**: A page of actual data from the entire result of our API call" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def parse_response(response):\n", " \"\"\"Returns two dataframes, meta and results:\n", " meta: a vertically aligned dataframe, where each row is an element \n", " of the response.meta dictionary\n", " results: a horizontally aligned dataframe, where each column is\n", " an element of the response.results dictionary\"\"\"\n", " meta = pd.DataFrame.from_dict(response.meta, orient='index')\n", " results = pd.DataFrame.from_dict(response.results)\n", " return meta, results\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exploring the response object\n", "\n", "We received a response object when we called ```client.GetCategories()```.\n", "\n", "By looking at the categories **meta** dataframe, we can see that there are 33 different categories." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
next
methodCategories
total_count33
linkhttps://api.meetup.com/2/categories
count33
descriptionReturns a list of Meetup group categories
lonNone
titleCategories
urlhttps://api.meetup.com/2/categories?offset=0&f...
id
updated1450292956000
latNone
\n", "
" ], "text/plain": [ " 0\n", "next \n", "method Categories\n", "total_count 33\n", "link https://api.meetup.com/2/categories\n", "count 33\n", "description Returns a list of Meetup group categories\n", "lon None\n", "title Categories\n", "url https://api.meetup.com/2/categories?offset=0&f...\n", "id \n", "updated 1450292956000\n", "lat None" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cats_meta_df, cats_df = parse_response(categories)\n", "cats_meta_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see from the meta that there are 33 categories available to us.\n", "I wonder what they are. \n", "\n", "### How the Meetup API works\n", "\n", "Notice that the value of **next** (above) is an empty string. Meetup API v2 response payloads come in **pages**, one at a time, but provide the URI of the **next** API call in the sequence. We can use this to programmatically get each next **page** in **response.meta\\[\"next\"\\]**. until the complete result is returned.\n", "\n", "As we can see, the **response.meta\\[\"next\"\\]** for this page is an empty string, so all of the categories\n", "fit into our first API call.\n", "\n", "#### Secondly, let's review the categories results dataframe" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameshortnamesort_name
01Arts & CultureArtsArts & Culture
118Book ClubsBook ClubsBook Clubs
22Career & BusinessBusinessCareer & Business
33Cars & MotorcyclesAutoCars & Motorcycles
44Community & EnvironmentCommunityCommunity & Environment
55DancingDancingDancing
66Education & LearningEducationEducation & Learning
78Fashion & BeautyFashionFashion & Beauty
89FitnessFitnessFitness
910Food & DrinkFood & DrinkFood & Drink
1011GamesGamesGames
1113Movements & PoliticsMovementsMovements & Politics
1214Health & WellbeingWell-beingHealth & Wellbeing
1315Hobbies & CraftsCraftsHobbies & Crafts
1416Language & Ethnic IdentityLanguagesLanguage & Ethnic Identity
1512LGBTLGBTLGBT
1617LifestyleLifestyleLifestyle
1720Movies & FilmFilmsMovies & Film
1821MusicMusicMusic
1922New Age & SpiritualitySpiritualityNew Age & Spirituality
2023Outdoors & AdventureOutdoorsOutdoors & Adventure
2124ParanormalParanormalParanormal
2225Parents & FamilyMoms & DadsParents & Family
2326Pets & AnimalsPetsPets & Animals
2427PhotographyPhotographyPhotography
2528Religion & BeliefsBeliefsReligion & Beliefs
2629Sci-Fi & FantasySci fiSci-Fi & Fantasy
2730SinglesSinglesSingles
2831SocializingSocialSocializing
2932Sports & RecreationSportsSports & Recreation
3033SupportSupportSupport
3134TechTechTech
3236WritingWritingWriting
\n", "
" ], "text/plain": [ " id name shortname sort_name\n", "0 1 Arts & Culture Arts Arts & Culture\n", "1 18 Book Clubs Book Clubs Book Clubs\n", "2 2 Career & Business Business Career & Business\n", "3 3 Cars & Motorcycles Auto Cars & Motorcycles\n", "4 4 Community & Environment Community Community & Environment\n", "5 5 Dancing Dancing Dancing\n", "6 6 Education & Learning Education Education & Learning\n", "7 8 Fashion & Beauty Fashion Fashion & Beauty\n", "8 9 Fitness Fitness Fitness\n", "9 10 Food & Drink Food & Drink Food & Drink\n", "10 11 Games Games Games\n", "11 13 Movements & Politics Movements Movements & Politics\n", "12 14 Health & Wellbeing Well-being Health & Wellbeing\n", "13 15 Hobbies & Crafts Crafts Hobbies & Crafts\n", "14 16 Language & Ethnic Identity Languages Language & Ethnic Identity\n", "15 12 LGBT LGBT LGBT\n", "16 17 Lifestyle Lifestyle Lifestyle\n", "17 20 Movies & Film Films Movies & Film\n", "18 21 Music Music Music\n", "19 22 New Age & Spirituality Spirituality New Age & Spirituality\n", "20 23 Outdoors & Adventure Outdoors Outdoors & Adventure\n", "21 24 Paranormal Paranormal Paranormal\n", "22 25 Parents & Family Moms & Dads Parents & Family\n", "23 26 Pets & Animals Pets Pets & Animals\n", "24 27 Photography Photography Photography\n", "25 28 Religion & Beliefs Beliefs Religion & Beliefs\n", "26 29 Sci-Fi & Fantasy Sci fi Sci-Fi & Fantasy\n", "27 30 Singles Singles Singles\n", "28 31 Socializing Social Socializing\n", "29 32 Sports & Recreation Sports Sports & Recreation\n", "30 33 Support Support Support\n", "31 34 Tech Tech Tech\n", "32 36 Writing Writing Writing" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cats_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### So, if we want to work with a particular category\n", "In this case, I want **Tech**. Let's query the dataframe for categories named **Tech**." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnameshortnamesort_name
3134TechTechTech
\n", "
" ], "text/plain": [ " id name shortname sort_name\n", "31 34 Tech Tech Tech" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tech_df = cats_df.loc[cats_df['name'] == 'Tech']\n", "tech_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's store the category ID number for later use" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "34" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tech_category_id = tech_df['id'].values[0]\n", "tech_category_id" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explore Cities\n", "### Now let's look at cities in the United States named San Francisco" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "%%capture --no-display\n", "cities_resp = client.GetCities(country='us', query='San Francisco')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we used the [GetCities] method of the [Python Meetup API client]\n", "\n", "I used a query for cities in **United States** called **San Francisco**.\n", "\n", "[//]: # (References)\n", "\n", "[GetCities]: https://meetup-api.readthedocs.io/en/latest/meetup_api.html#meetup.api.meetup.api.Client.GetCities\n", "[Python Meetup API client]: https://meetup-api.readthedocs.io/en/latest/meetup_api.html#api-client-details\n", "\n", "Now let's take a look at the **meta** for our results." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
next
methodCities
total_count4
linkhttps://api.meetup.com/2/cities
count4
descriptionReturns Meetup cities. This method supports se...
lonNone
titleCities
urlhttps://api.meetup.com/2/cities?country=us&off...
id
updated1263132740000
latNone
\n", "
" ], "text/plain": [ " 0\n", "next \n", "method Cities\n", "total_count 4\n", "link https://api.meetup.com/2/cities\n", "count 4\n", "description Returns Meetup cities. This method supports se...\n", "lon None\n", "title Cities\n", "url https://api.meetup.com/2/cities?country=us&off...\n", "id \n", "updated 1263132740000\n", "lat None" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cities_meta_df, cities_df = parse_response(cities_resp)\n", "cities_meta_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hmm, a count of 4 cities is suspicious...\n", "\n", "I only know of the one San Francisco, why are there 4 cities?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citycountryidlatlocalized_country_namelonmember_countname_stringrankingstatezip
0San Franciscous9410137.779999USA-122.41999860351San Francisco, California, USA0CA94101
1Bosqueus8700634.560001USA-106.7799995San Francisco, New Mexico, USA1NM87006
2San Luisus8115237.080002USA-105.6200034San Francisco, Colorado, USA2CO81152
3Reserveus8783033.650002USA-108.7699971San Francisco Plaza, New Mexico, USA3NM87830
\n", "
" ], "text/plain": [ " city country id lat localized_country_name lon \\\n", "0 San Francisco us 94101 37.779999 USA -122.419998 \n", "1 Bosque us 87006 34.560001 USA -106.779999 \n", "2 San Luis us 81152 37.080002 USA -105.620003 \n", "3 Reserve us 87830 33.650002 USA -108.769997 \n", "\n", " member_count name_string ranking state zip \n", "0 60351 San Francisco, California, USA 0 CA 94101 \n", "1 5 San Francisco, New Mexico, USA 1 NM 87006 \n", "2 4 San Francisco, Colorado, USA 2 CO 81152 \n", "3 1 San Francisco Plaza, New Mexico, USA 3 NM 87830 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cities_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Oh, there are lots of San Franciscos! \n", "\n", "### Let's filter the dataframe with a query to give us only cities in California, US" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
citycountryidlatlocalized_country_namelonmember_countname_stringrankingstatezip
0San Franciscous9410137.779999USA-122.41999860351San Francisco, California, USA0CA94101
\n", "
" ], "text/plain": [ " city country id lat localized_country_name lon \\\n", "0 San Francisco us 94101 37.779999 USA -122.419998 \n", "\n", " member_count name_string ranking state zip \n", "0 60351 San Francisco, California, USA 0 CA 94101 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "single_city_df = cities_df.loc[\n", " (cities_df['state'] == 'CA')]\n", "\n", "single_city_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One San Francisco, perfect! \n", "\n", "### Let's store the latitude and longitude for later use as well" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(37.779998779296875, -122.41999816894531)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "latitude = single_city_df['lat'][0]\n", "longitude = single_city_df['lon'][0]\n", "latitude, longitude" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Now let's look at groups in San Francisco, CA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Since we are going to grab lots of groups, let's make a function to help us call the API\n", "\n", "**Note**: This function will use the **tech_category_id, latitude, and longitude** values that we \n", "found eariler." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def get_a_group(page_number, category_id=tech_category_id, lat=latitude,\n", " lon=longitude):\n", " group = None\n", " retry_counter, retry_max = 0, 3\n", " print(f\"Getting page {page_number}\")\n", " while retry_counter < retry_max:\n", " try:\n", " group = client.GetGroups(\n", " category_id=category_id, lat=lat, lon=lon, offset=page_number)\n", " return group\n", " except:\n", " print(f\"Fetch failure {retry_counter + 1}\")\n", " retry_counter += 1\n", "\n", " raise Exception(f\"Unable to fetch page after {retry_counter} attempts\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Now, After grabbing the first group\n", "\n", "Let's review the **meta** to help us see what we are getting into" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
nexthttps://api.meetup.com/2/groups?offset=1&forma...
methodGroups
total_count2197
linkhttps://api.meetup.com/2/groups
count200
descriptionNone
lon-122.42
titleMeetup Groups v2
urlhttps://api.meetup.com/2/groups?offset=0&forma...
id
updated1550965553000
lat37.78
\n", "
" ], "text/plain": [ " 0\n", "next https://api.meetup.com/2/groups?offset=1&forma...\n", "method Groups\n", "total_count 2197\n", "link https://api.meetup.com/2/groups\n", "count 200\n", "description None\n", "lon -122.42\n", "title Meetup Groups v2\n", "url https://api.meetup.com/2/groups?offset=0&forma...\n", "id \n", "updated 1550965553000\n", "lat 37.78" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%capture --no-display\n", "group_resp = get_a_group(0)\n", "group_meta, _ = parse_response(group_resp)\n", "group_meta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Wait! There's a meta[\\\"next\\\"].\n", "\n", "Remember earlier when I spoke about **response.meta\\[\"next\"\\]**? \n", "\n", "It seems as though our result will span mulitple API calls, each returning 200 new groups in \n", "a **page**.\n", "\n", "Let's make a new helper that will grab each **page** in a series of API calls until we obtain the entire data set:\n", "\n", "We will use the pandas.DataFrame.[concat] function to collate all pages into a single useful dataframe\n", "\n", "[//]: # (References)\n", "\n", "[concat]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "def get_all_groups_as_a_df():\n", " \"\"\"Returns a single dataframe composed from data from multiple\n", " successive calls to get_a_group.\n", " \n", " We will loop through get_a_group pages while our page.meta['next'] is \n", " not the empty string.\n", " \"\"\"\n", " page_df_list = []\n", " next_page = None\n", " page_number = 0\n", " while next_page != '': \n", " page = get_a_group(page_number)\n", " next_page = page.meta[\"next\"]\n", " _, frame = parse_response(page)\n", " page_number += 1\n", " page_df_list.append(frame)\n", " \n", " return pd.concat(page_df_list, ignore_index=True)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Getting page 0\n", "28/30 (5 seconds remaining)\n", "Getting page 1\n", "27/30 (3 seconds remaining)\n", "Getting page 2\n", "26/30 (3 seconds remaining)\n", "Getting page 3\n", "25/30 (2 seconds remaining)\n", "Getting page 4\n", "24/30 (1 seconds remaining)\n", "Getting page 5\n", "29/30 (10 seconds remaining)\n", "Getting page 6\n", "28/30 (9 seconds remaining)\n", "Getting page 7\n", "27/30 (8 seconds remaining)\n", "Getting page 8\n", "26/30 (7 seconds remaining)\n", "Getting page 9\n", "25/30 (6 seconds remaining)\n", "Getting page 10\n", "24/30 (5 seconds remaining)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
categorycitycountrycreateddescriptiongroup_photoidjoin_modelatlink...nameorganizerratingstatetimezonetopicsurlnameutc_offsetvisibilitywho
0{'name': 'tech', 'id': 34, 'shortname': 'tech'}San FranciscoUS1034097740000<p>The SF PHP Community Meetup is an open foru...{'highres_link': 'https://secure.meetupstatic....120903open37.77https://www.meetup.com/sf-php/...SF PHP Community{'member_id': 126468982, 'name': 'Andre Marigo...4.38CAUS/Pacific[{'urlkey': 'php', 'name': 'PHP', 'id': 455}, ...sf-php-28800000publicPHP Developers
\n", "

1 rows × 22 columns

\n", "
" ], "text/plain": [ " category city country \\\n", "0 {'name': 'tech', 'id': 34, 'shortname': 'tech'} San Francisco US \n", "\n", " created description \\\n", "0 1034097740000

The SF PHP Community Meetup is an open foru... \n", "\n", " group_photo id join_mode lat \\\n", "0 {'highres_link': 'https://secure.meetupstatic.... 120903 open 37.77 \n", "\n", " link ... name \\\n", "0 https://www.meetup.com/sf-php/ ... SF PHP Community \n", "\n", " organizer rating state timezone \\\n", "0 {'member_id': 126468982, 'name': 'Andre Marigo... 4.38 CA US/Pacific \n", "\n", " topics urlname utc_offset \\\n", "0 [{'urlkey': 'php', 'name': 'PHP', 'id': 455}, ... sf-php -28800000 \n", "\n", " visibility who \n", "0 public PHP Developers \n", "\n", "[1 rows x 22 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Collect all groups into a single dataframe\n", "all_groups_df = get_all_groups_as_a_df()\n", "\n", "# Show the first row in the dataframe\n", "all_groups_df.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### That's too many columns\n", "I really only care about a small list of columns, let's exclude the unneeded columns." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idnamemembersratingjoin_modeurlname
0120903SF PHP Community27024.38opensf-php
\n", "
" ], "text/plain": [ " id name members rating join_mode urlname\n", "0 120903 SF PHP Community 2702 4.38 open sf-php" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "column_list = ['id', 'name', 'members', 'rating', 'join_mode', 'urlname']\n", "all_groups_df = all_groups_df[column_list]\n", "all_groups_df.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's double check the size of our new dataframe" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2197, 6)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_groups_df.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That looks just about right:\n", "* 2197 rows\n", "* 6 columns\n", "\n", "---\n", "## Explore Members per Group\n", "\n", "Each group has a different sized membership, let's explore this first!\n", "\n", "\n", "### Let's start with with a histogram\n", "\n", "This visualization should give us a basic idea of how big our groups are." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Using seaborn's distplot function\n", "plt.rcParams['figure.figsize'] = [11, 6]\n", "sb.distplot(all_groups_df['members'], kde=False, color=\"g\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### It appears that most groups are relatively small\n", "\n", "Let's take a closer look at some basic stats for our data in a tabular \n", "format for some hard numbers:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
members
count2,197.00
mean811.79
std1,741.14
min1.00
25%86.00
50%256.00
75%780.00
max36,058.00
\n", "
" ], "text/plain": [ " members\n", "count 2,197.00\n", "mean 811.79\n", "std 1,741.14\n", "min 1.00\n", "25% 86.00\n", "50% 256.00\n", "75% 780.00\n", "max 36,058.00" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.options.display.float_format = '{:20,.2f}'.format\n", "all_groups_df[[\"members\"]].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As a table I can see some numbers:\n", "\n", "1. It looks like the average group size is about 812 persons.\n", "2. Half of the group sits at or under 256 members.\n", "3. The smallest group has a single person.\n", "4. **the largest group has 36,000+ members!**\n", "\n", "What an outlier! But are there other **mega-groups** like this?\n", "\n", "### Maybe a box and whisker plot can visualize these stats for us" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.rcParams['figure.figsize'] = [6, 20]\n", "all_groups_df['members'].plot.box();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Wow, there are quite a few **mega-groups**, as indicated by the circles above our top whisker!\n", "\n", "---\n", "\n", "Why are the groups so big?\n", "\n", "In fact...\n", "\n", "### What are the 10 biggest tech groups in the area?" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namemembers
19Silicon Valley Entrepreneurs & Startups36058
107SFHTML517718
106Designers + Geeks15467
426SF Data Science14874
28The SF JavaScript Meetup13359
250Tech in Motion Events: San Francisco13090
540Docker Online Meetup12475
191SF Data Mining12378
201Women Who Code SF12334
706SF Big Analytics11889
\n", "
" ], "text/plain": [ " name members\n", "19 Silicon Valley Entrepreneurs & Startups 36058\n", "107 SFHTML5 17718\n", "106 Designers + Geeks 15467\n", "426 SF Data Science 14874\n", "28 The SF JavaScript Meetup 13359\n", "250 Tech in Motion Events: San Francisco 13090\n", "540 Docker Online Meetup 12475\n", "191 SF Data Mining 12378\n", "201 Women Who Code SF 12334\n", "706 SF Big Analytics 11889" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "biggest_ten_df = all_groups_df.sort_values('members',\n", " ascending=False).head(10)\n", "biggest_ten_df[[\"name\", \"members\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Get Group Events" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### We need to do some data shaping before the next api call\n", "\n", "Mostly we need to:\n", "1. pass in a string with group ids from our 10 biggest groups\n", "2. convert our human-readable date ranges to milliseconds since Jan 1, 1970\n", "3. Call the GetEvents API filtering for past events using our group IDs and our date range\n", "\n", "#### First, let's make that string of group ids\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'844726,1619955,1615633,9226282,1060260,3483762,13402242,2065031,2252591,18354966'" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "id_list = biggest_ten_df['id'].tolist()\n", "id_list\n", "ids = ','.join(str(x) for x in id_list)\n", "ids" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Now, let's get the epoch milliseconds for a date range between now and 9 months ago" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Now: 1550937159472, nine months ago: 1535385159472\n" ] } ], "source": [ "def to_millis(dt):\n", " return int(pd.to_datetime(dt).value / 1000000)\n", "\n", "right_now = to_millis(datetime.datetime.now())\n", "nine_months_ago = int(right_now - 180 * 24 * 60 * 60 * 1000)\n", "print(f\"Now: {right_now}, nine months ago: {nine_months_ago}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Finally, let's look at those events." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "%%capture --no-display\n", "events_resp = client.GetEvents(group_id=ids, status='past',\n", " time=f\"{nine_months_ago},{right_now}\");\n", "\n", "events_meta, events_df = parse_response(events_resp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Again, our events_df dataframe has extra columns that I don't care about" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
createddescriptiondurationevent_urlgroupheadcounthow_to_find_usidmaybe_rsvp_countname...ratingrsvp_limitstatustimeupdatedutc_offsetvenuevisibilitywaitlist_countyes_rsvp_count
01534368847000<p>Demo Session is free to meetup attendees. U...7200000https://www.meetup.com/sventrepreneurs/events/...{'join_mode': 'open', 'created': 1196203591000...0NaN2538245060Demo Session @ Mars Blockchain Summit by Mars ......{'count': 0, 'average': 0}nanpast15354954000001535527171000-25200000{'country': 'us', 'localized_country_name': 'U...public066
\n", "

1 rows × 21 columns

\n", "
" ], "text/plain": [ " created description duration \\\n", "0 1534368847000

Demo Session is free to meetup attendees. U... 7200000 \n", "\n", " event_url \\\n", "0 https://www.meetup.com/sventrepreneurs/events/... \n", "\n", " group headcount \\\n", "0 {'join_mode': 'open', 'created': 1196203591000... 0 \n", "\n", " how_to_find_us id maybe_rsvp_count \\\n", "0 NaN 253824506 0 \n", "\n", " name ... \\\n", "0 Demo Session @ Mars Blockchain Summit by Mars ... ... \n", "\n", " rating rsvp_limit status time \\\n", "0 {'count': 0, 'average': 0} nan past 1535495400000 \n", "\n", " updated utc_offset \\\n", "0 1535527171000 -25200000 \n", "\n", " venue visibility \\\n", "0 {'country': 'us', 'localized_country_name': 'U... public \n", "\n", " waitlist_count yes_rsvp_count \n", "0 0 66 \n", "\n", "[1 rows x 21 columns]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "events_df.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### So again, let's filter down to just what's relevant" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
grouptimedurationyes_rsvp_count
0{'join_mode': 'open', 'created': 1196203591000...1535495400000720000066
\n", "
" ], "text/plain": [ " group time duration \\\n", "0 {'join_mode': 'open', 'created': 1196203591000... 1535495400000 7200000 \n", "\n", " yes_rsvp_count \n", "0 66 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "column_list = ['group', 'time', 'duration', 'yes_rsvp_count']\n", "events_df = events_df[column_list]\n", "events_df.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The group column\n", "\n", "The **group** column is actually a JSON object full of metadata about the group.\n", "\n", "I really only need the **group\\[\"id\"\\]** for now, so let's focus on that." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtimedurationyes_rsvp_count
08447261535495400000720000066
\n", "
" ], "text/plain": [ " id time duration yes_rsvp_count\n", "0 844726 1535495400000 7200000 66" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def get_id(my_dict):\n", " \"\"\"Extract the id member of a python dictionary\"\"\"\n", " return my_dict[\"id\"]\n", "\n", "events_df[\"id\"] = events_df[\"group\"].apply(get_id)\n", "\n", "# Let's \n", "columns = ['id', 'time', 'duration', 'yes_rsvp_count']\n", "events_df = events_df[columns]\n", "events_df.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hmm, it seems that our **time** is numeric\n", "\n", "The **time** is stored in **Epoch milliseconds** format.\n", "\n", "This is great if you want to see time as the number of milliseconds since Jan 1, 1970.\n", "\n", "This is not-so-great if you just want to see a human-readable date and time equivalent.\n", "\n", "Let's make a new human-readable column called **time_dt**" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtimetime_dtdurationyes_rsvp_count
0844726153549540000008/28/18 22:30720000066
\n", "
" ], "text/plain": [ " id time time_dt duration yes_rsvp_count\n", "0 844726 1535495400000 08/28/18 22:30 7200000 66" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "events_df[\"time_dt\"] = pd.to_datetime(\n", " events_df[\"time\"], unit='ms').dt.strftime('%m/%d/%y %H:%M')\n", " \n", "columns = ['id', 'time','time_dt', 'duration', 'yes_rsvp_count']\n", "events_df = events_df[columns]\n", "events_df.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Now let's convert the duration column to something human-readable\n", "\n", "Let's convert the column to a string that shows hours and minutes." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtimetime_dtdurationyes_rsvp_count
0844726153549540000008/28/18 22:302 hours, 0 minutes66
\n", "
" ], "text/plain": [ " id time time_dt duration yes_rsvp_count\n", "0 844726 1535495400000 08/28/18 22:30 2 hours, 0 minutes 66" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def millis_2_hours_and_minutes(ms):\n", " \"\"\"Converts milliseconds to hours and minutes.\"\"\"\n", " seconds = ms / 1000\n", " minutes, seconds = divmod(seconds, 60)\n", " hours, minutes = divmod(minutes, 60)\n", "\n", " return f\"{int(hours)} hours, {int(minutes)} minutes\" \n", "\n", "events_df[\"duration\"] = events_df[\"duration\"].apply(\n", " millis_2_hours_and_minutes)\n", "\n", "events_df.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Now let's join our top ten mega-groups dataframe with our events dataframe\n", "\n", "If you are familiar with SQL this is similar to a left join from **raw_results_df**\n", "to **biggest_ten_df** on **id**\n", "\n", "Then we sort the output by **name** ascending and then **time** descending." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametimetime_dtdurationyes_rsvp_countid
101Designers + Geeks155080440000002/22/19 03:002 hours, 0 minutes1781615633
55Designers + Geeks154233720000011/16/18 03:002 hours, 0 minutes711615633
46Designers + Geeks154112400000011/02/18 02:002 hours, 0 minutes711615633
32Designers + Geeks153991440000010/19/18 02:002 hours, 0 minutes411615633
24Designers + Geeks153870480000010/05/18 02:002 hours, 0 minutes291615633
14Designers + Geeks153749520000009/21/18 02:002 hours, 0 minutes371615633
4Designers + Geeks153628560000009/07/18 02:002 hours, 0 minutes871615633
40Docker Online Meetup154091520000010/30/18 16:001 hours, 0 minutes113402242
97SF Big Analytics154959120000002/08/19 02:002 hours, 30 minutes41818354966
80SF Big Analytics154769040000001/17/19 02:003 hours, 0 minutes45018354966
78SF Big Analytics154760400000001/16/19 02:003 hours, 0 minutes518354966
64SF Big Analytics154406160000012/06/18 02:003 hours, 0 minutes718354966
53SF Big Analytics154224720000011/15/18 02:003 hours, 0 minutes1618354966
51SF Big Analytics154216080000011/14/18 02:002 hours, 30 minutes47818354966
50SF Big Analytics154178280000011/09/18 17:0057 hours, 0 minutes1618354966
44SF Big Analytics154112040000011/02/18 01:003 hours, 0 minutes15218354966
29SF Big Analytics153990900000010/19/18 00:302 hours, 30 minutes39018354966
16SF Big Analytics153801000000009/27/18 01:002 hours, 45 minutes24518354966
8SF Big Analytics153680040000009/13/18 01:002 hours, 30 minutes21818354966
10SF Data Mining153720000000009/17/18 16:00104 hours, 0 minutes62065031
93SF Data Science154898820000002/01/19 02:302 hours, 0 minutes949226282
89SF Data Science154838340000001/25/19 02:302 hours, 0 minutes489226282
82SF Data Science154777860000001/18/19 02:302 hours, 0 minutes269226282
81SF Data Science154769220000001/17/19 02:302 hours, 0 minutes599226282
60SF Data Science154353960000011/30/18 01:003 hours, 30 minutes189226282
52SF Data Science154221120000011/14/18 16:0010 hours, 0 minutes19226282
38SF Data Science154077480000010/29/18 01:004 hours, 0 minutes219226282
30SF Data Science153991080000010/19/18 01:002 hours, 0 minutes959226282
26SF Data Science153930600000010/12/18 01:003 hours, 0 minutes159226282
102SFHTML5155088360000002/23/19 01:004 hours, 0 minutes3361619955
.....................
95Women Who Code SF154942020000002/06/19 02:302 hours, 0 minutes472252591
92Women Who Code SF154881540000001/30/19 02:302 hours, 0 minutes372252591
86Women Who Code SF154821060000001/23/19 02:302 hours, 0 minutes292252591
79Women Who Code SF154760580000001/16/19 02:302 hours, 0 minutes322252591
76Women Who Code SF154700100000001/09/19 02:302 hours, 0 minutes572252591
71Women Who Code SF154527120000012/20/18 02:002 hours, 0 minutes1992252591
69Women Who Code SF154518660000012/19/18 02:301 hours, 30 minutes502252591
70Women Who Code SF154518660000012/19/18 02:302 hours, 0 minutes232252591
67Women Who Code SF154458180000012/12/18 02:302 hours, 0 minutes222252591
65Women Who Code SF154414980000012/07/18 02:302 hours, 0 minutes172252591
63Women Who Code SF154397700000012/05/18 02:302 hours, 0 minutes292252591
62Women Who Code SF154397520000012/05/18 02:003 hours, 0 minutes12252591
58Women Who Code SF154276740000011/21/18 02:302 hours, 0 minutes312252591
54Women Who Code SF154233540000011/16/18 02:302 hours, 0 minutes502252591
49Women Who Code SF154155780000011/07/18 02:302 hours, 0 minutes272252591
45Women Who Code SF154112220000011/02/18 01:302 hours, 0 minutes502252591
43Women Who Code SF154094940000010/31/18 01:302 hours, 0 minutes292252591
39Women Who Code SF154085940000010/30/18 00:302 hours, 30 minutes642252591
34Women Who Code SF154040040000010/24/18 17:001 hours, 0 minutes22252591
33Women Who Code SF154034460000010/24/18 01:302 hours, 0 minutes362252591
31Women Who Code SF153991260000010/19/18 01:302 hours, 0 minutes502252591
25Women Who Code SF153913500000010/10/18 01:302 hours, 0 minutes252252591
23Women Who Code SF153870300000010/05/18 01:302 hours, 0 minutes502252591
21Women Who Code SF153853020000010/03/18 01:302 hours, 0 minutes102252591
22Women Who Code SF153853020000010/03/18 01:302 hours, 0 minutes272252591
13Women Who Code SF153749340000009/21/18 01:302 hours, 0 minutes332252591
11Women Who Code SF153732060000009/19/18 01:302 hours, 0 minutes142252591
9Women Who Code SF153703080000009/15/18 17:003 hours, 0 minutes102252591
7Women Who Code SF153671580000009/12/18 01:302 hours, 0 minutes252252591
1Women Who Code SF153550620000008/29/18 01:302 hours, 0 minutes352252591
\n", "

103 rows × 6 columns

\n", "
" ], "text/plain": [ " name time time_dt \\\n", "101 Designers + Geeks 1550804400000 02/22/19 03:00 \n", "55 Designers + Geeks 1542337200000 11/16/18 03:00 \n", "46 Designers + Geeks 1541124000000 11/02/18 02:00 \n", "32 Designers + Geeks 1539914400000 10/19/18 02:00 \n", "24 Designers + Geeks 1538704800000 10/05/18 02:00 \n", "14 Designers + Geeks 1537495200000 09/21/18 02:00 \n", "4 Designers + Geeks 1536285600000 09/07/18 02:00 \n", "40 Docker Online Meetup 1540915200000 10/30/18 16:00 \n", "97 SF Big Analytics 1549591200000 02/08/19 02:00 \n", "80 SF Big Analytics 1547690400000 01/17/19 02:00 \n", "78 SF Big Analytics 1547604000000 01/16/19 02:00 \n", "64 SF Big Analytics 1544061600000 12/06/18 02:00 \n", "53 SF Big Analytics 1542247200000 11/15/18 02:00 \n", "51 SF Big Analytics 1542160800000 11/14/18 02:00 \n", "50 SF Big Analytics 1541782800000 11/09/18 17:00 \n", "44 SF Big Analytics 1541120400000 11/02/18 01:00 \n", "29 SF Big Analytics 1539909000000 10/19/18 00:30 \n", "16 SF Big Analytics 1538010000000 09/27/18 01:00 \n", "8 SF Big Analytics 1536800400000 09/13/18 01:00 \n", "10 SF Data Mining 1537200000000 09/17/18 16:00 \n", "93 SF Data Science 1548988200000 02/01/19 02:30 \n", "89 SF Data Science 1548383400000 01/25/19 02:30 \n", "82 SF Data Science 1547778600000 01/18/19 02:30 \n", "81 SF Data Science 1547692200000 01/17/19 02:30 \n", "60 SF Data Science 1543539600000 11/30/18 01:00 \n", "52 SF Data Science 1542211200000 11/14/18 16:00 \n", "38 SF Data Science 1540774800000 10/29/18 01:00 \n", "30 SF Data Science 1539910800000 10/19/18 01:00 \n", "26 SF Data Science 1539306000000 10/12/18 01:00 \n", "102 SFHTML5 1550883600000 02/23/19 01:00 \n", ".. ... ... ... \n", "95 Women Who Code SF 1549420200000 02/06/19 02:30 \n", "92 Women Who Code SF 1548815400000 01/30/19 02:30 \n", "86 Women Who Code SF 1548210600000 01/23/19 02:30 \n", "79 Women Who Code SF 1547605800000 01/16/19 02:30 \n", "76 Women Who Code SF 1547001000000 01/09/19 02:30 \n", "71 Women Who Code SF 1545271200000 12/20/18 02:00 \n", "69 Women Who Code SF 1545186600000 12/19/18 02:30 \n", "70 Women Who Code SF 1545186600000 12/19/18 02:30 \n", "67 Women Who Code SF 1544581800000 12/12/18 02:30 \n", "65 Women Who Code SF 1544149800000 12/07/18 02:30 \n", "63 Women Who Code SF 1543977000000 12/05/18 02:30 \n", "62 Women Who Code SF 1543975200000 12/05/18 02:00 \n", "58 Women Who Code SF 1542767400000 11/21/18 02:30 \n", "54 Women Who Code SF 1542335400000 11/16/18 02:30 \n", "49 Women Who Code SF 1541557800000 11/07/18 02:30 \n", "45 Women Who Code SF 1541122200000 11/02/18 01:30 \n", "43 Women Who Code SF 1540949400000 10/31/18 01:30 \n", "39 Women Who Code SF 1540859400000 10/30/18 00:30 \n", "34 Women Who Code SF 1540400400000 10/24/18 17:00 \n", "33 Women Who Code SF 1540344600000 10/24/18 01:30 \n", "31 Women Who Code SF 1539912600000 10/19/18 01:30 \n", "25 Women Who Code SF 1539135000000 10/10/18 01:30 \n", "23 Women Who Code SF 1538703000000 10/05/18 01:30 \n", "21 Women Who Code SF 1538530200000 10/03/18 01:30 \n", "22 Women Who Code SF 1538530200000 10/03/18 01:30 \n", "13 Women Who Code SF 1537493400000 09/21/18 01:30 \n", "11 Women Who Code SF 1537320600000 09/19/18 01:30 \n", "9 Women Who Code SF 1537030800000 09/15/18 17:00 \n", "7 Women Who Code SF 1536715800000 09/12/18 01:30 \n", "1 Women Who Code SF 1535506200000 08/29/18 01:30 \n", "\n", " duration yes_rsvp_count id \n", "101 2 hours, 0 minutes 178 1615633 \n", "55 2 hours, 0 minutes 71 1615633 \n", "46 2 hours, 0 minutes 71 1615633 \n", "32 2 hours, 0 minutes 41 1615633 \n", "24 2 hours, 0 minutes 29 1615633 \n", "14 2 hours, 0 minutes 37 1615633 \n", "4 2 hours, 0 minutes 87 1615633 \n", "40 1 hours, 0 minutes 1 13402242 \n", "97 2 hours, 30 minutes 418 18354966 \n", "80 3 hours, 0 minutes 450 18354966 \n", "78 3 hours, 0 minutes 5 18354966 \n", "64 3 hours, 0 minutes 7 18354966 \n", "53 3 hours, 0 minutes 16 18354966 \n", "51 2 hours, 30 minutes 478 18354966 \n", "50 57 hours, 0 minutes 16 18354966 \n", "44 3 hours, 0 minutes 152 18354966 \n", "29 2 hours, 30 minutes 390 18354966 \n", "16 2 hours, 45 minutes 245 18354966 \n", "8 2 hours, 30 minutes 218 18354966 \n", "10 104 hours, 0 minutes 6 2065031 \n", "93 2 hours, 0 minutes 94 9226282 \n", "89 2 hours, 0 minutes 48 9226282 \n", "82 2 hours, 0 minutes 26 9226282 \n", "81 2 hours, 0 minutes 59 9226282 \n", "60 3 hours, 30 minutes 18 9226282 \n", "52 10 hours, 0 minutes 1 9226282 \n", "38 4 hours, 0 minutes 21 9226282 \n", "30 2 hours, 0 minutes 95 9226282 \n", "26 3 hours, 0 minutes 15 9226282 \n", "102 4 hours, 0 minutes 336 1619955 \n", ".. ... ... ... \n", "95 2 hours, 0 minutes 47 2252591 \n", "92 2 hours, 0 minutes 37 2252591 \n", "86 2 hours, 0 minutes 29 2252591 \n", "79 2 hours, 0 minutes 32 2252591 \n", "76 2 hours, 0 minutes 57 2252591 \n", "71 2 hours, 0 minutes 199 2252591 \n", "69 1 hours, 30 minutes 50 2252591 \n", "70 2 hours, 0 minutes 23 2252591 \n", "67 2 hours, 0 minutes 22 2252591 \n", "65 2 hours, 0 minutes 17 2252591 \n", "63 2 hours, 0 minutes 29 2252591 \n", "62 3 hours, 0 minutes 1 2252591 \n", "58 2 hours, 0 minutes 31 2252591 \n", "54 2 hours, 0 minutes 50 2252591 \n", "49 2 hours, 0 minutes 27 2252591 \n", "45 2 hours, 0 minutes 50 2252591 \n", "43 2 hours, 0 minutes 29 2252591 \n", "39 2 hours, 30 minutes 64 2252591 \n", "34 1 hours, 0 minutes 2 2252591 \n", "33 2 hours, 0 minutes 36 2252591 \n", "31 2 hours, 0 minutes 50 2252591 \n", "25 2 hours, 0 minutes 25 2252591 \n", "23 2 hours, 0 minutes 50 2252591 \n", "21 2 hours, 0 minutes 10 2252591 \n", "22 2 hours, 0 minutes 27 2252591 \n", "13 2 hours, 0 minutes 33 2252591 \n", "11 2 hours, 0 minutes 14 2252591 \n", "9 3 hours, 0 minutes 10 2252591 \n", "7 2 hours, 0 minutes 25 2252591 \n", "1 2 hours, 0 minutes 35 2252591 \n", "\n", "[103 rows x 6 columns]" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "merged_df = pd.merge(\n", " events_df,\n", " biggest_ten_df[['id', 'name']],\n", " on='id',\n", " how='left')\n", "\n", "columns = ['name', 'time', 'time_dt', 'duration', 'yes_rsvp_count', 'id']\n", "final_df = merged_df[columns]\n", "\n", "# Sort the output by name and time\n", "final_df = final_df.sort_values(by=['name', 'time'], ascending=[True, False])\n", "final_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hmm, those ID numbers are numeric, but take a while to type\n", "\n", "Let's convert those to something easier.\n" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametimetime_dtdurationyes_rsvp_countid
101Designers + Geeks155080440000002/22/19 03:002 hours, 0 minutes1780
55Designers + Geeks154233720000011/16/18 03:002 hours, 0 minutes710
46Designers + Geeks154112400000011/02/18 02:002 hours, 0 minutes710
32Designers + Geeks153991440000010/19/18 02:002 hours, 0 minutes410
24Designers + Geeks153870480000010/05/18 02:002 hours, 0 minutes290
14Designers + Geeks153749520000009/21/18 02:002 hours, 0 minutes370
4Designers + Geeks153628560000009/07/18 02:002 hours, 0 minutes870
40Docker Online Meetup154091520000010/30/18 16:001 hours, 0 minutes11
97SF Big Analytics154959120000002/08/19 02:002 hours, 30 minutes4182
80SF Big Analytics154769040000001/17/19 02:003 hours, 0 minutes4502
78SF Big Analytics154760400000001/16/19 02:003 hours, 0 minutes52
64SF Big Analytics154406160000012/06/18 02:003 hours, 0 minutes72
53SF Big Analytics154224720000011/15/18 02:003 hours, 0 minutes162
51SF Big Analytics154216080000011/14/18 02:002 hours, 30 minutes4782
50SF Big Analytics154178280000011/09/18 17:0057 hours, 0 minutes162
44SF Big Analytics154112040000011/02/18 01:003 hours, 0 minutes1522
29SF Big Analytics153990900000010/19/18 00:302 hours, 30 minutes3902
16SF Big Analytics153801000000009/27/18 01:002 hours, 45 minutes2452
8SF Big Analytics153680040000009/13/18 01:002 hours, 30 minutes2182
10SF Data Mining153720000000009/17/18 16:00104 hours, 0 minutes63
93SF Data Science154898820000002/01/19 02:302 hours, 0 minutes944
89SF Data Science154838340000001/25/19 02:302 hours, 0 minutes484
82SF Data Science154777860000001/18/19 02:302 hours, 0 minutes264
81SF Data Science154769220000001/17/19 02:302 hours, 0 minutes594
60SF Data Science154353960000011/30/18 01:003 hours, 30 minutes184
52SF Data Science154221120000011/14/18 16:0010 hours, 0 minutes14
38SF Data Science154077480000010/29/18 01:004 hours, 0 minutes214
30SF Data Science153991080000010/19/18 01:002 hours, 0 minutes954
26SF Data Science153930600000010/12/18 01:003 hours, 0 minutes154
102SFHTML5155088360000002/23/19 01:004 hours, 0 minutes3365
\n", "
" ], "text/plain": [ " name time time_dt \\\n", "101 Designers + Geeks 1550804400000 02/22/19 03:00 \n", "55 Designers + Geeks 1542337200000 11/16/18 03:00 \n", "46 Designers + Geeks 1541124000000 11/02/18 02:00 \n", "32 Designers + Geeks 1539914400000 10/19/18 02:00 \n", "24 Designers + Geeks 1538704800000 10/05/18 02:00 \n", "14 Designers + Geeks 1537495200000 09/21/18 02:00 \n", "4 Designers + Geeks 1536285600000 09/07/18 02:00 \n", "40 Docker Online Meetup 1540915200000 10/30/18 16:00 \n", "97 SF Big Analytics 1549591200000 02/08/19 02:00 \n", "80 SF Big Analytics 1547690400000 01/17/19 02:00 \n", "78 SF Big Analytics 1547604000000 01/16/19 02:00 \n", "64 SF Big Analytics 1544061600000 12/06/18 02:00 \n", "53 SF Big Analytics 1542247200000 11/15/18 02:00 \n", "51 SF Big Analytics 1542160800000 11/14/18 02:00 \n", "50 SF Big Analytics 1541782800000 11/09/18 17:00 \n", "44 SF Big Analytics 1541120400000 11/02/18 01:00 \n", "29 SF Big Analytics 1539909000000 10/19/18 00:30 \n", "16 SF Big Analytics 1538010000000 09/27/18 01:00 \n", "8 SF Big Analytics 1536800400000 09/13/18 01:00 \n", "10 SF Data Mining 1537200000000 09/17/18 16:00 \n", "93 SF Data Science 1548988200000 02/01/19 02:30 \n", "89 SF Data Science 1548383400000 01/25/19 02:30 \n", "82 SF Data Science 1547778600000 01/18/19 02:30 \n", "81 SF Data Science 1547692200000 01/17/19 02:30 \n", "60 SF Data Science 1543539600000 11/30/18 01:00 \n", "52 SF Data Science 1542211200000 11/14/18 16:00 \n", "38 SF Data Science 1540774800000 10/29/18 01:00 \n", "30 SF Data Science 1539910800000 10/19/18 01:00 \n", "26 SF Data Science 1539306000000 10/12/18 01:00 \n", "102 SFHTML5 1550883600000 02/23/19 01:00 \n", "\n", " duration yes_rsvp_count id \n", "101 2 hours, 0 minutes 178 0 \n", "55 2 hours, 0 minutes 71 0 \n", "46 2 hours, 0 minutes 71 0 \n", "32 2 hours, 0 minutes 41 0 \n", "24 2 hours, 0 minutes 29 0 \n", "14 2 hours, 0 minutes 37 0 \n", "4 2 hours, 0 minutes 87 0 \n", "40 1 hours, 0 minutes 1 1 \n", "97 2 hours, 30 minutes 418 2 \n", "80 3 hours, 0 minutes 450 2 \n", "78 3 hours, 0 minutes 5 2 \n", "64 3 hours, 0 minutes 7 2 \n", "53 3 hours, 0 minutes 16 2 \n", "51 2 hours, 30 minutes 478 2 \n", "50 57 hours, 0 minutes 16 2 \n", "44 3 hours, 0 minutes 152 2 \n", "29 2 hours, 30 minutes 390 2 \n", "16 2 hours, 45 minutes 245 2 \n", "8 2 hours, 30 minutes 218 2 \n", "10 104 hours, 0 minutes 6 3 \n", "93 2 hours, 0 minutes 94 4 \n", "89 2 hours, 0 minutes 48 4 \n", "82 2 hours, 0 minutes 26 4 \n", "81 2 hours, 0 minutes 59 4 \n", "60 3 hours, 30 minutes 18 4 \n", "52 10 hours, 0 minutes 1 4 \n", "38 4 hours, 0 minutes 21 4 \n", "30 2 hours, 0 minutes 95 4 \n", "26 3 hours, 0 minutes 15 4 \n", "102 4 hours, 0 minutes 336 5 " ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Helper function to transform long id number to easy id numbers\n", "# Get an enumeration of unique ids\n", "unique_ids = final_df['id'].unique()\n", "ids_2_new_ids = {k: v for v, k in enumerate(unique_ids)}\n", "ids_2_new_ids\n", "def get_new_id(old_id):\n", " return ids_2_new_ids[old_id]\n", "\n", "# re-write those ids as something simpler\n", "final_df['id'] = final_df['id'].apply(get_new_id)\n", "\n", "final_df.head(30)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This is nice, but which groups have regular meetings, which ones have growing interest (increasing RSVP-yes)?\n", "How do any of these groups compare to each other?\n", "\n", "### Let's visualize the trend lines for the top ten mega-groups\n", "\n", "Let's just use linear regression to draw trend lines for each mega-group.\n", "\n", "We'll use seaborn's **lmplot** to visualize all ten mega-groups.\n", "\n", "We need to use epoch milliseconds **time** column of our dataframe, since it is numeric and can be used to generate trend lines." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# import statsmodels.api as sm\n", "ax = sb.lmplot(\"time\", \"yes_rsvp_count\", data=final_df, hue=\"name\",\n", " height=9, aspect=0.75, order=1, ci=None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ah, this is interesting\n", "\n", "From the top 10 mega-groups I can see that several groups have major problems.\n", "For instance, the **Docker Online Meetup** group has almost no yes_rsvp_counts.\n", "\n", "Let's take a closer look." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametimetime_dtdurationyes_rsvp_countid
40Docker Online Meetup154091520000010/30/18 16:001 hours, 0 minutes11
\n", "
" ], "text/plain": [ " name time time_dt duration \\\n", "40 Docker Online Meetup 1540915200000 10/30/18 16:00 1 hours, 0 minutes \n", "\n", " yes_rsvp_count id \n", "40 1 1 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "interesting_group_df = final_df[final_df['name'] == \"Docker Online Meetup\"]\n", "interesting_group_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Yikes, are there other mega-groups with no RSVPs?" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametimeyes_rsvp_countid
1Docker Online Meetup1,540,915,200,000.001.001.00
3SF Data Mining1,537,200,000,000.006.003.00
9Women Who Code SF1,542,342,600,000.0037.039.00
4SF Data Science1,544,287,200,000.0041.894.00
0Designers + Geeks1,540,952,228,571.4373.430.00
8The SF JavaScript Meetup1,544,289,840,000.0083.808.00
7Tech in Motion Events: San Francisco1,544,405,400,000.00113.507.00
6Silicon Valley Entrepreneurs & Startups1,542,303,840,000.00130.306.00
2SF Big Analytics1,542,816,163,636.36217.732.00
5SFHTML51,546,360,500,000.00218.335.00
\n", "
" ], "text/plain": [ " name time \\\n", "1 Docker Online Meetup 1,540,915,200,000.00 \n", "3 SF Data Mining 1,537,200,000,000.00 \n", "9 Women Who Code SF 1,542,342,600,000.00 \n", "4 SF Data Science 1,544,287,200,000.00 \n", "0 Designers + Geeks 1,540,952,228,571.43 \n", "8 The SF JavaScript Meetup 1,544,289,840,000.00 \n", "7 Tech in Motion Events: San Francisco 1,544,405,400,000.00 \n", "6 Silicon Valley Entrepreneurs & Startups 1,542,303,840,000.00 \n", "2 SF Big Analytics 1,542,816,163,636.36 \n", "5 SFHTML5 1,546,360,500,000.00 \n", "\n", " yes_rsvp_count id \n", "1 1.00 1.00 \n", "3 6.00 3.00 \n", "9 37.03 9.00 \n", "4 41.89 4.00 \n", "0 73.43 0.00 \n", "8 83.80 8.00 \n", "7 113.50 7.00 \n", "6 130.30 6.00 \n", "2 217.73 2.00 \n", "5 218.33 5.00 " ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "rsvp_df = final_df.groupby(['name'], as_index=False).mean()\n", "\n", "rsvp_df = rsvp_df.sort_values(by=['yes_rsvp_count', 'name'],\n", " ascending=[True, True])\n", "rsvp_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Okay, we have two mega-groups that nobody really RSVPs for....\n", "\n", "Let's exclude those now." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "final_df = final_df[~final_df['id'].isin([1, 3])]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's also make sure that our groups meet frequently enough\n" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idtime
572
685
356
007
249
1211
4630
7931
\n", "
" ], "text/plain": [ " id time\n", "5 7 2\n", "6 8 5\n", "3 5 6\n", "0 0 7\n", "2 4 9\n", "1 2 11\n", "4 6 30\n", "7 9 31" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "event_count_df = final_df.groupby(['id'], as_index=False).count()\n", "event_count_df = event_count_df.sort_values(by=['name', 'id'],ascending=[True, True])\n", "event_count_df[[\"id\", \"time\"]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hmm... a few of our groups meet rather infrequently\n", "\n", "Let's remove those as well." ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "final_df = final_df[~final_df['id'].isin([7, 8, 5, 0])]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's try this again, only with the remaining 4 mega-groups\n", "\n", "This time we're going to do two more things:\n", "1. Show our Confidence Interval. This adds a translucent band showing which events had a number of RSVPs within a single Standard Deviation of the mean.\n", "2. Estimate a [**robust regression**](https://en.wikipedia.org/wiki/Robust_regression), to de-weight outliers" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "%%capture --no-display\n", "sb.lmplot(\"time\", \"yes_rsvp_count\", data=final_df, hue=\"name\", \n", " height=9, aspect=0.75, order=1,\n", " ci=68, # 1. Confidence interval of 1 Standard Deviation\n", " robust=True) # 2. estimate robust regression, to de-weight outliers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Looking at the for the four remaining lines, we can see that\n", "\n", "### Gold Medal -- [SF Big Analytics](https://www.meetup.com/SF-Big-Analytics/)\n", "* PROs\n", " * the largest mean number of RSVPs\n", " * a slope for the trendline that indicates that it's getting even more popular\n", "* CONs\n", " * the confidence interval shows a pretty thick blue bar, which means that the yes RSVPs are pretty hit-and-miss. Check each event in advance, so that you're not surprised\n", " \n", "### Silver Medal -- [Silicon Valley Entrepreneurs & Startups](https://www.meetup.com/sventrepreneurs/)\n", "* PROs\n", " * holding onto a flat trend of about 100 yes-RSVPs per meeting\n", " * Met 31 times in the past 3 months\n", "* CONs\n", " * The trend is flat or slightly down over the last nine months, so it's not really showing signs of getting more popular\n", " \n", "### Bronze Medal # 1 -- [SF Data Science](https://www.meetup.com/SF-Data-Science/)\n", "* PROs\n", " * Show a pretty aggresive slope for the trend line, it's definitely rising in popularity.\n", "* CONs \n", " * Meets up the least frequently of the four top mega-groups, just about monthly\n", " \n", "### Bronze Medal # 2 -- [Women Who Code SF](https://www.meetup.com/Women-Who-Code-SF/)\n", "* PROs\n", " * Met up 31 times in the past 9 months (that's almost weekly)\n", " * Smallest variance in RSVPs, perhaps indicating a loyal following\n", "* CONs\n", " * Smallest average number of RSVPs\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goal Achieved!\n", "\n", "At last! We found the four best Tech groups in San Francisco, CA that:\n", "1. are among the 10 biggest in the city\n", "2. that are popular and are staying popular\n", "3. holds events at least monthly\n", "\n", "## Conclusion\n", "\n", "We achieved our objectives and demonstrated several useful techniques along the way we :\n", "1. worked with the [Python Meetup API client]\n", "2. built a helper function to parse response objects into **meta** and **results** dataframes\n", "3. built a helper function to loop through multiple API calls and [concat]-enate a list of pages into a single useful dataframe\n", "4. Used pandas.DataFrame.[query] to sort and filter data of interest\n", "5. Used pandas.DataFrame.[apply] to clean columns of data using custom helper functions\n", "6. Used pandas.DataFrame.[describe] to get descriptive statistics that summarize\n", " * the central tendency\n", " * dispersion\n", " * shape of our dataset's distribution\n", "7. Used pandas.DataFrame.[merge] to join the **events** and **groups** dataframes to create a report of the events for our 10 biggest mega-groups in technology\n", "8. Used visualizations and statistics to filter those 10 mega-groups into the 4 very best tech groups in San Francisco\n", "\n", "\n", "[//]: # (References)\n", "\n", "[Python Meetup API client]: https://meetup-api.readthedocs.io/en/latest/meetup_api.html#api-client-details\n", "[concat]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html\n", "[query]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html\n", "[apply]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html\n", "[describe]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html\n", "[merge]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html" ] } ], "metadata": { "kernelspec": { "display_name": "Python (datascience_challenges)", "language": "python", "name": "datascience_challenges" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }