{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction\n",
"\n",
"My friend [Johannes Giorgis] and I are developing a series of [Data Science Challenges] to help others\n",
"become better data scientists by presenting a series of challenges. Why did we do this?\n",
"\n",
"\n",
"> Because that's what heroes do!\n",
">\n",
"> --Johannes Giorgis\n",
"\n",
"I now present my response to the first challenge, Exploring the Meetup API in the city of my choice.\n",
"\n",
"**San Francisco, CA**, I choose you!\n",
"\n",
"\n",
"\n",
"# Challenge 01: Explore the Meetup API\n",
"Use the [Meetup API] to explore meetups in your city of choice.\n",
"\n",
"\n",
"**Guide Questions**:\n",
"\n",
"Below are some guide line questions to get you started:\n",
"\n",
"- What is the largest meetup in your location of choice (city, cities, country...etc)?\n",
"- How many meetups of a certain category (e.g. Tech, Art...etc) are in your city?\n",
"- Basic statistics of meetups\n",
"\t- What is the average size of meetups?\n",
"\t- How frequently do meetups host events?\n",
" \n",
"## Okay, but what I really want to know is\n",
"\n",
"What is the biggest Tech Group in San Francisco that meets regularly and has a growing and enthusiastic membership?\n",
"\n",
"\n",
"## Prerequisites:\n",
"Add a [Meetup API Key] to your environment.\n",
"\n",
"[//]: # (References)\n",
"\n",
"[Meetup API]: https://www.meetup.com/meetup_api/\n",
"[Meetup API Key]: https://secure.meetup.com/meetup_api/key/\n",
"[Johannes Giorgis]: http://johannesgiorgis.com/\n",
"[Data Science Challenges]: https://medium.com/red-panda-ai/introducing-data-science-challenges-4ae4a103d67b"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import datetime\n",
"import json\n",
"import math\n",
"import meetup.api\n",
"import os\n",
"import pprint\n",
"import requests\n",
"\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import numpy as np\n",
"import seaborn as sb\n",
"\n",
"from tqdm import tnrange, tqdm_notebook\n",
"\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Double check your environment\n",
"\n",
"Nothing works without **MEETUP_API_KEY**."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"assert 'MEETUP_API_KEY' in os.environ, (\n",
" \"You need a MEETUP_API_KEY in your environment please look at the \"\n",
" \"README for instructions.\")\n",
"client = meetup.api.Client()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Categories\n",
"\n",
"There are multiple categories of groups in Meetup, let's use Python's meetup.api to [GetCategories](https://meetup-api.readthedocs.io/en/latest/meetup_api.html#meetup.api.meetup.api.Client.GetCategories)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"%%capture --no-display\n",
"categories = client.GetCategories()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### What are the attributes of the response object? \n",
"\n",
"First, \n",
"let's create ahelper function to help us parse out the two most useful\n",
"different pieces:\n",
"\n",
"1. **meta**: an object containing meta-data about the response object itself\n",
"2. **results**: A page of actual data from the entire result of our API call"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"def parse_response(response):\n",
" \"\"\"Returns two dataframes, meta and results:\n",
" meta: a vertically aligned dataframe, where each row is an element \n",
" of the response.meta dictionary\n",
" results: a horizontally aligned dataframe, where each column is\n",
" an element of the response.results dictionary\"\"\"\n",
" meta = pd.DataFrame.from_dict(response.meta, orient='index')\n",
" results = pd.DataFrame.from_dict(response.results)\n",
" return meta, results\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exploring the response object\n",
"\n",
"We received a response object when we called ```client.GetCategories()```.\n",
"\n",
"By looking at the categories **meta** dataframe, we can see that there are 33 different categories."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
0
\n",
"
\n",
" \n",
" \n",
"
\n",
"
next
\n",
"
\n",
"
\n",
"
\n",
"
method
\n",
"
Categories
\n",
"
\n",
"
\n",
"
total_count
\n",
"
33
\n",
"
\n",
"
\n",
"
link
\n",
"
https://api.meetup.com/2/categories
\n",
"
\n",
"
\n",
"
count
\n",
"
33
\n",
"
\n",
"
\n",
"
description
\n",
"
Returns a list of Meetup group categories
\n",
"
\n",
"
\n",
"
lon
\n",
"
None
\n",
"
\n",
"
\n",
"
title
\n",
"
Categories
\n",
"
\n",
"
\n",
"
url
\n",
"
https://api.meetup.com/2/categories?offset=0&f...
\n",
"
\n",
"
\n",
"
id
\n",
"
\n",
"
\n",
"
\n",
"
updated
\n",
"
1450292956000
\n",
"
\n",
"
\n",
"
lat
\n",
"
None
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0\n",
"next \n",
"method Categories\n",
"total_count 33\n",
"link https://api.meetup.com/2/categories\n",
"count 33\n",
"description Returns a list of Meetup group categories\n",
"lon None\n",
"title Categories\n",
"url https://api.meetup.com/2/categories?offset=0&f...\n",
"id \n",
"updated 1450292956000\n",
"lat None"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cats_meta_df, cats_df = parse_response(categories)\n",
"cats_meta_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can see from the meta that there are 33 categories available to us.\n",
"I wonder what they are. \n",
"\n",
"### How the Meetup API works\n",
"\n",
"Notice that the value of **next** (above) is an empty string. Meetup API v2 response payloads come in **pages**, one at a time, but provide the URI of the **next** API call in the sequence. We can use this to programmatically get each next **page** in **response.meta\\[\"next\"\\]**. until the complete result is returned.\n",
"\n",
"As we can see, the **response.meta\\[\"next\"\\]** for this page is an empty string, so all of the categories\n",
"fit into our first API call.\n",
"\n",
"#### Secondly, let's review the categories results dataframe"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
id
\n",
"
name
\n",
"
shortname
\n",
"
sort_name
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1
\n",
"
Arts & Culture
\n",
"
Arts
\n",
"
Arts & Culture
\n",
"
\n",
"
\n",
"
1
\n",
"
18
\n",
"
Book Clubs
\n",
"
Book Clubs
\n",
"
Book Clubs
\n",
"
\n",
"
\n",
"
2
\n",
"
2
\n",
"
Career & Business
\n",
"
Business
\n",
"
Career & Business
\n",
"
\n",
"
\n",
"
3
\n",
"
3
\n",
"
Cars & Motorcycles
\n",
"
Auto
\n",
"
Cars & Motorcycles
\n",
"
\n",
"
\n",
"
4
\n",
"
4
\n",
"
Community & Environment
\n",
"
Community
\n",
"
Community & Environment
\n",
"
\n",
"
\n",
"
5
\n",
"
5
\n",
"
Dancing
\n",
"
Dancing
\n",
"
Dancing
\n",
"
\n",
"
\n",
"
6
\n",
"
6
\n",
"
Education & Learning
\n",
"
Education
\n",
"
Education & Learning
\n",
"
\n",
"
\n",
"
7
\n",
"
8
\n",
"
Fashion & Beauty
\n",
"
Fashion
\n",
"
Fashion & Beauty
\n",
"
\n",
"
\n",
"
8
\n",
"
9
\n",
"
Fitness
\n",
"
Fitness
\n",
"
Fitness
\n",
"
\n",
"
\n",
"
9
\n",
"
10
\n",
"
Food & Drink
\n",
"
Food & Drink
\n",
"
Food & Drink
\n",
"
\n",
"
\n",
"
10
\n",
"
11
\n",
"
Games
\n",
"
Games
\n",
"
Games
\n",
"
\n",
"
\n",
"
11
\n",
"
13
\n",
"
Movements & Politics
\n",
"
Movements
\n",
"
Movements & Politics
\n",
"
\n",
"
\n",
"
12
\n",
"
14
\n",
"
Health & Wellbeing
\n",
"
Well-being
\n",
"
Health & Wellbeing
\n",
"
\n",
"
\n",
"
13
\n",
"
15
\n",
"
Hobbies & Crafts
\n",
"
Crafts
\n",
"
Hobbies & Crafts
\n",
"
\n",
"
\n",
"
14
\n",
"
16
\n",
"
Language & Ethnic Identity
\n",
"
Languages
\n",
"
Language & Ethnic Identity
\n",
"
\n",
"
\n",
"
15
\n",
"
12
\n",
"
LGBT
\n",
"
LGBT
\n",
"
LGBT
\n",
"
\n",
"
\n",
"
16
\n",
"
17
\n",
"
Lifestyle
\n",
"
Lifestyle
\n",
"
Lifestyle
\n",
"
\n",
"
\n",
"
17
\n",
"
20
\n",
"
Movies & Film
\n",
"
Films
\n",
"
Movies & Film
\n",
"
\n",
"
\n",
"
18
\n",
"
21
\n",
"
Music
\n",
"
Music
\n",
"
Music
\n",
"
\n",
"
\n",
"
19
\n",
"
22
\n",
"
New Age & Spirituality
\n",
"
Spirituality
\n",
"
New Age & Spirituality
\n",
"
\n",
"
\n",
"
20
\n",
"
23
\n",
"
Outdoors & Adventure
\n",
"
Outdoors
\n",
"
Outdoors & Adventure
\n",
"
\n",
"
\n",
"
21
\n",
"
24
\n",
"
Paranormal
\n",
"
Paranormal
\n",
"
Paranormal
\n",
"
\n",
"
\n",
"
22
\n",
"
25
\n",
"
Parents & Family
\n",
"
Moms & Dads
\n",
"
Parents & Family
\n",
"
\n",
"
\n",
"
23
\n",
"
26
\n",
"
Pets & Animals
\n",
"
Pets
\n",
"
Pets & Animals
\n",
"
\n",
"
\n",
"
24
\n",
"
27
\n",
"
Photography
\n",
"
Photography
\n",
"
Photography
\n",
"
\n",
"
\n",
"
25
\n",
"
28
\n",
"
Religion & Beliefs
\n",
"
Beliefs
\n",
"
Religion & Beliefs
\n",
"
\n",
"
\n",
"
26
\n",
"
29
\n",
"
Sci-Fi & Fantasy
\n",
"
Sci fi
\n",
"
Sci-Fi & Fantasy
\n",
"
\n",
"
\n",
"
27
\n",
"
30
\n",
"
Singles
\n",
"
Singles
\n",
"
Singles
\n",
"
\n",
"
\n",
"
28
\n",
"
31
\n",
"
Socializing
\n",
"
Social
\n",
"
Socializing
\n",
"
\n",
"
\n",
"
29
\n",
"
32
\n",
"
Sports & Recreation
\n",
"
Sports
\n",
"
Sports & Recreation
\n",
"
\n",
"
\n",
"
30
\n",
"
33
\n",
"
Support
\n",
"
Support
\n",
"
Support
\n",
"
\n",
"
\n",
"
31
\n",
"
34
\n",
"
Tech
\n",
"
Tech
\n",
"
Tech
\n",
"
\n",
"
\n",
"
32
\n",
"
36
\n",
"
Writing
\n",
"
Writing
\n",
"
Writing
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name shortname sort_name\n",
"0 1 Arts & Culture Arts Arts & Culture\n",
"1 18 Book Clubs Book Clubs Book Clubs\n",
"2 2 Career & Business Business Career & Business\n",
"3 3 Cars & Motorcycles Auto Cars & Motorcycles\n",
"4 4 Community & Environment Community Community & Environment\n",
"5 5 Dancing Dancing Dancing\n",
"6 6 Education & Learning Education Education & Learning\n",
"7 8 Fashion & Beauty Fashion Fashion & Beauty\n",
"8 9 Fitness Fitness Fitness\n",
"9 10 Food & Drink Food & Drink Food & Drink\n",
"10 11 Games Games Games\n",
"11 13 Movements & Politics Movements Movements & Politics\n",
"12 14 Health & Wellbeing Well-being Health & Wellbeing\n",
"13 15 Hobbies & Crafts Crafts Hobbies & Crafts\n",
"14 16 Language & Ethnic Identity Languages Language & Ethnic Identity\n",
"15 12 LGBT LGBT LGBT\n",
"16 17 Lifestyle Lifestyle Lifestyle\n",
"17 20 Movies & Film Films Movies & Film\n",
"18 21 Music Music Music\n",
"19 22 New Age & Spirituality Spirituality New Age & Spirituality\n",
"20 23 Outdoors & Adventure Outdoors Outdoors & Adventure\n",
"21 24 Paranormal Paranormal Paranormal\n",
"22 25 Parents & Family Moms & Dads Parents & Family\n",
"23 26 Pets & Animals Pets Pets & Animals\n",
"24 27 Photography Photography Photography\n",
"25 28 Religion & Beliefs Beliefs Religion & Beliefs\n",
"26 29 Sci-Fi & Fantasy Sci fi Sci-Fi & Fantasy\n",
"27 30 Singles Singles Singles\n",
"28 31 Socializing Social Socializing\n",
"29 32 Sports & Recreation Sports Sports & Recreation\n",
"30 33 Support Support Support\n",
"31 34 Tech Tech Tech\n",
"32 36 Writing Writing Writing"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cats_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### So, if we want to work with a particular category\n",
"In this case, I want **Tech**. Let's query the dataframe for categories named **Tech**."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
id
\n",
"
name
\n",
"
shortname
\n",
"
sort_name
\n",
"
\n",
" \n",
" \n",
"
\n",
"
31
\n",
"
34
\n",
"
Tech
\n",
"
Tech
\n",
"
Tech
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name shortname sort_name\n",
"31 34 Tech Tech Tech"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tech_df = cats_df.loc[cats_df['name'] == 'Tech']\n",
"tech_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's store the category ID number for later use"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"34"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tech_category_id = tech_df['id'].values[0]\n",
"tech_category_id"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Explore Cities\n",
"### Now let's look at cities in the United States named San Francisco"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"%%capture --no-display\n",
"cities_resp = client.GetCities(country='us', query='San Francisco')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here we used the [GetCities] method of the [Python Meetup API client]\n",
"\n",
"I used a query for cities in **United States** called **San Francisco**.\n",
"\n",
"[//]: # (References)\n",
"\n",
"[GetCities]: https://meetup-api.readthedocs.io/en/latest/meetup_api.html#meetup.api.meetup.api.Client.GetCities\n",
"[Python Meetup API client]: https://meetup-api.readthedocs.io/en/latest/meetup_api.html#api-client-details\n",
"\n",
"Now let's take a look at the **meta** for our results."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
0
\n",
"
\n",
" \n",
" \n",
"
\n",
"
next
\n",
"
\n",
"
\n",
"
\n",
"
method
\n",
"
Cities
\n",
"
\n",
"
\n",
"
total_count
\n",
"
4
\n",
"
\n",
"
\n",
"
link
\n",
"
https://api.meetup.com/2/cities
\n",
"
\n",
"
\n",
"
count
\n",
"
4
\n",
"
\n",
"
\n",
"
description
\n",
"
Returns Meetup cities. This method supports se...
\n",
"
\n",
"
\n",
"
lon
\n",
"
None
\n",
"
\n",
"
\n",
"
title
\n",
"
Cities
\n",
"
\n",
"
\n",
"
url
\n",
"
https://api.meetup.com/2/cities?country=us&off...
\n",
"
\n",
"
\n",
"
id
\n",
"
\n",
"
\n",
"
\n",
"
updated
\n",
"
1263132740000
\n",
"
\n",
"
\n",
"
lat
\n",
"
None
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0\n",
"next \n",
"method Cities\n",
"total_count 4\n",
"link https://api.meetup.com/2/cities\n",
"count 4\n",
"description Returns Meetup cities. This method supports se...\n",
"lon None\n",
"title Cities\n",
"url https://api.meetup.com/2/cities?country=us&off...\n",
"id \n",
"updated 1263132740000\n",
"lat None"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cities_meta_df, cities_df = parse_response(cities_resp)\n",
"cities_meta_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hmm, a count of 4 cities is suspicious...\n",
"\n",
"I only know of the one San Francisco, why are there 4 cities?"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
city
\n",
"
country
\n",
"
id
\n",
"
lat
\n",
"
localized_country_name
\n",
"
lon
\n",
"
member_count
\n",
"
name_string
\n",
"
ranking
\n",
"
state
\n",
"
zip
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
San Francisco
\n",
"
us
\n",
"
94101
\n",
"
37.779999
\n",
"
USA
\n",
"
-122.419998
\n",
"
60351
\n",
"
San Francisco, California, USA
\n",
"
0
\n",
"
CA
\n",
"
94101
\n",
"
\n",
"
\n",
"
1
\n",
"
Bosque
\n",
"
us
\n",
"
87006
\n",
"
34.560001
\n",
"
USA
\n",
"
-106.779999
\n",
"
5
\n",
"
San Francisco, New Mexico, USA
\n",
"
1
\n",
"
NM
\n",
"
87006
\n",
"
\n",
"
\n",
"
2
\n",
"
San Luis
\n",
"
us
\n",
"
81152
\n",
"
37.080002
\n",
"
USA
\n",
"
-105.620003
\n",
"
4
\n",
"
San Francisco, Colorado, USA
\n",
"
2
\n",
"
CO
\n",
"
81152
\n",
"
\n",
"
\n",
"
3
\n",
"
Reserve
\n",
"
us
\n",
"
87830
\n",
"
33.650002
\n",
"
USA
\n",
"
-108.769997
\n",
"
1
\n",
"
San Francisco Plaza, New Mexico, USA
\n",
"
3
\n",
"
NM
\n",
"
87830
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" city country id lat localized_country_name lon \\\n",
"0 San Francisco us 94101 37.779999 USA -122.419998 \n",
"1 Bosque us 87006 34.560001 USA -106.779999 \n",
"2 San Luis us 81152 37.080002 USA -105.620003 \n",
"3 Reserve us 87830 33.650002 USA -108.769997 \n",
"\n",
" member_count name_string ranking state zip \n",
"0 60351 San Francisco, California, USA 0 CA 94101 \n",
"1 5 San Francisco, New Mexico, USA 1 NM 87006 \n",
"2 4 San Francisco, Colorado, USA 2 CO 81152 \n",
"3 1 San Francisco Plaza, New Mexico, USA 3 NM 87830 "
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cities_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Oh, there are lots of San Franciscos! \n",
"\n",
"### Let's filter the dataframe with a query to give us only cities in California, US"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
city
\n",
"
country
\n",
"
id
\n",
"
lat
\n",
"
localized_country_name
\n",
"
lon
\n",
"
member_count
\n",
"
name_string
\n",
"
ranking
\n",
"
state
\n",
"
zip
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
San Francisco
\n",
"
us
\n",
"
94101
\n",
"
37.779999
\n",
"
USA
\n",
"
-122.419998
\n",
"
60351
\n",
"
San Francisco, California, USA
\n",
"
0
\n",
"
CA
\n",
"
94101
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" city country id lat localized_country_name lon \\\n",
"0 San Francisco us 94101 37.779999 USA -122.419998 \n",
"\n",
" member_count name_string ranking state zip \n",
"0 60351 San Francisco, California, USA 0 CA 94101 "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"single_city_df = cities_df.loc[\n",
" (cities_df['state'] == 'CA')]\n",
"\n",
"single_city_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One San Francisco, perfect! \n",
"\n",
"### Let's store the latitude and longitude for later use as well"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(37.779998779296875, -122.41999816894531)"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"latitude = single_city_df['lat'][0]\n",
"longitude = single_city_df['lon'][0]\n",
"latitude, longitude"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Now let's look at groups in San Francisco, CA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Since we are going to grab lots of groups, let's make a function to help us call the API\n",
"\n",
"**Note**: This function will use the **tech_category_id, latitude, and longitude** values that we \n",
"found eariler."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def get_a_group(page_number, category_id=tech_category_id, lat=latitude,\n",
" lon=longitude):\n",
" group = None\n",
" retry_counter, retry_max = 0, 3\n",
" print(f\"Getting page {page_number}\")\n",
" while retry_counter < retry_max:\n",
" try:\n",
" group = client.GetGroups(\n",
" category_id=category_id, lat=lat, lon=lon, offset=page_number)\n",
" return group\n",
" except:\n",
" print(f\"Fetch failure {retry_counter + 1}\")\n",
" retry_counter += 1\n",
"\n",
" raise Exception(f\"Unable to fetch page after {retry_counter} attempts\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now, After grabbing the first group\n",
"\n",
"Let's review the **meta** to help us see what we are getting into"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
0
\n",
"
\n",
" \n",
" \n",
"
\n",
"
next
\n",
"
https://api.meetup.com/2/groups?offset=1&forma...
\n",
"
\n",
"
\n",
"
method
\n",
"
Groups
\n",
"
\n",
"
\n",
"
total_count
\n",
"
2197
\n",
"
\n",
"
\n",
"
link
\n",
"
https://api.meetup.com/2/groups
\n",
"
\n",
"
\n",
"
count
\n",
"
200
\n",
"
\n",
"
\n",
"
description
\n",
"
None
\n",
"
\n",
"
\n",
"
lon
\n",
"
-122.42
\n",
"
\n",
"
\n",
"
title
\n",
"
Meetup Groups v2
\n",
"
\n",
"
\n",
"
url
\n",
"
https://api.meetup.com/2/groups?offset=0&forma...
\n",
"
\n",
"
\n",
"
id
\n",
"
\n",
"
\n",
"
\n",
"
updated
\n",
"
1550965553000
\n",
"
\n",
"
\n",
"
lat
\n",
"
37.78
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0\n",
"next https://api.meetup.com/2/groups?offset=1&forma...\n",
"method Groups\n",
"total_count 2197\n",
"link https://api.meetup.com/2/groups\n",
"count 200\n",
"description None\n",
"lon -122.42\n",
"title Meetup Groups v2\n",
"url https://api.meetup.com/2/groups?offset=0&forma...\n",
"id \n",
"updated 1550965553000\n",
"lat 37.78"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%capture --no-display\n",
"group_resp = get_a_group(0)\n",
"group_meta, _ = parse_response(group_resp)\n",
"group_meta"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Wait! There's a meta[\\\"next\\\"].\n",
"\n",
"Remember earlier when I spoke about **response.meta\\[\"next\"\\]**? \n",
"\n",
"It seems as though our result will span mulitple API calls, each returning 200 new groups in \n",
"a **page**.\n",
"\n",
"Let's make a new helper that will grab each **page** in a series of API calls until we obtain the entire data set:\n",
"\n",
"We will use the pandas.DataFrame.[concat] function to collate all pages into a single useful dataframe\n",
"\n",
"[//]: # (References)\n",
"\n",
"[concat]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"def get_all_groups_as_a_df():\n",
" \"\"\"Returns a single dataframe composed from data from multiple\n",
" successive calls to get_a_group.\n",
" \n",
" We will loop through get_a_group pages while our page.meta['next'] is \n",
" not the empty string.\n",
" \"\"\"\n",
" page_df_list = []\n",
" next_page = None\n",
" page_number = 0\n",
" while next_page != '': \n",
" page = get_a_group(page_number)\n",
" next_page = page.meta[\"next\"]\n",
" _, frame = parse_response(page)\n",
" page_number += 1\n",
" page_df_list.append(frame)\n",
" \n",
" return pd.concat(page_df_list, ignore_index=True)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Getting page 0\n",
"28/30 (5 seconds remaining)\n",
"Getting page 1\n",
"27/30 (3 seconds remaining)\n",
"Getting page 2\n",
"26/30 (3 seconds remaining)\n",
"Getting page 3\n",
"25/30 (2 seconds remaining)\n",
"Getting page 4\n",
"24/30 (1 seconds remaining)\n",
"Getting page 5\n",
"29/30 (10 seconds remaining)\n",
"Getting page 6\n",
"28/30 (9 seconds remaining)\n",
"Getting page 7\n",
"27/30 (8 seconds remaining)\n",
"Getting page 8\n",
"26/30 (7 seconds remaining)\n",
"Getting page 9\n",
"25/30 (6 seconds remaining)\n",
"Getting page 10\n",
"24/30 (5 seconds remaining)\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
category
\n",
"
city
\n",
"
country
\n",
"
created
\n",
"
description
\n",
"
group_photo
\n",
"
id
\n",
"
join_mode
\n",
"
lat
\n",
"
link
\n",
"
...
\n",
"
name
\n",
"
organizer
\n",
"
rating
\n",
"
state
\n",
"
timezone
\n",
"
topics
\n",
"
urlname
\n",
"
utc_offset
\n",
"
visibility
\n",
"
who
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
{'name': 'tech', 'id': 34, 'shortname': 'tech'}
\n",
"
San Francisco
\n",
"
US
\n",
"
1034097740000
\n",
"
<p>The SF PHP Community Meetup is an open foru...
\n",
"
{'highres_link': 'https://secure.meetupstatic....
\n",
"
120903
\n",
"
open
\n",
"
37.77
\n",
"
https://www.meetup.com/sf-php/
\n",
"
...
\n",
"
SF PHP Community
\n",
"
{'member_id': 126468982, 'name': 'Andre Marigo...
\n",
"
4.38
\n",
"
CA
\n",
"
US/Pacific
\n",
"
[{'urlkey': 'php', 'name': 'PHP', 'id': 455}, ...
\n",
"
sf-php
\n",
"
-28800000
\n",
"
public
\n",
"
PHP Developers
\n",
"
\n",
" \n",
"
\n",
"
1 rows × 22 columns
\n",
"
"
],
"text/plain": [
" category city country \\\n",
"0 {'name': 'tech', 'id': 34, 'shortname': 'tech'} San Francisco US \n",
"\n",
" created description \\\n",
"0 1034097740000
The SF PHP Community Meetup is an open foru... \n",
"\n",
" group_photo id join_mode lat \\\n",
"0 {'highres_link': 'https://secure.meetupstatic.... 120903 open 37.77 \n",
"\n",
" link ... name \\\n",
"0 https://www.meetup.com/sf-php/ ... SF PHP Community \n",
"\n",
" organizer rating state timezone \\\n",
"0 {'member_id': 126468982, 'name': 'Andre Marigo... 4.38 CA US/Pacific \n",
"\n",
" topics urlname utc_offset \\\n",
"0 [{'urlkey': 'php', 'name': 'PHP', 'id': 455}, ... sf-php -28800000 \n",
"\n",
" visibility who \n",
"0 public PHP Developers \n",
"\n",
"[1 rows x 22 columns]"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Collect all groups into a single dataframe\n",
"all_groups_df = get_all_groups_as_a_df()\n",
"\n",
"# Show the first row in the dataframe\n",
"all_groups_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### That's too many columns\n",
"I really only care about a small list of columns, let's exclude the unneeded columns."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
id
\n",
"
name
\n",
"
members
\n",
"
rating
\n",
"
join_mode
\n",
"
urlname
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
120903
\n",
"
SF PHP Community
\n",
"
2702
\n",
"
4.38
\n",
"
open
\n",
"
sf-php
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id name members rating join_mode urlname\n",
"0 120903 SF PHP Community 2702 4.38 open sf-php"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"column_list = ['id', 'name', 'members', 'rating', 'join_mode', 'urlname']\n",
"all_groups_df = all_groups_df[column_list]\n",
"all_groups_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's double check the size of our new dataframe"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2197, 6)"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"all_groups_df.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That looks just about right:\n",
"* 2197 rows\n",
"* 6 columns\n",
"\n",
"---\n",
"## Explore Members per Group\n",
"\n",
"Each group has a different sized membership, let's explore this first!\n",
"\n",
"\n",
"### Let's start with with a histogram\n",
"\n",
"This visualization should give us a basic idea of how big our groups are."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Using seaborn's distplot function\n",
"plt.rcParams['figure.figsize'] = [11, 6]\n",
"sb.distplot(all_groups_df['members'], kde=False, color=\"g\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### It appears that most groups are relatively small\n",
"\n",
"Let's take a closer look at some basic stats for our data in a tabular \n",
"format for some hard numbers:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
members
\n",
"
\n",
" \n",
" \n",
"
\n",
"
count
\n",
"
2,197.00
\n",
"
\n",
"
\n",
"
mean
\n",
"
811.79
\n",
"
\n",
"
\n",
"
std
\n",
"
1,741.14
\n",
"
\n",
"
\n",
"
min
\n",
"
1.00
\n",
"
\n",
"
\n",
"
25%
\n",
"
86.00
\n",
"
\n",
"
\n",
"
50%
\n",
"
256.00
\n",
"
\n",
"
\n",
"
75%
\n",
"
780.00
\n",
"
\n",
"
\n",
"
max
\n",
"
36,058.00
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" members\n",
"count 2,197.00\n",
"mean 811.79\n",
"std 1,741.14\n",
"min 1.00\n",
"25% 86.00\n",
"50% 256.00\n",
"75% 780.00\n",
"max 36,058.00"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.options.display.float_format = '{:20,.2f}'.format\n",
"all_groups_df[[\"members\"]].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As a table I can see some numbers:\n",
"\n",
"1. It looks like the average group size is about 812 persons.\n",
"2. Half of the group sits at or under 256 members.\n",
"3. The smallest group has a single person.\n",
"4. **the largest group has 36,000+ members!**\n",
"\n",
"What an outlier! But are there other **mega-groups** like this?\n",
"\n",
"### Maybe a box and whisker plot can visualize these stats for us"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"plt.rcParams['figure.figsize'] = [6, 20]\n",
"all_groups_df['members'].plot.box();"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wow, there are quite a few **mega-groups**, as indicated by the circles above our top whisker!\n",
"\n",
"---\n",
"\n",
"Why are the groups so big?\n",
"\n",
"In fact...\n",
"\n",
"### What are the 10 biggest tech groups in the area?"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
name
\n",
"
members
\n",
"
\n",
" \n",
" \n",
"
\n",
"
19
\n",
"
Silicon Valley Entrepreneurs & Startups
\n",
"
36058
\n",
"
\n",
"
\n",
"
107
\n",
"
SFHTML5
\n",
"
17718
\n",
"
\n",
"
\n",
"
106
\n",
"
Designers + Geeks
\n",
"
15467
\n",
"
\n",
"
\n",
"
426
\n",
"
SF Data Science
\n",
"
14874
\n",
"
\n",
"
\n",
"
28
\n",
"
The SF JavaScript Meetup
\n",
"
13359
\n",
"
\n",
"
\n",
"
250
\n",
"
Tech in Motion Events: San Francisco
\n",
"
13090
\n",
"
\n",
"
\n",
"
540
\n",
"
Docker Online Meetup
\n",
"
12475
\n",
"
\n",
"
\n",
"
191
\n",
"
SF Data Mining
\n",
"
12378
\n",
"
\n",
"
\n",
"
201
\n",
"
Women Who Code SF
\n",
"
12334
\n",
"
\n",
"
\n",
"
706
\n",
"
SF Big Analytics
\n",
"
11889
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name members\n",
"19 Silicon Valley Entrepreneurs & Startups 36058\n",
"107 SFHTML5 17718\n",
"106 Designers + Geeks 15467\n",
"426 SF Data Science 14874\n",
"28 The SF JavaScript Meetup 13359\n",
"250 Tech in Motion Events: San Francisco 13090\n",
"540 Docker Online Meetup 12475\n",
"191 SF Data Mining 12378\n",
"201 Women Who Code SF 12334\n",
"706 SF Big Analytics 11889"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"biggest_ten_df = all_groups_df.sort_values('members',\n",
" ascending=False).head(10)\n",
"biggest_ten_df[[\"name\", \"members\"]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get Group Events"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### We need to do some data shaping before the next api call\n",
"\n",
"Mostly we need to:\n",
"1. pass in a string with group ids from our 10 biggest groups\n",
"2. convert our human-readable date ranges to milliseconds since Jan 1, 1970\n",
"3. Call the GetEvents API filtering for past events using our group IDs and our date range\n",
"\n",
"#### First, let's make that string of group ids\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'844726,1619955,1615633,9226282,1060260,3483762,13402242,2065031,2252591,18354966'"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"id_list = biggest_ten_df['id'].tolist()\n",
"id_list\n",
"ids = ','.join(str(x) for x in id_list)\n",
"ids"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Now, let's get the epoch milliseconds for a date range between now and 9 months ago"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Now: 1550937159472, nine months ago: 1535385159472\n"
]
}
],
"source": [
"def to_millis(dt):\n",
" return int(pd.to_datetime(dt).value / 1000000)\n",
"\n",
"right_now = to_millis(datetime.datetime.now())\n",
"nine_months_ago = int(right_now - 180 * 24 * 60 * 60 * 1000)\n",
"print(f\"Now: {right_now}, nine months ago: {nine_months_ago}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Finally, let's look at those events."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"%%capture --no-display\n",
"events_resp = client.GetEvents(group_id=ids, status='past',\n",
" time=f\"{nine_months_ago},{right_now}\");\n",
"\n",
"events_meta, events_df = parse_response(events_resp)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Again, our events_df dataframe has extra columns that I don't care about"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
Demo Session is free to meetup attendees. U... 7200000 \n",
"\n",
" event_url \\\n",
"0 https://www.meetup.com/sventrepreneurs/events/... \n",
"\n",
" group headcount \\\n",
"0 {'join_mode': 'open', 'created': 1196203591000... 0 \n",
"\n",
" how_to_find_us id maybe_rsvp_count \\\n",
"0 NaN 253824506 0 \n",
"\n",
" name ... \\\n",
"0 Demo Session @ Mars Blockchain Summit by Mars ... ... \n",
"\n",
" rating rsvp_limit status time \\\n",
"0 {'count': 0, 'average': 0} nan past 1535495400000 \n",
"\n",
" updated utc_offset \\\n",
"0 1535527171000 -25200000 \n",
"\n",
" venue visibility \\\n",
"0 {'country': 'us', 'localized_country_name': 'U... public \n",
"\n",
" waitlist_count yes_rsvp_count \n",
"0 0 66 \n",
"\n",
"[1 rows x 21 columns]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"events_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### So again, let's filter down to just what's relevant"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
group
\n",
"
time
\n",
"
duration
\n",
"
yes_rsvp_count
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
{'join_mode': 'open', 'created': 1196203591000...
\n",
"
1535495400000
\n",
"
7200000
\n",
"
66
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" group time duration \\\n",
"0 {'join_mode': 'open', 'created': 1196203591000... 1535495400000 7200000 \n",
"\n",
" yes_rsvp_count \n",
"0 66 "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"column_list = ['group', 'time', 'duration', 'yes_rsvp_count']\n",
"events_df = events_df[column_list]\n",
"events_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The group column\n",
"\n",
"The **group** column is actually a JSON object full of metadata about the group.\n",
"\n",
"I really only need the **group\\[\"id\"\\]** for now, so let's focus on that."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
id
\n",
"
time
\n",
"
duration
\n",
"
yes_rsvp_count
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
844726
\n",
"
1535495400000
\n",
"
7200000
\n",
"
66
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id time duration yes_rsvp_count\n",
"0 844726 1535495400000 7200000 66"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def get_id(my_dict):\n",
" \"\"\"Extract the id member of a python dictionary\"\"\"\n",
" return my_dict[\"id\"]\n",
"\n",
"events_df[\"id\"] = events_df[\"group\"].apply(get_id)\n",
"\n",
"# Let's \n",
"columns = ['id', 'time', 'duration', 'yes_rsvp_count']\n",
"events_df = events_df[columns]\n",
"events_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hmm, it seems that our **time** is numeric\n",
"\n",
"The **time** is stored in **Epoch milliseconds** format.\n",
"\n",
"This is great if you want to see time as the number of milliseconds since Jan 1, 1970.\n",
"\n",
"This is not-so-great if you just want to see a human-readable date and time equivalent.\n",
"\n",
"Let's make a new human-readable column called **time_dt**"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
id
\n",
"
time
\n",
"
time_dt
\n",
"
duration
\n",
"
yes_rsvp_count
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
844726
\n",
"
1535495400000
\n",
"
08/28/18 22:30
\n",
"
7200000
\n",
"
66
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id time time_dt duration yes_rsvp_count\n",
"0 844726 1535495400000 08/28/18 22:30 7200000 66"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"events_df[\"time_dt\"] = pd.to_datetime(\n",
" events_df[\"time\"], unit='ms').dt.strftime('%m/%d/%y %H:%M')\n",
" \n",
"columns = ['id', 'time','time_dt', 'duration', 'yes_rsvp_count']\n",
"events_df = events_df[columns]\n",
"events_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now let's convert the duration column to something human-readable\n",
"\n",
"Let's convert the column to a string that shows hours and minutes."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
id
\n",
"
time
\n",
"
time_dt
\n",
"
duration
\n",
"
yes_rsvp_count
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
844726
\n",
"
1535495400000
\n",
"
08/28/18 22:30
\n",
"
2 hours, 0 minutes
\n",
"
66
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id time time_dt duration yes_rsvp_count\n",
"0 844726 1535495400000 08/28/18 22:30 2 hours, 0 minutes 66"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def millis_2_hours_and_minutes(ms):\n",
" \"\"\"Converts milliseconds to hours and minutes.\"\"\"\n",
" seconds = ms / 1000\n",
" minutes, seconds = divmod(seconds, 60)\n",
" hours, minutes = divmod(minutes, 60)\n",
"\n",
" return f\"{int(hours)} hours, {int(minutes)} minutes\" \n",
"\n",
"events_df[\"duration\"] = events_df[\"duration\"].apply(\n",
" millis_2_hours_and_minutes)\n",
"\n",
"events_df.head(1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Now let's join our top ten mega-groups dataframe with our events dataframe\n",
"\n",
"If you are familiar with SQL this is similar to a left join from **raw_results_df**\n",
"to **biggest_ten_df** on **id**\n",
"\n",
"Then we sort the output by **name** ascending and then **time** descending."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
name
\n",
"
time
\n",
"
time_dt
\n",
"
duration
\n",
"
yes_rsvp_count
\n",
"
id
\n",
"
\n",
" \n",
" \n",
"
\n",
"
101
\n",
"
Designers + Geeks
\n",
"
1550804400000
\n",
"
02/22/19 03:00
\n",
"
2 hours, 0 minutes
\n",
"
178
\n",
"
1615633
\n",
"
\n",
"
\n",
"
55
\n",
"
Designers + Geeks
\n",
"
1542337200000
\n",
"
11/16/18 03:00
\n",
"
2 hours, 0 minutes
\n",
"
71
\n",
"
1615633
\n",
"
\n",
"
\n",
"
46
\n",
"
Designers + Geeks
\n",
"
1541124000000
\n",
"
11/02/18 02:00
\n",
"
2 hours, 0 minutes
\n",
"
71
\n",
"
1615633
\n",
"
\n",
"
\n",
"
32
\n",
"
Designers + Geeks
\n",
"
1539914400000
\n",
"
10/19/18 02:00
\n",
"
2 hours, 0 minutes
\n",
"
41
\n",
"
1615633
\n",
"
\n",
"
\n",
"
24
\n",
"
Designers + Geeks
\n",
"
1538704800000
\n",
"
10/05/18 02:00
\n",
"
2 hours, 0 minutes
\n",
"
29
\n",
"
1615633
\n",
"
\n",
"
\n",
"
14
\n",
"
Designers + Geeks
\n",
"
1537495200000
\n",
"
09/21/18 02:00
\n",
"
2 hours, 0 minutes
\n",
"
37
\n",
"
1615633
\n",
"
\n",
"
\n",
"
4
\n",
"
Designers + Geeks
\n",
"
1536285600000
\n",
"
09/07/18 02:00
\n",
"
2 hours, 0 minutes
\n",
"
87
\n",
"
1615633
\n",
"
\n",
"
\n",
"
40
\n",
"
Docker Online Meetup
\n",
"
1540915200000
\n",
"
10/30/18 16:00
\n",
"
1 hours, 0 minutes
\n",
"
1
\n",
"
13402242
\n",
"
\n",
"
\n",
"
97
\n",
"
SF Big Analytics
\n",
"
1549591200000
\n",
"
02/08/19 02:00
\n",
"
2 hours, 30 minutes
\n",
"
418
\n",
"
18354966
\n",
"
\n",
"
\n",
"
80
\n",
"
SF Big Analytics
\n",
"
1547690400000
\n",
"
01/17/19 02:00
\n",
"
3 hours, 0 minutes
\n",
"
450
\n",
"
18354966
\n",
"
\n",
"
\n",
"
78
\n",
"
SF Big Analytics
\n",
"
1547604000000
\n",
"
01/16/19 02:00
\n",
"
3 hours, 0 minutes
\n",
"
5
\n",
"
18354966
\n",
"
\n",
"
\n",
"
64
\n",
"
SF Big Analytics
\n",
"
1544061600000
\n",
"
12/06/18 02:00
\n",
"
3 hours, 0 minutes
\n",
"
7
\n",
"
18354966
\n",
"
\n",
"
\n",
"
53
\n",
"
SF Big Analytics
\n",
"
1542247200000
\n",
"
11/15/18 02:00
\n",
"
3 hours, 0 minutes
\n",
"
16
\n",
"
18354966
\n",
"
\n",
"
\n",
"
51
\n",
"
SF Big Analytics
\n",
"
1542160800000
\n",
"
11/14/18 02:00
\n",
"
2 hours, 30 minutes
\n",
"
478
\n",
"
18354966
\n",
"
\n",
"
\n",
"
50
\n",
"
SF Big Analytics
\n",
"
1541782800000
\n",
"
11/09/18 17:00
\n",
"
57 hours, 0 minutes
\n",
"
16
\n",
"
18354966
\n",
"
\n",
"
\n",
"
44
\n",
"
SF Big Analytics
\n",
"
1541120400000
\n",
"
11/02/18 01:00
\n",
"
3 hours, 0 minutes
\n",
"
152
\n",
"
18354966
\n",
"
\n",
"
\n",
"
29
\n",
"
SF Big Analytics
\n",
"
1539909000000
\n",
"
10/19/18 00:30
\n",
"
2 hours, 30 minutes
\n",
"
390
\n",
"
18354966
\n",
"
\n",
"
\n",
"
16
\n",
"
SF Big Analytics
\n",
"
1538010000000
\n",
"
09/27/18 01:00
\n",
"
2 hours, 45 minutes
\n",
"
245
\n",
"
18354966
\n",
"
\n",
"
\n",
"
8
\n",
"
SF Big Analytics
\n",
"
1536800400000
\n",
"
09/13/18 01:00
\n",
"
2 hours, 30 minutes
\n",
"
218
\n",
"
18354966
\n",
"
\n",
"
\n",
"
10
\n",
"
SF Data Mining
\n",
"
1537200000000
\n",
"
09/17/18 16:00
\n",
"
104 hours, 0 minutes
\n",
"
6
\n",
"
2065031
\n",
"
\n",
"
\n",
"
93
\n",
"
SF Data Science
\n",
"
1548988200000
\n",
"
02/01/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
94
\n",
"
9226282
\n",
"
\n",
"
\n",
"
89
\n",
"
SF Data Science
\n",
"
1548383400000
\n",
"
01/25/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
48
\n",
"
9226282
\n",
"
\n",
"
\n",
"
82
\n",
"
SF Data Science
\n",
"
1547778600000
\n",
"
01/18/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
26
\n",
"
9226282
\n",
"
\n",
"
\n",
"
81
\n",
"
SF Data Science
\n",
"
1547692200000
\n",
"
01/17/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
59
\n",
"
9226282
\n",
"
\n",
"
\n",
"
60
\n",
"
SF Data Science
\n",
"
1543539600000
\n",
"
11/30/18 01:00
\n",
"
3 hours, 30 minutes
\n",
"
18
\n",
"
9226282
\n",
"
\n",
"
\n",
"
52
\n",
"
SF Data Science
\n",
"
1542211200000
\n",
"
11/14/18 16:00
\n",
"
10 hours, 0 minutes
\n",
"
1
\n",
"
9226282
\n",
"
\n",
"
\n",
"
38
\n",
"
SF Data Science
\n",
"
1540774800000
\n",
"
10/29/18 01:00
\n",
"
4 hours, 0 minutes
\n",
"
21
\n",
"
9226282
\n",
"
\n",
"
\n",
"
30
\n",
"
SF Data Science
\n",
"
1539910800000
\n",
"
10/19/18 01:00
\n",
"
2 hours, 0 minutes
\n",
"
95
\n",
"
9226282
\n",
"
\n",
"
\n",
"
26
\n",
"
SF Data Science
\n",
"
1539306000000
\n",
"
10/12/18 01:00
\n",
"
3 hours, 0 minutes
\n",
"
15
\n",
"
9226282
\n",
"
\n",
"
\n",
"
102
\n",
"
SFHTML5
\n",
"
1550883600000
\n",
"
02/23/19 01:00
\n",
"
4 hours, 0 minutes
\n",
"
336
\n",
"
1619955
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
95
\n",
"
Women Who Code SF
\n",
"
1549420200000
\n",
"
02/06/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
47
\n",
"
2252591
\n",
"
\n",
"
\n",
"
92
\n",
"
Women Who Code SF
\n",
"
1548815400000
\n",
"
01/30/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
37
\n",
"
2252591
\n",
"
\n",
"
\n",
"
86
\n",
"
Women Who Code SF
\n",
"
1548210600000
\n",
"
01/23/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
29
\n",
"
2252591
\n",
"
\n",
"
\n",
"
79
\n",
"
Women Who Code SF
\n",
"
1547605800000
\n",
"
01/16/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
32
\n",
"
2252591
\n",
"
\n",
"
\n",
"
76
\n",
"
Women Who Code SF
\n",
"
1547001000000
\n",
"
01/09/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
57
\n",
"
2252591
\n",
"
\n",
"
\n",
"
71
\n",
"
Women Who Code SF
\n",
"
1545271200000
\n",
"
12/20/18 02:00
\n",
"
2 hours, 0 minutes
\n",
"
199
\n",
"
2252591
\n",
"
\n",
"
\n",
"
69
\n",
"
Women Who Code SF
\n",
"
1545186600000
\n",
"
12/19/18 02:30
\n",
"
1 hours, 30 minutes
\n",
"
50
\n",
"
2252591
\n",
"
\n",
"
\n",
"
70
\n",
"
Women Who Code SF
\n",
"
1545186600000
\n",
"
12/19/18 02:30
\n",
"
2 hours, 0 minutes
\n",
"
23
\n",
"
2252591
\n",
"
\n",
"
\n",
"
67
\n",
"
Women Who Code SF
\n",
"
1544581800000
\n",
"
12/12/18 02:30
\n",
"
2 hours, 0 minutes
\n",
"
22
\n",
"
2252591
\n",
"
\n",
"
\n",
"
65
\n",
"
Women Who Code SF
\n",
"
1544149800000
\n",
"
12/07/18 02:30
\n",
"
2 hours, 0 minutes
\n",
"
17
\n",
"
2252591
\n",
"
\n",
"
\n",
"
63
\n",
"
Women Who Code SF
\n",
"
1543977000000
\n",
"
12/05/18 02:30
\n",
"
2 hours, 0 minutes
\n",
"
29
\n",
"
2252591
\n",
"
\n",
"
\n",
"
62
\n",
"
Women Who Code SF
\n",
"
1543975200000
\n",
"
12/05/18 02:00
\n",
"
3 hours, 0 minutes
\n",
"
1
\n",
"
2252591
\n",
"
\n",
"
\n",
"
58
\n",
"
Women Who Code SF
\n",
"
1542767400000
\n",
"
11/21/18 02:30
\n",
"
2 hours, 0 minutes
\n",
"
31
\n",
"
2252591
\n",
"
\n",
"
\n",
"
54
\n",
"
Women Who Code SF
\n",
"
1542335400000
\n",
"
11/16/18 02:30
\n",
"
2 hours, 0 minutes
\n",
"
50
\n",
"
2252591
\n",
"
\n",
"
\n",
"
49
\n",
"
Women Who Code SF
\n",
"
1541557800000
\n",
"
11/07/18 02:30
\n",
"
2 hours, 0 minutes
\n",
"
27
\n",
"
2252591
\n",
"
\n",
"
\n",
"
45
\n",
"
Women Who Code SF
\n",
"
1541122200000
\n",
"
11/02/18 01:30
\n",
"
2 hours, 0 minutes
\n",
"
50
\n",
"
2252591
\n",
"
\n",
"
\n",
"
43
\n",
"
Women Who Code SF
\n",
"
1540949400000
\n",
"
10/31/18 01:30
\n",
"
2 hours, 0 minutes
\n",
"
29
\n",
"
2252591
\n",
"
\n",
"
\n",
"
39
\n",
"
Women Who Code SF
\n",
"
1540859400000
\n",
"
10/30/18 00:30
\n",
"
2 hours, 30 minutes
\n",
"
64
\n",
"
2252591
\n",
"
\n",
"
\n",
"
34
\n",
"
Women Who Code SF
\n",
"
1540400400000
\n",
"
10/24/18 17:00
\n",
"
1 hours, 0 minutes
\n",
"
2
\n",
"
2252591
\n",
"
\n",
"
\n",
"
33
\n",
"
Women Who Code SF
\n",
"
1540344600000
\n",
"
10/24/18 01:30
\n",
"
2 hours, 0 minutes
\n",
"
36
\n",
"
2252591
\n",
"
\n",
"
\n",
"
31
\n",
"
Women Who Code SF
\n",
"
1539912600000
\n",
"
10/19/18 01:30
\n",
"
2 hours, 0 minutes
\n",
"
50
\n",
"
2252591
\n",
"
\n",
"
\n",
"
25
\n",
"
Women Who Code SF
\n",
"
1539135000000
\n",
"
10/10/18 01:30
\n",
"
2 hours, 0 minutes
\n",
"
25
\n",
"
2252591
\n",
"
\n",
"
\n",
"
23
\n",
"
Women Who Code SF
\n",
"
1538703000000
\n",
"
10/05/18 01:30
\n",
"
2 hours, 0 minutes
\n",
"
50
\n",
"
2252591
\n",
"
\n",
"
\n",
"
21
\n",
"
Women Who Code SF
\n",
"
1538530200000
\n",
"
10/03/18 01:30
\n",
"
2 hours, 0 minutes
\n",
"
10
\n",
"
2252591
\n",
"
\n",
"
\n",
"
22
\n",
"
Women Who Code SF
\n",
"
1538530200000
\n",
"
10/03/18 01:30
\n",
"
2 hours, 0 minutes
\n",
"
27
\n",
"
2252591
\n",
"
\n",
"
\n",
"
13
\n",
"
Women Who Code SF
\n",
"
1537493400000
\n",
"
09/21/18 01:30
\n",
"
2 hours, 0 minutes
\n",
"
33
\n",
"
2252591
\n",
"
\n",
"
\n",
"
11
\n",
"
Women Who Code SF
\n",
"
1537320600000
\n",
"
09/19/18 01:30
\n",
"
2 hours, 0 minutes
\n",
"
14
\n",
"
2252591
\n",
"
\n",
"
\n",
"
9
\n",
"
Women Who Code SF
\n",
"
1537030800000
\n",
"
09/15/18 17:00
\n",
"
3 hours, 0 minutes
\n",
"
10
\n",
"
2252591
\n",
"
\n",
"
\n",
"
7
\n",
"
Women Who Code SF
\n",
"
1536715800000
\n",
"
09/12/18 01:30
\n",
"
2 hours, 0 minutes
\n",
"
25
\n",
"
2252591
\n",
"
\n",
"
\n",
"
1
\n",
"
Women Who Code SF
\n",
"
1535506200000
\n",
"
08/29/18 01:30
\n",
"
2 hours, 0 minutes
\n",
"
35
\n",
"
2252591
\n",
"
\n",
" \n",
"
\n",
"
103 rows × 6 columns
\n",
"
"
],
"text/plain": [
" name time time_dt \\\n",
"101 Designers + Geeks 1550804400000 02/22/19 03:00 \n",
"55 Designers + Geeks 1542337200000 11/16/18 03:00 \n",
"46 Designers + Geeks 1541124000000 11/02/18 02:00 \n",
"32 Designers + Geeks 1539914400000 10/19/18 02:00 \n",
"24 Designers + Geeks 1538704800000 10/05/18 02:00 \n",
"14 Designers + Geeks 1537495200000 09/21/18 02:00 \n",
"4 Designers + Geeks 1536285600000 09/07/18 02:00 \n",
"40 Docker Online Meetup 1540915200000 10/30/18 16:00 \n",
"97 SF Big Analytics 1549591200000 02/08/19 02:00 \n",
"80 SF Big Analytics 1547690400000 01/17/19 02:00 \n",
"78 SF Big Analytics 1547604000000 01/16/19 02:00 \n",
"64 SF Big Analytics 1544061600000 12/06/18 02:00 \n",
"53 SF Big Analytics 1542247200000 11/15/18 02:00 \n",
"51 SF Big Analytics 1542160800000 11/14/18 02:00 \n",
"50 SF Big Analytics 1541782800000 11/09/18 17:00 \n",
"44 SF Big Analytics 1541120400000 11/02/18 01:00 \n",
"29 SF Big Analytics 1539909000000 10/19/18 00:30 \n",
"16 SF Big Analytics 1538010000000 09/27/18 01:00 \n",
"8 SF Big Analytics 1536800400000 09/13/18 01:00 \n",
"10 SF Data Mining 1537200000000 09/17/18 16:00 \n",
"93 SF Data Science 1548988200000 02/01/19 02:30 \n",
"89 SF Data Science 1548383400000 01/25/19 02:30 \n",
"82 SF Data Science 1547778600000 01/18/19 02:30 \n",
"81 SF Data Science 1547692200000 01/17/19 02:30 \n",
"60 SF Data Science 1543539600000 11/30/18 01:00 \n",
"52 SF Data Science 1542211200000 11/14/18 16:00 \n",
"38 SF Data Science 1540774800000 10/29/18 01:00 \n",
"30 SF Data Science 1539910800000 10/19/18 01:00 \n",
"26 SF Data Science 1539306000000 10/12/18 01:00 \n",
"102 SFHTML5 1550883600000 02/23/19 01:00 \n",
".. ... ... ... \n",
"95 Women Who Code SF 1549420200000 02/06/19 02:30 \n",
"92 Women Who Code SF 1548815400000 01/30/19 02:30 \n",
"86 Women Who Code SF 1548210600000 01/23/19 02:30 \n",
"79 Women Who Code SF 1547605800000 01/16/19 02:30 \n",
"76 Women Who Code SF 1547001000000 01/09/19 02:30 \n",
"71 Women Who Code SF 1545271200000 12/20/18 02:00 \n",
"69 Women Who Code SF 1545186600000 12/19/18 02:30 \n",
"70 Women Who Code SF 1545186600000 12/19/18 02:30 \n",
"67 Women Who Code SF 1544581800000 12/12/18 02:30 \n",
"65 Women Who Code SF 1544149800000 12/07/18 02:30 \n",
"63 Women Who Code SF 1543977000000 12/05/18 02:30 \n",
"62 Women Who Code SF 1543975200000 12/05/18 02:00 \n",
"58 Women Who Code SF 1542767400000 11/21/18 02:30 \n",
"54 Women Who Code SF 1542335400000 11/16/18 02:30 \n",
"49 Women Who Code SF 1541557800000 11/07/18 02:30 \n",
"45 Women Who Code SF 1541122200000 11/02/18 01:30 \n",
"43 Women Who Code SF 1540949400000 10/31/18 01:30 \n",
"39 Women Who Code SF 1540859400000 10/30/18 00:30 \n",
"34 Women Who Code SF 1540400400000 10/24/18 17:00 \n",
"33 Women Who Code SF 1540344600000 10/24/18 01:30 \n",
"31 Women Who Code SF 1539912600000 10/19/18 01:30 \n",
"25 Women Who Code SF 1539135000000 10/10/18 01:30 \n",
"23 Women Who Code SF 1538703000000 10/05/18 01:30 \n",
"21 Women Who Code SF 1538530200000 10/03/18 01:30 \n",
"22 Women Who Code SF 1538530200000 10/03/18 01:30 \n",
"13 Women Who Code SF 1537493400000 09/21/18 01:30 \n",
"11 Women Who Code SF 1537320600000 09/19/18 01:30 \n",
"9 Women Who Code SF 1537030800000 09/15/18 17:00 \n",
"7 Women Who Code SF 1536715800000 09/12/18 01:30 \n",
"1 Women Who Code SF 1535506200000 08/29/18 01:30 \n",
"\n",
" duration yes_rsvp_count id \n",
"101 2 hours, 0 minutes 178 1615633 \n",
"55 2 hours, 0 minutes 71 1615633 \n",
"46 2 hours, 0 minutes 71 1615633 \n",
"32 2 hours, 0 minutes 41 1615633 \n",
"24 2 hours, 0 minutes 29 1615633 \n",
"14 2 hours, 0 minutes 37 1615633 \n",
"4 2 hours, 0 minutes 87 1615633 \n",
"40 1 hours, 0 minutes 1 13402242 \n",
"97 2 hours, 30 minutes 418 18354966 \n",
"80 3 hours, 0 minutes 450 18354966 \n",
"78 3 hours, 0 minutes 5 18354966 \n",
"64 3 hours, 0 minutes 7 18354966 \n",
"53 3 hours, 0 minutes 16 18354966 \n",
"51 2 hours, 30 minutes 478 18354966 \n",
"50 57 hours, 0 minutes 16 18354966 \n",
"44 3 hours, 0 minutes 152 18354966 \n",
"29 2 hours, 30 minutes 390 18354966 \n",
"16 2 hours, 45 minutes 245 18354966 \n",
"8 2 hours, 30 minutes 218 18354966 \n",
"10 104 hours, 0 minutes 6 2065031 \n",
"93 2 hours, 0 minutes 94 9226282 \n",
"89 2 hours, 0 minutes 48 9226282 \n",
"82 2 hours, 0 minutes 26 9226282 \n",
"81 2 hours, 0 minutes 59 9226282 \n",
"60 3 hours, 30 minutes 18 9226282 \n",
"52 10 hours, 0 minutes 1 9226282 \n",
"38 4 hours, 0 minutes 21 9226282 \n",
"30 2 hours, 0 minutes 95 9226282 \n",
"26 3 hours, 0 minutes 15 9226282 \n",
"102 4 hours, 0 minutes 336 1619955 \n",
".. ... ... ... \n",
"95 2 hours, 0 minutes 47 2252591 \n",
"92 2 hours, 0 minutes 37 2252591 \n",
"86 2 hours, 0 minutes 29 2252591 \n",
"79 2 hours, 0 minutes 32 2252591 \n",
"76 2 hours, 0 minutes 57 2252591 \n",
"71 2 hours, 0 minutes 199 2252591 \n",
"69 1 hours, 30 minutes 50 2252591 \n",
"70 2 hours, 0 minutes 23 2252591 \n",
"67 2 hours, 0 minutes 22 2252591 \n",
"65 2 hours, 0 minutes 17 2252591 \n",
"63 2 hours, 0 minutes 29 2252591 \n",
"62 3 hours, 0 minutes 1 2252591 \n",
"58 2 hours, 0 minutes 31 2252591 \n",
"54 2 hours, 0 minutes 50 2252591 \n",
"49 2 hours, 0 minutes 27 2252591 \n",
"45 2 hours, 0 minutes 50 2252591 \n",
"43 2 hours, 0 minutes 29 2252591 \n",
"39 2 hours, 30 minutes 64 2252591 \n",
"34 1 hours, 0 minutes 2 2252591 \n",
"33 2 hours, 0 minutes 36 2252591 \n",
"31 2 hours, 0 minutes 50 2252591 \n",
"25 2 hours, 0 minutes 25 2252591 \n",
"23 2 hours, 0 minutes 50 2252591 \n",
"21 2 hours, 0 minutes 10 2252591 \n",
"22 2 hours, 0 minutes 27 2252591 \n",
"13 2 hours, 0 minutes 33 2252591 \n",
"11 2 hours, 0 minutes 14 2252591 \n",
"9 3 hours, 0 minutes 10 2252591 \n",
"7 2 hours, 0 minutes 25 2252591 \n",
"1 2 hours, 0 minutes 35 2252591 \n",
"\n",
"[103 rows x 6 columns]"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"merged_df = pd.merge(\n",
" events_df,\n",
" biggest_ten_df[['id', 'name']],\n",
" on='id',\n",
" how='left')\n",
"\n",
"columns = ['name', 'time', 'time_dt', 'duration', 'yes_rsvp_count', 'id']\n",
"final_df = merged_df[columns]\n",
"\n",
"# Sort the output by name and time\n",
"final_df = final_df.sort_values(by=['name', 'time'], ascending=[True, False])\n",
"final_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hmm, those ID numbers are numeric, but take a while to type\n",
"\n",
"Let's convert those to something easier.\n"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
name
\n",
"
time
\n",
"
time_dt
\n",
"
duration
\n",
"
yes_rsvp_count
\n",
"
id
\n",
"
\n",
" \n",
" \n",
"
\n",
"
101
\n",
"
Designers + Geeks
\n",
"
1550804400000
\n",
"
02/22/19 03:00
\n",
"
2 hours, 0 minutes
\n",
"
178
\n",
"
0
\n",
"
\n",
"
\n",
"
55
\n",
"
Designers + Geeks
\n",
"
1542337200000
\n",
"
11/16/18 03:00
\n",
"
2 hours, 0 minutes
\n",
"
71
\n",
"
0
\n",
"
\n",
"
\n",
"
46
\n",
"
Designers + Geeks
\n",
"
1541124000000
\n",
"
11/02/18 02:00
\n",
"
2 hours, 0 minutes
\n",
"
71
\n",
"
0
\n",
"
\n",
"
\n",
"
32
\n",
"
Designers + Geeks
\n",
"
1539914400000
\n",
"
10/19/18 02:00
\n",
"
2 hours, 0 minutes
\n",
"
41
\n",
"
0
\n",
"
\n",
"
\n",
"
24
\n",
"
Designers + Geeks
\n",
"
1538704800000
\n",
"
10/05/18 02:00
\n",
"
2 hours, 0 minutes
\n",
"
29
\n",
"
0
\n",
"
\n",
"
\n",
"
14
\n",
"
Designers + Geeks
\n",
"
1537495200000
\n",
"
09/21/18 02:00
\n",
"
2 hours, 0 minutes
\n",
"
37
\n",
"
0
\n",
"
\n",
"
\n",
"
4
\n",
"
Designers + Geeks
\n",
"
1536285600000
\n",
"
09/07/18 02:00
\n",
"
2 hours, 0 minutes
\n",
"
87
\n",
"
0
\n",
"
\n",
"
\n",
"
40
\n",
"
Docker Online Meetup
\n",
"
1540915200000
\n",
"
10/30/18 16:00
\n",
"
1 hours, 0 minutes
\n",
"
1
\n",
"
1
\n",
"
\n",
"
\n",
"
97
\n",
"
SF Big Analytics
\n",
"
1549591200000
\n",
"
02/08/19 02:00
\n",
"
2 hours, 30 minutes
\n",
"
418
\n",
"
2
\n",
"
\n",
"
\n",
"
80
\n",
"
SF Big Analytics
\n",
"
1547690400000
\n",
"
01/17/19 02:00
\n",
"
3 hours, 0 minutes
\n",
"
450
\n",
"
2
\n",
"
\n",
"
\n",
"
78
\n",
"
SF Big Analytics
\n",
"
1547604000000
\n",
"
01/16/19 02:00
\n",
"
3 hours, 0 minutes
\n",
"
5
\n",
"
2
\n",
"
\n",
"
\n",
"
64
\n",
"
SF Big Analytics
\n",
"
1544061600000
\n",
"
12/06/18 02:00
\n",
"
3 hours, 0 minutes
\n",
"
7
\n",
"
2
\n",
"
\n",
"
\n",
"
53
\n",
"
SF Big Analytics
\n",
"
1542247200000
\n",
"
11/15/18 02:00
\n",
"
3 hours, 0 minutes
\n",
"
16
\n",
"
2
\n",
"
\n",
"
\n",
"
51
\n",
"
SF Big Analytics
\n",
"
1542160800000
\n",
"
11/14/18 02:00
\n",
"
2 hours, 30 minutes
\n",
"
478
\n",
"
2
\n",
"
\n",
"
\n",
"
50
\n",
"
SF Big Analytics
\n",
"
1541782800000
\n",
"
11/09/18 17:00
\n",
"
57 hours, 0 minutes
\n",
"
16
\n",
"
2
\n",
"
\n",
"
\n",
"
44
\n",
"
SF Big Analytics
\n",
"
1541120400000
\n",
"
11/02/18 01:00
\n",
"
3 hours, 0 minutes
\n",
"
152
\n",
"
2
\n",
"
\n",
"
\n",
"
29
\n",
"
SF Big Analytics
\n",
"
1539909000000
\n",
"
10/19/18 00:30
\n",
"
2 hours, 30 minutes
\n",
"
390
\n",
"
2
\n",
"
\n",
"
\n",
"
16
\n",
"
SF Big Analytics
\n",
"
1538010000000
\n",
"
09/27/18 01:00
\n",
"
2 hours, 45 minutes
\n",
"
245
\n",
"
2
\n",
"
\n",
"
\n",
"
8
\n",
"
SF Big Analytics
\n",
"
1536800400000
\n",
"
09/13/18 01:00
\n",
"
2 hours, 30 minutes
\n",
"
218
\n",
"
2
\n",
"
\n",
"
\n",
"
10
\n",
"
SF Data Mining
\n",
"
1537200000000
\n",
"
09/17/18 16:00
\n",
"
104 hours, 0 minutes
\n",
"
6
\n",
"
3
\n",
"
\n",
"
\n",
"
93
\n",
"
SF Data Science
\n",
"
1548988200000
\n",
"
02/01/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
94
\n",
"
4
\n",
"
\n",
"
\n",
"
89
\n",
"
SF Data Science
\n",
"
1548383400000
\n",
"
01/25/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
48
\n",
"
4
\n",
"
\n",
"
\n",
"
82
\n",
"
SF Data Science
\n",
"
1547778600000
\n",
"
01/18/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
26
\n",
"
4
\n",
"
\n",
"
\n",
"
81
\n",
"
SF Data Science
\n",
"
1547692200000
\n",
"
01/17/19 02:30
\n",
"
2 hours, 0 minutes
\n",
"
59
\n",
"
4
\n",
"
\n",
"
\n",
"
60
\n",
"
SF Data Science
\n",
"
1543539600000
\n",
"
11/30/18 01:00
\n",
"
3 hours, 30 minutes
\n",
"
18
\n",
"
4
\n",
"
\n",
"
\n",
"
52
\n",
"
SF Data Science
\n",
"
1542211200000
\n",
"
11/14/18 16:00
\n",
"
10 hours, 0 minutes
\n",
"
1
\n",
"
4
\n",
"
\n",
"
\n",
"
38
\n",
"
SF Data Science
\n",
"
1540774800000
\n",
"
10/29/18 01:00
\n",
"
4 hours, 0 minutes
\n",
"
21
\n",
"
4
\n",
"
\n",
"
\n",
"
30
\n",
"
SF Data Science
\n",
"
1539910800000
\n",
"
10/19/18 01:00
\n",
"
2 hours, 0 minutes
\n",
"
95
\n",
"
4
\n",
"
\n",
"
\n",
"
26
\n",
"
SF Data Science
\n",
"
1539306000000
\n",
"
10/12/18 01:00
\n",
"
3 hours, 0 minutes
\n",
"
15
\n",
"
4
\n",
"
\n",
"
\n",
"
102
\n",
"
SFHTML5
\n",
"
1550883600000
\n",
"
02/23/19 01:00
\n",
"
4 hours, 0 minutes
\n",
"
336
\n",
"
5
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name time time_dt \\\n",
"101 Designers + Geeks 1550804400000 02/22/19 03:00 \n",
"55 Designers + Geeks 1542337200000 11/16/18 03:00 \n",
"46 Designers + Geeks 1541124000000 11/02/18 02:00 \n",
"32 Designers + Geeks 1539914400000 10/19/18 02:00 \n",
"24 Designers + Geeks 1538704800000 10/05/18 02:00 \n",
"14 Designers + Geeks 1537495200000 09/21/18 02:00 \n",
"4 Designers + Geeks 1536285600000 09/07/18 02:00 \n",
"40 Docker Online Meetup 1540915200000 10/30/18 16:00 \n",
"97 SF Big Analytics 1549591200000 02/08/19 02:00 \n",
"80 SF Big Analytics 1547690400000 01/17/19 02:00 \n",
"78 SF Big Analytics 1547604000000 01/16/19 02:00 \n",
"64 SF Big Analytics 1544061600000 12/06/18 02:00 \n",
"53 SF Big Analytics 1542247200000 11/15/18 02:00 \n",
"51 SF Big Analytics 1542160800000 11/14/18 02:00 \n",
"50 SF Big Analytics 1541782800000 11/09/18 17:00 \n",
"44 SF Big Analytics 1541120400000 11/02/18 01:00 \n",
"29 SF Big Analytics 1539909000000 10/19/18 00:30 \n",
"16 SF Big Analytics 1538010000000 09/27/18 01:00 \n",
"8 SF Big Analytics 1536800400000 09/13/18 01:00 \n",
"10 SF Data Mining 1537200000000 09/17/18 16:00 \n",
"93 SF Data Science 1548988200000 02/01/19 02:30 \n",
"89 SF Data Science 1548383400000 01/25/19 02:30 \n",
"82 SF Data Science 1547778600000 01/18/19 02:30 \n",
"81 SF Data Science 1547692200000 01/17/19 02:30 \n",
"60 SF Data Science 1543539600000 11/30/18 01:00 \n",
"52 SF Data Science 1542211200000 11/14/18 16:00 \n",
"38 SF Data Science 1540774800000 10/29/18 01:00 \n",
"30 SF Data Science 1539910800000 10/19/18 01:00 \n",
"26 SF Data Science 1539306000000 10/12/18 01:00 \n",
"102 SFHTML5 1550883600000 02/23/19 01:00 \n",
"\n",
" duration yes_rsvp_count id \n",
"101 2 hours, 0 minutes 178 0 \n",
"55 2 hours, 0 minutes 71 0 \n",
"46 2 hours, 0 minutes 71 0 \n",
"32 2 hours, 0 minutes 41 0 \n",
"24 2 hours, 0 minutes 29 0 \n",
"14 2 hours, 0 minutes 37 0 \n",
"4 2 hours, 0 minutes 87 0 \n",
"40 1 hours, 0 minutes 1 1 \n",
"97 2 hours, 30 minutes 418 2 \n",
"80 3 hours, 0 minutes 450 2 \n",
"78 3 hours, 0 minutes 5 2 \n",
"64 3 hours, 0 minutes 7 2 \n",
"53 3 hours, 0 minutes 16 2 \n",
"51 2 hours, 30 minutes 478 2 \n",
"50 57 hours, 0 minutes 16 2 \n",
"44 3 hours, 0 minutes 152 2 \n",
"29 2 hours, 30 minutes 390 2 \n",
"16 2 hours, 45 minutes 245 2 \n",
"8 2 hours, 30 minutes 218 2 \n",
"10 104 hours, 0 minutes 6 3 \n",
"93 2 hours, 0 minutes 94 4 \n",
"89 2 hours, 0 minutes 48 4 \n",
"82 2 hours, 0 minutes 26 4 \n",
"81 2 hours, 0 minutes 59 4 \n",
"60 3 hours, 30 minutes 18 4 \n",
"52 10 hours, 0 minutes 1 4 \n",
"38 4 hours, 0 minutes 21 4 \n",
"30 2 hours, 0 minutes 95 4 \n",
"26 3 hours, 0 minutes 15 4 \n",
"102 4 hours, 0 minutes 336 5 "
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Helper function to transform long id number to easy id numbers\n",
"# Get an enumeration of unique ids\n",
"unique_ids = final_df['id'].unique()\n",
"ids_2_new_ids = {k: v for v, k in enumerate(unique_ids)}\n",
"ids_2_new_ids\n",
"def get_new_id(old_id):\n",
" return ids_2_new_ids[old_id]\n",
"\n",
"# re-write those ids as something simpler\n",
"final_df['id'] = final_df['id'].apply(get_new_id)\n",
"\n",
"final_df.head(30)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"This is nice, but which groups have regular meetings, which ones have growing interest (increasing RSVP-yes)?\n",
"How do any of these groups compare to each other?\n",
"\n",
"### Let's visualize the trend lines for the top ten mega-groups\n",
"\n",
"Let's just use linear regression to draw trend lines for each mega-group.\n",
"\n",
"We'll use seaborn's **lmplot** to visualize all ten mega-groups.\n",
"\n",
"We need to use epoch milliseconds **time** column of our dataframe, since it is numeric and can be used to generate trend lines."
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"
"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# import statsmodels.api as sm\n",
"ax = sb.lmplot(\"time\", \"yes_rsvp_count\", data=final_df, hue=\"name\",\n",
" height=9, aspect=0.75, order=1, ci=None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ah, this is interesting\n",
"\n",
"From the top 10 mega-groups I can see that several groups have major problems.\n",
"For instance, the **Docker Online Meetup** group has almost no yes_rsvp_counts.\n",
"\n",
"Let's take a closer look."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" name time \\\n",
"1 Docker Online Meetup 1,540,915,200,000.00 \n",
"3 SF Data Mining 1,537,200,000,000.00 \n",
"9 Women Who Code SF 1,542,342,600,000.00 \n",
"4 SF Data Science 1,544,287,200,000.00 \n",
"0 Designers + Geeks 1,540,952,228,571.43 \n",
"8 The SF JavaScript Meetup 1,544,289,840,000.00 \n",
"7 Tech in Motion Events: San Francisco 1,544,405,400,000.00 \n",
"6 Silicon Valley Entrepreneurs & Startups 1,542,303,840,000.00 \n",
"2 SF Big Analytics 1,542,816,163,636.36 \n",
"5 SFHTML5 1,546,360,500,000.00 \n",
"\n",
" yes_rsvp_count id \n",
"1 1.00 1.00 \n",
"3 6.00 3.00 \n",
"9 37.03 9.00 \n",
"4 41.89 4.00 \n",
"0 73.43 0.00 \n",
"8 83.80 8.00 \n",
"7 113.50 7.00 \n",
"6 130.30 6.00 \n",
"2 217.73 2.00 \n",
"5 218.33 5.00 "
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rsvp_df = final_df.groupby(['name'], as_index=False).mean()\n",
"\n",
"rsvp_df = rsvp_df.sort_values(by=['yes_rsvp_count', 'name'],\n",
" ascending=[True, True])\n",
"rsvp_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Okay, we have two mega-groups that nobody really RSVPs for....\n",
"\n",
"Let's exclude those now."
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [],
"source": [
"final_df = final_df[~final_df['id'].isin([1, 3])]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's also make sure that our groups meet frequently enough\n"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
id
\n",
"
time
\n",
"
\n",
" \n",
" \n",
"
\n",
"
5
\n",
"
7
\n",
"
2
\n",
"
\n",
"
\n",
"
6
\n",
"
8
\n",
"
5
\n",
"
\n",
"
\n",
"
3
\n",
"
5
\n",
"
6
\n",
"
\n",
"
\n",
"
0
\n",
"
0
\n",
"
7
\n",
"
\n",
"
\n",
"
2
\n",
"
4
\n",
"
9
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
11
\n",
"
\n",
"
\n",
"
4
\n",
"
6
\n",
"
30
\n",
"
\n",
"
\n",
"
7
\n",
"
9
\n",
"
31
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id time\n",
"5 7 2\n",
"6 8 5\n",
"3 5 6\n",
"0 0 7\n",
"2 4 9\n",
"1 2 11\n",
"4 6 30\n",
"7 9 31"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"event_count_df = final_df.groupby(['id'], as_index=False).count()\n",
"event_count_df = event_count_df.sort_values(by=['name', 'id'],ascending=[True, True])\n",
"event_count_df[[\"id\", \"time\"]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Hmm... a few of our groups meet rather infrequently\n",
"\n",
"Let's remove those as well."
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"final_df = final_df[~final_df['id'].isin([7, 8, 5, 0])]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Let's try this again, only with the remaining 4 mega-groups\n",
"\n",
"This time we're going to do two more things:\n",
"1. Show our Confidence Interval. This adds a translucent band showing which events had a number of RSVPs within a single Standard Deviation of the mean.\n",
"2. Estimate a [**robust regression**](https://en.wikipedia.org/wiki/Robust_regression), to de-weight outliers"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"%%capture --no-display\n",
"sb.lmplot(\"time\", \"yes_rsvp_count\", data=final_df, hue=\"name\", \n",
" height=9, aspect=0.75, order=1,\n",
" ci=68, # 1. Confidence interval of 1 Standard Deviation\n",
" robust=True) # 2. estimate robust regression, to de-weight outliers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Looking at the for the four remaining lines, we can see that\n",
"\n",
"### Gold Medal -- [SF Big Analytics](https://www.meetup.com/SF-Big-Analytics/)\n",
"* PROs\n",
" * the largest mean number of RSVPs\n",
" * a slope for the trendline that indicates that it's getting even more popular\n",
"* CONs\n",
" * the confidence interval shows a pretty thick blue bar, which means that the yes RSVPs are pretty hit-and-miss. Check each event in advance, so that you're not surprised\n",
" \n",
"### Silver Medal -- [Silicon Valley Entrepreneurs & Startups](https://www.meetup.com/sventrepreneurs/)\n",
"* PROs\n",
" * holding onto a flat trend of about 100 yes-RSVPs per meeting\n",
" * Met 31 times in the past 3 months\n",
"* CONs\n",
" * The trend is flat or slightly down over the last nine months, so it's not really showing signs of getting more popular\n",
" \n",
"### Bronze Medal # 1 -- [SF Data Science](https://www.meetup.com/SF-Data-Science/)\n",
"* PROs\n",
" * Show a pretty aggresive slope for the trend line, it's definitely rising in popularity.\n",
"* CONs \n",
" * Meets up the least frequently of the four top mega-groups, just about monthly\n",
" \n",
"### Bronze Medal # 2 -- [Women Who Code SF](https://www.meetup.com/Women-Who-Code-SF/)\n",
"* PROs\n",
" * Met up 31 times in the past 9 months (that's almost weekly)\n",
" * Smallest variance in RSVPs, perhaps indicating a loyal following\n",
"* CONs\n",
" * Smallest average number of RSVPs\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Goal Achieved!\n",
"\n",
"At last! We found the four best Tech groups in San Francisco, CA that:\n",
"1. are among the 10 biggest in the city\n",
"2. that are popular and are staying popular\n",
"3. holds events at least monthly\n",
"\n",
"## Conclusion\n",
"\n",
"We achieved our objectives and demonstrated several useful techniques along the way we :\n",
"1. worked with the [Python Meetup API client]\n",
"2. built a helper function to parse response objects into **meta** and **results** dataframes\n",
"3. built a helper function to loop through multiple API calls and [concat]-enate a list of pages into a single useful dataframe\n",
"4. Used pandas.DataFrame.[query] to sort and filter data of interest\n",
"5. Used pandas.DataFrame.[apply] to clean columns of data using custom helper functions\n",
"6. Used pandas.DataFrame.[describe] to get descriptive statistics that summarize\n",
" * the central tendency\n",
" * dispersion\n",
" * shape of our dataset's distribution\n",
"7. Used pandas.DataFrame.[merge] to join the **events** and **groups** dataframes to create a report of the events for our 10 biggest mega-groups in technology\n",
"8. Used visualizations and statistics to filter those 10 mega-groups into the 4 very best tech groups in San Francisco\n",
"\n",
"\n",
"[//]: # (References)\n",
"\n",
"[Python Meetup API client]: https://meetup-api.readthedocs.io/en/latest/meetup_api.html#api-client-details\n",
"[concat]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html\n",
"[query]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html\n",
"[apply]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html\n",
"[describe]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html\n",
"[merge]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (datascience_challenges)",
"language": "python",
"name": "datascience_challenges"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": true,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}