{ "metadata": { "name": "", "signature": "sha256:a687f14352fd60d7dfd3aede3941413ec1dc0ea88eb84849e1d9203985839acf" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Building a Foursquare Location Graph\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialization\n", "\n", "First, you have to find your access tokens to use the Foursquare API with reasonable rate limits.\n", "\n", "If you have an access token, you can use that, otherwise register an app an use the client id and secret for the following steps." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import foursquare\n", "import pandas as pd\n", "\n", "#ACCESS_TOKEN = \"\"\n", "#client = foursquare.Foursquare(access_token=ACCESS_TOKEN)\n", "\n", "CLIENT_ID = \"\"\n", "CLIENT_SECRET = \"\"\n", "client = foursquare.Foursquare(client_id=CLIENT_ID, client_secret=CLIENT_SECRET)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 64 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fetching the Foursquare location links\n", "\n", "1. Starting at Munich Marienplatz as seed venue\n", "2. Fetching the 5 next venues (from API nextvenues, see: https://developer.foursquare.com/docs/venues/nextvenues)\n", "3. For each of the 5 venues fetch the 5 next venues\n", "4. Repeat until saturation (no new locations)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# bbox = [11.109872,47.815652,12.068588,48.397136] # bounding box for Munich\n", "# bbox = [13.088400,52.338120,13.761340,52.675499] # bounding box for Berlin\n", "bbox = [5.866240,47.270210,15.042050,55.058140] # bounding box for Germany" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 219 }, { "cell_type": "code", "collapsed": false, "input": [ "new_crawl = [] # list of locations to be crawled\n", "done = [] # list of crawled locations\n", "links = [] # list of tuples that represent links between locations\n", "venues = pd.DataFrame() # dictionary of locations id => meta-data on location" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 730 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set seed values for Marienplatz, Airport and Central Station.\n", "\n", "Depth is the number of recursive crawling processes." ] }, { "cell_type": "code", "collapsed": false, "input": [ "to_crawl = [\"4ade0ccef964a520246921e3\", \"4cbd1bfaf50e224b160503fc\", \"4b0674e2f964a520f4eb22e3\"]\n", "depth = 8" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 731 }, { "cell_type": "code", "collapsed": false, "input": [ "for i in range(depth):\n", " new_crawl = []\n", " print \"Step \" + str(i) + \": \" + str(len(venues)) + \" locations and \" + str(len(links)) + \" links. \" + str(len(to_crawl)) + \" venues to go.\"\n", " for v in to_crawl:\n", " if v not in venues:\n", " res = client.venues(v)\n", " venues = venues.append(pd.DataFrame({\"name\":res[\"venue\"][\"name\"],\"users\":res[\"venue\"][\"stats\"][\"usersCount\"],\n", " \"checkins\":res[\"venue\"][\"stats\"][\"checkinsCount\"], \"lat\":res[\"venue\"][\"location\"][\"lat\"], \n", " \"lng\":res[\"venue\"][\"location\"][\"lng\"]}, index=[v]))\n", " next_venues = client.venues.nextvenues(v)\n", " for nv in next_venues['nextVenues']['items']:\n", " if ((nv[\"location\"][\"lat\"] > bbox[1]) & (nv[\"location\"][\"lat\"] < bbox[3]) & \n", " (nv[\"location\"][\"lng\"] > bbox[0]) & (nv[\"location\"][\"lng\"] < bbox[2])):\n", " if nv[\"id\"] not in venues:\n", " venues = venues.append(pd.DataFrame({\"name\":nv[\"name\"],\"users\":nv[\"stats\"][\"usersCount\"],\n", " \"checkins\":nv[\"stats\"][\"checkinsCount\"], \"lat\":nv[\"location\"][\"lat\"], \n", " \"lng\":nv[\"location\"][\"lng\"]}, index=[nv[\"id\"]]))\n", " if (nv[\"id\"] not in done) & (nv[\"id\"] not in to_crawl) & (nv[\"id\"] not in new_crawl):\n", " new_crawl.append(nv[\"id\"])\n", " links.append((v, nv[\"id\"]))\n", " done.append(v)\n", " to_crawl = new_crawl" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Step 0: 0 locations and 0 links. 3 venues to go.\n", "Step 1: 12 locations and 9 links. 7 venues to go." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Step 2: 53 locations and 43 links. 17 venues to go." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Step 3: 153 locations and 126 links. 17 venues to go." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Step 4: 235 locations and 191 links. 12 venues to go." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Step 5: 291 locations and 235 links. 13 venues to go." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Step 6: 348 locations and 279 links. 22 venues to go." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Step 7: 461 locations and 370 links. 28 venues to go." ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 732 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generating the network\n", "\n", "We're importing networkx to build the network out of our crawled venues (= nodes) and links between them." ] }, { "cell_type": "code", "collapsed": false, "input": [ "venues = venues.reset_index().drop_duplicates(cols='index',take_last=True).set_index('index')\n", "venues.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | checkins | \n", "lat | \n", "lng | \n", "name | \n", "users | \n", "
---|---|---|---|---|---|
index | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
4cbd1bfaf50e224b160503fc | \n", "224872 | \n", "48.352599 | \n", "11.780992 | \n", "M\u00fcnchen Flughafen \"Franz Josef Strau\u00df\" (MUC) | \n", "83604 | \n", "
4b0674e2f964a520f4eb22e3 | \n", "88327 | \n", "48.140547 | \n", "11.555772 | \n", "M\u00fcnchen Hauptbahnhof | \n", "18833 | \n", "
4b56f6eef964a520ec2028e3 | \n", "1845 | \n", "48.137558 | \n", "11.579466 | \n", "Augustiner am Platzl | \n", "1471 | \n", "
4ade0d1df964a520dc6a21e3 | \n", "2053 | \n", "48.136930 | \n", "11.574156 | \n", "Sporthaus Schuster | \n", "1137 | \n", "
4bbc6329afe1b7136d4d304b | \n", "2702 | \n", "48.135282 | \n", "11.576350 | \n", "Biergarten am Viktualienmarkt | \n", "1534 | \n", "
5 rows \u00d7 5 columns
\n", "