{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Connecting to Facebook API\n", "\n", "Written by Kat Chuang [@katychuang](http://twitter.com/katychuang)\n", "\n", "## Objective\n", "\n", "The goal of this exercise is to connect with Facebook Graph Api to collect information about my most recent posts, and also to collect each posts' subsequent comments and likes.\n", "\n", "## Setup\n", "\n", "- I first created a virtual environment for my notebooks: `mkvirtualenv --python=/usr/local/bin/python3 dataAnalysis`\n", "- Then installed the notebook server to the machine. Instructions can be [found here](http://jupyter.org/install.html)\n", "- Start server in the root directory, `jupyter notebook`\n", "\n", "In addition to getting the console ready, I saved my access token information in a separate local folder, `_keys` in a file name `facebook.py`. Inside you want to save a string variable like the following:\n", "\n", "```\n", "ACCESS_TOKEN=\"XXXXXX\"\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Making a request" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dict_keys(['message', 'story', 'created_time', 'id']) \n", "\n" ] } ], "source": [ "from _keys.facebook import USER_ID, ACCESS_TOKEN, paging_token\n", "import requests\n", "\n", "host = 'https://graph.facebook.com/v2.8'\n", "\n", "u = '{}/{}/posts?access_token={}'.format(host, USER_ID, ACCESS_TOKEN)\n", "data1 = requests.get(u).json()\n", "\n", "pg2 = '{}/{}/posts?limit=25&until=1486832400&__paging_token={}&access_token={}'.format(host, USER_ID, paging_token, ACCESS_TOKEN)\n", "data2 = requests.get(pg2).json()\n", "\n", "data = []\n", "data.extend(data1[\"data\"])\n", "data.extend(data2[\"data\"])\n", "\n", "print(data[0].keys(), \"\\n\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "The JSON data is saved into the variable `data`,which is a list of dictionaries, so we can now use the data structure functions to access information for each story. Here are the two most recent posts' IDs and timestamps." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2017-03-19T22:23:36+0000\n" ] } ], "source": [ "# Two most post time:\n", "print(data[0]['created_time'])" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Parsing data\n", "\n", "The next step is to save this data for analysis. We could save it into a text file. We could save it into a database. Since this is a small amount of data (n=25), I chose to iterate quickly on developing the code by loading the JSON response directly into a Python dataframe.\n", "\n", "Let's see what happens when we use parse the timestamps:\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [], "source": [ "import datetime\n", "from dateutil import parser\n", "\n", "# return \n", "def scrub(timestamp):\n", " d = parser.parse(timestamp)\n", " return dow(d), hod(d)\n", "\n", "# returns day of week\n", "def dow(date): return date.strftime(\"%A\")\n", "\n", "# returns hour of day\n", "def hod(time): return time.strftime(\"%-I:%M%p\")\n", "\n", "a = list(map((lambda x: scrub(x['created_time'])), data ))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "day\t\tposts\n", "=========== =======\n", "Monday \t 2\n", "Tuesday \t 3\n", "Wednesday \t 8\n", "Thursday \t 8\n", "Friday \t 7\n", "Saturday \t 6\n", "Sunday \t 12\n" ] } ], "source": [ "days=[\"Monday\",\"Tuesday\",\"Wednesday\",\"Thursday\",\"Friday\",\"Saturday\",\"Sunday\"]\n", "\n", "dow = list(map( (lambda x: x[0]) , a))\n", "\n", "print(\"day\\t\\tposts\")\n", "print(\"=========== =======\")\n", "for day in days:\n", " print (day, \" \\t\", dow.count(day))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Saving data to file\n", "\n", "You can easily save the list of dictionaries to a text file using the metho `json.dump()` like so:\n", "\n", "```\n", "import json\n", "\n", "with open(output_file, 'w') as jsonfile:\n", " json.dump(data, jsonfile)\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }