{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tokyo Photographs" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "data": { "text/markdown": [ "# Tokyo data\n", "\n", "This dataset contains a sample of geotagged images uploaded to Flickr for the Tokyo region. The original extract (generated by Meixu Chen, `meixu@liverpool.ac.uk`) is stored for archival purposes as `tokyo.csv`.\n", "\n", "- `Source`: Yahoo Flickr Creative Commons 100 Million Dataset\n", "- `URL`:\n", "\n", "> [https://yahooresearch.tumblr.com/post/89783581601/one-hundred-million-creative-commons-flickr-images](https://yahooresearch.tumblr.com/post/89783581601/one-hundred-million-creative-commons-flickr-images)\n", "\n", "- `Processing`: transformations applied to the original extract, including a random subset to a more manageable size for pedagogical purposes, are documented in `tokyo_cleaning.ipynb`\n", " - Clean file: `tokyo_clean.csv`\n", "\n", "## Metadata\n", "\n", "For every record, the following information is provided:\n", "\n", "* `user_id`: the unique id number of each Flickr user.\n", "\n", "* `longitude`: longitude of the geotagged Flickr photo in decimal format,\n", "under WGS1984 geographic coordinate system.\n", "\n", "* `latitude`: latitude of the geotagged Flickr photo in decimal format,\n", "under WGS1984 geographic coordinate system.\n", "\n", "* `date_taken`: the date when the photo was taken.\n", "\n", "* `photo/video_page_url`: an url link where the photo/video content is\n", "available.\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import display_markdown\n", "\n", "display_markdown(open(\"README.md\").read(), raw=True)" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import geopandas as gpd\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import numpy as np" ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "db = pd.read_csv('data/tokyo.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Randomly subsetting" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "# Set the \"seed\" so every run produces the generates the same random numbers\n", "np.random.seed(1234)\n", "# Create a sequence of length equal to the number of rows in the table\n", "ri = np.arange(len(db))\n", "# Randomly reorganize (shuffle) the values\n", "np.random.shuffle(ri)\n", "# Reindex the table by using only the first 10,000 numbers \n", "# of the (now randomly arranged) sequence\n", "db = db.iloc[ri[:10000], :]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reproject XY coordinates in separate columns" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 431 ms, sys: 4.86 ms, total: 436 ms\n", "Wall time: 436 ms\n" ] } ], "source": [ "%%time\n", "pts = db.apply(lambda r: Point(r.longitude, r.latitude), axis=1)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "gdb = gpd.GeoDataFrame(db.assign(geometry=pts), \\\n", " crs={'init' :'epsg:4326'})" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 529 ms, sys: 7.46 ms, total: 536 ms\n", "Wall time: 747 ms\n" ] } ], "source": [ "%%time\n", "gdb = gdb.to_crs(epsg=3857)" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 2.13 s, sys: 20.3 ms, total: 2.15 s\n", "Wall time: 2.16 s\n" ] } ], "source": [ "%%time\n", "xys = gdb['geometry'].apply(lambda pt: pd.Series({'x': pt.x, 'y': pt.y}))\n", "gdb['x'] = xys['x']\n", "gdb['y'] = xys['y']" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "collapsed": true, "jupyter": { "outputs_hidden": true } }, "outputs": [], "source": [ "gdb.drop('geometry', axis=1).to_csv('tokyo_clean.csv', index=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## Download link\n", "\n", "{download}`[Download the *tokyo_clean.csv* file] `" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" } }, "nbformat": 4, "nbformat_minor": 4 }