{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Generate enclosures for the Great Britain\n",
    "\n",
    "This notebook generates _enclosures_ for the area of Great Britain based on set barriers. Note that the final version is different from the [proof of a concept](Parallelized_enclosures). It does not use dask since we found that parallelisation is not necessary and it uses different preprocessing steps for railway.\n",
    "\n",
    "Note: An algorithm to generate enclosures has been implemented in momepy 0.4.0 as [`momepy.enclosures`](http://docs.momepy.org/en/latest/generated/momepy.enclosures.html#momepy.enclosures).\n",
    "\n",
    "Used barriers:\n",
    "\n",
    "- road network (OS OpenRoads)\n",
    "- railway network (OS OpenMap Local)\n",
    "- rivers (OS OpenRivers)\n",
    "- coastline (OS Strategi®)\n",
    "\n",
    "Connect to db:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import geopandas as gpd\n",
    "\n",
    "from sqlalchemy import create_engine\n",
    "\n",
    "user = os.environ.get('DB_USER')\n",
    "pwd = os.environ.get('DB_PWD')\n",
    "host = os.environ.get('DB_HOST')\n",
    "port = os.environ.get('DB_PORT')\n",
    "\n",
    "db_connection_url = f\"postgres+psycopg2://{user}:{pwd}@{host}:{port}/built_env\"\n",
    "engine = create_engine(db_connection_url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load initial data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2min 7s, sys: 4.94 s, total: 2min 12s\n",
      "Wall time: 2min 17s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "sql = f'SELECT * FROM openroads_200803_topological'\n",
    "roads = gpd.read_postgis(sql, engine, geom_col='geometry')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 1.65 s, sys: 18.1 ms, total: 1.67 s\n",
      "Wall time: 1.66 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "# filter out tunnels\n",
    "roads = roads[roads.roadStructure != 'Road In Tunnel']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 330 ms, sys: 7.78 ms, total: 337 ms\n",
      "Wall time: 378 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "sql = f'SELECT * FROM gb_coastline_2016'\n",
    "coastline = gpd.read_postgis(sql, engine, geom_col='geometry')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate enclosures\n",
    "\n",
    "### First level\n",
    "\n",
    "The first level enclosures are composed of road network and external boundary, which in our case is a coastline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pygeos\n",
    "import pandas as pd\n",
    "\n",
    "from shapely.ops import polygonize\n",
    "from shapely.geometry import Point\n",
    "\n",
    "from snap import line_to_line, close_gaps\n",
    "from consolidate import topology"
   ]
  },
  {
   "source": [
    "There three simple steps:\n",
    "\n",
    "1. merge layers to a single GeoSeries.\n",
    "2. union geometries to a single MultiLineString. That helps with precision of `polygonize`.\n",
    "3. polygonize data and save them as GeometryArray."
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 25.5 ms, sys: 0 ns, total: 25.5 ms\n",
      "Wall time: 23 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "barriers = pd.concat([roads.geometry, coastline.geometry])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 14min 16s, sys: 7.33 s, total: 14min 23s\n",
      "Wall time: 14min 13s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "unioned = barriers.unary_union"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 1min 4s, sys: 482 ms, total: 1min 4s\n",
      "Wall time: 1min 3s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "polygons = polygonize(unioned)\n",
    "enclosures = gpd.array.from_shapely(list(polygons), crs=roads.crs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Additional barriers\n",
    "\n",
    "Additional barriers are used to further subdivide those generated above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 2.95 s, sys: 15.8 ms, total: 2.97 s\n",
      "Wall time: 3.06 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "sql = f'SELECT * FROM openmap_railwaytrack_200824'\n",
    "railway = gpd.read_postgis(sql, engine, geom_col='geometry')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 9.97 s, sys: 190 ms, total: 10.2 s\n",
      "Wall time: 10.8 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "sql = f'SELECT * FROM openrivers_200909'\n",
    "rivers = gpd.read_postgis(sql, engine, geom_col='geometry')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Preprocess railways\n",
    "\n",
    "Railways are not optimal input. OS OpenMap Local is cartographic resource and LineStrings representing railways are split into multiple pieces, sometimes even with gaps in between. Moreover, there is almost always a gap where railways cross roads. All that needs to be fixed, otherwise we won't get enclosed geometry.\n",
    "\n",
    "Preprocessing fucntions `topology`, `close_gaps` and `line_to_line` have been implemented in momepy 0.4.0 as [`momepy.remove_false_nodes`](http://docs.momepy.org/en/latest/generated/momepy.remove_false_nodes.html#momepy.remove_false_nodes), [`momepy.close_gaps`](http://docs.momepy.org/en/latest/generated/momepy.close_gaps.html#momepy.close_gaps) and [`momepy.extend_lines`](http://docs.momepy.org/en/latest/generated/momepy.extend_lines.html#momepy.extend_lines).\n",
    "\n",
    "The first step is to clean topology - remove nodes of a degree 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 4min 42s, sys: 7.19 ms, total: 4min 42s\n",
      "Wall time: 4min 42s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "railway_topo = topology(railway)"
   ]
  },
  {
   "source": [
    "Second step closes gaps between LineStrings and then fixes resulting topology again."
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 5min 8s, sys: 506 ms, total: 5min 8s\n",
      "Wall time: 5min 7s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "closed = close_gaps(railway_topo, tolerance=25)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 10.6 s, sys: 3.96 ms, total: 10.6 s\n",
      "Wall time: 10.6 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "closed_topo = topology(gpd.GeoDataFrame(geometry=closed))"
   ]
  },
  {
   "source": [
    "Finally, we extend lines to adjacent road geometry to close the area."
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 13.5 s, sys: 19.5 ms, total: 13.5 s\n",
      "Wall time: 13.5 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "extended_topo = line_to_line(closed_topo, roads, 25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Subdivide enclosures\n",
    "\n",
    "With the preprocessed data, we can subdivide first level enclosures into final ones.\n",
    "\n",
    "Due to the current transition between pygeos and shapely 2.0, we are using here private `_pygeos_to_shapely` function from GeoPandas. That will not be needed in future.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "import itertools\n",
    "\n",
    "import numpy as np\n",
    "import dask.bag as db\n",
    "\n",
    "from tqdm.notebook import tqdm\n",
    "from geopandas._vectorized import _pygeos_to_shapely"
   ]
  },
  {
   "source": [
    "We need a single GeoSeries of additional barriers."
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 3.93 ms, sys: 36 µs, total: 3.96 ms\n",
      "Wall time: 2.6 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "additional = pd.concat([rivers.geometry, extended_topo.geometry])"
   ]
  },
  {
   "source": [
    "Using spatial index, we link additional barriers to existing enclosures."
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 27.8 s, sys: 43.9 ms, total: 27.9 s\n",
      "Wall time: 27.7 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "sindex = gpd.GeoSeries(enclosures).sindex\n",
    "inp, res = sindex.query_bulk(additional.geometry, predicate='intersects')"
   ]
  },
  {
   "source": [
    "Unique enclosure indices in `res` mark those enclosures which needs to be subdivided."
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 12.2 ms, sys: 2 µs, total: 12.2 ms\n",
      "Wall time: 10.6 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "unique = np.unique(res)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We loop over polygons and generate new geometry using additional barriers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "d3682898384d4f3993a0c74de276aec5",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "HBox(children=(FloatProgress(value=0.0, max=66823.0), HTML(value='')))"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "CPU times: user 3min 7s, sys: 607 ms, total: 3min 7s\n",
      "Wall time: 3min 6s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "new = []\n",
    "\n",
    "for i in tqdm(unique, total=len(unique)):\n",
    "    poly = enclosures.data[i]  # get enclosure polygon\n",
    "    crossing = inp[res==i]  # get relevant additional barriers\n",
    "    buf = pygeos.buffer(poly, 0.01)  # to avoid floating point errors\n",
    "    crossing_ins = pygeos.intersection(buf, additional.values.data[crossing])  # keeping only parts of additional barriers within polygon\n",
    "    union = pygeos.union_all(np.append(crossing_ins, pygeos.boundary(poly)))  # union\n",
    "    polygons = np.array(list(polygonize(_pygeos_to_shapely(union))))  # polygonize\n",
    "    within = pygeos.covered_by(pygeos.from_shapely(polygons), buf)  # keep only those within original polygon\n",
    "    new += list(polygons[within])"
   ]
  },
  {
   "source": [
    "Now we replace those polygons which needed subdivision with a new geometry."
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 334 ms, sys: 5 µs, total: 334 ms\n",
      "Wall time: 331 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "final_enclosures = gpd.GeoSeries(enclosures).drop(unique).append(gpd.GeoSeries(new))"
   ]
  },
  {
   "source": [
    "Before we save it to a file, let's check the difference between initial and final enclosures."
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(735372,)"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "final_enclosures.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(619191,)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "enclosures.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:1: UserWarning: this is an initial implementation of Parquet/Feather file support and associated metadata.  This is tracking version 0.1.0 of the metadata specification at https://github.com/geopandas/geo-arrow-spec\n",
      "\n",
      "This metadata specification does not yet make stability promises.  We do not yet recommend using this in a production setting unless you are able to rewrite your Parquet/Feather files.\n",
      "\n",
      "To further ignore this warning, you can do: \n",
      "import warnings; warnings.filterwarnings('ignore', message='.*initial implementation of Parquet.*')\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 4.81 s, sys: 626 ms, total: 5.43 s\n",
      "Wall time: 6.81 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "gpd.GeoDataFrame(geometry=final_enclosures, crs=roads.crs).to_parquet('../../urbangrammar_samba/enclosures.pq')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8-final"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}