{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Scraping data from ESRI Map Servers\n", "In the last notebook, we looked at the `Export Map` operation supported by many ESRI Map Services. Here we look at the `Query` operation which enables us to download specific features hosted in the layers included a map service. This can be a very useful means for collecting data from a wide variety of sources. \n", "\n", "We'll revisit the NCOneMap REST Endpoint ([link](https://services.nconemap.gov/secure/rest/services)) and download cenus 2010 tract data for NC." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring the map service in your browser.\n", "It's always good to familiarize yourself with a service before invoking it. Here's a nice sequence to get you started.\n", "\n", "\n", "* Open your browser to the metadata page for the [NC1Map_Census MapServer](https://services.nconemap.gov/secure/rest/services/NC1Map_Census/MapServer)\n", "\n", "\n", "* First, it's useful to preview the layers and features served in the map service. To do this, click on the [**View In: `ArcGIS Online Map Viewer`**](http://www.arcgis.com/home/webmap/viewer.html?url=https%3A%2F%2Fservices.nconemap.gov%2Fsecure%2Frest%2Fservices%2FNC1Map_Census%2FMapServer&source=sd). This will open a new window where you can look at and interact with the layers served in the map service. \n", " * Expand the layer in the contents to interact with specific layers.\n", " * Zoom to specific locations; note that some layers appear and disappear with the maps zoom level. \n", " * Search for Durham and display only the \"2010 Census - Block Group\" layer. \n", " * In the contents section, extend the details for this layer, then click the `Show Table` button beneath the layer's name. *This gives you an idea of how you can query records from this layer.*\n", " * Try filtering the records by clicking the `Filter` button (next to the `Show Table` button), and filter for `GEOID10 start with 37063` (that's the FIPS for Durham Co.)\n", " \n", " \n", "* Go back to the map service metadata page, and take note of the **layers** included in this map service. These, of course, corresponde to the ones we viewed in the ArcGIS Online interface. We'll stick to interacting with the 2010 Census Block Group data\n", "\n", "\n", "* Click on the link for the [2010 Census Block Group (8)](https://services.nconemap.gov/secure/rest/services/NC1Map_Census/MapServer/8) layer.\n", " * Examine the properties assocatied with this layer. In particular, look at:\n", " * The `MaxRecordCount` attribute. This defines how many features can be returned by a query at one time.\n", " * The `Fields` attribute, which lists the attribute fields included in this layer. \n", " \n", " \n", "* And finally, at the bottom of the page, examine the `Supported Operations`. We'll use the **Query** operation to query, and download, specific features." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quering features in the ArcGIS Rest Services\n", "[Resource: https://developers.arcgis.com/rest/services-reference/query-map-service-layer-.htm ]\n", "\n", "* Click on the [**Query**](https://services.nconemap.gov/secure/rest/services/NC1Map_Census/MapServer/8/query) opertaion for the layer. This brings up the interface to build the REST-based request to query data from the map service. \n", "\n", "We'll start slowly and then build up to a working query that does just what we want. First, we'll just fetch one record and examine the query results. \n", "\n", "* In the **Object IDs:** box, enter `1` to get the first record. Then execute the query by hitting Enter. \n", "\n", "You'll see the result at the bottom of the page.The full result, however, is cut off. We'll change the output from HTML to JSON to see the entire record. \n", "\n", "* In the **Format:** box, switch the value to `JSON` and run the query. \n", "\n", "This result is a raw JSON result, but we can browse it. Specifically, see how it returns, under the `features` section, an attribute of `GEOID10` with the 12-digit FIPS or \"GeoID\" code for the census block. We'll use that info to form a query. \n", "\n", "* Navigate your broswer back to the Query page. In the **Out Fields:** box, type a `*` to return **all** fields in our query. Run the query, keeping output in JSON format. Run the query." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you again see all the fields we can use to build our query. Next, we'll tweak the query to return all block groups falling in Durham county, that is all features with a GEOID10 starting with 37063.\n", "\n", "* Go back to the query page again. \n", " * Now in the **Where:** box, enter a string to return all records with a GEOID10 that begin with 37063: `GEOID10 = '370319703013'`. \n", " * Also, clear out the `0` in the **Object IDs:** box. \n", " * Run the query. You should again get one feature returned. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now to expand our query to return multiple records. This is a bit tricky as you need to know SQL commands, but they're just like the where clauses in ArcGIS Pro:\n", "\n", "* Change the contents of the **Where:** box to `GEOID10 LIKE '37063%'`\n", "\n", "The result is a looooong JSON object, but it contains all the information needed to build a feature class from these results. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary so far and what's next...\n", "Ok, we have a feel for constructing and executing a query, but we have two more steps to accomplish. First, we need to run this query programmatically, from our coding environment. And second, we have to deal with getting the JSON back to a workable feature class." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running queries from Python\n", "As seen in earlier examples, we can use the `requests` package to send REST based queries and handle the response. This case is just the same: we just need to manage the request URL that we created above. The URL for the Query request is copied and pasted below:\n", "\n", "```html\n", "https://services.nconemap.gov/secure/rest/services/NC1Map_Census/MapServer/8/query?where=GEOID10+LIKE+%2737063%25%27&text=&objectIds=&time=&geometry=&geometryType=esriGeometryEnvelope&inSR=&spatialRel=esriSpatialRelIntersects&relationParam=&outFields=&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&returnDistinctValues=false&resultOffset=&resultRecordCount=&queryByDistance=&returnExtentsOnly=false&datumTransformation=¶meterValues=&rangeValues=&f=pjson\n", "```\n", "\n", "Dissected into components, we have the code block below:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Construct the request elements\n", "theURL = 'https://services.nconemap.gov/secure/rest/services/NC1Map_Census/MapServer/8/query'\n", "params = {'where':\"GEOID10 LIKE '37063%'\",\n", " 'text':'','objectIds':'',\n", " 'time':'',\n", " 'geometry':'',\n", " 'geometryType':'esriGeometryEnvelope',\n", " 'inSR':'',\n", " 'spatialRel':'esriSpatialRelIntersects',\n", " 'relationParam':'',\n", " 'outFields':'*',\n", " 'returnGeometry':'true',\n", " 'returnTrueCurves':'false',\n", " 'maxAllowableOffset':'',\n", " 'geometryPrecision':'', \n", " 'outSR':'4326',\n", " 'returnIdsOnly':'false',\n", " 'returnCountOnly':'false',\n", " 'orderByFields':'',\n", " 'groupByFieldsForStatistics':'',\n", " 'outStatistics':'',\n", " 'returnZ':'false',\n", " 'returnM':'false',\n", " 'gdbVersion':'',\n", " 'returnDistinctValues':'false',\n", " 'resultOffset':'',\n", " 'resultRecordCount':'',\n", " 'queryByDistance':'',\n", " 'returnExtentsOnly':'false',\n", " 'datumTransformation':'',\n", " 'parameterValues':'',\n", " 'rangeValues':'',\n", " 'f':'pjson'\n", " }" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Import request and send the request\n", "import requests\n", "result = requests.get(theURL,params)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Convert to JSON\n", "theJSON = result.json()\n", "theJSON.keys()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Get the features contained in the result\n", "features=theJSON['features']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Converting the response to a local shapefile\n", "Geopandas is capable of converting JSON back to a feature class. We'll come back to explain this in more detail later, but here's the set of steps to do that. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Import pandas and geopandas\n", "import geopandas as gpd\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Create a dataFrame from the results, \n", "# keeping just the attributes and geometry objects\n", "df = pd.DataFrame(features,columns=('attributes','geometry'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#\"Explode\" the dictionary of values into fields\n", "dfAttribs = df['attributes'].apply(pd.Series)\n", "dfGeom = df['geometry'].apply(pd.Series)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Combine the two\n", "dfAll = pd.merge(dfAttribs,dfGeom,how='inner',left_index=True,right_index=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Write a function that converts JSON rings to geometry objects\n", "#https://shapely.readthedocs.io/en/latest/manual.html#polygons\n", "from shapely.geometry import Polygon\n", "from shapely.geometry import LinearRing\n", "def polyFromRing(ring):\n", " r = LinearRing(ring)\n", " s = Polygon(r)\n", " return r" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Apply the function to all rows\n", "dfAll['geometry']=dfAll.apply(lambda x: Polygon(x.rings[0]),axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Convert the dataframe to geopandas dataframe\n", "gdf=gpd.GeoDataFrame(dfAll)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Drop the rings field\n", "gdf.drop('rings',axis='columns',inplace=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#Convert to a shapefile\n", "gdf.to_file(driver='ESRI Shapefile',filename=\"DurhamBlockGroups.shp\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gdf.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "gdf.plot()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.12" } }, "nbformat": 4, "nbformat_minor": 2 }