{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ " # Spatial Thinking and Skills for Archaeology " ] }, { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Chapter 1: Introduction: notebooks motivations \n", "\n", "This notebook has stemmed from 'A course on Spatial Thinking and Methods for Archaeologists' given to the Honours Students in Archaeology at the University of Glasgow by Dr Rachel Opitz. The course enabled the students to think about how to use spatial analysis to analyse archaeological data.
\n", "\n", "This notebook has two goals: \n", " * **conceptual goal**\n", " * Introduce Spatial Techniques and Concepts (STC) within archaeology;\n", " * Understand the underlying principles, tools, method used with data, and also problems and limitations associated with data!\n", " * **technical goal** \n", " * Acquire more advanced computer skills;\n", " * Promote the thoughtful use of computer-based techniques within archaeology. \n", "\n", "**By reading this notebook, we want you to be able to create your own notebooks and experiment with real archaeological data.** \n", "\n", "More specifically, the learning outcomes are to:\n", "\n", " * Apply spatial tools and thinking to archaeological problems in an appropriate manner with accuracy and precision;\n", " * Understand underlying principles of graphical representations of data and their problems and limitations;\n", " * Critical awareness of the limitations and shortcomings of STC;\n", " * Understanding of the range of STC applications within archaeology.\n", " \n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "COZLcSnE2k30" }, "source": [ "## 1.1 What do we mean by spatial thinking?\n", "\n", "\n", "This is thinking critically about space and place in archaeology to answer questions:
\n", "Why do we care about the 'where' in archaeology? What is the relationship between space and place? How do you know where something is located? How do we define location? How do we recognise a spatial relationship? Why spatial patterns are important?\n", "\n", " \n", " \n", "* **What do we mean by Spatial Data?**\n", "\n", "Science is based on gathering evidence and interpreting the evidence to draw logical conclusions. Data science is a branch of computer science dealing with, capturing, processing, and analysing data to gain new insights about the systems being studied ([ref. to open source](https://opensource.com/resources/data-science)). \n", "\n", "From an archaeology point of view, this is all about learning how to:\n", " * design spatial data;\n", " * collect it;\n", " * tell if it is dirty data and clean it; \n", " * combine it with other data;\n", " * select from it for your *purposes*, your *research question*;\n", " * take care of it.\n", " \n", "**This data has to be digital (or made digital) and spatial!** \n", " \n", "| | \n", "| ------------- |\n", "|**Spatial data for ARCHAEOLOGY!** | \n", "| | \n", "\n", "
\n", "\n", "* **Which tools and which platform?**\n", "\n", " Because data science is is a broad field that's expanding rapidly, we need to find techniques and methods to systematically question and analyse data. Data can be untangled and analysed with multiple tools:\n", " \n", "
\n", " \n", " [Adapted from Modern Data Scientist skill set](https://mywebvault.wordpress.com/2017/05/18/modern-data-scientist-skill-set-marketing-distillery/) by Krzysztof Zawadzki
\n", " \n", "
\n", " \n", " To capture, observe, collect, store, and manage geographic data; and to analyse and visualise this spatial data, you need a System that deals with Geographic Information called GIS (Longley et al.,2001). A GIS comprises hardware (computer, mobile device, GPS), software (ArcGIS, QGIS or online mapping) and data (geographic information). The process of developing software is called programming where a program instructs the computer to accomplish a task based on the orders (Yang, 2017). There many different types of programming levels: CPU for hardware instructions; C programming to develop software, C++ for programme organisation; Java and JavaScript to programme objects manipulations in a web browser; R (add the links from before) and Python are both object oriented and interactive programming language for data science. This is not the intention of this book to teach you to develop programmes but by using practical exercises to introduce you gradually to programming in order to question and analyse real archaeological data using one of its language: Python. \n", " \n", " Python was originally developed by a Dutch Programmer, Guido van Rossum, in 1990. Van Rossum was reportedly a fan of the British Monty Python's Flying Circus from which he borrowed the name 'Python' when he developed the open-source programming language (and his non-profit institution Python Software Foundation'). Python is described as a 'fully object-oriented language, simple yet elegant, and stable and mature' (Yang, 2017). \n", " \n", " These notebooks are your first step to experiment with spatial data using a programming language using python language. \n", "\n", "\n", "\n", "There are several steps to learning Python and to ease your steps in these practical exercises few fundamental concepts are introduced succinctly first:\n", "\n", " * **What is a GIS model?** GIS models are used to capture essential geo-spatial elements of a specific problem.
\n", "Lock and Pouncett (2017) reminds us \"that thinking spatially entails knowing about: \n", " * **Concepts of space**: providing the conceptual and analytical framework within which data can be integrated, related and structured into a whole.\n", " * **Tools of representation**: providing the forms within which structured information can be stored, analysed, comprehended and communicated.\n", " * **Processes of reasoning**: providing the means of manipulating,interpreting and explaining the structured information.\"\n", "\n", " Spatial analysis will help you to formulate problems, think creatively about solutions, and express a solution clearly and accurately as schematise in figure below:
\n", " \n", "| | | \n", "| ------------- |-------------|\n", "| | | \n", "| | Spatial analysis|
\n", " \n", " * **Which kind of Spatial Data?** There are 3 types of Data models: vector data (point, polyline and polygons data), raster data (and special data (network and linear data).
\n", "\n", "\n", "A **‘Feature-oriented’** approach. Spatial data can be stored as points (0-Dimensional), lines (1-Dimensional), areas (2-Dimensional), it is **vector** data: \n", "\n", "| | \n", "| ------------- |\n", " Data stored as vector | \n", "|
|\n", "|*Vector files can be Shapefiles, WKT, geoJSON*|\n", "\n", "\n", "
\n", "A **‘Coverage-oriented’** approach. Spatial data can also stored in matrices or grida where each cells represent points, lines, areas, it is **raster** data:\n", "\n", "| | \n", "| ------------- |\n", "| Data stored at raster |\n", "|
|\n", "|*Raster files can be geoTiff, HDF, BSQ, ASC, etc.*|\n", "\n", "
\n", "\n", " * **How this data is stored?**\n", " \n", "**Database** is an integrated set of attributes on a particular subject (a structured collection of data files) and \n", "**Geographic database / Geospatial database / Geodatabase** are set of attributes on a particular subject for a particular subject area (a structured collection of data files including spatial locations).
\n", "\n", "| | \n", "| ------------- |\n", "| GeoDatabase stores geometry information and relationships between different geometry layers |\n", "|
|\n", "\n", "\n", "\n", "Each object/data will have **attributes** that define the value of its properties and will be sorted differently depending it is a raster or vector. \n", "Attributes can be broadly split between numerical and categorical (see image below), and the difference that lies between them is crucial to grasp as it will define which tests (spatial analysis) and commands (from script) you can perform. These attributes can be: \n", "\n", " * Nominal – qualitative data defining the kind or type of object\n", " * Ordinal – ranking or place in hierarchy\n", " * Interval – quantitative measurement with arbitrary unit and zero\n", " * Mathematical operators limited\n", " * Ratio – quantitative measurement with non-arbitrary zero\n", " * Full range of mathematical operators applicable. \n", " \n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.2 Getting started \n", "\n", "\n", "Material for these series of practical exercises is built on top of individual [Jupyter Notebook]using Python. The code has all been written for you, you just need to execute it. Most code cells have an extra explanatory information to explain and detail how the code is working. So, when confused by python language, look for Learning a new language cells .\n", "\n", "### What's a notebook? And why Jupyter notebooks? \n", "\n", "[Jupyter notebooks](https://jupyter.org) are open-source web applications that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. Here are the user interfaces you can work with. \n", "\n", " \n", "\n", "[Jupyter notebooks](https://jupyter.org) have been used in this practical course exercises because they ease the creation and the sharing of documents containing live code such as[Python](https://www.python.org/![image.png](attachment:image.png))or [R](https://www.r-project.org/)). You can explore a [gallery](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks) of interesting Jupyter notebooks and read why Jupyter is data scientists’ computational notebook of choice ([i](https://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261) and [ii](https://www.nature.com/articles/d41586-018-07196-1)).
\n", " \n", "[GitHub](https://github.com/) is widely known as one of the most popular code hosting platforms for version control and collaboration. This is where these notebooks will be saved and accessed publicly an open-source project. **All working material can be found in https://github.com/Francoz-Charlotte/Spatial_teaching_CFediting** (NEED THE FINAL LINK HERE!) \n", "\n", "\n", " ### How does it work?### \n", "\n", "Jupyter Notebooks are made of a list of **CELLS**. Cells contain either explanatory **text** or executable **code** (here we are using Python 3) and its output (once you run the code). \n", "#### **Text cells: Markdowns**###\n", "This cell box is a text cell. Text cells use a language called [Markdown](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html). To learn more, you can follow [markdown guide](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) provides examples to edit the Markdown cell, there is also an excellent [markdown cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) or you could follow this brief [tutorial](https://github.com/twistedhardware/mltutorial/blob/master/notebooks/jupyter/2.%20Markdown%20%26%20LaTeX.ipynb).\n", "#### **The code cells: Working with python**####\n", "We will use these code cells to do things. They are executable code, mostly written in programming [language](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Running%20Code.html#Code-cells-allow-you-to-enter-and-run-code). These code cells are commonly using a language called [Python](https://www.python.org/) or [R](https://www.r-project.org/). In this course we'll mostly be using python. As you know, stereotypically archaeologists hate snakes. Python is the exception to this.\n", "\n", "\n", " ### What do you need?### \n", "You can run the entirety of these notebooks here or make your own copies and run the script with your own data. \n", "\n", "To run the notebooks, you can install python and work with these jupyter notebooks, for its simplicity [Anaconda]( https://www.anaconda.com/distribution/ ) is an easy way to have everytthing in place on your computer. \n", "\n", "To make your own copy, you'll need to:\n", " * Sign up for a github account at https://github.com/\n", " * Create your own spatial analysis course repository : https://help.github.com/en/articles/create-a-repoyou \n", " * Open your Book and Make a Copy in your own drive where you can work directly on & with your notebooks. **how?** Go to File then Locate the .ipin Drive.\n", " \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.3 Introductory exercise: learning rudimentaries working steps\n", "\n", "In this introductory exercise, you will learn how to get some tools to work with simple spatial data and you will also learn how to open up some data. \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's get the tools we need - We will always start practical exercises by getting some tools like this.### " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | | \n", "| ------------- | ------------- |------------- |\n", "|
| = |
|" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "In these notebooks, to make things happen we need to speak Python language. The language is compiled in different libraries that need to be imported to make things happen. These tools are open source libraries from Python that enable powerful data visualisation. Importing these libraries is in fact the first bit of python code you will be using. \n", "\n", "In this first exercise, we will be using \n", "[Pandas](https://pandas.pydata.org/) and [Geopandas](http://geopandas.org/) libraries. **Pandas** is a powerful and rich library that is designed for data analysis in Python. **Geopandas** is a library used to make working with geospatial data in python easier. GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's import our first tool" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": {}, "colab_type": "code", "id": "vMmtCROs70ej" }, "outputs": [], "source": [ "#codecell_introduction_importyourlibraries\n", "\n", "#start by importing a tool you want. \n", "#Pandas is a basic tool for working with text and spreadsheet type data. \n", "#Just say 'import' and the name of the tool\n", "\n", "import pandas as pd\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", " ### Learning a new language \n", " \n", " WHAT IS THIS? YOUR FIRST PROGRAMMING CODE CELL!
\n", "\n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "XxJMOyFU4dbQ" }, "source": [ "#### Let's import another tool\n", "Add another code cell below this one and try to import geopandas and give it the abbreviation 'gpd'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": {}, "colab_type": "code", "id": "YgPNKDBC38TQ", "scrolled": false }, "outputs": [], "source": [ "#codecell_introduction_importyourlibraries\n", "\n", "\n", "import geopandas as gpd" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Odds are that didn't work... Some tools need an additional 'install' step. \n", "# This happens when the tool isn't in the cloud environment of the software you are working on.\n", "# If something doesn't import, I pretty much always try to '!pip install' it.\n", "# pip is python's tool for getting tools. Very meta.\n", "\n", "!pip install geopandas\n", "import geopandas as gpd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", " ### Learning a new language \n", " \n", "\n", "\n", "You now have three tools: **Pandas, Geopandas and Matplotlib**. Whenever you need more tools, use **import** or **!pip install** which is a package management system to install manage software packages. \n", " \n", " \n", "Geopandas is notoriously difficult to install, especially on Windows. If you intend to work directly on the Jupyter notebooks, and pip install does not work for you, [this video]( https://www.youtube.com/watch?v=nLezhLLp3oQ ) will help you to get started with installation. if you are running out of luck I can only recommend to follow [this]( https://medium.com/@chrieke/howto-install-python-for-geospatial-applications-1dbc82433c05 ) and as a last resort [this link]( https://geoffboeing.com/2014/09/using-geopandas-windows/ )." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
|| " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Gy9N6RBt4ISP" }, "source": [ "#### Now try to import another tool\n", "Add another code cell below this one and figure out how to import **[matplotlib.pyplot](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.html)**. Matplotlib.pyplot lets you plot in 2D with an extensive plotting library for quality publication & provides a [MATLAB](https://www.mathworks.com/help/matlab/creating_plots/types-of-matlab-plots.html)-like way of plotting." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", " ### Learning a new language \n", " \n", "\n", "\n", "You can **add text or code cells** to your version of the notebook: use the `+` button in your toolbar and choose which type of cell you need (e.g. *Markdown*, *Raw NB convert*, *Heading*) as the default is `Code` cell. By hitting the up or down arrow button, you can move the cell to wherever you want. \n", "\n", "**Editing the cell**: click the cell to select it, and, double-click to edit it.\n", "\n", "**To make things happen in a Code cell** you can click on the cell and then type '*ctrl* + *enter*' and your command can be executed. An asterisk in square brackets `In [*]:` will appear while the code is being executed, and this will change to a number `In [1]:` when the code is finished. ***The order in which you execute the code blocks matters, they must be run in sequence.*** \n", "Inside blocks of python code there are comments indicated by lines that start with `#`. These lines are not computer code but rather comments providing information about what the code is doing to help you follow along. \n", "\n", "You are ready to go! and remember to save regularly.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
|| " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "hBYtz1065Hzl" }, "source": [ "### Now let's add data.\n", "\n", "
**Exploring the data you already know**
\n", "You can see some **geojson** (vector for the internet) data from [Canmore](https://canmore.org.uk/site/search/result?SITECOUNTRY=0&view=map) that represents locations of sites on Shetland as points with descriptive attributes [here](http://ropitz.github.io/digitalantiquity/data/CanmoreShetlandPoints.geojson). \n", "Have a look and identify the coordinates (spatial data), attributes, and coordinate system EPSG code.\n", "Write an example of each one in a new text field below this one in your notebook.
\n", "\n", "**What's a GeoJSON?** is an open standard format designed for representing simple geographical features, along with their non-spatial attributes. It is based on the JavaScript Object Notation. \n", "GeoJSON embeds attributes and the coordinates projection information (we'll talk a bity more about them in chapter 2). It is vector data.\n", "\n", "| | | \n", "| ------------- |-------------| \n", "|
|
| \n", "| |*GEoJson example*|\n", "\n", "\n", "**What's a WTK** or known as Well-known text (WKT). It is a text markup language for representing vector geometry objects on a map. A binary equivalent is known as well-known binary (WKB). Open Geospatial Consortium (OGC) designed and approved. The current standard definition is in the ISO/IEC 13249-3:2016 standard.\n", "\n", " \n", "| | | \n", "| ------------- |-------------|\n", "|GEOMETRYCOLLECTION(POINT(4 6) ~ LINESTRING(4 6,7 10))~ POINT ZM (1 1 5 60) ~ POINT M (1 1 80) ~ POINT EMPTY MULTIPOLYGON ~ EMPTY TRIANGLE((0 0 0,0 1 0,1 1 0,0 0 0))~ TIN (((0 0 0, 0 0 1, 0 1 0, 0 0 0)), ((0 0 0, 0 1 0, 1 1 0, 0 0 0))) ~ POLYHEDRALSURFACE Z ( PATCHES ((0 0 0, 0 1 0, 1 1 0, 1 0 0, 0 0 0)), ((1 1 1, 1 0 1, 0 0 1, 0 1 1, 1 1 1)), ((1 1 1, 1 0 1, 1 0 0, 1 1 0, 1 1 1)),((1 1 1, 1 1 0, 0 1 0, 0 1 1, 1 1 1)))|
| \n", "|*Well-known text (WKT)details* | *WKT example*\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "xHidimvO6Sll" }, "source": [ "**Let's import some real data**
\n", "\n", "We want to bring some data into this notebook. It is done by telling the notebook where the data is located and asking it to read that file using the geopandas tool using the data's URL (web address). So this data is digital.\n", "Instead of using this long web url, we can give the file a name within the notebook. Here I've called it 'df'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "ehdvcH7l3tWx" }, "outputs": [], "source": [ "#codecell_introduction_getyourdata\n", "\n", "url = \"http://ropitz.github.io/digitalantiquity/data/CanmoreShetlandPoints.geojson\"\n", "df = gpd.read_file(url)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "IKFBwPHu7Ivp" }, "source": [ "#### How do you see what's in your data?\n", "\n", "The command **.head()** will show you the first part of any file in a notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "yuxHVE-V7GG4" }, "outputs": [], "source": [ "#codecell_introduction_takeapick@yourdata\n", "\n", "# go ahead and look at the first few lines of the file. \n", "#Note this is slightly easier to read than the raw geojson\n", "\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "493PX4Ve7rgs" }, "source": [ "#### Well, if it's spatial data, you should be able to plot it\n", "\n", "This is usually fairly easy. For any spatial dataset, just call its name and say .plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "p4CFpEus711c" }, "outputs": [], "source": [ "#codecell_introduction_plot@yourdata\n", "\n", "df.plot()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "fF0MquuG_hDH" }, "source": [ "Not very pretty... Next week we'll learn about making pretty maps. " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "oZ9cqoIf8Umk" }, "source": [ "## 1.4 Challenge\n", "\n", "Go find some other archaeological, read it into the notebook, and plot it. \n", "\n", "Like the scheduled monuments from York:\n", "[https://opendata.arcgis.com/datasets/26857683681542b798ef97708cbdaf61_14.geojson](https://opendata.arcgis.com/datasets/26857683681542b798ef97708cbdaf61_14.geojson)\n", "\n", "Or know wrecks off the coast of Northern Ireland: [https://www.opendatani.gov.uk/dataset/c5fceed7-b07a-4bc4-a0a0-8c45b9033083/resource/2d8da801-39f7-48b7-81ad-8db58d107962/download/protected_wrecks.geojson](https://www.opendatani.gov.uk/dataset/c5fceed7-b07a-4bc4-a0a0-8c45b9033083/resource/2d8da801-39f7-48b7-81ad-8db58d107962/download/protected_wrecks.geojson)\n", "\n", "Or any other open data you find in the geojson format..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.5 References\n", "\n", "Howey, M. C., & Burg, M. B. (2017). Assessing the state of archaeological GIS research: Unbinding analyses of past landscapes. Journal of Archaeological Science, 84, 1-9.\n", "\n", "Lock, G., & Pouncett, J. (2017). Spatial thinking in archaeology: Is GIS the answer?. Journal of Archaeological Science, 84, 129-135.\n", "\n", "Longley, P. A., Goodchild, M. F., Maguire, D. J. & Rhind, D. W. (2001) Geographic Information Systems and Science. Wiley, New York.\n", "\n", "Yang, C. (2017). Introduction to GIS Programming and Fundamentals with Python and ArcGIS. Boca Raton, FL : CRC Press, Taylor & Francis Group, 2017.\n", "\n", "Verhagen, P. (2018). Spatial analysis in archaeology: moving into new territories. In Digital Geoarchaeology (pp. 11-25). Springer, Cham.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Chapter 2: Make A Basic Map" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "vVM1mmNj6SWE" }, "source": [ "\n", "## 2.1 Introduction: exercise motivations ?\n", "\n", "In this practical exercise, you will make some basic maps. You're making them **interactively** in this notebook.\n", "\n", "A basic map allows the viewer to :\n", " * to think visually for\n", " * data exploration\n", " * confirming idea\n", " * visual presentation\n", " * synthesis \n", " * presentaion\n", "\n", "So, even a basic map is centred around a research question. As such, this exercise sets some goals as to:\n", " * answer the question visually (and quantitatively)\n", " * choose a specific design/symbology and show why it matters\n", " * understand a first type of spatial analysis using **spatial density**\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | | \n", "| ------------- | ------------- |------------- |\n", "|
| = |
|" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " \n", "\n", "To get started, you will get the tools required. In this lab, we are using Panda, Folium and Branca libraries." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "ws-jEcyKk0MF" }, "outputs": [], "source": [ "#codecell_makeabasicmap_importyourlibraries\n", "import folium\n", "import branca\n", "import pandas as pd\n", "print(folium.__file__)\n", "print(folium.__version__)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "14eCo5OlnD6b" }, "source": [ "# 2.2 Maps should answer questions\n", "\n", "We've discussed how to come up with a good, explicitly spatial question. We've also discussed how to design a good map. Here you're going to start putting all this into practice.\n", "\n", "The data in Palmisano et al. (2018) article provides information about settlement and population patterns in central Italy and how they change over time. Where people were living and working at various times in the past is a basic archaeological question.\n", "\n", "Let's say your question is about **how many iron age sites** are present in the region, and **what their distribution** is like **in space** - that is where they are located and how many relatively speaking are in each area. \n", "\n", "How would you go about answering these questions with a map?\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "ZnRemIE7mUQK" }, "source": [ "# 2.3 To make a map we need to get some data. \n", "\n", "This data must be explicitly spatial - that is have spatial coordinates that tell us about locations in a way that machines can understand. Spatial coordinates are pairs of numbers organised on a x-horizontal and y-vertical axis – this is the bit of geometry you will be using the most as an archaeologist! \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "\n", " \n", "**Projections**: Distances and directions to all places are true only from the centre point of projection.\n", "\n", "| | |\n", "|--| --|\n", "|
|
|\n", "\n", "**Projections and coordinate systems**: Coordinates (x,y) can also be called eastings & northings or latitude & longitudes depending on the type projection used to transform the map. The most common coordinate format for maps on the internet are [latitude and longitude coordinates](https://www.latlong.net/). \n", "\n", "| | |\n", "|--| --|\n", "|
|
|\n", "\n", " \n", "**Transformations**: When working with different dataset ALL YOUR DATA MUST BE IN THE SAME COORDINATE SYSTEM. If they are different you will need to transform it... below are illustrations of the relationships between different projections.\n", " \n", "| | | \n", "| ------------- |-------------|\n", "|
|
|\n", "| *How to unfold a cylindrical projection of the world to a Mercator grid projection?* |*Or from Geographic (left image)to Cartesian coordinates (right image)?* |\n", "|
|
|\n", "| *South America many projections*|*Ways to transform(with a bit of math too!)* |\n", " \n", " \n", "\n", "Whilst the conversion from geodetic to cartesian is fairly straightforward, converting cartesian to geodetic is a complex problem (& if you are interested on how this is done mathematically have a look at [this](https://www.movable-type.co.uk/scripts/latlong-os-gridref.html)).\n", "\n", "here are some useful links to use with your dataset:\n", "* **[OS](https://www.ordnancesurvey.co.uk/gps/transformation/batch)**\n", "* Some python tools : **[Rasterio tool](https://www.earthdatascience.org/courses/earth-analytics-python/lidar-raster-data/reproject-raster/)** and **[Pyproj tool](http://geologyandpython.com/maps.html)**.\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this notebook, we'll experiment with the spatial data from:\n", "\n", " Palmisano, A., Bevan, A. and Shennan, S., 2018. Regional Demographic Trends and Settlement Patterns in Central Italy: Archaeological Sites and Radiocarbon Dates. **Journal of Open Archaeology Data**, 6(1), p.2. DOI: [http://doi.org/10.5334/joad.43](http://doi.org/10.5334/joad.43)\n", " \n", "Spend some time exploring read about it before jumping in this practical...\n", " \n", "\n", "---\n", "\n", "\n", "These researchers have been very thoughtful and provided the data behind their analysis so we can re-use it. Their old school shapefile has been converted to a CSV for you to make things easier. If you ever have to do this yourself, there are lots of online converters like [this one](https://mygeodata.cloud/converter/shp-to-csv). \n", "\n", "Later in the course, you will also learn to read data from shapefiles directly into your notebook. This is a little more complicated, so we are skipping it for now." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "**Reminders**:\n", " \n", "\n", " \n", "A **CSV** file is simple tabular file format using commas or spaces to distinguish elements within it.\n", " \n", "A **shapefile** is a vector data storage file format that can be used in spatial software such as [Q-GIS](https://www.qgis.org/en/site/) . \n", " \n", " \n", "| | |\n", "|--| --|\n", "|
|
|\n", "| *Mandatory files*: •\t**.shp** — shape format; the feature geometry itself. Variable-record-length file in which each record describes a shape with a list of its vertices. •\t**.shx** — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly. The index file (.shx) contains a 100-byte header followed by 8-byte, fixed-length records. •\t**.dbf** — attribute format; one-to-one relationship between geometry and attributes is based on record number. Attribute records in the dBASE file must be in the same order as records in the main file. |*Optional files*: •\t**.prj** — provides projection information and coordinate system;, a plain text file describing the projection using well-known text format •\t**.sbn** and **.sbx **— a spatial index of the features •\t**.fbn**and **.fbx** — a spatial index of the features for shapefiles that are read-only •\t**.ain** and **.aih** — an attribute index of the active fields in a table •\t**.ixs** — a geocoding index for read-write shapefiles •\t**.mxs** — a geocoding index for read-write shapefiles (ODB format) •\t**.atx** — an attribute index for the .dbf file in the form of shapefile.columnname.atx (ArcGIS 8 and later) •\t**.shp.xml** — contain geospatial metadata in XML format. •\t**.cpg** — used to specify the code page (only for .dbf) for identifying the character encoding to be used.\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
|| " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.3.1 Let's get your data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "F0B_wkOblGUW" }, "outputs": [], "source": [ "#codecell_makeabasicmap_GetUrdataReady\n", "\n", "#Get the data by reading it into the notebook\n", "palmisano_tuscany_sites = pd.read_csv(\"https://raw.githubusercontent.com/ropitz/spatialarchaeology/master/data/site_centriods_tuscany.txt\")\n", "\n", "palmisano_tuscany_sites.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "kyRsKLy6AtbQ" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code\n", " \n", "\n", "\n", "\n", "In *#codecell_makeabasicmap-GetUrdataReady*, we have a simple piece of code which allows you to open the CSV file containing the archaeological sites information. The image below will help you to understand the code better and reuse it easily.
\n", "\n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
|| \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 2.3.2 Let's filter this big dataset to get only the data you need\n", "\n", "The original data file contains over 10970 entries for all archaeological settlement in Central Italy. So you need to filter or SUBSELECT your big data file to get just the iron age sites.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "1PiNejEBmrhR" }, "outputs": [], "source": [ "#codecell_makeabasicmap_SubSelectData\n", "\n", "#tell the notebook you only want to see stuff where the period is the iron age.\n", "palmisano_tuscany_sites_iron_age = palmisano_tuscany_sites[(palmisano_tuscany_sites['Period']==\"Iron Age\") ]\n", "\n", "palmisano_tuscany_sites_iron_age.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "FpEzAptwEiKz" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code\n", " \n", "\n", "\n", "\n", "In the *#code_makeabasicmap_SubSelectData*, we have a simple piece of code which allows you to open the CSV file containing the archaeological sites information. The image below is decomposing this codeline, so you can understand how python language works and re-use it easily. \n", "
\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
|| " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "lTRfAfasnd1A" }, "source": [ "\n", "\n", "## 2.3.3 Let's put the data onto a map\n", "Now we want to add this data to a simple map. \n", "\n", "You need to start by getting the coordinates of all the points so you can center the map - that is focus on the area where your data is. Probably it will be a good idea to put the middle of your map roughly where the middle of your dataset is located. \n", "\n", "Think about it... if your data is in Italy, putting Antarctica as the center of your map is not going to be very effective.\n", "\n", "![Antarctica is not where the map should be](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e0/Antarctica_6400px_from_Blue_Marble.jpg/310px-Antarctica_6400px_from_Blue_Marble.jpg)\n", "\n", "\n", "Now, you want to add a marker for each Iron Age site. A marker, also called recall, is a graphic icon to represent the coordinates where a site is centred. \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "DhHMGaQMneNF" }, "outputs": [], "source": [ "#codecell_makeabasicmap_BringingUrData2theMap\n", "\n", "#location is the mean of every lat and long point to centre the map.\n", "location = palmisano_tuscany_sites_iron_age['Latitude'].mean(), palmisano_tuscany_sites_iron_age['Longitude'].mean()\n", "\n", "#A basemap is then created using the location to centre on and the zoom level to start.\n", "m = folium.Map(location=location,zoom_start=10)\n", "\n", "#Each location in the DataFrame is then added as a marker to the basemap points are then added to the map\n", "for i in range(0,len(palmisano_tuscany_sites_iron_age)):\n", " folium.Marker([palmisano_tuscany_sites_iron_age['Latitude'].iloc[i],palmisano_tuscany_sites_iron_age['Longitude'].iloc[i]]).add_to(m)\n", " \n", "#To display the map simply ask for the object (called 'm') representation\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Yay! \n", "\n", "You have a map. " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "9og4onHTCWr9" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code\n", " \n", "\n", "\n", "\n", "The *#codecell_makeabasicmap_BringingUrData2theMap* contains four codelines aimed to:\n", "\n", "* centre your map by calculating the mean of the latitude coordinates and mean of longitude coordinates;\n", "\n", "* zoom your map to a suitable level - e.g. world, country, region or street level;\n", "\n", "* place a marker for each archaeological site location.\n", "\n", "* show the results interactively.\n", "\n", "\n", "Below are the three codelines of the script decomposed for you, so you can adapt it easily in the future.\n", "\n", "
\n", "
\n", "\n", "**Reminder**: the mean or average of a set is the total sum of all numbers divided by the number of items in the set. So, for instance, the mean of latitude coordinates = sum of all latitude coordinates/ divided by the number of latitude coordinates in the dataframe.\n", "\n", "\n", "---\n", "\n", "
\n", "
\n", "\n", "This codeline shows the powerful capabilities of the **Folium** library you've imported at the start of this practical lab. You can find more detailed guidance about it [here](https://python-visualization.github.io/folium/).\n", "\n", "\n", "---\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "This codeline introduces you to two important concepts in coding: the **[range](https://pynative.com/python-range-function/)** and the **[loop](https://www.datacamp.com/community/tutorials/for-loops-in-python)** by using iloc command. \n", "Furthermore, by using range(), len() and iloc() you could implement a loop in Python for iterating rows and columns of your filtered pandas dataframe (palminsano_tuscany_sites_iron_age).\n", "\n", "---\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
|| " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "xT89bOyooeE9" }, "source": [ "\n", "## 2.3.4 Let's play with this code\n", "\n", "Now, a design question: what is a good starting zoom level?\n", "\n", "Do you want your map zoomed further out or further in given the extent of your data? \n", "\n", "In the code cell above, play around with the '**zoom_start**' parameter and find a good zoom level that makes you happy.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**You just modified the code to suit your own needs**\n", "\n", "Rather than just executing code by pushing play, you've edited the code by changing a variable. \n", "\n", "Easy, right?\n", "\n", "This is where many people get started with scripting in archaeology. You find a bit of [open source code](https://opensource.com/resources/what-open-source) that seems to do what you need, and then you modify it. There's no sense in writing code from scratch when there's lots of useful open source code available that is meant to be shared and modified.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
|| \n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "T6kjhm_lpmEu" }, "source": [ "# 2.4 Representing information and relative values or quantities with graphics\n", "\n", "So far, we have seen how the data is spread visually, but we want to look at this data **quantitatively**. How do these sites compare?\n", "\n", "Let's look at how to represent larger and smaller sites. The archaeologists in this project have a qualitative ranking of site size from large (A) to small (F). You can see the categories they've used, and that there are more small sites than larger ones. Say you want to make the two categories of sites that are the largest different colours. Why do this? Maybe larger sites have more people, so this provides insights into demographics.\n", "\n", "In the step above, #codecell_makeabasicmap_SubSelectData, you've set up your new dataset by filtering the iron age settlements from the original dataset. It's what we would call a 'subset' of your original data.\n", "\n", "Now you are ready to make a new map." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "-x3bmE-3ov3c" }, "outputs": [], "source": [ "#codecell_makeabasicmap_ManipulatingyourData_ValueCounting\n", "\n", "# Size surely matters... group the sites by size category and get the number of sites in each size category.\n", "# SizeQual is the name of the 'attribute', or column in the table, that has information on site size. \n", "# An attribute is anything attached to spatial data other than the coordinates, such as period, type, size in hectare,etc. \n", "\n", "palmisano_tuscany_sites_iron_age['SizeQual'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "
\n", " \n", "**.value_counts()** is a function that allows you to count numerical and categorical data (e.g. numbers, objects or labels such as attributes). The results are returned as a series of numbers in descending order, so, that you can further evaluate the frequency and distribution of these values.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "_vu_cUOzp77n" }, "outputs": [], "source": [ "#codecell_makeabasicmap_ManipulatingyourData_UsingSymbology\n", "\n", "#now make a map just like you did before. Note that this time we're adding a scale bar with 'control_scale'\n", "location = palmisano_tuscany_sites_iron_age['Latitude'].mean(), palmisano_tuscany_sites_iron_age['Longitude'].mean()\n", "m = folium.Map(location=location,zoom_start=10,control_scale = True)\n", "\n", "#Assign different colours to the two large site categories - B and C in this case\n", "for i in range(0,len(palmisano_tuscany_sites_iron_age)):\n", "\n", "\n", " site_size = palmisano_tuscany_sites_iron_age['SizeQual'].iloc[i]\n", " if site_size == 'C':\n", " color = 'blue'\n", " elif site_size == 'B':\n", " color = 'green'\n", " else:\n", " color = 'red'\n", " \n", "# add the markers to the map, using the locations and colours \n", " folium.Marker([palmisano_tuscany_sites_iron_age['Latitude'].iloc[i],palmisano_tuscany_sites_iron_age['Longitude'].iloc[i]],icon=folium.Icon(color=color)).add_to(m)\n", "\n", "#type 'm' for map (the variable you set above) to tell the notebook to display your map\n", "m" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "yUZmFVuyJ9tN" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "
\n", " \n", "for *#codecell_makeabasicmap_ManipulatingyourData_UsingSymbology* can be decomposed like this:\n", " \n", " \n", "![]( https://github.com/ropitz/spatial-thinking-archaeology-notebook/master/Chap2_makeabasicmap_ManipulatingyourData_UsingSymbology_.jpg? raw=1)\n", "\n", "\n", "\n", "\n", "---\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
|| \n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "VjuGVxl8rKue" }, "source": [ "# 2.5 Designing your map - Symbology\n", "\n", "Why design matters? Why it is important? Some map design can observe some principles following a simple equation: **Aesthetics + Information Communication + Argumentation**. In other words, your map should be beautiful, well designed, and tell a clear true story. \n", "\n", "Research, design, revision, and evaluation are the standard process for making a good map following a simple workflow : \n", " Careful planning -> Data analysis -> Presentation -> Critique = Feedback -> Production.\n", "\n", "For starters, you can look at this [document](https://open.lib.umn.edu/mapping/chapter/4-design-and-symbolization/) to familiarise yourself with some of the concepts\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "ZcN5jSj13Ive" }, "source": [ "## 2.5.1 What symbol scheme is best for your map?\n", "\n", "**Changing Symbol Colour**\n", "\n", "Now, go back into *#codecell_makeabasicmap_ManipulatingyourData_UsingSymbology* and experiment a bit. Try out some different colours. Do certain combinations work well? Try adding different colours for each of the smaller categories of sites. Does that make the map clearer or more confusing?\n", "\n", "---\n", "**Changing Symbol size**\n", "\n", "Maybe it makes more sense to show size by changing the size of the icon than the colour. Let's make another map that varies the size of the icon for each site based on its size in hectares. This is making a **proportional symbol map**. \n", "\n", "Here are two example maps from Kenneth Field illustrating the power of map symbology:\n", "\n", "| | | \n", "| ------------- |-------------|\n", "|
|
|\n", "| *Basic geocoded structure of the dataset* |*Proportional symbol map of poverty* |\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "8d_u8uL5wKBg" }, "outputs": [], "source": [ "#codecell_makeabasicmap_ManipulatingyourData_UsingSymbologyExperimenting\n", "\n", "#now make a map just like you did before. \n", "location = palmisano_tuscany_sites_iron_age['Latitude'].mean(), palmisano_tuscany_sites_iron_age['Longitude'].mean()\n", "m = folium.Map(location=location,zoom_start=8,control_scale = True)\n", "\n", "\n", "# Set the size for each circle icon by defining the 'radius' which is the radius of the circle\n", "# Here we are multiplying the size in hectares (the SizeHa attribute) by 15. Try different values here to get icons a size you like\n", "for i in range(0,len(palmisano_tuscany_sites_iron_age)):\n", " folium.Circle(\n", " location=[palmisano_tuscany_sites_iron_age['Latitude'].iloc[i],palmisano_tuscany_sites_iron_age['Longitude'].iloc[i]],\n", " popup=palmisano_tuscany_sites_iron_age.iloc[i]['Toponyms'],\n", " radius=palmisano_tuscany_sites_iron_age.iloc[i]['SizeHa']*15, #this is where we set the value of 15 - change this variable to get differrent size icons\n", " color='crimson',\n", " fill=True,\n", " fill_color='crimson'\n", " ).add_to(m)\n", " \n", "#type 'm' for map (the variable you set above) to tell the notebook to display your map\n", "m" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "0lgx9Jyg9FAz" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "for *#codecell_makeabasicmap_ManipulatingyourData_UsingSymbologyExperimenting* can be decomposed like this:\n", "\n", " ![](https://github.com/ropitz/spatial-thinking-archaeology-notebook/blob/master/Chap2_makeabasicmap_ManipulatingyourData_UsingSymbologyExperimenting.jpg?raw=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
|| \n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "UGrxvlO1sn1m" }, "source": [ "# 2.5.2 Thinking about spatial density or 'clustering' or concentration\n", "\n", "\n", "Many things in life naturally tend to cluster. Patterns of higher density or cluster can be clearly identified in urban and rural landscapes and populations. Forming general understandings of the processes at particular points in time (e.g. socioeconomic, demographic and environmental structures of different places) that give rise to the geographical patterning of landscapes and populations are essential to predict and simulate how they settlements have evolved in the past and might evolve in the future.\n", "\n", "*When might this be important? What kinds of archaeological questions involve finding the spatial centre of a distribution of sites or finds or features?*\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Research question** Now what if you want to make a map that shows the concentration of sites in the iron age, that is areas with more and fewer sites? \n", "\n", "In order to show areas of higher density of activity, a **'heatmap'** is often created by using a colour gradient. A heatmap represents the geographic density of point features on a map by using colored areas to represent those points. The areas will be of high intensity colour where the most points are concentrated and least intensity where they are less concentrated. Essentially it shows areas with more sites as 'hotter' (generally red in colour) and areas with fewer sites as cooler. \n", "\n", "For our dataset, in this kind of map surely larger sites should count more - we call this **'weighting'**. We want to add the weight of the size of the settlement for our Iron Age settlement map. A site that is twice as large should count for twice as much in our heatmap.\n", "\n", "For this need to think how the points are distributed geographically. This entails obtaining basic spatial statistics about the dataset, a first step in the analysis of a geographic data.\n", "\n", "Just a reminder of some basic terminologies in data statistics:\n", "
\n", "\n", "The overarching aim is to determine the default search radius or bandwidth which will define the area of influence and how the colour gradient will be applied as illustrated in the image below. \n", "\n", "| | | \n", "| ------------- |------------- |\n", "|
|
|\n", "\n", "To find the (default) search radius consists in 5 steps (Yang,2017) where you calculate:\n", " * the mean center of the number of the input. This is the geographic center for a set of features (remember that this center can change a lot depending if there are outliers or not). We want to weight this centre to our attribute field size of the settlement.\n", " * the distance from the weighted mean center for all points. \n", " * the weighted median of the distances (D).\n", " * calculate the weighted standard distance (SD). This is the degree to which features are concentrated or dispersed around the geometric mean center. This shows the concentration or dispersal of the data.\n", " * calculate the bandwidth based on this data statistics. You can learn a bit more how density formula equation is working in this [Esri webpage] (https://pro.arcgis.com/en/pro-app/tool-reference/spatial-analyst/how-kernel-density-works.htm).\n", " \n", "Here is an example of population density map to illustrate this concept:\n", "\n", "| | \n", "| ------------- |\n", "|
| \n", "| *Density surfaces show where point or line features are concentrated. For example, you might have a point value for each town representing the total number of people in the town, but you want to learn more about the spread of population over the region. Since all the people in each town do not live at the population point, by calculating density, you can create a surface showing the predicted distribution of the population throughout the landscape. The following graphic gives an example of a density surface. When added together, the population values of the cells equal the sum of the population of the original point layer. [ref]( https://pro.arcgis.com/en/pro-app/tool-reference/spatial-analyst/understanding-density-analysis.htm)* |\n", "\n", " \n", "\n", "**Folium** is the library will be using to create our heatmap, you can see more example in [Anthony Ivan page]( https://towardsdatascience.com/data-101s-spatial-visualizations-and-analysis-in-python-with-folium-39730da2adf) or [Aly Sivji]( https://alysivji.github.io/getting-started-with-folium.html).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "dMDhTabesLOc" }, "outputs": [], "source": [ "#codecell_makeabasicmap_ManipulatingyourData_Heatmap\n", "\n", "#first make a list of the coordinates of each site and its size in hectares, which we will use for the size-based weighting\n", "data_heat = palmisano_tuscany_sites_iron_age[['Latitude','Longitude','SizeHa']].values.tolist()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "pzzi69aXtyMN" }, "outputs": [], "source": [ "#look at the first line of your list to check it seems to have worked\n", "data_heat[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "TVAorbp9uKIS" }, "outputs": [], "source": [ "#to make the heatmap, we need to get an extra tool, so...\n", "import folium.plugins as plugins\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "ZIe4sfeVuM80" }, "outputs": [], "source": [ "# now make a map a bit like you did before, set your zoom level and your centre location. Then use the plugin to make the heatmap. \n", "\n", "m = folium.Map(location=location, zoom_start=10)\n", "\n", "#tiles='stamentoner'\n", "\n", "plugins.HeatMap(data_heat).add_to(m)\n", "\n", "#type 'm' for map (the variable you set above) to tell the notebook to display your map\n", "m" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "5e19_h8BFXjE" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "for *#codecell_makeabasicmap_ManipulatingyourData_Heatmap* can be decomposed like this:\n", "\n", "\n", "![](https://github.com/ropitz/spatial-thinking-archaeology-notebook/blob/master/Chap2_makeabasicmap_ManipulatingyourData_Heatmap.jpg?raw=1)\n", "\n", "\n", "---\n", "\n", "\n", "\n", "For further documentation on folium plugins follow [link](https://python-visualization.github.io/folium/plugins.html)." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "rRVg43qE5Y_Z" }, "source": [ "# Success!\n", "\n", "You should have a heatmap showing the concentrations of Iron Age sites in the region.\n", "\n", "You could repeat the exercise with sites from any period in which you are interested.\n", "\n", "Hopefully you've learned to make some basic maps and are starting to understand how to put into practice some of the theory of map design and spatial visualisation we've been discussing.\n", "\n", "That's it for today... Remember to save your notebook (under a new name) so you can come back to it and practice making basic static maps. \n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "H4FKa3tQHVsi" }, "source": [ " ## 2.6 LexiCode \n", " \n", "In this practical you have learned new commands that you can now reuse for your own datasets:\n", "*\t== , () []\n", "*\t.head_csv()\n", "*\t.read_csv()\n", "*\tmean()\n", "*\tfolium.Map\n", "*\trange()\n", "*\tlen()\n", "*\tiloc[]\n", "*\t.value_counts()\n", "*\tif =:\n", "*\telif =:\n", "*\telse =:\n", "*\tfolium.Marker()\n", "*\tfolium.Icon()\n", "*\tfolium.Circle\n", "*\tpopup=\n", "*\tradius=\n", "*\t.values.tolist()\n", "*\tplugins.HeatMap()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.7 References\n", "\n", "Palmisano, A., Bevan, A. and Shennan, S. (2018) Regional Demographic Trends and Settlement Patterns in Central Italy: Archaeological Sites and Radiocarbon Dates. Journal of Open Archaeology Data, 6(1), p.2. DOI: [http://doi.org/10.5334/joad.43](http://doi.org/10.5334/joad.43)\n", "\n", "Pimpler, E. (2017) Spatial Analytics with ArcGIS. Pakt Publishing.\n", "\n", "Yang, C. (2017). Introduction to GIS Programming and Fundamentals with Python and ArcGIS. Boca Raton, FL : CRC Press, Taylor & Francis Group.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Chapter 3: Interactive maps" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "-1zyXWFvOql5" }, "source": [ "## 3.1 Introduction: exercise motivations ##\n", "\n", " ~ déjà vu ~ In Chapter 2, we focused on making \"mostly static\" maps, that is maps where you mostly just expect your user to look at the product you've prepared. We looked at a research question about the distribution of Iron Age sites in Central Italy, and we focused the practical exercise on using the map for:\n", " * data exploration => filtering and creating subsets of data to show different aspects of the overall dataset;\n", " * visualisation => zoom level (visual balance), visual variables (symbols, fonts, generalisation, color);\n", " * organisation => Balancing geospatial and attribute (descriptive) data, and which attributes you will highlight.\n", "\n", "\n", " ~ new ~ This week we'll continue exploring different types of maps. Specifically, we'll look at making maps where interactivity is a key part of the design. Beyond the question of how to present spatial data (i.e. designing the map), we will spend more time on an important topic in archaeology: **spatial distributions**. Central to investigating spatial distributions (patterns) is our ability to manipulate and rearrange spatial data to answer spatially explicit questions.As such, this practical exercise aims to help you with:\n", "\n", " * Data preparation, data cleaning and data manipulation: by transforming database, merging database and creating data layers;\n", " * Presenting your data interactively.\n", " \n", " " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1.1 Why interactive maps?\n", "\n", " *Why does this data need to be presented in an interactive map? What spatial relationships are important here? What kinds of questions might a user ask?*\n", "\n", "**Interactive maps are for exploring spatial data** where poeple can:\n", "* ask their questions about the data\n", "* ask their questions and control how the response is represented\n", "* create and add their own data\n", "\n", "**When should you use interactive maps?**\n", "* When the ability to explore data and query to make different maps is important.\n", "* When the ability to manipulate the visual aspects of the map to show different combinations of data is important.\n", "* Archaeological motivations/example: \n", " * How do we understand change over time and space in archaeology? \n", " * How can we explore this using (interactive at different levels) maps?\n", "\n", "**Interactive maps are spatial and descriptive data interfaces**\n", "\n", "| | |\n", "|---------|---------|\n", "|**Visual design *(Front end)***| **Data for design *(Back end)***|\n", "|Symbols|\tData Quality (accuracy and resolution) |\n", "|Fonts\t|Data Cleanliness|\n", "|Labels|\tData Consistency|\n", "|Scale |\tData Accessibility|\n", "|North Arrow |\tMetadata|\n", "|Generalisation|\tData Architecture|\n", "|**Visual Interface & Query Interface**| **Data architecture & Spatial Data resource**|\n", "\n", " \n", "### 3.1.1 Open Data and Data Legacy\n", "\n", "**Open source software** \n", "In Chapter 1, when introducing Jupyter notebooks, we mentioned open source software. Open data operates under the same broad ethos, and follows many of the same principles. Sharing, reuse, and attribution are key. If you continue to reuse the Antikythera data, be sure to continue to link back to and cite the source.\n", "\n", "Perhaps the most relevant example in the UK is **OS (Ordnance survey) OpenData** which was made freely available for the first time in 2010. Last review ([2018](https://www.ordnancesurvey.co.uk/business-government/tools-support/open-data-support)) counted 1.9 million open data downloads, the equivalent of 150 people download OS OpenData every day!\n", "\n", "2015 has seen Environmental Agency made **lidar (light detection and ranging)** data available to the public, for free, as open data. Within the first year of release 500,000 lidar downloads were made equating to nearly 13 million km2 of data! \n", "\n", "**Working with Legacy data: the case of West Heslerton, North Yorkshire, excavation of the Early Anglo-Saxon or Anglian Settlement between 1986 and 1995,** \n", "\n", "| | |\n", "|---------|---------|\n", "|**Interactive Map**| **Settlement Archive**|\n", "|The West Heslerton Interactive Map: [This is the map]( https://lrc.cast.uark.edu/map)|\tThe West Heslerton Settlement Archive: [This is the data that drives the map]( https://lrc.cast.uark.edu/settle/home.htm) |\n", "|**Front end**|\t**Back end** |\n", "\n", "\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.2 Let's start" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "nVvxjDYFOkK8" }, "source": [ "| | | | \n", "| ------------- | ------------- |------------- |\n", "|
| = |
|" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start by getting tools, always. In this Chapter, you still be working with panda and folium and you will be replacing the branca library with [numpy](https://numpy.org/). \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-11-19T05:33:33.494092Z", "start_time": "2018-11-19T05:33:24.204201Z" }, "colab": {}, "colab_type": "code", "id": "mLiflC-JiIir" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_ImportUrLibraries\n", "\n", "#Like last time, get the tools we need at the start\n", "import pandas as pd\n", "import folium\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "TN6AB-EgPF-q" }, "source": [ "## 3.3 Let's get the data: the case of Antikythera\n", "\n", "This time, we are working with the data from the Antikythera survey project.\n", "\n", "It's citation is: Bevan, A. and Conolly, J. 2012. Intensive Survey Data from Antikythera, Greece. **Journal of Open Archaeology Data** 1(1), DOI: http://dx.doi.org/10.5334/4f3bcb3f7f21d.\n", "\n", "This data has been made available at the Archaeological Data Service (ADS) archive at https://archaeologydataservice.ac.uk/archives/view/antikythera_ahrc_2012/ and completely open for re-use. You can [see](https://archaeologydataservice.ac.uk/archives/view/antikythera_ahrc_2012/stats.cfm) how many times their database has been viewed and downloaded. \n", "\n", "**Working with other people's data** \n", "\n", "Have a quick look around Antikythera dataset as it's described on the ADS site. You'll notice that they've split up their dataset in ways that made sense to them at the time. Specifically they've divided up the artefact data into three discrete elements: 'pottery', 'lithics' and 'other' into separate files (very much like we did last week by filtering and creating a subset). This is a pretty normal archaeological data thing to do. \n", "\n", "Here's the trick, we want to focus on both **ceramics and small finds** and to look at these datasets together. This means you'll have to grab both of them and combine them. When you are combining them, you will also need to re-organize the attribute data in order to reuse it for something new. This follows the model of a **relational database** \n", "\n", "**What’s a database vs a graph database?**\n", "\n", "| | \n", "| ------------- | \n", "|
| \n", "\n", "**How to query this data?**\n", "Query languages are the languages used to ask questions of databases. **Python** acts as a query language for pandas data structures. **SQL** and **SPARQL** are specific database query languages\n", "\n", "| | | \n", "| ------------- | ------------- |\n", "|Querying Python | Querying SQL|\n", "|
|
|\n", "|*Example python script (top) & Your script in 2.3.2 (bottom)* | *Example of SQL query language*|\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2.1 It is all about data preparation###" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "up4gTxO6Ukzb" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_OrganisingUrdata\n", "\n", "#Like last time, get the data we'll need at the start. I've been nice again and converted their coordinates to latitude and longitude for you. \n", "#You'll learn to do this yourself later in the course.\n", "\n", "# we label the first dataset 'pottery'\n", "pottery = pd.read_csv('https://raw.githubusercontent.com/ropitz/spatialarchaeology/master/data/antikythera_survey_pottery.csv')\n", "\n", "# we label the second dataset 'small finds'\n", "small_finds = pd.read_csv('https://raw.githubusercontent.com/ropitz/spatialarchaeology/master/data/antikythera_survey_small_finds.csv')\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "g3MV8nj-VKzB" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_CheckingZeData\n", "\n", "#let's check the individual pottery file. \n", "pottery.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "SRR2Nyugbnh-" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_CheckingZeData\n", "\n", "#let's check the individual pottery file to see \n", "small_finds.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "mrtIpk4DS-DK" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", " in *#codecell_Webmaps & Distributions_CheckingZeData*:\n", "\n", "* **=** allows you to provide a new name, often more convenient than the link pathname\n", "* **pd.read_csv** allows you to open a CSV format file\n", "\n", "* **.head() function** take a peek at the first part of the dataset \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "\n", "**Spend some time understanding your data**\n", "\n", "Now have a look at these two datasets and how they are structured. Also, the coordinates have been transformed for you from cartesian co-ordinates : a grid system - shown in column *'Xsugg' &\t'Ysugg'* - back to latitude/longitude, a geodetic system - shown in column *'DDlat' &\t'DDlong'* -. (we've talked about projections, coordinates systems and transformations in Chapter 2, section 2.3)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2.2 Re-organising your data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "VIQ_n9BCY1cu" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_Concatenation\n", "\n", "# then we combine the two datasets together to make a big dataset we call 'survey data'\n", "survey_data = pd.concat([pottery,small_finds], sort=False, ignore_index=True)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "SjpgcdojY5NJ" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "\n", "*Codecell__Webmaps&Distributions_Concatenation* can be decomposed like this:\n", " \n", "![](https://github.com/ropitz/spatial-thinking-archaeology-notebook/blob/master/Chap3_Webmaps%26Distributions_Concatenation_.jpg?raw=1)\n", "\n", "\n", "\n", "---\n", "\n", "\n", "**An array** is a group of elements/objects/items organised as a set where columns (of equal size) are multiplied by rows (of equal size). Below are two illustration showing what is happening with an array and how it differs from a list:\n", "\n", "| | |\n", "| ------------- | ------------- | \n", "|
|
|\n", " \n", "The advantage of using arrays is that all elements can be accessed at any time randomly. In Chapter 2, you had done something similar to call all items from the range [i] ( in *#codecell_makeabasicmap_BringingUrData2theMap* ). This time, you are linking together 2 arrays. Pandas library provides various ways to combine together two arrays such as merge, join and concatenate ([see user guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html)). \n", "\n", "| | \n", "| ------------- |\n", "|In this example, df1 and df4 are concatenated in the same way we have concatenated pottery and small finds|\n", "|
| \n", "|The result shows that the axis have been appended them but the overlapping indexes have been ignored. To do this, we have used the argument ignore_index=True)\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "_62uXk_TQwVY" }, "source": [ "\n", "Let's make sure nothing went wrong..." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "pv4XxSSQU2YT" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_CheckingZeData\n", "\n", "#check things loaded in and combined OK\n", "survey_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "PZqevW03Q4ig" }, "source": [ "## 3.2.3 Now let's ask a research question\n", "\n", "Like we said last week, we are using maps and spatial analysis to pose, explore and respond to spatial questions. \n", "\n", "My question is a bit like last chapter's question. I want to know about **how many sites are in each period, so I can try and understand changing patterns over time**. \n", "\n", "However, you may have noticed when you read in the data that it's structure is a bit different from last week's data. Instead of each site belonging to one period, it's assigned with varying probability to several different periods. \n", "\n", "This is a totally legit archaeological thing to do. Many sites have activity from multiple periods, and depending on the available evidence, you might be more or less confident about the presence or absence of activity in a specific period. \n", "\n", "### The most likely period/date?\n", "\n", "We might start simply by assigning each site primarily to its 'most likely period'. \n", "\n", "**Data cleaning** is the most time-consuming and important part of any analysis - and not only in archaeology but for all scientific analyses. This takes a few steps.... \n", "\n", "Therefore we will walk through the steps of data cleaning in this exercise.\n", "\n", "\n", "#### step 1 - prepare the data\n", "\n", "...you have to re-organize the dataset to answer our question: **'how many sites are in each period?'**.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "bbjLzluu1kx9" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_Subselect\n", "\n", "# first we create a subset of our data that only contains the columns with information about time\n", "# this is in part because we want to do some operations where everything has to be a number, and some of the other fields contain text\n", "# it's also just to make things simpler when we look at them\n", "\n", "survey_data_time = survey_data[['MNLN', 'FNEB1',\t'EB2',\t'LPrePal', 'FPal', 'SPal', 'TPal', 'PPalPG', 'Geom', 'Arch', 'Class', 'Hell', 'ERom', 'MRom', 'LRom', 'EByz', 'MByz', 'EVen', 'MVen', 'LVen', 'Recent', 'Other']]\n", "survey_data_time.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "1P23K0yo2y5U" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_ChangingDataType\n", "\n", "# if you were to look through this data\n", "survey_data_time.dtypes" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "s9ZRoNq_kY-N" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "\n", "In *#codecell__Webmaps&Distributions_ChangingDataType*, the ~ .dtypes ~ function allows you to see which type of data is in our *'survey_data_time'* dataframe. Three types of data are returned **int64** which are integers (whole numbers -numerical data), **float64** which are floating data (contains floating decimal points -numerical data) and **object** which is a string (not a number -categorical data). \n", "\n", "At the very start, in Chapter 1.1, we said that the type of data you are using matters, and this matters because tests and commands will differ depending the data is numerical or categorical. The image below is here as a reminder:\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking further in this dataset, *'survey_data_time'*, we can also see that some that some fields contain missing values (NaN). Let's see what we can do about that and why it matters. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "rDnMwtylkVec" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_AllNumbers\n", "\n", "# We can get rid of NaN (null) values by 'filling' them.\n", "# This is important because null values can break number-based operations.\n", "# Let's get rid of missing values and make sure everything is a number.\n", "\n", "survey_data_time.astype('float64').fillna(0)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "mjRxTDQQhqn2" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", "\n", " \n", "\n", "\n", "\n", "In the *#codecell_Webmaps&Distributions_AllNumbers*, you are making sure all these missing values, NaN, becomes null numbers (0=zero). \n", "\n", "Why do missing values need to be removed? Simply because you cannot apply maths to them (e.g. adding or multiplying columns together, reordering values of a column from max to min value, etc.).\n", "\n", "And this is how it can be done:\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "ym3UIQjPZ3Y5" }, "source": [ "#### step 2 - Reshaping your data\n", "\n", "Now that you've removed the null values (NaN), you can move to the next step in data cleaning, which is applying a transformation to arrange the data the way you want it for your analysis. Data transformation is an important thing to learn to do. \n", "\n", "Right now we have a bunch of columns with information about time. What we want is one single column that contains the most likely period - which is represented in each row by the column with the greatest value. \n", "\n", "*think about that for a moment*\n", "\n", "Right now the 'most likely period' is represented by a number in each row, but that's not the piece of information we want in our new column - we want the name of the column that contains that number.**In other words, we want to know the the item in the table that satisfies the condition 'the most likely period', and to be able to do something with this item.** \n", "\n", "\n", "Once you have identified the maximum values, you can extract them...\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "-VTV0zNi8TBA" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_MaximumValue\n", "\n", "#here we take the columns from all the different periods, get the one with the maximum value, and write that column's name to our new 'colmax' field\n", "def returncolname(row, colnames):\n", " return colnames[np.argmax(row.values)]\n", "\n", "survey_data_time['colmax'] = survey_data_time.apply(lambda x: returncolname(x, survey_data_time.columns), axis=1)\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "hyFFUO0NIzR2" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", " \n", "In the *#codecell_Webmaps&Distributions_MaximumValue* you are creating a **function**. A function is a small script with multiple steps that you can run over your whole dataset. In this function, for each your you return the name of the column where the maximum (greatest) value in that row is found. You then create a new column \"colmax\" and store that value. \n", "\n", "This can be done using a [lambda function](https://www.w3schools.com/python/python_lambda.asp). Let's break down the code the steps...\n", "\n", "* 1) define a function : \n", " \n", "
\n", "\n", "
\n", " \n", "* 2) pass/apply this function as an argument:\n", " \n", "
\n", " \n", "
\n", "\n", "***How does this really work?***\n", "\n", "Remember that we are working with arrays, so, getting maximum or minimum values can be schematised like this:\n", " \n", "
\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "jcza17og5zeo" }, "outputs": [], "source": [ "#we can check it has all gone well\n", "survey_data_time.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "O5iNwhgraxn9" }, "source": [ "### step 3 - Merging tables\n", "\n", "OK now we have a single column with the information we need - the most likely date. To create this column, we broke off some of our data (the columns with numbers) from the rest of the data (important descriptive text). We might well want to stick these two datasets back together before proceeding. \n", "\n", "**splitting and merging tables is another basic skill when working with data**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "6LxENtT79Rvl" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_MergingZeData\n", "\n", "#now we can also add our new column back to our original data table by doing a 'merge'\n", "#create a new table 'survey_data_maxtime' by merging our original 'survey_data' with ONLY the 'colmax' column from our new table\n", "survey_data_maxtime = pd.merge(survey_data, survey_data_time['colmax'], how='inner', left_index=True, right_index=True)\n", "survey_data_maxtime.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "9tzHj_vOz4Aa" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "\n", "In the *#codecell_Webmaps&Distributions_AllNumbers*, you learned how to merge using pandas, and, if you want to further explore and apply pd.merge() to your data, check which [parameters](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) apply.\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "8tbZd-bqVx7p" }, "source": [ "### The curse of abbreviations\n", "\n", "Have a look at the resulting table. What do all those column names mean? Right now you are probably justifiably confused. We'll be talking more about the mess that is 'other people's data' next week. For now, have a look at the documentation for these datasets at: https://archaeologydataservice.ac.uk/catalogue/adsdata/arch-1115-2/dissemination/csv/pottery/documentation/pottery.txt\n", "\n", "You'll see they explain that many of those weird abbreviations are periods and that the number in each one represents the chance that a given find belongs to that period. Sometimes I wish people wouldn't use abbreviations like this, but they've defined them in their metadata file, so we can't compain too much." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.3 Finally, we make maps!\n", "\n", "Broadly speaking, there are two ways to approach interpreting spatial patterns. There's visualisation and interpretation, where you might visually compare distributions or densities or locations of two or more datasets by plotting them on a map and intepreting what you see. Then there's statistical analysis. \n", "\n", "We'll start with the tools to do the first one, and introduce statistical analysis later in the course. We'll also discuss the value of each approach, and when to apply it.\n", "\n", "### 3.3.1 Maps for visualisation and interpretation\n", "\n", "here are illustrative examples of data distribution and how we can interpolate this data:\n", "\n", "*Types of Distribution: Data Sampling*\n", "\n", "| | |\n", "| :--|:--| \n", "|
|
|\n", "| | Bonnier et al. (2019)|\n", "\n", "*Examples of Applications: Interpolating the data*\n", "\n", "| | |\n", "| :--|:--| \n", "|
|
|\n", "| | Orengo et al. (2019)|\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "TZeKlPYrbzT3" }, "source": [ "### 3.3.2 As always, start with a question\n", "\n", "As we said at the beginning of today, we're interested in change over time.\n", "\n", "**How does the distribution of finds change between different periods?**\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | | \n", "| ------------- | ------------- |------------- |\n", "|
| = |
|\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "hYZwq0DesMEw" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_ImportUrLibraries\n", "\n", "# we're going to get geopandas, another tool for making maps\n", "!pip install geopandas\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "_RdAEMkRsl11" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_ImportUrLibraries\n", "\n", "# get some more tools for making maps (and other things)\n", "\n", "%matplotlib inline\n", "import geopandas as gpd\n", "import seaborn as sns\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "T6MYggBtwUnd" }, "source": [ "#### Geopandas\n", "\n", "We're going to use geopandas for the next few steps. You've used geopandas before, in the first chapter.\n", "\n", "It can do many of the same things as folium, which we were using last time.\n", "\n", "Geopandas is particularly useful for showing categorical data. \n", "\n", "|*Categorical data is data about groups*|\n", "|---|\n", "|
|\n", "\n", "Categories can be anything where you are defining groups. Archaeological periods are categories. Types of lithics or ceramics are categories. As above, types of cars are categories. The name of each category is sometimes referred to as a label.\n", "\n", "We have seen above in *#codecell_Webmaps&Distributions_ChangingDataType* that the function **.dtype()** allows you to see which type of data is your columns. \n", "\n", "We could start tryping to see and understand the distributions of our sites by periods simply by mapping the period labels with different colors.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "MpBpnIxBrWq6" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_dataframesINTOgeodataframes\n", "# take our big dataset from above and turn it from a 'dataframe' which is the data that folium uses to make maps into a 'geodataframe' which is the data geopandas uses to make maps\n", "\n", "gdf_survey = gpd.GeoDataFrame(\n", " survey_data_maxtime, geometry=gpd.points_from_xy(survey_data_maxtime.DDLon, survey_data_maxtime.DDLat))\n", "print(gdf_survey.head())" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "qKa9a2Yu0cWq" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "\n", "In the *#codecell_Webmaps&Distributions_dataframesINTOgeodataframes*, gpd.GeoDataFrame() allows you to define the geometry (geometry=) of the new dataframe *'survey_data_maxtime'*, so it can be mapped. \n", "\n", "Here, the location of the points is defined by the x and y centroids (exact middles of polygon shapes)(gpd.points_from_xy ) following their respective geodetic coordinates, which can be found in the column *'survey_data_maxtime.DDLon'* for the Xs and *'survey_data_maxtime.DDLat'* for Ys.\n", "\n", "print()~ allows you to see the results." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3.3 Visualising the results " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "u6Hmn-KFvz8X" }, "outputs": [], "source": [ "#Codecell_Webmaps&Distributions_PlotGeodataframes\n", "\n", "#plot your data colouring the points by the period to which they belong. You are grouping your sites by their category label when you do this. \n", "\n", "#the plot requires you to define which type of data it is (categorical or numerical)\n", "#figsize =(width, height)\n", "gdf_survey.plot(column='colmax', categorical=True, legend=True, figsize=(15,15))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "N0k9G6d0yVPO" }, "source": [ "### 3.3.4 Assess the resulting map\n", "\n", "Is it useful? Why or why not?\n", "\n", "Can you see the distribution of sites from individual periods easily?\n", "\n", "Can you easily discern change over time?\n", "\n", "I'm not overly convinced by the result here. If you think about the map design principles we discused last week, you will probably also conclude that this is not a successful map.\n", "\n", "What other approach might we take?\n", "\n", "\n", "Let's try something else.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "CZFoGw070Yq_" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_SplittingUrData\n", "\n", "#Maybe it would be better to only look at two or three periods at a time. \n", "# Recall last week's discussion about selecting appropriate data that matches our question, and also about appropriate levels of generalisation.\n", "\n", "#let's select a subset of our periods to see change from the early bronze age to hellenistic to late roman periods\n", "# note that some crazy abbreviations have been used for the names of periods...\n", "\n", "# list the types (periods) of ceramics we are interested in seeing.\n", "types = ['EB2','Hell','LRom']\n", "\n", "#define 'classic' (as in classical archaeology) as this group of periods. Create a subset of the data to contain only the types you just listed\n", "classic = gdf_survey.loc[gdf_survey['colmax'].isin(types)]\n", "\n", "# check you get the expected result - only sites from the periods you just defined as belonging to the 'classic' group\n", "classic.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "sO4kepty1EAV" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "\n", "in *#codecell__Webmaps&Distributions_SplittingUrData*, the function gdf_survey.loc is similar to .loc[] & .iloc[] used in pandas last chapter (in *#codecell_makeabasicmap_BringingUrData2theMap*). Note that .loc is using a label to index data whereas .iloc uses an integer position. This is the difference between working by columns or by rows. \n", "\n", "The .isin()~ command allows you to exclude data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "ROpEOeRw1TZp" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_PlotUrData\n", "\n", "#plot your data colouring the points by the period to which they belong\n", "# Do this by setting 'categorical' to 'true' so that the plot is coloured by category\n", "classic.plot(column='colmax', categorical=True, legend=True, figsize=(15,15))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "VQu68G8o1bDi" }, "source": [ "## 3.4 Thinking about data visualisation and map design\n", "\n", "That's a bit better perhaps. The map is less crowded.\n", "\n", "Recall our discussions about how to design a map well. Clearly too much data introduces design problems.\n", "\n", "Well, now we can see the distributions a bit, and maybe say something about change over time, but there are still a lot of dots, and it's pretty clear dots from some periods are hidden under dots from other periods and we have no way to separate them. \n", "\n", "\n", "**What we need here is:**\n", " * **layers, so we can group our data and work with it interactively; and,**\n", " * **some context for all those dots**\n", "\n", "To address our design problem, we will make **interactive maps with layers** that can be toggled on and off\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | | \n", "| ------------- | ------------- |------------- |\n", "|
| = |
|\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "C8fCQvSyUrx-" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_ImportUrLibraries\n", "\n", "#Like last time, we'll use folium and one of it's plugins. Import the tools you'll need, as usual. \n", "from folium.plugins import HeatMapWithTime\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "NBXk0ENHiIjR" }, "source": [ "### 3.4.1 Map Visualisations with Folium" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "wPAgyDlciIjR" }, "source": [ "\n", "To see the survey data in context and build our interactive maps, we'll start by generating the base map that will be used throughout this notebook.\n", "\n", "'Basemaps' are generic background maps, like a satellite image or an image of the street map. You know the different backgrounds you can show on google maps? Those are 'basemaps'. Very much like the basemap imported folium.Map (*#codecell_makeabasicmap_BringingUrData2theMap*).\n", "\n", "\n", "Have a look around the web, and you'll see that most modern online maps use a basemap, so we're going to do so as well.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-11-19T05:34:04.673013Z", "start_time": "2018-11-19T05:34:04.670120Z" }, "colab": {}, "colab_type": "code", "id": "0H-uRYF1iIjS" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_BringingUrData2theMap\n", "\n", "#get the survey area centre, like you did last week, so you can centre the map where the data is located\n", "\n", "location_survey=survey_data_maxtime['DDLat'].mean(), survey_data_maxtime['DDLon'].mean()\n", "print(location_survey)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "wUBTM83AbAeX" }, "outputs": [], "source": [ "#define a basemap we can reuse. Use the coordiantes for the centre you generated just above to centre the basemap\n", "#This is a variant on how we did things last time...\n", "\n", "def generateBaseMap(default_location=[35.870086207930626, 23.301798820980512], default_zoom_start=11):\n", " base_map = folium.Map(location=default_location, control_scale=True, zoom_start=default_zoom_start)\n", " return base_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "4-qIFOBUiIjU" }, "source": [ "
\n", " \n", " ### Review - basic map controls \n", " \n", "\n", "\n", "* generateBaseMap(default_location=[],default_zoom_start=) \n", "* location=: Define the default location to zoom at when rendering the map
\n", "* zoom_start=: The zoom level that the map will default to when rendering the map
\n", "* control_scale=: Shows the map scale for a given zoom level" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-11-19T05:34:04.687135Z", "start_time": "2018-11-19T05:34:04.674745Z" }, "colab": {}, "colab_type": "code", "id": "ZhOzNzEEiIjV" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_CheckingZeData \n", "\n", "#check the basemap is working\n", "base_map = generateBaseMap()\n", "base_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | | \n", "| ------------- | ------------- |------------- |\n", "|**MORE TOOLS**| |
|" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-11-19T06:05:45.764352Z", "start_time": "2018-11-19T06:05:45.761769Z" }, "colab": {}, "colab_type": "code", "id": "--pBpUiviIjc" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_ImportUrLibraries\n", "\n", "#lets get the heatmap tool, like last time, let's also get a measure control so we can measure distance\n", "from folium import plugins\n", "from folium.plugins import HeatMap\n", "from folium.plugins import MeasureControl\n", "\n", "# Measure controls are pretty neat. Rather than just having a scale bar, like you would in a static map, and needing to visually estimate the size of features, you can mesure them.\n", "# The ability to measure is a benefit of moving up the 'interactivity spectrum'." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.4.2 Visualise with a Heatmap" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "g-4G4lb9iIje" }, "source": [ "Let's start by visually comparing MRom to LRom, that is middle roman to late roman sites by putting their data in separate layers." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-11-19T05:34:05.037298Z", "start_time": "2018-11-19T05:34:04.818915Z" }, "colab": {}, "colab_type": "code", "id": "2p4cDhYeiIjf" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_Splitting your data \n", "\n", "# make a layer for when each period is more than 50% likely, so you have all the sites that are probably in that period\n", "survey_data_MRom = survey_data_maxtime.loc[(survey_data_maxtime['MRom'] > 50)]\n", "survey_data_ERom = survey_data_maxtime.loc[(survey_data_maxtime['ERom'] > 50)]\n", "\n", "# Yes, I know choosing a 50% cut-off is arbitrary. You could choose a different cut-off and change the values listed above. \n", "# If you do so, all your maps (and all your conclusions about change over time) will be affected." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "RQIM47Vs2Gtv" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", " \n", "In the last practical lab in *#codecell_makeabasicmap_ManipulatingyourData_UsingSymbology*, you used the symbol **==** . This is similar, you are using a mathematical symbol to filter (in this case split it in two using 50%)your dataframe: **>** greater than and **<** less than." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Da1dgQn7fsXx" }, "source": [ "### 3.4.3 Practicing the concept of layers\n", "\n", "We've introduced the concept of 'Spatial Data' in Chapter 1.1, and presented them as layers. Similarly, in our dataset, each layer contains information that can be turned on and off. Think of this like a stack of transparent paper. Each sheet of paper is a layer, and can be added to or taken away from the stack. Their order can also be changed.\n", "\n", "
\n", "\n", "\n", "
\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-11-19T10:09:37.970112Z", "start_time": "2018-11-19T10:09:34.940649Z" }, "colab": {}, "colab_type": "code", "id": "DxWjo2YUiIjo" }, "outputs": [], "source": [ "#codecell_Webmaps&Distributions_PrepareUrBasemaps_CreateLayers \n", "\n", "\n", "# like last time, make heatmaps, but one for each period, put them in different layers.\n", "\n", "# give your map a name\n", "base_map = generateBaseMap()\n", "\n", "# add two layers, one for each period\n", "\n", "mrom = HeatMap(data=survey_data_MRom[['DDLat', 'DDLon', 'MRom']].groupby(['DDLat', 'DDLon']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(base_map)\n", "erom = HeatMap(data=survey_data_ERom[['DDLat', 'DDLon', 'ERom']].groupby(['DDLat', 'DDLon']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(base_map)\n", "\n", "#give the layers sensible names that human can read\n", "mrom.layer_name = 'Middle Roman Distribution'\n", "erom.layer_name = 'Early Roman Distribution'\n", "\n", "# add the layer control. This is the tool that lets you turn different layers in your map on and off\n", "folium.LayerControl().add_to(base_map)\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "XxO-yK-h2RT6" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "In *#codecell__Webmaps&Distributions_PrepareUrBasemaps_CreateLayers*, you should refer to *#codecell_makeabasicmap_ManipulatingyourData_Heatmap* to review the steps taken. \n", " \n", "Last time, to create your heatmap you've used a size-based weighting (making larger sites have larger symbols). This time, you are creating two heatmaps named mrom ('Middle Roman Distribution')and erom ('Early Roman Distribution'). Because the data is located within the same island, the maps will overlap.\n", "\n", "The layer control plugin allows you to switch on and off the visibility of one or the other (or both). you can find some documentation on **LayerControl()** [here](https://python-visualization.github.io/folium/modules.html).\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "ExecuteTime": { "end_time": "2018-11-19T10:09:48.682799Z", "start_time": "2018-11-19T10:09:39.486706Z" }, "colab": {}, "colab_type": "code", "id": "mzgBodwBiIjr" }, "outputs": [], "source": [ "#codecell__Webmaps&Distributions_GenerateUrBasemap\n", "\n", "#Now generate your map by calling it by its name\n", "base_map" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "wSCv-r8eXuXE" }, "source": [ "### 3.4.4 An exercise\n", "\n", "Now try and add some more layers to the map to show other periods! What other periods might it be relevant to consider if you are trying to understand change over time? Edit the cell above to add more layers, or add a new cell below and follow the steps above to make a new map, and add your extra layers to it. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "## 3.5 Back end to Front end: Tidying, manipulating, rearranging again and again for a better design" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.5.1 Tidying" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "engTEEWWe454" }, "outputs": [], "source": [ "#codecell_makeabasicmap_ManipulatingyourData\n", "\n", "# Here I'm doing the same thing as before but with different periods\n", "# make a layer for when the max period is LRom or MRom to compare these periods\n", "survey_data_lrommax = survey_data_maxtime.loc[(survey_data_maxtime['colmax'] =='LRom')]\n", "survey_data_mrommax = survey_data_maxtime.loc[(survey_data_maxtime['colmax'] =='MRom')]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "u_ucaKUkfW_F" }, "outputs": [], "source": [ "\n", "#codecell_Webmaps&Distributions_SplittingUrData_CreateLayers \n", "\n", "# like last time, make heatmaps, but one for each period, put them in different layers\n", "base_map = generateBaseMap()\n", "\n", "lrommax = HeatMap(data=survey_data_lrommax[['DDLat', 'DDLon']].groupby(['DDLat', 'DDLon']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(base_map)\n", "mrommax = HeatMap(data=survey_data_mrommax[['DDLat', 'DDLon']].groupby(['DDLat', 'DDLon']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(base_map)\n", "\n", "#give the layers sensible names\n", "lrommax.layer_name = 'Late Roman Distribution'\n", "mrommax.layer_name = 'Middle Roman Distribution'\n", "\n", "# add the layer control\n", "folium.LayerControl().add_to(base_map)\n", "base_map\n", "\n", "\n", "# Adds a measure tool to the top right\n", "\n", "base_map.add_child(MeasureControl())\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "M85UnCT-lCKz" }, "source": [ "### 3.5.2 Making visual comparisons\n", "\n", "Stacking two layers on top of one another is one way to visually compare distributions. Do you think it is effective in this case?\n", "\n", "Perhaps it will be more useful to be able to view our distributions and explore them side by side, to help us compare what is happening in these two periods.\n", "\n", "\n", "We'll import a new tool to allow us to see two maps where the views are synced - that is identical and moving together - to have another way to compare distributions." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "2NEHqdsAhYYy" }, "outputs": [], "source": [ "# get another plugin for side by side maps\n", "from folium.plugins import DualMap" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "EBBVGMamhNNY" }, "outputs": [], "source": [ "# declare you are making a new map \"m\"\n", "\n", "# set the location to the location of your survey, set your starting zoom\n", "m = plugins.DualMap(location=location_survey, tiles=None, zoom_start=13)\n", "\n", "# the dual maps plugin automatically defines two buddy maps, \"m1\" and \"m2\" which pan and zoom together\n", "\n", "# give yourself some options in life for your base layers, add them to both maps 'm1' and 'm2' by just using \"m\"\n", "folium.TileLayer('cartodbpositron').add_to(m)\n", "folium.TileLayer('openstreetmap').add_to(m)\n", "\n", "\n", "# like last time, make heatmaps, one for each period, put them in different layers\n", "# put one layer in the left hand map 'm' and the other in the right hand map 'm2'\n", "\n", "lrommax = HeatMap(data=survey_data_lrommax[['DDLat', 'DDLon']].groupby(['DDLat', 'DDLon']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(m.m1)\n", "mrommax = HeatMap(data=survey_data_mrommax[['DDLat', 'DDLon']].groupby(['DDLat', 'DDLon']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(m.m2)\n", "\n", "#give the layers sensible names\n", "lrommax.layer_name = 'Late Roman Distribution'\n", "mrommax.layer_name = 'Middle Roman Distribution'\n", "\n", "# layer control time\n", "folium.LayerControl(collapsed=False).add_to(m)\n", "\n", "# Adds a measure tool to the top right\n", "\n", "m.add_child(MeasureControl())\n", "\n", "#draw your side by side maps\n", "\n", "m" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " \n", "\n", "\n", "Once you have done this exercise, I recommend you diving in NLS website and see the differences between [overlays](https://maps.nls.uk/geo/explore/#zoom=16&lat=55.8716&lon=-4.2894&layers=1&b=1) (remember to slide the transparency cursor) and [side by side](https://maps.nls.uk/geo/explore/side-by-side/#zoom=16&lat=55.8716&lon=-4.2894&layers=1&right=BingHyb) maps." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "ZYz4hI_1kImv" }, "source": [ "## 3.6 Visualisation and interpretation\n", "\n", "Thought exercise: The results of these two maps should be similar but slightly different. What is making the difference?\n", "\n", "How good are you at interpreting these distributions and comparing them visually? *This is me hinting at you that you are going to end up wanting to use statistics eventually*\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "XWAQRddKcogi" }, "source": [ " \n", "## 3.7 Think about basic principles\n", "\n", "The principles of what we've done this week are the same as the principles of what we did last week. \n", "\n", "I think it's important to learn to do things more than one way, and to adapt to slightly different tools. The software and code packages used for modern spatial analysis and mapping are pretty diverse and are always developing as people improve things. It doesn't make much sense to just learn one way of making maps mechanistically. The important thing is to understand the principles and aims of what you're doing. \n", "\n", "Key principles from this week are:\n", "\n", "* transforming, or re-shaping data to prepare it for mapping (or analysis)\n", "* visualizing categorical data\n", "* making layers in maps\n", "* adding tools to toggle layer visibility\n", "* adding tools to interactively measure things on maps\n", "\n", "In any code package that is meant to be used for making maps, odds are good you will find a way to set the zoom level, set the centre starting location, and set the initial scale. \n", "\n", "You will be able to set up colour schemes, map attributes, and make layers. Knowing keywords and princples is the important thing. \n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "0nGkpE2cU9xj" }, "source": [ "## 3.8 LexiCode\n", "\n", "In the past two practical labs, you have learned and experimented with programming commands - \n", "\n", "To re-use the codes - you will need to first load their respective libraries. So far, you have used ...:\n", "\n", "
\n", "\n", "> **libraries** |**plugins** |\n", ">--- |--- | \n", ">folium | HeatMapWithTime\n", ">branca| HeatMap\n", ">pandas| MeasureControl\n", ">geopandas| PrepareUrBasemaps_CreateLayers from [folium.plugins]\n", "\n", "\n", "your lexicode is non-exhaustive, keep expanding, find your own 'best way' to reuse these code/scripts...\n", "\n", "
\n", "\n", ">Lexicode_MakingaBasicMap | Lexicode_Webmaps&Distributions\n", ">--- | ---\n", ">\t== () [] | pd.concat()\n", ">.head_csv() | .dtype()\n", ">.read_csv() | astype()\n", ">mean() | fillna()\n", ">folium.Map | def return\n", ">range() | .apply(lambda x:*function*,axis=)\n", ">len() | pd.merge()\n", ">iloc[]| how= , left_index= ,left_index= \n", ">.value_counts()| gpd.GeoDataFrame()\n", ">if =:| geometry=gpd.points_from_xy\n", ">elif =: |print() \n", ">else =:| .isin()\n", ">folium.Marker()| classic.plot()\n", ">folium.Icon()| generateBaseMap()\n", ">folium.Circle| .groupby(['', ''])\n", ">popup= | .reset_index()\n", ">radius= | max_zoom=\n", ">.values.tolist() |folium.TileLayer()\n", "> .add_to()| plugins.DualMap(location= , tiles= , zoom_start= )\n", "> | \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.9 References\n", "\n", "Orengo, H.A. Garcia-Molsosa, A. (2019) A brave new world for archaeological survey: Automated machine learning-based potsherd detection using high-resolution drone imagery. Journal of Archaeological Science,Volume 112.\n", "\n", "Bonnier, A., Finné, M., & Weiberg, E. (2019). Examining land-use through GIS-based kernel density estimation: A re-evaluation of legacy data from the berbati-limnes survey. Journal of Field Archaeology, 44(2), 70-83.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 3.10 The End\n", "\n", "That's all for this chapter. Be sure to save your copy of the notebook in your own repo so you can re-use it!" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "9Oe23Q7tlMna" }, "source": [] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "l8JEjBNu-BQW" }, "source": [ "# Chapter 4: More about spatial patterns: The case of Gabii\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.1 Introduction: exercise motivations ?\n", "\n", "Understanding the meanings behind patterns of finds recovered through excavation is a tricky problem. We hope to distinguish activity areas, places devoted to domestic and industrial use, or inhabited places that are distinct from liminal ones. We often want to discern change over time, identifying areas with finds associated with different temporal periods. \n", "\n", "To successfully unravel these patterns, we must look not only at the **distributions** of different types of finds, but how they **correlate** with one another, the character of the contexts in which they were recovered, and their own physical and social characteristics. Are they likely to be curated? Are they light and likely to be moved from one area to another by post-depositional processes? It's all a bit of a mess. \n", "\n", "Importantly, all these processes are spatial. Alignments or proximity between areas with similar (or quite different) finds is potentially meaningful. \n", "\n", "The aims of this exercise is for you to:\n", "* learn to work real special finds data from an excavation, in all its messiness, to look for spatial patterns and relationships. **-> helping you to better identify spatial patterns**\n", "* start thinking about quantitative and spatial approaches to finds data from excavations and how they can help us better understand the patterns we see. **-> thelping you to better quantifying spatial patterns**\n", "\n", "You'll do this using data collected by the Gabii Project, a 10+ year excavation in central Italy. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | | \n", "| ------------- | ------------- |------------- |\n", "|
| = |
|" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 641 }, "colab_type": "code", "id": "awel6EMM-BQY", "outputId": "b892dee6-b268-46d6-cc37-0d43133c960d" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_ImportUrLibraries\n", "\n", "# as usual, start by getting your tools: your prerequisites.\n", "\n", "\n", "%matplotlib inline\n", "# Matplotlib is your tool for drawing graphs and basic maps. You need this!\n", "!pip install fiona\n", "!pip install geopandas\n", "import pandas as pd\n", "import requests\n", "import fiona\n", "import geopandas as gpd\n", "import ipywidgets as widgets\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "vpZN9KC_ZxJ5" }, "source": [ "
\n", "\n", "\n", " In *#codecell_SpatialPatterns_ImportUrLibraries*. These are what we call **prerequisites**. You know by now that they are basic tools so you can get started. Let's review them broadly:\n", " \n", "* *Pandas* manipulate data. \n", "* *Geo-pandas* manipulate geographic data. They're also black and white and like to eat bamboo... You need these to manipulate your data!\n", "* *Fiona* helps with geographic data (find more about [fiona](https://pypi.org/project/Fiona/)).\n", "* *Requests* are for asking for things. It's good to be able to ask for things.\n", "* *ipywidgets* supports interactivity." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "YDGLKQ8_iwBc" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "\n", "## 4.2 Getting to know your Data...##" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 158 }, "colab_type": "code", "id": "siGQqpwn-BQc", "outputId": "720a8a11-68b1-457a-d39a-85a84465603a" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_ImportUrData\n", "\n", "# then get your data\n", "# This is where I put the data. It's in a format called geojson, used to represent spatial geometry (shapes) and attributes (text).\n", "url = 'http://ropitz.github.io/digitalantiquity/data/gabii_SU.geojson'\n", "\n", "# Please get me the data at that web address (url):\n", "# use requests.get to retrieve data from any destination\n", "request = requests.get(url)\n", "\n", "# I will use the letter 'b' to refer to the data, like a nickname\n", "#we can use requests to read the response content in bytes\n", "b = bytes(request.content)\n", "\n", "#So we will use fiona.BytesCollection referred by the letter 'f':\n", "# to read the raw data (as single-file formats or zipped shapefiles)\n", "# to wrap up all the data from 'b'\n", "# check the coordinate refereence system (crs) listed in the features\n", "with fiona.BytesCollection(b) as f:\n", " crs = f.crs\n", " #by using also GeoDataFrame.from_features you can read geospatial data that's in the url without saving that data to disk (your PC) first\n", " gabii_su_poly = gpd.GeoDataFrame.from_features(f, crs=crs)\n", " # and print out the first few lines of the file, so I can check everything looks ok: you know this by now...we will call .head()\n", " print(gabii_su_poly.head())\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "8bXO2Veyf4b-" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", " In *#codecell_SpatialPatterns_ImportUrData*:
\n", " \n", " *GeoJSON* is used to store the excavation data. GeoJSON format allows to encode a variety of geographic data structures which contains features with spatial attributes (e.g. points, line strings, polygons, multiparts geometries) and non-spatial attributes (text). This is a really useful format to use when creating a GIS.
\n", " *bytes()* method returns bytes object which is an immmutable (cannot be modified) sequence of integers. We tend to use this to compress data, save or send it.
\n", " *requests.get* is used to retrieve data from any destination & *requests.content* to read all content which is in bytes (for non-text requests using the .content property).
\n", " *BytesCollection()* takes a buffer of bytes and maps to a virtual file that can then be opened by fiona. By using both fiona.BytesCollection and GeoDataFrame.from_features you can:\n", "* to read the raw data (as single-file formats or zipped shapefiles)\n", "* to wrap up all the data from 'b'\n", "* check the coordinate refereence system (crs) listed in the features\n", "\n", "\n", " **Open source**\n", "\n", "So far, we have abundantly benefitted from open-source software, data, tools, code, design documents, or content. It is only natural to open, share and use the results of 10 years excavation..." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "4aXd2B1bre1_" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "## 4.3 Assessing your data visually...\n", "\n", " So you know what's in this dataset... Maybe you want to see it before you start querying or analysing it? ... Start by visualising the spatial data for all the contexts (stratigraphic units) from the excavation we'll be exploring.\n", "\n", "So far you have dealt with survey data where all data has been logged with coordinates (x, y and sometimes z for the elevation height). However, it is not always possible, or even meaningful, to record the coordinates of all artefacts retrieved during an excavation, especially if it lasts over several years. It is then more appropriate to use stratigraphical units (SU). The spatial analysis of these finds is more challenging as they don't have a spatial location per se. We need to think carefully about organising and presenting them. Colour can be a great help for this!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 874 }, "colab_type": "code", "id": "wVIYnLr0-BQg", "outputId": "9b581c0b-4a65-4c17-d5d1-f3faf0356959" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_PlottingUrData\n", "\n", "\n", "# Now we have polygons, the shapes of our contexts. Let's visualise the data to double check that all is well\n", "# We'll use again the function .plot (see lab_Webmaps&Distributions)\n", "# 'plot' means draw me an image showing the geometry of each feature in my data. \n", "# We want to control things like the color of different types of features on our map. \n", "# I used the 'Blues' colorscale command (cmap stands for 'colour map') \n", "# and asked it to draw the polygons differently based on the type of feature.\n", "\n", "gabii_map1 = gabii_su_poly.plot(column='DESCRIPTIO', cmap='Blues', edgecolor='grey', figsize=(15, 15));\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "yVgg9fN1-BQj" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "In *#codecell_SpatialPatterns_PlottingUrData*, when using .plot(), when column=' ' is specifed, the plot colouring is based on its values. The colour scale is defined by using cmap='' , the edge of the features (our polygons) are defined by edgecolor='' and the size of the plot by figsize=(width, height)) .\n", "\n", "you can of course add parameters/symbologies to your plots. The choice of parameters is largely dictated by your data analysis. [Here](http://geopandas.org/mapping.html) is some documentation on plot generation.\n", "\n", "The colorscale options are: Accent, Accent_r, Blues, Blues_r, BrBG, BrBG_r, BuGn, BuGn_r, BuPu, BuPu_r, CMRmap, CMRmap_r, Dark2, Dark2_r, GnBu, GnBu_r, Greens, Greens_r, Greys, Greys_r, OrRd, OrRd_r, Oranges, Oranges_r, PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples, Purples_r, RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r, Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, Spectral_r, Wistia, Wistia_r, YlGn, YlGnBu, YlGnBu_r, YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r, afmhot, afmhot_r, autumn, autumn_r, binary, binary_r, bone, bone_r, brg, brg_r, bwr, bwr_r, cividis, cividis_r, cool, cool_r, coolwarm, coolwarm_r, copper, copper_r, cubehelix, cubehelix_r, flag, flag_r, gist_earth, gist_earth_r, gist_gray, gist_gray_r, gist_heat, gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow, gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r, gray, gray_r, hot, hot_r, hsv, hsv_r, inferno, inferno_r, jet, jet_r, magma, magma_r, nipy_spectral, nipy_spectral_r, ocean, ocean_r, pink, pink_r, plasma, plasma_r, prism, prism_r, rainbow, rainbow_r, seismic, seismic_r, spring, spring_r, summer, summer_r, tab10, tab10_r, tab20, tab20_r, tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, viridis, viridis_r, winter, winter_r\n", "\n", "Swap out 'Blues' in the cell above for any of these options...\n", "\n", "
\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "asEFZLAQr04v" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "## 4.4 Loading the special finds\n", "\n", "Like many excavations, not every special finds in this datset has spatial coordinates associated with it (because in real archaeology life things are found in the sieve, the wheelbarrow, and during washing). \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 406 }, "colab_type": "code", "id": "zikTZlaj-BQk", "outputId": "b497e506-2592-4a80-a42b-45c4c558799c" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_WhichTypeOfSpecialFinds&fromWhere?\n", "\n", "# Now I'm going to bring in all the basic Gabii special finds data - descriptions, object types, IDs and the contexts from which they come.\n", "# We've had a few special finds over the years.\n", "sf_su = pd.read_csv(\"https://raw.githubusercontent.com/ropitz/gabii_experiments/master/spf_SU.csv\")\n", "sf_su" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "colab_type": "code", "id": "JhUKJYdCLfGt", "outputId": "5f546b1c-65c6-43dd-c3c7-d51a9323b86e" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_WhichTypeOfSpecialFinds?\n", "\n", "#the set()function allows to return all values (here our special finds type) without duplicates \n", "#this is a useful tool when you need to standardise your finds labels (and check your metadata for spelling!)\n", "sf_su_desc = sf_su['SF_OBJECT_TYPE']\n", "set(sf_su_desc)\n", "\n", "#however, this is a really long list to deal with, so we need to find ways to prepare this dataset to really see what has been happening on this site." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "1JPhVomY-BQn" }, "source": [ "One of our area supervisors, Troy, is super excited about tools related to textile production. They're a great example of how we think about special finds at Gabii. Multiple types of finds are related to textile production. Do we find all types everywhere? Are certain types of tools more concentrated in one type of context or one area than others? Troy has lots of questions about the patterns of places where we find these tools. Do they provide evidence for early textile production? Are they a major factor in the city's early wealth? Do we find the same things in later periods? After all, people under the Republic and Empire wore clothes... Loom Weights, spools, and spindle whorls are the most common weaving tools at Gabii.\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "MHlxZSplR5W6" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "\n", "## 4.5 Preparing your data, a prerequisite to all analysis ##\n", "\n", "### 4.5.1 Selection ### \n", "As this data is not yet spatial and only associated with a stratigraphic unit, logically, we can merge our non-spatial special finds data with our spatial stratigraphic units data to make all our data spatial." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 406 }, "colab_type": "code", "id": "OCrCdSIQ-BQo", "outputId": "ea40b1b1-8f9b-463c-d588-f1e8d2e1901c" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_SpecialFindsSelection\n", "\n", "#Let's pull all those find types out of the big list. \n", "#We're selecting the finds data we want to work with before merging with the spatial data. We could do these operations in reverse if we wanted to.\n", "#here very much like in lab1,#codecell_makeabasicmap_BringingUrData2theMap, & lab2, #codecell_Webmaps&Distributions_SplittingUrData, we are using iloc and isin functions\n", "\n", "types = ['Loom Weight','Spool','Spindle Whorl']\n", "textile_tools = sf_su.loc[sf_su['SF_OBJECT_TYPE'].isin(types)]\n", "textile_tools\n", "\n", "#we now have a new dataframe containing only textile_tools" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "ChQKlGPcs9qM" }, "source": [ "### 4.5.2 Listing and merging to become spatial ### \n", "Presence or absence isn't everything. You may want to know how many of a certain type of find is present in a given area." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 230 }, "colab_type": "code", "id": "QG9Hbhn0-BQq", "outputId": "3f928824-7949-48c6-8274-aae8e977caec" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_TextileToolsListing\n", "\n", "# Now let's count up how many of these tools appear in each context (SU).\n", "# pd.value_counts() functioon returns a series containing counts of unique values.\n", "# So we can print out a list of the number of textile_tools in each SU next to that SU number.\n", "\n", "pd.value_counts(textile_tools['SU'].values, sort=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 197 }, "colab_type": "code", "id": "q3x4foKL-BQt", "outputId": "dbb90939-1125-4158-bf30-29955710ba0f", "scrolled": true }, "outputs": [], "source": [ "###codecell_SpatialPatterns_TextileToolsBecomesSpatial\n", "\n", "#Then let's combine the special finds data with our polygons representing context with shape and a spatial location\n", "# We do this with a command called 'merge'. In lab2,#codecell_Webmaps&Distributions_MergingZeData, you have used pandas pd.merge()\n", "\n", "gabii_textools = gabii_su_poly.merge(textile_tools, on='SU')\n", "\n", "# very much like p.merge(), you have now created a new dataframe ('gabii_textools') by merging dataframe 'textile_tools' on= SU, the stratigraphical unit\n", "# let's have a look at the new dataframe using .head() to print out just the first few rows.\n", "gabii_textools.head()\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "LhKJFX1wXMkP" }, "source": [ "### 4.5.3 Visual assessment" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 892 }, "colab_type": "code", "id": "KXxiPahi-BQw", "outputId": "534dc2f8-6622-460f-d62f-9eda8483a7f8" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_SeeingTextileToolsinContext\n", "\n", "# If we want to see this result as a map, we just add the .plot command to the end of the dataframe's name\n", "# here .plot() symbology is expanded to transparency with 'alpha=' where value of 1 is complete opacity and 0 complete transparency \n", "\n", "gabii_textools.plot(column='SF_OBJECT_TYPE', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "87-PZM6l-BQz" }, "source": [ "#### But what do you see really?\n", "OK, what do you see here? Compare the distribution of each type of textile tool. Do some types seem to be **concentrated** in certain areas? How might you check? What **factors** might contribute to this pattern? Do big layer simply aggregate lots of stuff? Do late dumps contain early materials? Why would one type of tool appear where the others don't?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.5.4 Sorting and tidying" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 406 }, "colab_type": "code", "id": "HY3cT_mV-BQ0", "outputId": "d5b18843-648f-4ab6-b0db-1be8dd3f61bf" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_SortingDataTextileTools\n", "\n", "# We can try and see the relationship between layer size and count by sorting\n", "#our list of finds by the surface area of each layer.\n", "# We use the command 'sort_values' \n", "\n", "gabii_textools.sort_values(by=['Shape_Area'],ascending=False)\n", "\n", "# '.sort_values' function sort along their axis (here the axis is defined by 'Shape_Area' ). \n", "# the default sorting is on ascending values (smallest to largest) if you are happy with this =True, however, we want to see them in descending order, so we select =False" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "dQKRWndWJkqb" }, "source": [ "#### Knowing your site and refining your analysis ####\n", "\n", "Gabii excavations have revealed that there are enormous colluvial layers. This is an important consideration as these large areas will contribute to a bias distribution of the artefacts across the site. Therefore, very large areas should probably be excluded." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "Vs3Lvg3O-BQ4" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_RefiningDataSorting\n", "\n", "# Outliers will mess with any analysis. Here large stratigraphical layer are our outliers\n", "# By cutting out these layers i.e. excluding SUs with a surface area greater than 800 we can deal with these outliers\n", "\n", "gabii_textools2 = gabii_textools.loc[gabii_textools['Shape_Area']<800]\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 892 }, "colab_type": "code", "id": "vXgBTzkX-BQ7", "outputId": "f9a5f2a3-53c8-4d8c-fe07-36ebafa07b21" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_VisualisingDataSorting\n", "\n", "# If we want to see this result as a map, we just add the .plot command to the end again.\n", "\n", "gabii_textools2.plot(column='SF_OBJECT_TYPE', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)\n", "\n", "# That's better. Plot the results to see that you've removed the big colluvial layers." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "skqDgXvhNKgq" }, "source": [ "#### Grouping and merging further #### \n", "\n", "to answer to question: *how many of each tool type appears in each SU?* You will need to further group and merge your data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 436 }, "colab_type": "code", "id": "tW8PUkQI-BQ-", "outputId": "a0202828-010e-469b-a3e5-9eba1b5531d9" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_GroupingData\n", "\n", "# OK, count up how many of each tool type appears in each SU using the 'groupby' command. \n", "# You have used this command before in #codecell_Webmaps&Distributions_SplittingUrData_CreateLayers \n", "## and .fillna() was explained in codecell_Webmaps&Distributions_AllNumbers \n", "\n", "textools_counts = gabii_textools2.groupby('SU')['SF_OBJECT_TYPE'].value_counts().unstack().fillna(0)\n", "\n", "\n", "# Sort the list so that the SUs with the most stuff end up at the top.\n", "textools_counts.sort_values(by=['Loom Weight','Spindle Whorl','Spool'], ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 197 }, "colab_type": "code", "id": "uZsp42WE-BRA", "outputId": "87ecb666-883a-41c2-aa7f-ef02e58962ea" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_MergingData\n", "\n", "# Merge your textile tool counts with your spatial data for the contexts\n", "# Because both dataframes have a 'SU' column, you can use this to match up the rows. \n", "# so the merger will occur on='SU'\n", "\n", "gabii_textools_counts = gabii_su_poly.merge(textools_counts, on='SU')\n", "gabii_textools_counts.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "dQVlzdgZtU8c" }, "source": [ "### 4.5.5 Visual assessment: exploring your data ### \n", " \n", " Side by side plots of different variables can help you to visualize the differences between the spatial patterns you're exploring. Very much like in lab2_MakeaBasicMap, when you compared Late Roman and\n", "Middle Roman artefact distributions using two heatmaps side-by-side. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "colab_type": "code", "id": "sr_Wmixp-BRD", "outputId": "88aa6e47-3e3d-4a77-ce9b-eca99a3265c9" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_AssessingFindType\n", "\n", "# Let's start by looking at each class of textile tool individually. \n", "# Plot the counts of each type of find spatially\n", "\n", "gabii_textools_counts.plot(column='Loom Weight', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5, legend_kwds={'label': \"Number of Loom weight\",'orientation': \"vertical\"})\n", "gabii_textools_counts.plot(column='Spindle Whorl', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5, legend_kwds={'label': \"Number of Spindle Whorl\",'orientation': \"vertical\"})\n", "gabii_textools_counts.plot(column='Spool', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5, legend_kwds={'label': \"Number of Spool\",'orientation': \"vertical\"})\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 874 }, "colab_type": "code", "id": "QBDAmDtW-BRG", "outputId": "636bc085-c5ef-4410-b734-4a0ce61a8c20" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_PlottingAllFindType\n", "# here's another visualisation. I've chosen a single colour scale - so shades of red, shades of blue, shades of green, to show the quantity of each type of find in a single map.\n", "\n", "base = gabii_textools_counts.plot(column='Loom Weight', cmap='Blues', figsize=(15, 15), alpha=0.7)\n", "gabii_textools_counts.plot(ax=base, column='Spindle Whorl', cmap='Reds', alpha=0.7)\n", "gabii_textools_counts.plot(ax=base, column='Spool', cmap='Greens', alpha=0.7);\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "stOFt2mulEpQ" }, "source": [ "#### Let's get another library to help visualisation ####\n", "So far, it has been difficult to see what's happening, to identify activities between the buildings and to compare the maps when we have to scroll.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "mPhlEUHClPK4" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_ImportUrLibraries\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 362 }, "colab_type": "code", "id": "6qjbSNcx-BRJ", "outputId": "a1780041-9dec-43aa-be29-25a349586e1d" }, "outputs": [], "source": [ "###codecell_SpatialPatterns_AllFindTypeSidebySide\n", "\n", "# Let's put the maps side by side to help with comparative visualisation.\n", "fig, axes = plt.subplots(ncols=3,figsize=(15, 5))\n", "gabii_textools_counts.plot(column='Loom Weight', cmap='autumn', ax=axes[0], legend=True, legend_kwds={'label': \"Number of Loom weight\",'orientation': \"vertical\"}).axis('equal')\n", "gabii_textools_counts.plot(column='Spindle Whorl', cmap='autumn', ax=axes[1], legend=True, legend_kwds={'label': \"Number of Spindle Whorl\",'orientation': \"vertical\"}).axis('equal')\n", "gabii_textools_counts.plot(column='Spool', cmap='autumn',ax=axes[2], legend=True, legend_kwds={'label': \"Number of Spool\",'orientation': \"vertical\"}).axis('equal')" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "82KrdqwQ-BRM" }, "source": [ "### 4.5.6 Questioning your maps ###\n", "\n", "Can you see any **patterns** here? Do the different types of tools **concentrate** in the same parts of the site? Why might different types of tools have different **distributions**? " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "i_WBqeTEtcPh" }, "source": [ "
\n", " \n", " ### ~ warning ~ \n", " \n", "\n", "\n", "*OK, this next cell is a big scary cell is because something has broken after I drafted this exercise. Push run to fix the thing they've broken (hopefully).*" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "both", "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "colab_type": "code", "id": "-l6FaKTG92ZJ", "outputId": "1c600f88-7937-4ea6-b876-1fb7358bab32" }, "outputs": [], "source": [ "\n", "#@title\n", "!apt-get install -qq curl g++ make\n", "#@title\n", "!curl -L http://download.osgeo.org/libspatialindex/spatialindex-src-1.8.5.tar.gz | tar xz\n", "#@title\n", "import os\n", "os.chdir('spatialindex-src-1.8.5')\n", "#@title\n", "!./configure\n", "#@title\n", "!make\n", "#@title\n", "!make install\n", "#@title\n", "!pip install rtree\n", "#@title\n", "!ldconfig\n", "#Working through the example at http://toblerity.org/rtree/examples.html\n", "#@title\n", "from rtree import index\n", "from rtree.index import Rtree\n", "#@title\n", "p = index.Property()\n", "idx = index.Index(properties=p)\n", "idx" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "MeUS-o2Jm6dW" }, "source": [ "## 4.6 Quantifying these patterns ##\n", "\n", "### 4.6.1 Using statistics to explore, characterize and quantify spatial patterns\n", "\n", "There is a human limitation to recognise more than 3+ attributes/variables out of 100 point data such as this four-dimensional tic-tac-toe grid:\n", "\n", "
\n", "\n", "That's why we have to use statistics, spatial statistics to recognise and more to the point quantify patterns." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | | \n", "| ------------- | ------------- |------------- |\n", "|
| = |
|\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 781 }, "colab_type": "code", "id": "EVtbdZPW-BRN", "outputId": "f5cb484e-7e48-4dbe-9ae9-2b209a1a5140" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_ImportUrLibraries\n", "\n", "# I think the distributions of different weaving tools vary.\n", "# To investigate further, we are going to need more tools. Specifically we need statistical tools. \n", "# pysal, numpy and sklearn are all useful for statistics. Seaborn is useful for visualisation. \n", "!pip install pysal\n", "import pysal\n", "from sklearn import cluster\n", "import seaborn as sns\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "0qHclmig-BRP" }, "source": [ "### 4.6.2 Data Clustering ### \n", "\n", "We're going to use **cluster analysis** to try and better understand our patterns. Clustering is a broad set of techniques for finding groups within a data set. Cluster analysis has as its objective grouping together similar observations (unlike factor analysis works by searching for similar variables).\n", "
\n", "Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. The clustering algorithm tests the hypothesis that data points of the same group have similar properties, and that data points in different groups have dissimilar properties. So, when we cluster observations, we want items in the same group to be similar and items in different groups to be dissimilar.
\n", " \n", "M.Fortin & M. Dale (2005) explain that\n", "> \" Arising from time series statistics and the more familiar parametric statistics, spatial statistics quantify the degree of self-similarity of a variable as a function of distance. These spatial statistics assume that, within the study area, the parameters of the function defining the underlying process, such as the mean and the variance, are constant regardless of the distance and direction between the sampling locations. Then the goal of spatial statistics is **to test the null hypothesis of absence of ‘spatial pattern’**. The null hypothesis implies that nearby locations (or attributes, measures) do not affect one another such that there is independence and spatial randomness (Figure 6.2(b)). The **alternatives** are that there is clustering and thus positive spatial autocorrelation (Figure 6.2(a)) or repulsion and negative spatial auotocorrelation (Figure 6.2(c)) \n", "\n", "
\n", "\n", "\n", "
\n", "\n", "**K-Means clustering** is probably the most well-known clustering algorithm that solve clustering problem by splitting a dataset into a set of k (k being an arbitrary number you get to choose) groups. \n", "\n", "We can only recommand Ben Alex Keen [blog]( https://benalexkeen.com/k-means-clustering-in-python/) to see how Python works through clustering! Here is a simple version of K-Means Clustering explained in 5 visual steps:
\n", "\n", "\n", "|Step1|Step2|Step3|Step4|Step5|\n", "|------|------|------|------|------|\n", "|Assign each points to similar centre: we need to identify the number of classes to use by looking at the data and identifying any discrete groupings. |Identify the cluster centroids (3 coloured symbols on graph).|Reassign the points based on the minimum distance or closest from cluster centroids. Here we should emphasise that there many possible definitions that may be used for “closest” (e.g.[nearest neighbour]( https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761), Ward's method). |Identify the new centroids by taking the average of all points in the cluster. |Reassign the groupings -points and assignment- until points stop changing clusters in a loop. |\n", "|
|
|
|
|
|\n", "\n", "\n", "\n", "Because clustering allows us to identify which things are alike on the basis of multiple characteristics, we will do just that. We will use cluster analysis to see which SUs are similar in terms of all three types of textile tools: loom weights; spools; spindle whorls. In #codecell_SpatialPatterns_ImportUrLibraries, we import a cluster tool from sklearn to help us with it." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "aImzRH4J3Xdp" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "#### Clustering types of textile tools ####\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Yk1ZjHSFoHkK" }, "source": [ "**Let's use python to run K-means function** \n", "\n", "First, we want to cluster together **contexts** where the pattern **of the three types of textile tools** are similar." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 70 }, "colab_type": "code", "id": "XkkAU1rK-BRR", "outputId": "faabfdd7-d7e1-4908-c6dc-9a29efc9081e" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_ClusterTextileTools\n", "\n", "# Next step: cluster together contexts where the pattern of the three types of textile tools are similar, \n", "# with and without respect to the size of the SU.\n", "# In this cell we are including information on the size of the SU\n", "# we will use three functions .cluster.Kmeans, .fit() and .drop():\n", "\n", "km5 = cluster.KMeans(n_clusters=5)\n", "km5cls = km5.fit(gabii_textools_counts.drop(['geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU'], axis=1).values)\n", "km5cls\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "34muNG5z6UGu" }, "source": [ "
\n", " \n", " #### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "In #codecell_SpatialPatterns_ClusterTextileTools, by:\n", "* using .cluster.Kmeans() function,\n", " we have arbitrarily given k (n_clusters) an arbitrary value of 5 groups\n", "
\n", "* using .fit() function, we wanted to estimate the best representative function for the data points on account for the size of the context (*'Shape_Area'*) and counts of different types of tools(*Loom Weight, Spindle Whorl, Spool*). \n", "* using .drop() function Drop all the other fields (*'geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU'*)." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "BqmxImFv-BRW" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "#### Let's visualise this first step ####\n", "\n", "Each cluster produced should contain the **SUs that are similar to one another on the basis of the number of each type of textile tool and the size of the surface area of the SU**. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 845 }, "colab_type": "code", "id": "ZibZbvtQ-BRX", "outputId": "f770b153-ba26-482c-ad84-99907f5c9e1c" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_SubPlotClusterTextileTools\n", "\n", "# Plot the clusters, groups of contexts that have similar textile tool assemblages.\n", "# Give a different colour to the SUs that belong to each cluster.\n", "\n", "#creating a single chart\n", "f1, ax = plt.subplots(1, figsize=(15,15))\n", "#plt.subplots() is a function that returns a tuple (=sequence of immutable Python objects like a list)\n", "#this tuple contains a figure and axes object(s)\n", "#so, when using fig, ax = plt.subplots() you unpack this tuple into the variables fig and ax.\n", "#here f1 is your figure\n", "\n", "#here we want to assign() our new clustering labels 'km5cls.labels_'to the dataframe\n", "#and also plot the cluster ('cl'). (nb: that ax=ax are the features/objects here)\n", "gabii_textools_counts.assign(cl=km5cls.labels_)\\\n", " .plot(column='cl', categorical=True, legend=True, \\\n", " linewidth=0.1, cmap='Accent', edgecolor='white', ax=ax)\n", "\n", "ax.set_axis_off()\n", "#turn x and y-axis off which will affect the axis lines, ticks, ticklabels, grid and axis labels\n", "\n", "#let's see the plot using command plt.show()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 845 }, "colab_type": "code", "id": "Cee0MaMR-BRc", "outputId": "179b8e11-dbc5-4374-a1f2-4bfbb37308c7" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_ClusterTextileTools_2\n", "\n", "#Do the same as in ##codecell_SpatialPatterns_ClusterTextileTools\n", "#but let's ignore the size of the context so we also .drop('Shape_Area').\n", "\n", "\n", "km5 = cluster.KMeans(n_clusters=5)\n", "km5cls2 = km5.fit(gabii_textools_counts.drop(['geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU','Shape_Area'], axis=1).values)\n", "\n", "#we plot as in ##codecell_SpatialPatterns_SubPlotClusterTextileTools\n", "#this time a'f2' and our new cluster 'cl2'\n", "\n", "f2, ax = plt.subplots(1, figsize=(15,15))\n", "\n", "gabii_textools_counts.assign(cl2=km5cls2.labels_)\\\n", " .plot(column='cl2', categorical=True, legend=True, \\\n", " linewidth=0.1, cmap='Accent', edgecolor='white', ax=ax)\n", "\n", "ax.set_axis_off()\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "cGObkUxd-BRf" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "## 4.7 Interpreting the results: the cluster way ##\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "YUiX07wuhHnw" }, "source": [ "#### 4.7.1 Visualising clustering analysis differences ####\n", "\n", "The patterns are definitely different. How can we interpret the fact that context size affects the pattern of the distribution of textile tools? Do big units, which perhaps represent dumps or colluvial mashups, have a fundamentally different character than the varied small contexts?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 348 }, "colab_type": "code", "id": "T9dmn2j4-BRg", "outputId": "5b4d5986-9aa7-4ae7-c4da-ba4fc0b61692" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_SubPlotClustersSidebySide\n", "\n", "# Look at the difference with and without context size taken into account.\n", "\n", "#we are plotting just like in##codecell_SpatialPatterns_SubPlotClusterTextileTools\n", "#however we want to see them side by side to see effect of our selection.\n", "#To do so, we define our plt.subplots as having 2 columns (ncols=2)\n", "\n", "fig, axes = plt.subplots(ncols=2,figsize=(15, 5))\n", "\n", "\n", "gabii_textools_counts.assign(cl2=km5cls2.labels_)\\\n", " .plot(column='cl2', categorical=True, legend=True, \\\n", " linewidth=0.1, cmap='Accent', edgecolor='white', ax=axes[0]).axis('equal')\n", "gabii_textools_counts.assign(cl=km5cls.labels_)\\\n", " .plot(column='cl', categorical=True, legend=True, \\\n", " linewidth=0.1, cmap='Accent', edgecolor='white', ax=axes[1]).axis('equal')\n", "\n", "# note here that we have to 2 axes for our to feature classes: \n", "# ax=axes[0] for the first subplot and ax=axes[1] for the second subplot " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "EhrydzJshzPs" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "#### 4.7.1 Characterising our cluster analyses ####\n", "\n", "There are many ways to label your data in matplolib, [here](https://nikkimarinsek.com/blog/7-ways-to-label-a-cluster-plot-python) are some examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 197 }, "colab_type": "code", "id": "Zevk9cjl-BRi", "outputId": "00cf37b5-baed-405f-b6c7-fe6e033b3c30" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_AddLabelClusterstoDataframe\n", "\n", "# assign the cluster IDs to each context permanently\n", "gabiitextools_clas = gabii_textools_counts.assign(cl=km5cls.labels_)\n", "gabiitextools_class = gabiitextools_clas.assign(cl2=km5cls2.labels_)\n", "\n", "#and let's check how your dataframe looks like\n", "gabiitextools_class.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "nOKka5YAjLWl" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "#### 4.7.2 Data interpretation: push your cluster analyses further ####\n", "\n", "Data exploration can help us to understand how well the cluster analysis performed.\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "_J5gcA1Uz7Yg" }, "source": [ "##### Let's compare the effect of area size to one cluster group/class #####\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 348 }, "colab_type": "code", "id": "3AIxnf2H-BRp", "outputId": "840e4c46-f9af-4a86-ad50-e685f2685829" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_SubselectClustergroupN.0\n", "\n", "# Now let's look at some individual classes, with and without context size accounted for in the analyses.\n", "# We choose the cluster group/class 0\n", "# we use .loc[] command (see##codecell_makeabasicmap_BringingUrData2theMap & #codecell__Webmaps&Distributions_SplittingUrData)\n", "# and create 2 dataframes:\n", "# With=> [gabiitextools_class['cl']where we only select (==) group 0\n", "# Without=> [gabiitextools_class['cl2']where we only select (==) group 0\n", "\n", "gabiitextools_class0=gabiitextools_class.loc[gabiitextools_class['cl']==0]\n", "gabiitextools_class0noarea=gabiitextools_class.loc[gabiitextools_class['cl2']==0]\n", "\n", "#we are plotting just like in##codecell_SpatialPatterns_SubPlotClustersSidebySide\n", "\n", "fig, axes = plt.subplots(ncols=2,figsize=(15, 5))\n", "gabiitextools_class0.plot(ax=axes[0], legend=True).axis('equal')\n", "gabiitextools_class0noarea.plot(ax=axes[1]).axis('equal')" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "8Lh1gTWMmwo7" }, "source": [ "##### Effect of changing cluster numbers ####" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 845 }, "colab_type": "code", "id": "KdCiHX2N-BRt", "outputId": "636957ad-8818-4601-8016-28caabaa5a8c" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_ClusterTextileTools_7types_WithContextSize \n", "\n", "# What happens when we change the number of clusters (groups)?\n", "# let's repeat our first analysis (##codecell_SpatialPatterns_ClusterTextileTools) \n", "# and let's keep the context size ('Shape_area')\n", "km7 = cluster.KMeans(n_clusters=7)\n", "km7cls3 = km7.fit(gabii_textools_counts.drop(['geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU'], axis=1).values)\n", "f3, ax = plt.subplots(1, figsize=(15,15))\n", "\n", "gabii_textools_counts.assign(cl3=km7cls3.labels_)\\\n", " .plot(column='cl3', categorical=True, legend=True, \\\n", " linewidth=0.1, cmap='Accent', edgecolor='white', ax=ax)\n", "\n", "ax.set_axis_off()\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "mfWmcs0StUCo" }, "source": [ "That also changes things. Without going into too much detail, finding the ideal number of clusters is challenging. There are many statistical techniques to help you choose the best number of clusters. However, this is for a more advanced stats course! For now, just try playing around with the number of clusters in the notebook, or the size cut-off for inclusion. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 845 }, "colab_type": "code", "id": "UwtRjnKo-BRx", "outputId": "e7538a4b-722c-41e5-c057-76a9fe8494ca" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_ClusterTextileTools_7types_WithoutContextSize\n", "\n", "\n", "# Use 7 clusters folowing the same procedure than in ##codecell_SpatialPatterns_ClusterTextileTools_7types_WithContextSize \n", "# and let's drop the context size ('Shape_area')\n", "\n", "km7 = cluster.KMeans(n_clusters=7)\n", "km7cls4 = km7.fit(gabii_textools_counts.drop(['geometry', 'OBJECTID','DESCRIPTIO','Shape_Length','SU','Shape_Area'], axis=1).values)\n", "f4, ax = plt.subplots(1, figsize=(15,15))\n", "\n", "gabii_textools_counts.assign(cl4=km7cls4.labels_)\\\n", " .plot(column='cl4', categorical=True, legend=True, \\\n", " linewidth=0.1, cmap='Accent', edgecolor='white', ax=ax)\n", "\n", "ax.set_axis_off()\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 197 }, "colab_type": "code", "id": "yfkWHiOZ-BR1", "outputId": "2ca4fa92-3614-4b28-c65b-40e802eec904" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_AddLabelto7ClusterGroups\n", "\n", "# assign the cluster IDs to each context permanently just like in ##codecell_SpatialPatterns_AddLabelClusterstoDataframe\n", "\n", "# Let's set up to investigate some of the individual clusters\n", "gabiitextools_class3=gabiitextools_class.assign(cl3=km7cls3.labels_)\n", "gabiitextools_class4=gabiitextools_class3.assign(cl4=km7cls4.labels_)\n", "gabiitextools_class4.head()\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "b6CSTrxDxGD3" }, "source": [ "##### Let's compare the effect of changing cluster number (from 5 to 7) and area size to one cluster group/class #####" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 624 }, "colab_type": "code", "id": "_8MuyIpe-BR5", "outputId": "83ab6752-409a-446c-ce11-b9f37073caba" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_EffectofClusterNumber&AreaSizetoClustergroupN.0\n", "\n", "#we are following the same step than in ##codecell_SpatialPatterns_SubselectClustergroupN.0 but with our 7 cluster groups/classes\n", "# set up variables to store several classes, with and without context size taken into account.\n", "gabiitextools_class0=gabiitextools_class4.loc[gabiitextools_class4['cl']==0]\n", "gabiitextools_class0noarea=gabiitextools_class4.loc[gabiitextools_class4['cl2']==0]\n", "gabiitextools_k7_class0=gabiitextools_class4.loc[gabiitextools_class4['cl3']==0]\n", "gabiitextools_k7_class0noarea=gabiitextools_class4.loc[gabiitextools_class4['cl4']==0]\n", "\n", "# setting the subplots to the four dataframes, we have just identified above\n", "# this can be done by arranging the subplots like a 2X2 table/array with 'ncols=2' & 'nrows=2'\n", "fig, axes = plt.subplots(ncols=2,nrows=2,figsize=(15, 10))\n", "\n", "gabiitextools_class0.plot(ax=axes[0,0]).axis('equal')\n", "#note here the ax=axes defines the position of the figure in this 2X2 array (in this case is cell column 0, row0)\n", "axes[0,0].set_title(' 5 clusters - area')\n", "#for ease, we have added a title to the subplots\n", "\n", "gabiitextools_class0noarea.plot(ax=axes[0,1]).axis('equal')\n", "axes[1,0].set_title(' 5 clusters - no area')\n", "gabiitextools_k7_class0.plot(ax=axes[1,0]).axis('equal')\n", "axes[0,1].set_title('7 clusters - area')\n", "gabiitextools_k7_class0noarea.plot(ax=axes[1,1]).axis('equal')\n", "axes[1,1].set_title('7 clusters - no area')" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "FuUJafidw3Y_" }, "source": [ "##### Let's compare the effect of changing cluster number (from 5 to 7) and area size to other cluster groups/classes #####" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 624 }, "colab_type": "code", "id": "aoiyIy7o-BR8", "outputId": "452957d6-fc96-4b10-f4ce-ff670fb791aa" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_EffectofClusterNumber&AreaSizetoClustergroupN.3\n", "\n", "# now try some different cluster groups\n", "# we are choosing cluster no.3 this time\n", "\n", "#we are following our previous steps\n", "\n", "gabiitextools_class3=gabiitextools_class4.loc[gabiitextools_class4['cl']==3]\n", "gabiitextools_class3noarea=gabiitextools_class4.loc[gabiitextools_class4['cl2']==3]\n", "gabiitextools_k7_class3=gabiitextools_class4.loc[gabiitextools_class4['cl3']==3]\n", "gabiitextools_k7_class3noarea=gabiitextools_class4.loc[gabiitextools_class4['cl4']==3]\n", "\n", "\n", "fig, axes = plt.subplots(ncols=2,nrows=2,figsize=(15, 10))\n", "gabiitextools_class0.plot(ax=axes[0,0]).axis('equal')\n", "axes[0,0].set_title('5 clusters - area')\n", "gabiitextools_class0noarea.plot(ax=axes[0,1]).axis('equal')\n", "axes[1,0].set_title(' 5 clusters - no area')\n", "gabiitextools_k7_class0.plot(ax=axes[1,0]).axis('equal')\n", "axes[0,1].set_title('7 clusters - area')\n", "gabiitextools_k7_class0noarea.plot(ax=axes[1,1]).axis('equal')\n", "axes[1,1].set_title('7 clusters - no area')" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "vOd4CQew-BRw" }, "source": [ "
\n", " \n", " ### Learning a bit more about Cluster analysis \n", " \n", "\n", " \n", "Cluster analysis is an important statistical technique. While not the main focus of this course, it's worth learning more about it. \n", "\n", "We have used K-Means clustering because it presents the advantage to be fast, easy to understand and fairly robust and efficient; however, it has disadvantages too, such as the number of groups/classes selection (and as you have seen which isn’t always inconsequential; ideally, you'd want the algorithm to figure this out). Furthermore, K-Means clustering cannot deal with non-linear dataset and if there are highly overlapping data then k-means will not be able to resolve the differentiation between clusters. Best results can be achieved when datasets are distinct or well separated from each other.\n", "\n", " **Reminder: linear versus non-linear datapoints:**
\n", "
\n", "
\n", "
\n", " \n", "**Assessing spatial patterns: Degree of Clustering** Degree of Clustering is a measure of the degree to which nodes in a graph tend to cluster together. You can follow this [video](https://www.youtube.com/watch?v=K2WF4pT5pFY) to understand the concept behind clustering coefficient and below presents various degree of clustering results. \n", "
\n", "
\n", "**Assessing spatial patterns: Cluster Validity**\n", "
\n", "
\n", "\n", "K-means is not the only clustering technique. I encourage you to do some independent reading on different approaches to clustering. You may want to have a look at [this](https://scikit-learn.org/stable/modules/clustering.html#clustering) to start off. \n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "MgLtpOqX0k4f" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "## 4.8 Interpreting the results: Using association and relationships ####\n", "\n", "Although we can see differences, it is difficult to:\n", "\n", "* see them all\n", "* quantify them\n", "* express them\n", "\n", "Let's turn to Statitics to do this for us.
\n", "In *#codecell_SpatialPatterns_ImportUrLibraries*, you have imported [seaborn library](https://seaborn.pydata.org/generated/seaborn.pairplot.html) to do just that.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "bhAjsR6Y4Fj0" }, "source": [ "
\n", " \n", " ### Learning a bit more about Correlation \n", " \n", "\n", "\n", "Correlations are statistical associations between two variables. When we say that two variables are associated, we are suggesting that they vary (change) together. This implies that the processes that affect these variables may be related. We try to detect correlation to understand the relationship between two variables.\n", "\n", "For example: should we expect the pattern of loom weights to be similar to the pattern of spindle whorls? should we expect them to be correlated?\n", "\n", "\n", "Note that: **Correlation does not imply causation!** That is, the pattern of loom weights does **not** cause or create the pattern of spindle whorls.
\n", "
\n", "* Correlation tests describe the statistical relationship between two different variables, in contrast to comparison tests which evaluate the differences between variables (e.g. t-test, sign test, Mann-Whitney). \n", "* Another important characteristic of the correlation statistics is that they do not only identify relationships but they characterise both the strength and the form of these relationships. \n", "* The robustness of the correlation calculations is, again, evaluated in terms of statistical significance using the p-value. \n", "\n", "The most frequently used Parametric correlation statistic is the **Pearson’s correlation r**. Pearson’s r tests if there is a linear relationship between two variables, and can only range between -1 and 1, where -1 indicates perfect negative **linear** (one to one) correlation and +1 indicates the perfect positive linear correlation (see Figure below). \n", "\n", "|**Examples of Pearson's correlation coefficient (r) for different degrees of linearity**|\n", "|------|\n", "|
|\n", "\n", "
\n", "Note that besides the sign of the correlation (positive or negative) the key term here is the term linear. We can only argue about linear relationships when we apply **parametric** correlation. A general **Non-Parametric** correlation statistic is the Spearman’s ρ (rho) correlation.\n", "\n", "|**Examples of Pearson's correlation coefficient (r) for different degrees of linearity**|\n", "|------|\n", "|
\n", "\n", "
\n", "**Spatial covariation and implied correlation**\n", "\n", "|**Examples of positive and negative covariance**|\n", "|------|\n", "|
\n", "
\n", "As you can see, a fundamental concept is hidden behind these types of statitistical tests, and, in fact behind all statistics, which is parametric vs. nonparametric distributions... you can start finding more about this important set of statistical concepts [here]( https://blog.minitab.com/blog/applying-statistics-in-quality-projects/a-correspondence-table-for-non-parametric-and-parametric-tests) and [here]( https://machinelearningmastery.com/how-to-use-correlation-to-understand-the-relationship-between-variables/).\n", "
\n", "\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.8.1 Effect of Cluster numbers ##\n", "\n", "We will be using [seaborn.pairplot]( https://seaborn.pydata.org/generated/seaborn.pairplot.html) to visualise the relationships between our variables. A pairs plot allows us to see both distribution of single variables and relationships between two variables. You can learn a bit more about it in [Will Koehrsen's blog]( https://towardsdatascience.com/visualizing-data-with-pair-plots-in-python-f228cf529166)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 544 }, "colab_type": "code", "id": "kkk4vCFv-7ur", "outputId": "beb70df6-c635-4ab9-cac1-2c5740d6b56f" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_EffectofClusterNumber\n", "\n", "# Do 7 clusters vs. 5 result in more correlation between the spatial distribution patterns of pairs of find types?\n", "# we are simply using sns.pairplot function to visualize relationships (correlations) between the find types and the clusters they appear in. \n", "# We are asking: do certain pairs of variables appear in the same clusters most of the time? \n", "\n", "sns.pairplot(gabiitextools_k7_class0.drop(['OBJECTID','DESCRIPTIO','Shape_Length','Shape_Area','SU','geometry','cl','cl2','cl3','cl4'], axis=1), kind=\"reg\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.8.2 Effect of Cluster numbers-II ##" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 544 }, "colab_type": "code", "id": "YZe1WBJ_-7M-", "outputId": "495df1df-0fe7-481c-cac3-0724044ba349" }, "outputs": [], "source": [ "##codecell_SpatialPatterns_EffectofClusterNumber\n", "\n", "# Are some cluster classes/groups more correlated than others?\n", "sns.pairplot(gabiitextools_class0.drop(['OBJECTID','DESCRIPTIO','Shape_Length','Shape_Area','SU','geometry','cl','cl2','cl3','cl4'], axis=1), kind=\"reg\")\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "e0gT92Av-U7U" }, "source": [ "## 4.9 What can you conclude? ##\n", "\n", "Is there strong positive or negative correlation between pairs of variables (textile tools and the clusters they end up in)? What might this say about the processes behind these patterns? \n", "\n", "Statistically assessing correlation is a good starting point for understanding patterns in your data. \n", "\n", "Because this isn't a stats course, we're not taking this forward any further today. But it's worth noting that if you want to do spatial analysis properly at some stage you will need to learn a bit about statistics. \n", "\n", "**That concludes this practical**\n", "\n", "Hopefully you have:\n", "* started thinking (and perhaps are a bit confused) about how spatial patterns of different types of finds are created, and how we can interpret them when studying data from an excavation.\n", "* learned to combine spatial data and descriptive tables. \n", "* learned to use some basic clustering tools, and reinforced your knowledge about how to make charts and maps. " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "XNeDT5daQxgm" }, "source": [ "## 4.10 LexiCode\n", "To re-use the codes - you will need to first load their respective libraries. So far, you have used ...:\n", "\n", "\n", "
\n", "\n", "> **libraries** | | |\n", ">--- |--- | --- | \n", ">folium | numpy | \n", ">branca| rtree | \n", ">pandas| osmnx| \n", ">geopandas| requests | \n", ">seaborn | fiona| \n", ">matplotlib.pyplot | ipywidgets|\n", "> pysal |seaborn |\n", "> \n", "
\n", "\n", " **plugins**| |\n", "--- |--- |\n", "HeatMapWithTime\n", "HeatMap\n", "MeasureControl\n", "PrepareUrBasemaps_CreateLayers from [folium.plugins]\n", "cluster (from sklearn)\n", "\n", "
\n", "\n", "your lexicode is non-exhaustive, keep expanding, find your own 'best way' to reuse these code/scripts...\n", "\n", "
\n", "\n", ">Lexicode_MakingaBasicMap | Lexicode_Webmaps&Distributions |Lexicode_StreetGridOrientations | Lexicode_SpatialPatterns\n", ">--- | --- | ---|---|\n", ">\t== () [] | pd.concat() | { } *subselection from list*|\n", ">.head_csv() | .dtype() | ox.gdf_from_places()|requests.get()|requests.get()\n", ">.read_csv() | astype() | ox.plot_shape()|request.content()\n", ">mean() | fillna()|network_type= ''|.bytes()\n", ">folium.Map | def return |ox.add_edge_bearings(ox.get_undirected())|gpd.GeoDataFrame.from_features()\n", ">range() | .apply(lambda x:*function*,axis=) |count_and_merge()|Set()\n", ">len() | pd.merge() |np.arrange()|pd.value_counts() \n", ">iloc[]| how= , left_index= ,left_index= |np.histogram()|.merge()\n", ">.value_counts()| gpd.GeoDataFrame()| ax.set_theta_location()|.sort_values\n", ">if =:| geometry=gpd.points_from_xy |ax.set_ylim()|cluster.KMeans()\n", ">elif =: |print() |ax.set_title()|.fit()\n", ">else =:| .isin()|ax.set_yticks()|.drop()\n", ">folium.Marker()| classic.plot()|ax.set_xlabels() & ax.set_yticklabels|.assign()\n", ">folium.Icon()| generateBaseMap()|plt.subplots()|plt.show()\n", ">folium.Circle| .groupby(['', ''])|.dropna()|.set_title\n", ">popup= | .reset_index() |polar_plot()|sns.pairplot()\n", ">radius= | max_zoom= |pd.Series()|\n", ">.values.tolist() |folium.TileLayer()|np.pi|\n", "> .add_to()| plugins.DualMap(location= , tiles= , zoom_start= )|\n", "> | \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.11 References\n", "\n", "Fortin, M., & Dale, M. (2005). Spatial Analysis: A Guide for Ecologists. Cambridge: Cambridge University Press." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "L3bffymKpXqt" }, "source": [ "# Chapter 5: Land use and archaeological visibility \n", "\n", "## 5.1 Introduction: Exercise motivations\n", " \n", " In this exercise, we'll be relating learning to access information about the environment or landscape in a raster format. We'll then be relating the locations of visible sites to various environmental factors.
So far in this course you've been working mostly with vector data (points, lines, polygons) but now you're going to work with rasters as well. Environmental data is often held in a raster format because the data is continuous (see *Raster versus Features* below for more details). Elevation is everywhere, right? So if you have a big image file (a raster) you can have a value for elevation in pretty much every pixel of that file. \n", "\n", "Therefore this week practical lab aims to use rasters to explore (and investigate of course):\n", "* **environmental properties of landscapes** using image rasters\n", "* **physical properties of landscapes** using attribute rasters\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "9aWKNSMdnE6A" }, "source": [ "## 5.2 Rasters versus Features ##\n", "\n", "In #*codecell_StreetGridOrientations_geodataframe*, we have started to talk about vector feature data and why topology characterisation was important in geographical information systems. Vectors are not the only way to represent the world and what's in it, raster is another format used ... because:\n", "*\tDifferent representation is more suitable for different types of data\n", "*\tDifferent representation is more suitable for different types of data processing and analysis \n", "\n", "> ||\n", ">--- |--- |\n", ">|\n", ">
|\n", ">
|\n", ">
|\n", "\n", "There are different **types** of raster data Image rasters and attributes rasters as illustrated below:\n", "> ||\n", ">--- |--- |\n", ">Image raster (geology.com)|Attributes rasters (met.office)|\n", ">
|
|\n", "\n", "A raster format file is composed of **cells** where each each cell have a **pixel** value (=picture element and relates to resolution & to their position on the surface) and a **grid** cell value (size in units) and each cell has **attributes** stored in the array. For instance, this attribute can relate to spectral reflectance (ranging from 0-255), colour (RGB is 3 attributes) or numeric values (which can be associated with a full description through a join with a Look-Up_Code -LU_code- such as a reference tables for Land Use map).

\n", "\n", ">|*Raster Characteristics*|*Example of a Raster Array composition*|\n", ">|--- |--- |\n", ">|
|
\n", "\n", "**Raster Chararcteristics** \n", "* Space divided into cells of the same size (tessellation)- usually square\n", "* Size of cell determines resolution\n", "* Feature location defined by position within grid (row, column) - usually implicit\n", "* Cells homogenous with respect to attribute\n", "* Grids represent single attribute\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | | \n", "| ------------- | ------------- |------------- |\n", "|
| = |
|\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 344 }, "colab_type": "code", "id": "LIdl_WyK_FSH", "outputId": "60bf2d9a-8633-4570-eb36-8d80a9770c4b" }, "outputs": [], "source": [ "##codecell_RasterLandscape_ImportUrLibraries\n", "\n", "!pip install rasterio\n", "!pip install elevation\n", "!pip install richdem" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "UIm-91A4vZg4" }, "source": [ "
\n", "\n", "\n", "\n", "In *#codecell_RasterLandscape_ImportUrLibraries*, we are using reference data for this practical which include:\n", "* *rasterio* reads and writes raster formats to manipulate them and creates new ones ([rasterio ref](https://0rasterio.readthedocs.io/en/stable/quickstart.html)); \n", "* *richdem* are sets of digital elevation model or digital representaion of elevation heights in a three-dimensional space/landscape ([richdem ref](https://richdem.readthedocs.io/en/latest/)); \n", "* *elevation* provides the height data ([pypi ref](https://pypi.org/project/elevation/) & [elevation intensive tutorials](https://www.earthdatascience.org/tutorials/python/elevation/)).\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 53 }, "colab_type": "code", "id": "ToBCOWiGArka", "outputId": "b1e88199-5166-4ff3-cc67-4b9d46f7288b" }, "outputs": [], "source": [ "##codecell_RasterLandscape_ImportUrLibraries\n", "\n", "import rasterio \n", "# import the main rasterio function\n", "#read, read-write, and write access to raster data files\n", "\n", "import matplotlib \n", "# We have use this library lab4SpatialPatterns\n", "# matplotlib is the primary python plotting and viz library\n", "\n", "%matplotlib inline \n", "# this bit of magic allows matplotlib to plot inline in a jupyter notebook\n", "\n", "\n", "# We can check which version we're running by printing the \"__version__\" variable\n", "print(\"rasterio's version is: \" + rasterio.__version__)\n", "print(rasterio)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "F7CtijN0poYo" }, "source": [ "## 5.3 Raster dataset to inform on the landscape environmental properties: Soil Type ##\n", "\n", ">|Implications of spatial detail of land use / land cover? Implications of temporal detail of land use / land cover mapping episodes?|Land cover indicates the physical land type such as forest or open water whereas land use documents how people are using the land|\n", ">|--- |--- |\n", ">|
| Why do archaeologists care about land cover and land use? - Links past and present human activity to vegetation and soil conditions Supports spatially explicit modelling of past environments - Provides insights into archaeological preservation and visibility|\n", "\n", " \n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "yBptREpClWjn" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "### 5.3.1 Let's open a raster image ###" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "colab_type": "code", "id": "0Y7Sr4pTBd3N", "outputId": "8b4aa89d-5441-4db3-9620-6a64f87898be" }, "outputs": [], "source": [ "##codecell_RasterLandscape_OpenImage\n", "\n", "# filepath to our image\n", "img_fp = 'http://ropitz.github.io/digitalantiquity/data/LE70220492002106EDC00_stack.gtif'\n", "\n", "# Open a geospatial dataset\n", "# open() handles your file\n", "# this function allows you to open a file and return it as a file object\n", "dataset = rasterio.open(img_fp)\n", "print(dataset)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "UFmF6nmnEebY" }, "source": [ "### 5.3.2 Which kind of Raster it is? ###\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "\n", "Let's learn some basic things about the image data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 455 }, "colab_type": "code", "id": "9u2F3TZJD7mj", "outputId": "25d58980-b192-4bd3-cc32-5d675361586b" }, "outputs": [], "source": [ "##codecell_RasterLandscape_ImageCharacteristics\n", "\n", "# what is the name of this image\n", "img_name = dataset.name\n", "print('Image filename: {n}\\n'.format(n=img_name))\n", "\n", "# How many bands does this image have?\n", "num_bands = dataset.count\n", "print('Number of bands in image: {n}\\n'.format(n=num_bands))\n", "\n", "# How many rows and columns?\n", "rows, cols = dataset.shape\n", "print('Image size is: {r} rows x {c} columns\\n'.format(r=rows, c=cols))\n", "\n", "# Does the raster have a description or metadata?\n", "desc = dataset.descriptions\n", "print('Raster description: {desc}\\n'.format(desc=desc))\n", "\n", "# What driver was used to open the raster?\n", "driver = dataset.driver\n", "print('Raster driver: {d}\\n'.format(d=driver))\n", "\n", "# What is the raster's projection?\n", "proj = dataset.crs\n", "print('Image projection:')\n", "print(proj, '\\n')\n", "\n", "# What is the raster's \"geo-transform\"\n", "gt = dataset.transform\n", "\n", "print('Image geo-transform:\\n{gt}\\n'.format(gt=gt))\n", "\n", "# Does the raster have a metadata?\n", "metadata = dataset.meta\n", "print('All raster metadata:')\n", "print(metadata)\n", "print('\\n')" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "B4OsIqTXDOk3" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "\n", "In *#codecell_RasterLandscape_OpenImage*, rasterio allows you to open the raster via the function .open() and returns it as Dataset object.
\n", "\n", "In *#codecell_RasterLandscape_Characteristics*, you wanted to know a bit more about this raster, in particular its **characteristics** & **metadata** by calling information on:\n", "\n", "* the image name, the number of bands and the image size which is a description of the array and its number of rows and columns:
\n", "\n", "\n", "
\n", "
\n", "\n", "* the description of raster's file format. here, this is a description of the 8 bands (*see above from the command dataset.count*):\n", " \n", "\n", "
\n", "
\n", "\n", "* the image's projection:\n", " \n", "\n", "
\n", "
\n", "EPSG:32615 - is world projection used principally for the north Hemisphere - 96oW to 90oW (red box in image below) & in large and medium scale topographic mapping and engineering survey. This [webpage](https://spatialreference.org/ref/epsg/wgs-84-utm-zone-15n/) provides you with other possible formatting/coding option to use this projection in other software.
\n", "
\n", "
\n", "\n", "* by calling .dataset.transform, you are asking how the orignal image pixels have been transformed to the projected coordinates:\n", " \n", "\n", "
\n", "
\n", "\n", "*what does this mean?* \n", " \n", "

\n", "\n", "
\n", "You can see in the answer from .dataset.transform that is very much like what you've experienced in lab3_StreetGrid_Orientation with the north-up transformation when we needed to rearrange the street axes to run northward (*##codecell_StreetGridOrientations_RearrangeAxes*) and what you have seen in lab2 especially in *#codecell__Webmaps& Distributions_CheckingZeData*). In our case, the results are showing an affine transformation:\n", "
\n", "\n", "
\n", " \n", "
\n", "...but, there are other way to transform an image, this could be by using conformal or polynomial transformation for a 2D projection(x&y axes) and you see below what 2D transformation can do to your image (if you want a more detailed explanation on how and why coordinates are transformed [here]( https://kartoweb.itc.nl/geometrics/Coordinate%20transformations/coordtrans.html ) :\n", " \n", "

\n", "\n", "
\n", "\n", "* Finally, we are calling for the raster metadata, which is the important part of your data that you will need to provide:\n", "\n", "\n", "
\n", "\n", "---\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "kA5qzNQMEp1N" }, "source": [ "\n", "### 5.3.3 Image raster bands: let's visualise ###\n", "That's is the fun part...\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Sqct6oc0ZA_v" }, "source": [ "
\n", " \n", " #### Learning a bit more about image bands \n", " \n", "\n", "\n", " \n", "**But what are image bands?**\n", " \n", "Raster dataset contains one or more layers called bands. For instance, a colour image can have three bands using RBG, with 1 band Red,1 band Green, and 1 band Blue:
\n", "
\n", "
\n", " \n", "
An image can also be a multispectral image with many bands such as remote sensing data which is a recombination of the electromagnetic spectrum to create a multiband raster data. This is where every cell location has more than one value associated with it. You might have heard of Near Infrared (NIR), well these raster images have 8 bands and they come with each band as a separate file (bands 2,3,4 and 8 which are Blue, green, red, and NIR respectively) (see more [here](https://www.usgs.gov/faqs/what-are-band-designations-landsat-satellites?qt-news_science_products=0#qt-news_science_products)). Infrared (and ultraviolet) are that bit of the spectrum we cannot see with our own eyes.

\n", "
\n", "
\n", "
\n", "
\n", "*You can learn more how band combinations works in [this](http://biodiversityinformatics.amnh.org/interactives/bandcombination.php) interactive platform using 3 types of remotely sensed images (MODIS, ASTER and Landsat)*.

\n", "And, an image can also have one band can be a digital elevation model (DEM) where the cell holds elevation values.\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "EqqU-JX7Jcrp" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "The rasterio dataset object we've created contains a lot of useful information but it is not ready to be read in the raster image. Instead, we will need to access the raster's bands using the read() ) method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "colab_type": "code", "id": "UdPP46lfEqbg", "outputId": "be3b00eb-93e3-4c24-fad0-72f1cc136e10" }, "outputs": [], "source": [ "##codecell_RasterLandscape_CheckShapOfBands\n", "\n", "# Let's open fourth band in our image\n", "# read() function that allow us to specify a subset of the raster band in the brackets ()\n", "nir = dataset.read(4)\n", "\n", "# Using function .shape we want to check out the dimensions of the image\n", "\n", "nir.shape \n", "# #same as in ##codecell_RasterLandscape_ImageMetadata \n", "# by using .read() for all bands, the answer should have been (8, 250, 250)\n", "# .shape answer provides you with # bands, rows, cols\n", "# but images can be very very large files, so it is better to do this bit by bit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | \n", "| ------------- | \n", "|
|\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "sKrcNQ-cFCqn" }, "outputs": [], "source": [ "##codecell_RasterLandscape_ImportUrLibraries\n", "\n", "#we get numpy again\n", "# because raster tends to be large(heavy) files, \n", "#numpy allows to represent the data very efficiently without using a lot of computer memory \n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 90 }, "colab_type": "code", "id": "G3huVdTmFFGX", "outputId": "edbe346b-ac73-4133-8759-703b61befee2" }, "outputs": [], "source": [ "##codecell_RasterLandscape_BandsDataType\n", "\n", "# What are the band's datatypes?\n", "# remember we have used the dtype function in lab2Webmaps&Distribution\n", "datatype = dataset.dtypes\n", "print('Band datatypes: {dt}'.format(dt=datatype))\n", "\n", "# How about some band statistics?\n", "band_mean = np.mean(nir)\n", "band_min = np.amin(nir)\n", "band_max = np.amax(nir)\n", "band_stddev = np.std(nir)\n", "print('Band range: {minimum} - {maximum}'.format(maximum=band_max,\n", " minimum=band_min))\n", "print('Band mean, stddev: {m}, {s}\\n'.format(m=band_mean, s=band_stddev))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "l-K8zVywQ8eo" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "In *#codecell_RasterLandscape_BandsDataType*, we are asking numpy (np) to:\n", "* tell us which ( datatype ) are the bands. the answer is *'int16'* which is an integer of a storage capacity of 16 bit. Each type of integer has a different range of storage capacity:\n", "\n", " Type -- Capacity\n", " \n", " Int16 -- (-32,768 to +32,767)\n", "\n", " Int32 -- (-2,147,483,648 to +2,147,483,647)\n", "\n", " Int64 -- (-9,223,372,036,854,775,808 to +9,223,372,036,854,775,807)\n", "\n", "* provide us with the mean ( np.mean() ), minimum ( np.amin() ), maximum ( np.amax() ) and its standard deviation ( np.std() ). So, this is just like asking how the data (band no.4) is spread really:\n", "\n", "\n", "\n", "and how the data is distributed:\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "colab_type": "code", "id": "oOl-BVwVFIna", "outputId": "bd0d540c-ae9f-4304-fece-15a5eb0e3035" }, "outputs": [], "source": [ "##codecell_RasterLandscape_ImageDisplay\n", "\n", "#read in the image \n", "full_img = dataset.read()\n", "\n", "#we are checking the .shape of the image (do I have all my bands? See ##codecell_RasterLandscape_CheckShapOfBands)\n", "full_img.shape \n", "\n", "# we are adding a wee bit of rasterio library here \n", "# import the show function which allows us to display the image\n", "from rasterio.plot import show \n", "\n", "#then print the nir band with the whole image shape\n", "#we would like to have a grayscale map (cmap see also ##codecell_SpatialPatterns_PlottingUrData)\n", "print(\"Image dimensions: \", full_img.shape)\n", "show(nir, transform=dataset.transform, cmap='gray')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "_gfI3SxPFV37" }, "outputs": [], "source": [ "##codecell_RasterLandscape_ImportUrLibraries\n", "\n", "#more tools\n", "import matplotlib # matplotlib is the primary python plotting and viz library\n", "import matplotlib.pyplot as plt\n", "\n", "# this bit of magic allows matplotlib to plot inline in a jupyter notebook\n", "#see also lab4_spatial patterns\n", "%matplotlib inline \n", "import folium # folium is an interactive mapping library\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "VOxbNdV_qEAC" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "\n", "## 5.4 Raster dataset & physical properties of landscape: Vegetation ## \n", "Some raster datasets will only have one band. These behave a bit like a black and white image. Rasters can contain as many bands as you like, and single band images can be combined into multiple band images (see above in *But what are image bands?*). Multiple band images are useful because you can calculate ratios between the different bands to enhance the visibility of certain features in your image." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "nVfS6aLEdsRO" }, "source": [ "### 5.4.1 Let's visualise all bands ###" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "colab_type": "code", "id": "k9fQgZ3dFiLa", "outputId": "bf68d04e-84c3-4f19-d902-d42e044910e4" }, "outputs": [], "source": [ "##codecell_RasterLandscape_RasterCombiningBands\n", "\n", "# to make a multiple band raster add all the file paths to a list that we call s2_bands\n", "s2_bands = [\n", " \"http://ropitz.github.io/digitalantiquity/data/sentinel-2/2018-10-13_Sentinel-2BL1C_B02.tiff\",\n", " \"http://ropitz.github.io/digitalantiquity/data/sentinel-2/2018-10-13_Sentinel-2BL1C_B03.tiff\",\n", " \"http://ropitz.github.io/digitalantiquity/data/sentinel-2/2018-10-13_Sentinel-2BL1C_B04.tiff\",\n", " \"http://ropitz.github.io/digitalantiquity/data/sentinel-2/2018-10-13_Sentinel-2BL1C_B08.tiff\"\n", "]\n", "\n", "# open these files and add all bands together \n", "# and you are setting an argument 'for' 'in' 'with' 'as'\n", "# in new dataframe name for this list => arrs [] \n", "# in which we are appending the files\n", "\n", "arrs = []\n", "for band in s2_bands:\n", " with rasterio.open(band) as f:\n", " arrs.append(f.read(1))\n", "\n", "# convert the list of tiffs(our multiuple bands appended into 1 list = arrs) to a numpy array\n", "# if you forgot what was an array you simply revisit decomposing the code of #codecell_Webmaps&Distributions_AllNumbers \n", "sentinel_img = np.array(arrs, dtype=arrs[0].dtype)\n", "\n", "# let's check the shape of this array\n", "sentinel_img.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 237 }, "colab_type": "code", "id": "_UYq2o5tIOjD", "outputId": "a49b550b-74ad-4ac8-bf47-1ccfcf63fdfe" }, "outputs": [], "source": [ "##codecell_RasterLandscape_MultibandRasterClipped&View\n", "\n", "#clip to get a smaller area and show the multiple band image data\n", "#this is done by setting the extents of the y and x axis to the desired (here smaller)extent \n", "#using our subset special symbol []\n", "clipped_img = sentinel_img[:, 0:750:, 0:1500]\n", "clipped_img.shape\n", "show(clipped_img[[2,1,0], :, :])\n", "#we only asked to plot the first 3 bands RGB [3,2,1]" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "hvU31vVSLj8S" }, "source": [ "
\n", " \n", " ###~ Learning a new language – decomposing the code \n", " \n", "\n", "\n", "In *#codecell_RasterLandscape_MultibandRasterClipped&View*, you have clip the image by using the subset [ ] , you then asked rasterio to show() and plot() only the raster's first three bands so it will be an RGB image." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "od_q4SMLqefD" }, "source": [ "\n", "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "### 5.4.2 Let's check what's going on in each band by plotting an histogram ###\n", "\n", "You can see the different information held in each band by plotting a histogram of the raster dataset pixel values." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 295 }, "colab_type": "code", "id": "3WrVkXQfIbIB", "outputId": "2f9d0765-664c-4f20-b2bf-e0fcbc36f24f" }, "outputs": [], "source": [ "##codecell_RasterLandscape_MultibandRasterHistogram\n", "\n", "# plot a histogram of the data in each band\n", "#revisit your lab SpatialPatterns for further info on how to plot with matplotlib\n", "rasterio.plot.show_hist(clipped_img, bins=50, histtype='stepfilled', lw=0.0, stacked=False, alpha=0.3)\n", "\n", "#here note bin= corresponds to the interval used between pixel values " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "XEFGBBRjIie7" }, "source": [ "Even from simple visualisations, we can see the contrast between the red and the near-infared (NIR) bands. In the histogramm, the peak at 255 is simply from areas with no data (labelled 255 on all bands). \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "colab_type": "code", "id": "D3tnxTZOIjcz", "outputId": "bf8107ed-c76a-45e1-e456-f3ad1e157704" }, "outputs": [], "source": [ "##codecell_RasterLandscape_MultibandRasterNodata\n", "\n", "#we can ckeck where on the image are the cell with no value\n", "# we use the clipped image and subset to only [:0,0] data\n", "clipped_img[:,0,0]" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "o7IIARF0ql8n" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "### 5.4.3 Let's interpret what's going on in each band ####\n", "\n", "Different band ratios will highlight different features in a raster dataset. We might be interested in finding places where cropmarks are located, for example. \n", "\n", "There are lots of band ratios that will highlight more green and more stressed vegetation. A common ratio that does this is called **NDVI- the normalized difference vegetation index**.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "*How does this work?* \n", "
\n", "The density of vegetation (NDVI) at a certain point of the image is equal to the difference in the intensities of reflected light in the red and infrared range divided by the sum of these intensities. NDVI formula is based on how much red and NRI is present in the plant leaves, and can be formulate as follows:\n", "\n", "\n", " \n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "Iu5JDFItIm51" }, "outputs": [], "source": [ "##codecell_RasterLandscape_NDVI\n", "\n", "# let's calculate NDVI\n", "# but we want to make sure that to ignore errors due to NaN pixels \n", "#you remember these... revisit in #codecell__Webmaps&Distributions_ChangingDataType\n", "\n", "np.seterr(divide='ignore', invalid='ignore')\n", "\n", "bandNIR = clipped_img[3] # fourth band\n", "bandRed = clipped_img[2] # second band\n", "\n", "#Let's now apply the NDVI formula\n", "# note that in python division of integers leads to integers\n", "# so we need to specify (.astype) floats in order to get floats for NDVI results\n", "# if you forgot what this means, revisit #codecell__Webmaps&Distributions_ChangingDataType\n", "\n", "ndvi = (bandNIR.astype(float)-bandRed.astype(float))/(bandNIR.astype(float)+bandRed.astype(float))\n", "#we now have a simple matrix" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 252 }, "colab_type": "code", "id": "-xhq-V7QIxYW", "outputId": "50b81aac-9c0a-41b4-c068-5f061612076a" }, "outputs": [], "source": [ "##codecell_RasterLandscape_showNDVI\n", "\n", "#show NDVI\n", "\n", "#here we are using matplotlib instead of the rasterio \n", "# because we are now dealing with a matrix\n", "plt.imshow(ndvi, cmap=\"RdYlGn\")\n", "plt.colorbar()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "eWjqd5osVtzb" }, "source": [ "There is a lot of red for the water and green for the vegetation, so the formula looks like to apply correctly." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "dydmvQsLd0c8" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "## 5.5 Raster dataset & physical properties of landscapes: Topography ## " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "_I-eT9OiM8Ng" }, "source": [ "In this second part of the exercise, we'll be calculating some properties of the landscape like **slope** and **aspect** that are commonly used in analyses and models of where sites are likely to be visible.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 271 }, "colab_type": "code", "id": "6VbXF4Dn_crq", "outputId": "89a078ff-c350-46c0-b399-32c502ef3476" }, "outputs": [], "source": [ "##codecell_RasterLandscape_ImportUrLibraries\n", "\n", "#let's work with geopandas again\n", "!pip install geopandas" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "YwXML27MJsxW" }, "outputs": [], "source": [ "##codecell_RasterLandscape_ImportUrLibraries\n", "\n", "#more tools. all the tools really!\n", "from shapely.geometry import mapping\n", "from shapely.geometry import Point\n", "from rasterio.mask import mask\n", "import geopandas as gpd\n", "import zipfile\n", "import io\n", "import requests\n", "import elevation\n", "import richdem as rd\n", "import gdal\n", "import os" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "9oC0ip5sa9ss" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "### 5.5.1 Attribute raster: elevation dataset ###\n", "\n", "And now get some data to start the second part of the exercise- a raster dataset representing elevation referred to as a **DEM** or 'digital elevation model'. \n", "\n", "You may want to revisit Chap 3-3 showing how data can be interpolated. However, there are many ways (>40! with the most common IDW, Spline, Trend Surface, Natural Neighbour, Kriging) to interpolate elevation points to create a surface.
\n", "\n", "| | | | | \n", "| ------------- |-------------|-------------|-------------|\n", "|||||\n", "| The raw point cloud contains terrain and vegetation and surface objects|The filtered point cloud contains only selected classes of points, like terrain points|The bare earth terrain model is made from the classified point cloud|visualizations are created based on the bare earth terrain model| \n", "\n", "*(Images: Opitz, Nuninger, and Fruchart. Data: MSHE C.N. Ledoux)*" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "GPf59uViMWjt" }, "outputs": [], "source": [ "##codecell_RasterLandscape_GetaDEM&clipIt\n", "\n", "#fetch elevation data from the SRTM server and clip to our area of interest\n", "dem_path = os.path.join(os.getcwd(),'areaDEM.tif')\n", "elevation.clip(bounds=(5.1, 43.65, 5.5, 43.95), output=dem_path)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 257 }, "colab_type": "code", "id": "xtyeWz48kCUE", "outputId": "0720d74e-8338-41fb-b70f-ab2a7aaccb5e" }, "outputs": [], "source": [ "##codecell_RasterLandscape_LetsPlotDEM\n", "\n", "#check the DEM has loaded nicely by plotting it\n", "areaDEM = rd.LoadGDAL(dem_path)\n", "plt.imshow(areaDEM, interpolation='none')\n", "plt.colorbar()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "ktgtma3coyVs" }, "source": [ "\n", "\n", "### 5.5.2 What can we do with a DEM? ##\n", "\n", "DEMS are useful for 3D visualisation, mapping purposes and various other applications, and, can also derive additional types of topographical models from a DEM such as slope, aspect, hillshape, 3d surface views, visibility analysis.\n", "\n", "DEMs can be then analysed:\n", "\n", "| DEM analysis | 3D Analysis | \n", "| ------------- |-------------|\n", "|
|
|\n", "\n", "\n", "\n", "\n", "We will compute and plot in this practical lab **slope** and **aspect**. \n", "\n", "***Why use a Slope model?***\n", "*\tPredicting landslide risk\n", "*\tChoosing optimal alignment for new roads\n", "*\tDesigning drainage networks\n", "*\tLand suitability (e.g. residential construction, commercial forest plantation)\n", "*\tAssessing vulnerability to coastal erosion\n", "*\tOptimising route for hiking, cycling, walking.\n", "\n", "***What can we use an Aspect Model for?***\n", "*\tChoosing optimal location for a wind farm\n", "*\tDesigning layout of new housing developments\n", "*\tDefining slopes suitable for commercial crops\n", "*\tInferring soil moisture content and land slip risk (e.g. north and west facing slopes in Scotland generally wetter)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "i-p2Bq6prCTS" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "### 5.5.3 Slope mapping###" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 393 }, "colab_type": "code", "id": "YmnSX3kUhG88", "outputId": "17b2a70c-5ee7-4d09-b6a5-eb71fe864b70" }, "outputs": [], "source": [ "##codecell_RasterLandscape_DEMSlope\n", "\n", "slope = rd.TerrainAttribute(areaDEM, attrib='slope_riserun')\n", "rd.rdShow(slope, axes=False, cmap='magma', figsize=(8, 5.5))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "b3IHT7jEoenx" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "\n", "In *#codecell_RasterLandscape_DEMSlope*, you have used rd.TerrainAttribute function to compute the **slope** for each pixel containing the levation value and used rdShow function to visualise the slope map.\n", "
\n", "There are multiple ways to represent the slope values, you read richdem docs (see lexicode for a link) for more options!\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "BTH_cIYurHZo" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "### 5.5.4 Aspect mapping ###" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 392 }, "colab_type": "code", "id": "4RNQ4_gPpFbm", "outputId": "699fa948-69ae-483a-8682-9c68b81e600b" }, "outputs": [], "source": [ "##codecell_RasterLandscape_DEMAspect\n", "\n", "# now do the same thing to calculate and plot the aspect data\n", "aspect = rd.TerrainAttribute(areaDEM, attrib='aspect')\n", "rd.rdShow(aspect, axes=False, cmap='jet', figsize=(8, 5.5))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "tROFCIP0pycv" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "\n", "In *#codecell_RasterLandscape_DEMAspect*, you have used rd.TerrainAttribute() function to compute the **aspect** for each pixel containing the levation value and used rdShow() function to visualise the aspect map.\n", "
" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "rMCr440t087F" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "### 5.5.4 Contours mapping ###\n", "Contours, a common way to visualize elevation data, can also be derived from a raster.\n", "\n", "*visual representation of Iceland’s elevation levels with contours published by CartoDB*
\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "b4Tf20aB0D-H" }, "outputs": [], "source": [ "##codecell_RasterLandscape_DEMrastertoArray\n", "\n", "#this script is using gdal library (see lexicode)\n", "#to avoid having to bring your raster block by block\n", "\n", "\n", "# we need to open the raster file using gdal library\n", "#and essentially let gdal read what is in our 1 band DEM\n", "gdal_data = gdal.Open(dem_path)\n", "gdal_band = gdal_data.GetRasterBand(1)\n", "nodataval = gdal_band.GetNoDataValue()\n", "\n", "# let's convert to a numpy array\n", "data_array = gdal_data.ReadAsArray().astype(np.float)\n", "data_array\n", "\n", "# replace missing values if necessary\n", "# you knopw this by now ... we are talking about NaN values (the very annoying ones)\n", "if np.any(data_array == nodataval):\n", " data_array[data_array == nodataval] = np.nan" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 472 }, "colab_type": "code", "id": "Pzt-Ih8Xz7VB", "outputId": "a416b1f6-90b5-4409-bd49-de9b020ec45a" }, "outputs": [], "source": [ "##codecell_RasterLandscape_DEMrasterPlotCountoursLines\n", "\n", "#Plot out data using Matplotlib tool 'contour'\n", "#the script layout is very similar to last week's practical on spatialpatterns\n", "\n", "fig = plt.figure(figsize = (12, 8))\n", "ax = fig.add_subplot(111)\n", "plt.contour(data_array, cmap = \"viridis\", \n", " levels = list(range(0, 5000, 100)))\n", "plt.title(\"Elevation Contours\")\n", "cbar = plt.colorbar()\n", "plt.gca().set_aspect('equal', adjustable='box')\n", "plt.show()\n", "\n", "# and here we go!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 476 }, "colab_type": "code", "id": "SYuAf_L11RzO", "outputId": "29b23d51-defe-433b-ab3a-e7221b782019" }, "outputs": [], "source": [ "##codecell_RasterLandscape_DEMrasterPlotCountoursFilled\n", "\n", "#Plot our data using Matplotlib tool 'contourf' to get filled contours\n", "fig = plt.figure(figsize = (12, 8))\n", "ax = fig.add_subplot(111)\n", "plt.contourf(data_array, cmap = \"viridis\", \n", " levels = list(range(0, 5000, 500)))\n", "plt.title(\"Elevation Contours\")\n", "cbar = plt.colorbar()\n", "plt.gca().set_aspect('equal', adjustable='box')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "rd91jFvh1zuV" }, "source": [ "That's it for this lesson. Hopefully you've seen how you can access and manipulate raster data to represent the landscape or environmental factors. " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "0U2G_IqKe2Wl" }, "source": [ "## 5.6 LexiCode ##\n", "To re-use the codes - you will need to first load their respective libraries. So far, you have used ...:\n", "\n", "\n", "
\n", "\n", "> **libraries** | | | |\n", ">--- |--- | --- | --- |\n", ">folium | numpy | [rasterio](https://rasterio.readthedocs.io/en/stable/quickstart.html)|\n", ">branca| rtree | [richdem](https://richdem.readthedocs.io/en/latest/)|\n", ">pandas| osmnx| [elevation](https://pypi.org/project/elevation/)|\n", ">geopandas| requests | zipfile |\n", ">seaborn | fiona| io |\n", ">matplotlib.pyplot | ipywidgets|requests |\n", "> pysal |seaborn |[gdal](https://gdal.org/)|\n", "> os |\n", "\n", "
\n", "\n", "\n", " **plugins**| |\n", "--- |--- |\n", "HeatMapWithTime \n", "HeatMap\n", "MeasureControl\n", "PrepareUrBasemaps_CreateLayers from [folium.plugins]\n", "cluster (from sklearn)\n", "rasterio.plot\n", "mapping (from shapely.geometry)\n", "Point(from shapely.geometry) \n", "mask (from rasterio.mask) \n", "\n", "
\n", "\n", "your lexicode is non-exhaustive, keep expanding, find your own 'best way' to reuse these code/scripts...\n", "\n", "
\n", "\n", ">Lexicode_MakingaBasicMap | Lexicode_Webmaps&Distributions |Lexicode_StreetGridOrientations | Lexicode_SpatialPatterns | Lexicode_RasterLandscape\n", ">--- | --- | ---|---|----|\n", "> == () [] | pd.concat() | { } *subselection from list*|%matplotlib inline | .open()\n", ">.head_csv() | .dtype() | ox.gdf_from_places()|requests.get()|.print()\n", ">.read_csv() | astype() | ox.plot_shape()|request.content()|dataset.name|\n", ">mean() | fillna()|network_type= ''|.bytes()|dataset.count|\n", ">folium.Map | def return |ox.add_edge_bearings(ox.get_undirected())|gpd.GeoDataFrame.from_features()|dataset.shape|\n", ">range() | .apply(lambda x:*function*,axis=) |count_and_merge()|Set()|dataset.descriptions|\n", ">len() | pd.merge() |np.arrange()|pd.value_counts() |dataset.meta|\n", ">iloc[]| how= , left_index= ,left_index= |np.histogram()|.merge()|dataset.driver|\n", ">.value_counts()| gpd.GeoDataFrame()| ax.set_theta_location()|.sort_values|dataset.read\n", ">if =:| geometry=gpd.points_from_xy |ax.set_ylim()|cluster.KMeans()|.shape\n", ">elif =: |print() |ax.set_title()|.fit()|np.amean()\n", ">else =:| .isin()|ax.set_yticks()|.drop() |np.amin ()\n", ">folium.Marker()| classic.plot()|ax.set_xlabels() & ax.set_yticklabels|.assign()|np.amax()\n", ">folium.Icon()| generateBaseMap()|plt.subplots()|plt.show()|np.std()\n", ">folium.Circle| .groupby(['', ''])|.dropna()|.set_title|show()\n", ">popup= | .reset_index() |polar_plot()|sns.pairplot()|cmap=\n", ">radius= | max_zoom= |pd.Series()|arrs.append()|np.seterr()\n", ">.values.tolist() |folium.TileLayer()|np.pi|.show_hist()|[plt.imshow](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.imshow.html)\n", ">.add_to()| plugins.DualMap(location= , tiles= , zoom_start= )| | |clipped_img| \n", "> | | | |rd.TerrainAttribute()\n", "> | | | |rdShow() \n", "> | | | |gdal_data.GetRasterBand()\n", "> | | | |gdal_band.GetNoDataValue()\n", "> | | | |gdal_data.ReadAsArray\n", "> | | | |gdal_data.ReadAsArray().astype(np.float)\n", "> | | | |plt.contour()\n", "> | | | |plt.contourf()\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "oGsxVmn2-tsK" }, "source": [ "# Chapter 6: Working with Pysal on Weights, Moran and Lisa: Spatial Statistical Analysis #\n", "\n", "## 6.1 Introduction: Exercise motivations\n", "\n", "In this exercise we're going to get into some key spatial statistics. So far in this course we've mostly been visualising spatial distributions and patterns. Here we will run statistical tests to determine whether nor not a pattern or spatial structure exists, and to test what kind of pattern (dispersed vs. random vs. clustered) is present.\n", "\n", "This kind of analysis is what we would call **spatial statistical analysis**. This is different from ** deterministic analysis** -which is when you analyse things like which polygons intersect one another in a dataset, and which polygons entirely contain other polygons. **Spatial statistical analysis** is a subset of **probabilistic analysis** - which is simply analysis of how likely it is that something happened. In this exercise we will start by working with vector-based data (points, lines and polygons). Then we will briefly look at working with raster datasets.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| | | | \n", "| ------------- | ------------- |------------- |\n", "|
| = |
|\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "colab_type": "code", "id": "9LlGBn4RQ1wL", "outputId": "cccb107d-7b4f-4718-f43c-3a5919a2479a" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_ImportUrLibraries\n", "\n", "# start by installing tools as usual\n", "!pip install geopandas\n", "!pip install descartes\n", "!pip install mapclassify\n", "!pip install pysal\n", "\n", "#@title\n", "!apt-get install -qq curl g++ make\n", "#@title\n", "!curl -L http://download.osgeo.org/libspatialindex/spatialindex-src-1.8.5.tar.gz | tar xz\n", "#@title\n", "import os\n", "os.chdir('spatialindex-src-1.8.5')\n", "#@title\n", "!./configure\n", "#@title\n", "!make\n", "#@title\n", "!make install\n", "#@title\n", "!pip install rtree\n", "#@title\n", "!ldconfig\n", "#Working through the example at http://toblerity.org/rtree/examples.html\n", "#@title\n", "from rtree import index\n", "from rtree.index import Rtree\n", "#@title\n", "p = index.Property()\n", "idx = index.Index(properties=p)\n", "idx" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "HbWJQH1CQrHj" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_ImportUrLibraries\n", "\n", "\n", "#and importing tools...\n", "import geopandas as gpd \n", "import requests \n", "import zipfile\n", "import io\n", "import matplotlib.pyplot as plt\n", "\n", "%matplotlib inline\n", "\n", "import seaborn as sns\n", "import pandas as pd \n", "import pysal as ps\n", "import numpy as np\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "vpZN9KC_ZxJ5" }, "source": [ "
\n", " \n", " ### Learning a new language – decomposing the code \n", " \n", "\n", "\n", " in *#codecell_Spatial_Statistical_Analysis_ImportUrLibraries*. These are what we call **prerequisites**. You know by now that they are basic tools so you can get started.\n", "* *Pandas* lets you manipulate your data. ~ déjà vu ~\n", "* *Geo-pandas* lets you manipulate your geographic data. ~ déjà vu ~\n", "* *requests* lets you access easily Hypertext Transfer Protocol or [HTTP](https://www.w3schools.com/tags/ref_httpmethods.asp) library ([doc.](https://realpython.com/python-requests/)).\n", "* *zipfile* lets you reate, read, write, append, and list a ZIP file. ~ déjà vu ~\n", "* *io* (input/open) lets you to access files and streams (a stream is sequence of data elements made available over time - processed 1 at a time) ~ déjà vu ~\n", "* *matplotlib* lets you plot in 2D which extensive plotting library for quality publication & *matplotlib.pyplot* provides a MATLAB-like way of plotting ~ déjà vu ~\n", "* *seaborn* lets you do statitistical data visualisation. ~ déjà vu ~\n", "* *numpy* lets you do statistics especially statistics using numpy. ~ déjà vu ~\n", "* *pysal* ~ déjà vu ~ we have use it before in lab4_SpatialPatterns. This is the one I'd like you to spend a bit more time with...Why? because this library is used to conduct exploratory spatial data analysis. You can have a look at its [documentation](https://pysal.readthedocs.io/en/v1.11.0/users/introduction.html), a [video]((https://www.youtube.com/watch?v=FN1nH4Fkd_Y)) of one of its creator and high level applications for spatial analysis (see image below).\n", "
\n", " \n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "F0UceQ6k5aby" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "## 6.2 Preparing your Data ##\n", "\n", "You must have realised by now that the most time consuming part of your work is preparing your data ... Data preparation is important, and your ability to use or re-use a dataset depends on how well it has been prepared. When you are selecting and preparing your data, remember to think about your research aims and the questions you are trying to answer, and make sure your data is set up to respond to those aims and questions.\n", "\n", "The following scripts repeat what you have done in previous practicals, in particular Lab4_spatial_patterns_in_excavation where you complete the following basic steps:\n", "* import\n", "* read\n", "* select/extract\n", "* merge " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 70 }, "colab_type": "code", "id": "czrB1ZVBQwou", "outputId": "e1ca041f-b669-4e9a-c80d-2902f5ea5e51" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_ImportOurData\n", "\n", "# And now, as usual, get the data\n", "url = 'https://github.com/ropitz/spatialarchaeology/blob/master/gabii_spatial.zip?raw=true'\n", "local_path = 'temp/'\n", "\n", "print('Downloading shapefile...')\n", "r = requests.get(url)\n", "z = zipfile.ZipFile(io.BytesIO(r.content))\n", "print(\"Done\")\n", "\n", "#here this is very much like what we are doing on your PC file system\n", "z.extractall(path=local_path) # extract to folder\n", "\n", "#here we want to download the shapefiles\n", "# so we sort them using sorted()\n", "#so we have a comprehensive list of all variables available in our dataset\n", "# print them\n", "filenames = [y for y in sorted(z.namelist()) for ending in ['dbf', 'prj', 'shp', 'shx'] if y.endswith(ending)] \n", "print(filenames)\n", "\n", "#here you know it ...\n", "#we use function read() to access the data from this notebook and work with it\n", "dbf, shp, shx = [filename for filename in filenames]\n", "gabii = gpd.read_file(local_path + 'gabii_SU_poly.shp')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 316 }, "colab_type": "code", "id": "FoSWTv5nQ-XN", "outputId": "ef93383d-2da9-40dd-86be-b9ab944bab17" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_Peek@OurData\n", "\n", "# As you've done before, print out some information on the data \n", "# To check the number of records in the file, so it has loaded in ok\n", "# And preview the data \n", "print(\"Shape of the dataframe: {}\".format(gabii.shape))\n", "print(\"Projection of dataframe: {}\".format(gabii.crs))\n", "gabii.tail() #last 5 records in dataframe" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 197 }, "colab_type": "code", "id": "8omTerlfvRB1", "outputId": "b4bc9b2b-b88e-4a8d-d958-9b3e96ba964f" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_ObjectType&SU\n", "\n", "# As we've done before (returning to the Gabii finds data - see ###codecell_SpatialPatterns_WhichTypeOfSpecialFinds&fromWhere?) \n", "# Get the non-spatial special finds data\n", "# And they are archived per SU /Stratigraphical Unit\n", "sf_su = pd.read_csv(\"https://raw.githubusercontent.com/ropitz/gabii_experiments/master/spf_SU.csv\")\n", "sf_su.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 364 }, "colab_type": "code", "id": "msdQ_Jaq2S8t", "outputId": "4c44f539-05ab-4a9c-8519-498d13f9a1c6" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_FindsBecomesSpatial\n", "\n", "#Then let's combine our polygons representing context shape and location\n", "# with the special finds data\n", "# We can use known command 'merge()' for this (see especially explanation for this in #codecell_Webmaps&Distributions_MergingZeData and also ###codecell_SpatialPatterns_TextileToolsBecomesSpatial)\n", "\n", "gabii_textools = gabii.merge(sf_su, on='SU')\n", "gabii_textools.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 484 }, "colab_type": "code", "id": "toSx3S1kzsI4", "outputId": "86f050b1-2e80-4c6e-d520-54facd96b3ca" }, "outputs": [], "source": [ "##codecell_WorkingWithPysal_DataPreparation\n", "\n", "#Let's pull all those find types out of the big list. \n", "#These commands should look familiar because you've done them before (###codecell_SpatialPatterns_SpecialFindsSelection)\n", "types = ['Loom Weight','Spool','Spindle Whorl']\n", "textile_tools = gabii_textools.loc[gabii_textools['SF_OBJECT_TYPE'].isin(types)]\n", "\n", "# Now let's count up how many of these tools appear in each context (SU).\n", "# This command will print out a list of the number of textile tools in each SU next to that SU number.\n", "textile_tool_counts = textile_tools.groupby('SU')['SF_OBJECT_TYPE'].value_counts().unstack().fillna(0)\n", "\n", "\n", "gts = gabii_textools.merge(textile_tool_counts, on='SU')\n", "gts_new = gts.drop_duplicates(subset=\"SU\")\n", "gts_new.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "MdEkesFQAKby" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "## 6.3 Visualising your Data ##\n", "\n", "### 6.3.1 Let's plot our data\n", "\n", "Now plot your data to visualise it with a focus on the Spool Distribution of Gabii excavations. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 584 }, "colab_type": "code", "id": "_gxMADn3dSge", "outputId": "82947cc1-3bdb-4501-d402-3f1897817f96" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_PlotZeData\n", "\n", "# Set up figure and axis\n", "f, ax = plt.subplots(1, figsize=(9, 9))\n", "# Plot SUs\n", "#gabii.plot(ax=ax, facecolor='0.85', linewidth=0)\n", "\n", "# Quantile choropleth of deaths at the street level\n", "gts_new.plot(column='Spool', scheme='fisher_jenks', ax=ax, \\\n", " cmap='YlGn', legend=True, linewidth=3)\n", "# Plot pumps\n", "#xys = np.array([(pt.x, pt.y) for pt in pumps.geometry])\n", "#ax.scatter(xys[:, 0], xys[:, 1], marker='^', color='k', s=50)\n", "# Remove axis frame, also called graticule\n", "ax.set_axis_off()\n", "# Change background color of the figure\n", "f.set_facecolor('0.75')\n", "# Keep axes (your x and y) proportionate\n", "plt.axis('equal')\n", "# Title \n", "# f.suptitle()command allows you to add a centered title to the figure\n", "f.suptitle('Spool Distribution', size=30)\n", "# Draw\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "m4k2ydSEE8-c" }, "source": [ "### 6.3.2 Exploring your data: Descriptive Statistics Vs Visualisation\n", "\n", "They are different ways to explore your data. \n", "\n", "\n", "
\n", " " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "I9lXmkgOAPio" }, "source": [ "## 6.4 Spatial Statistical Analysis ##\n", "\n", "So far, you've (rapidly) repeated the steps you've done in a previous exercise to visualise a spatial pattern - this time we are exploring the spools artefacts discovered while excavating at Gabii. \n", "\n", "Now we want to test statistically if there is a pattern? Statistical tests are needed when it is not obvious from just looking at the distribution whether or not a pattern exists. \n", "\n", "We can start with some of the more basic tests: **Moran's** and **local Moran's**, which are tests for spatial autocorrelation. Moran's I statistic (1948, 1950) is one of the classic (as well as one of the most common) ways of measuring the degree of spatial autocorrelation in data. \n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "YO-8JxfLWFlZ" }, "source": [ "### 6.4.1 Moran's I test ###\n", "\n", "**But how does this work? ** \n", "\n", "You have seen that objects in space are rarely randomly distributed. In fact, they usually have some degree of patchiness (i.e., they are spatially clustered) which can produce a variety of distinct spatial patterns.
These patterns can be quantified according to the degree of similarity between the attributes values of adjacent spatial objects (polygons, lines, points, raster cells).
We will start by testing if nearby objects tend to have similar attributes than expected from a random distribution . The degree to which similar values cluster together spatially is the degree of autocorrelation. This value is reported as **Moran's I**.\n", "\n", "In a nutshell, the Moran’s I statistic provides a correlation coefficient (a measurement of similarity) for the relationship between a measured value and its surrounding measured values. \n", "\n", "To understand the meaning of this measurement of similarity, the Moran's I value, you need to understand how \"statistical significance\" works. Basically, a measurement is considered \"statistically significant\" if it very different from what you would get by chance. A spatial pattern is \"statistically significant' if it is very different from a random spatial pattern. \n", "\n", "In summary:\n", "* null hypothesis = the data is randomly disbursed.\n", "* alternative hypothesis = the data is more spatially clustered than you would expect by chance alone. \n", "\n", "Conditions:\n", "* All features should have at least one neighbor (8 is good).\n", "* No feature should have all other features as a neighbor.\n", "* At least 30 points.\n", "* Global statistics assess the overall pattern. They are most effective when the spatial pattern is consistent across the study area. \n", "\n", "**Z-score** and **p-value** are used to evaluate the significance of the Index value. The p-value is the probability that the observed spatial pattern was created by some random process. When the p-value is very small, it means it is very unlikely (small probability) that the observed spatial pattern is the result of random processes. There are two possible scenarios:\n", "* Very high or very low (negative) z-scores, associated with very small p-values, are found in the tails of the normal distribution. \n", "* Small p-values and either a very high or a very low z-score indicates it is unlikely that the observed spatial pattern reflects a random pattern.\n", "\n", "
\n", "\n", "\n", "\n", "You can read about [Moran's](https://mgimond.github.io/Spatial/spatial-autocorrelation.html). If you prefer a video, this one [this](https://www.youtube.com/watch?v=_J_bmWmOF3I) one is very clear on how to interpret the resulst of a Moran's test.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "ozpmT-c_iALL" }, "source": [ "#### To start your Moran's I test ###\n", "\n", "As you may have suspected, we need to do something first before running the Moran's I. And, as said above, the emphasis of the Moran's I spatial analysis is on the role of the weights. So, one of the vital steps of spatial autocorrelation modelling is to construct a spatial weights matrix. The spatial weights can be based on:\n", "\n", "* Contiguity [Pysal-notes](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html#contiguity-based-weights)\n", "* Distance [Pysal-notes](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html#distance-based-weights)\n", "* Distance band [Pysal-notes](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html#distance-band-weights)\n", "* Kernel [Pysal-notes](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html#kernel-weights)\n", "* K-nearest neighbours [Pysal-notes](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html#k-nearest-neighbor-weights)\n", "which can be illustrated as such:\n", "\n", "
*where 0= no weights and 1\n", "\n", "\n", "As we've discussed in class, designing a good weights matrix is part of the interpretive process. It's an important step because if we fail to design the weights matrix well, the resulting of spatial analysis will likely not produce results that are meaningful. It is important, therefore, to understand the **kinds of spatial relationships** listed above, and to use the one that best represents the kind of spatial relationship presesnt in your data.\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "BD_Oz57v8L6S" }, "source": [ "#### Let's define the weights ####\n", "\n", " Let's create some weights for a real dataset that you know: finds from the Gabii excavations." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 407 }, "colab_type": "code", "id": "6DOM4mY88XQe", "outputId": "987b92f4-e74b-471d-ba22-25548c66ca23" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_MoranStatistics_defineweights\n", "\n", "# To start your Moran's statistical test, you need to create weights that define how strongly you think things near to one another influence one another.\n", "# see the types of weights available to you by looking in pysals help file\n", "help(ps.lib.weights)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 52 }, "colab_type": "code", "id": "KSwfX24T8Jwr", "outputId": "acda0c82-9274-49fe-9c13-746418ff88d6" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_MoranStatistics_KNN_weights\n", "\n", "#create some weights. I've gone with KNN weights. Read about this in the pysal documentation linked above...\n", "#we want to see what is happening with the Spools \n", "\n", "#let's first subselect them and create a gts_spool\n", "gts_spool = gts_new[['SU','Spool']]\n", "\n", "#let's have the nearest neighbours weights \n", "#we choose a k-distance of 5\n", "gts_spool_weights = ps.lib.weights.KNN(gts_spool,5)\n", "#here we Ignore the warnings because\n", "#we know that not all the SU areas connect up physically" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "92a8jWAjDUrr" }, "source": [ "#### Let's add the knn-weights as an attribute to your data ####\n", "\n", "Here we create a matrix, or a grid of values, which contains the weights assigned to each spatial entity: .\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "W5M2Z7HkAIah" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_MoranStatistics_DefineMatrix\n", "\n", "# Rename IDs to match those in the `segIdStr` column\n", "gts_spool_weights.remap_ids(gts_spool.index)\n", "\n", "# Row standardise the matrix\n", "gts_spool_weights.transform = 'R'" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "8-ByLSt8fg2M" }, "source": [ "
\n", " \n", " #### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "
\n", " in *##codecell_Spatial_Statistical_Analysis_MoranStatistics_DefineMatrix*, to keep things aligned, you use .remap_ids() to rename the IDs of the matrix to match those in the table.\n", "\n", "Often there is a need to apply a transformation .transform = 'R' to the spatial weights, such as in the case of row standardisation. You can use .transform = 'b' for binary and \n", ".transform = 'v' for variance stabilising [doc](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html#weight-transformations). \n", "\n", "**~ Behind the scenes ~** the transform property is updating all other characteristics of the spatial weights that are a function of the values and these standardisation operations, freeing the user from having to keep these other attributes updated. " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "NvoI0eAmgy9H" }, "source": [ "#### Let's add Spatial lag ####\n", "Now you have the data and the spatial weights matrix ready.\n", "\n", "Let's start by computing the spatial lag of the spool distribution. The spatial lag is the product (multiplication) of the spatial weights matrix and a given variable. If gts_spool_weight is row-standardised, the result is the average value of the variable in the neighborhood of each observation.\n", "\n", "Multiplication of two matrices looks like this:\n", "
\n", "\n", "\n", "We can calculate the spatial lag for the variable 'Spool' and store it directly in the main table with .lag_spatial() :" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 304 }, "colab_type": "code", "id": "VTPTQRiICR7Y", "outputId": "5f7b6a64-9cd2-46fc-f8b2-1c3947926423" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_MoranStatistics_SpatialLag\n", "\n", "#add the weights you've created to the attribute table\n", "gts_spool['gts_spool_weights'] = ps.lib.weights.lag_spatial(gts_spool_weights, gts_spool['Spool'])\n", "gts_spool.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "DK-Sx1HwDuwd" }, "source": [ "#### Standardisation against biaised dataset ####\n", "\n", "Whenever there is a risk that the distribution of your features (see decomposing the code of *#codecell_RasterLandscape_BandsDataType* as a reminder) is potentially biased due to sampling design or an imposed aggregation scheme, it is important to row standardize the data. \n", "\n", "Read about [standardisation](http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/modeling-spatial-relationships.htm#GUID-DB9C20A7-51DB-4704-A0D7-1D4EA22C23A7) in spatial modelling.\n", "\n", "\n", "So now we need to remove potentially biased distribution in our dataset by standardising the counts and standardising the means." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 411 }, "colab_type": "code", "id": "Z8uUaea3DHAe", "outputId": "a74158ae-eb74-4162-ccfa-510da4ab9d03" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_MoranStatistics_Standardisation&SpatialLag\n", "\n", "# standardise the counts of the number of spools in each context and the weights\n", "#we are applying some math here\n", "gts_spool['spool_std'] = (gts_spool['Spool'] - gts_spool['Spool'].mean()) / gts_spool['Spool'].std()\n", "gts_spool['w_spool_std'] = ps.lib.weights.lag_spatial(gts_spool_weights, gts_spool['spool_std'])\n", "gts_spool.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Fk5JjmcJr_LN" }, "source": [ "
\n", " \n", " #### Learning a new language – decomposing the code \n", " \n", "\n", "\n", "
\n", " in *##codecell_Spatial_Statistical_Analysis_MoranStatisticstandardisation&SpatialLag*, to make sure our varaibles distribution is not biaised, we apply some basic algebra:\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "7GDFkBUZElPe" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_MoransI_ImportUrLibraries\n", "\n", "\n", "#get some more tools for the Moran test\n", "from pysal.explore.esda.moran import Moran" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "id": "iAMa3J_IEA6y", "outputId": "7ffb3ebd-01da-4438-96d4-7e6a824163c7" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_RUNMoransI\n", "\n", "# Run the Moran test\n", "mi = Moran(gts_spool['Spool'], gts_spool_weights)\n", "mi.I\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "HtGalQ97D4FE" }, "source": [ "#### Interpreting your Moran's I ####\n", "\n", "*##codecell_Spatial_Statistical_Analysis__RUNMoransI* mean? \n", "* Read how to [interpret the results](https://www.statisticshowto.datasciencecentral.com/morans-i/).\n", "* Are your spools actually clustered?\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "uRL98mVV4DIo" }, "source": [ "#### Let's plot your Moran's I ####\n", "\n", " Now let's plot the results.\n", "\n", "The cluster/outlier type (COType) field distinguishes between a statistically significant cluster of:\n", "* high values (HH), \n", "* cluster of low values (LL), \n", "* outlier in which a high value is surrounded primarily by low values (HL)\n", "* outlier in which a low value is surrounded primarily by high values (LH). \n", "\n", "\n", "Statistical significance is set at the 95 percent confidence level (really quite confident, we could say). " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 552 }, "colab_type": "code", "id": "R8ybhc4zC4Ug", "outputId": "ba221168-9c9c-4eb0-f2f0-d44d89a25246" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_PLOTMoransI\n", "\n", "#Here we are using similar matplotlib commands than in Lab3_StreetOrientation & Lab5_RasterLandscape \n", "\n", "# Setup the figure and axis\n", "f, ax = plt.subplots(1, figsize=(9, 9))\n", "\n", "# Plot values \n", "#command sns.regplot()allows you to plot the data/results and a linear regression model fit \n", "#such as our results of Moran's I \n", "#\n", "sns.regplot(x='spool_std', y='w_spool_std', data=gts_spool)\n", "\n", "# Add vertical and horizontal lines\n", "plt.axvline(0, c='k', alpha=0.5)\n", "plt.axhline(0, c='k', alpha=0.5)\n", "ax.set_xlim(-2, 7)\n", "ax.set_ylim(-2.5, 2.5)\n", "\n", "#add text within each quandrant\n", "plt.text(3, 1.5, \"HH\", fontsize=25)\n", "plt.text(3, -1.5, \"HL\", fontsize=25)\n", "plt.text(-1, 1.5, \"LH\", fontsize=25)\n", "plt.text(-1, -1.5, \"LL\", fontsize=25)\n", "\n", "# Display\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "0d3N1txj6iAO" }, "source": [ "
\n", " \n", " #### Learning a new language – Interpreting the results \n", " \n", "\n", "\n", "\n", "What's the figure generated from *##codecell_Spatial_Statistical_Analysis__PLOTMoransI* mean? \n", "\n", "In order to guide the interpretation of the plot, a linear fit is also included in the plot, together with confidence intervals. This line represents **the best linear fit** to the scatter plot or, in other words, what is the best way to represent the relationship between the two variables as a straight line. Because the line comes from a regression, we can also include a measure of the uncertainty about the fit in the form of confidence intervals (the shaded blue area around the line).\n", "\n", "The plot displays a positive relationship between both variables. This is associated with the presence of positive spatial autocorrelation: similar values tend to be located close to each other. If we had to summarise the main pattern of the data in terms of how clustered similar values are, the best way would be to say they are positively correlated and, hence, clustered over space.\n", "\n", "At the core of this is a classification of the observations in a dataset into four groups derived from the Moran Plot: high values surrounded by high values (HH), low values nearby other low values (LL), high values among low values (HL), and viceversa (LH). Each of these groups are typically called \"quadrants\"" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "L-wxTR_aElbr" }, "source": [ "### 6.4.2 Moran's I to Moran's local: LISA ###\n", "\n", "![](https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/GUID-203ADC3E-C9BC-4FF3-890F-B11B05F1FC41-web.png)\n", "\n", "Actually, as you may have realised, Moran's I can tell you about the complete spatial pattern so it tells you about the clustering BUT it tells us nothing about where the clusters might be ! It only tells you that the pattern is more clustered than it would be if it would be under spatial randomness. \n", "\n", "So why do we need Moran's I then? well you need to test the significance of your spatial analysis, i.e. that your regression is not violating your assumptions... so Moran's is a global test.\n", "\n", "And, now, what do we do to get test the location of spatial clusters? We need a local statistic...To test this, we use the local variant of the Moran's test." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "6ywdvVrCtmaq" }, "source": [ "#### But are there local patterns inside the global one? ###" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "neClfc59E4m7" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_ImportUrLibraries\n", "\n", "# get the tools for the local test\n", "from pysal.explore.esda.moran import Moran_Local" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "DpwGWNguE9lS" }, "source": [ "#### Using your Moran's local ###\n", "\n", "The local test breaks the global pattern down to **test for the presence of local clusters**. You can check at each SU whether or not it is likely (in a statistical significance sense) for it to participate in a local cluster.\n", "\n", "*How does this work?* Local measures consider each single observation in a dataset and operate on them (as oposed to on the overall data) as global measures do. \n", "\n", " **LISAs (Local Indicators of Spatial Association)** are widely used in many fields to identify clusters of values in space. They are a very useful tool that can quickly return areas in which values are concentrated and provide suggestive evidence about the processes that might be at work. Therefore, they are a key part of the exploratory toolset. In Python, we can calculate LISAs in a very streamlined way thanks to PySAL:\n", "\n", "* **A positive value** for I indicates that a feature has neighboring features with similarly high or low attribute values; this feature is part of a cluster.\n", "* **A negative value** for I indicates that a feature has neighboring features with dissimilar values; this feature is an outlier. \n", "* In either instance, the **p-value** for the feature must be small enough for the cluster or outlier to be considered statistically significant. \n", "* **Note** that the local Moran's I index (I) is a relative measure and can only be interpreted within the context of its computed z-score or p-value." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "w3-URUxzEx3B" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_MoransLocal\n", "\n", "# run the local test\n", "lisa = Moran_Local(gts_spool['Spool'].values, gts_spool_weights)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "a0dukgj__PNt" }, "source": [ "#### Applying your Moran's local ###\n", "\n", "All we need to pass is the .values variable of interest \"gts_spool['Spool']\" and the spatial weights \"gts_spool_weights\" that describes the neighborhood relations between the different observation that make up the dataset.\n", "\n", "Looking at the numerical result of LISAs is not always the most useful way to exploit all the information they can provide. Remember that we are calculating a statistic for every sigle observation in the data so, if we have many of them, it will be difficult to extract any meaningful pattern. Instead, what is typically done is to create a **cluster map** that extracts the significant observations (those that are highly unlikely to have come from pure chance) and plots them with a specific color depending on their quadrant category.\n", "\n", "All of the information needed is contained in the lisa object you have just created in *##codecell_Spatial_Statistical_Analysis_MoransLocal*. \n", "\n", "\n", "It is convenient to pull them out and insert them in the main data frame that we will call \"gts_spool\" as demonstrated in *##codecell_Spatial_Statistical_Analysis_MoransLocal_Significance*. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 230 }, "colab_type": "code", "id": "8JnBisEIFqhV", "outputId": "60afab58-7218-43b2-c25f-303776b0ef83" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_MoransLocal_Significance\n", "\n", "# Break observations into significant or not\n", "# using .o_sim < we will set it at 95% confidence interval \n", "# so significant values are all the ones which are less than 5% \n", "gts_spool['significant'] = lisa.p_sim < 0.05\n", "\n", "# By using .q command\n", "# you can store the quadrant each observation belongs to:\n", "# - the high-high, high-low, low-high, low-low \n", "# just as before (in ##codecell_Spatial_Statistical_Analysis_PLOTMoransI) are quads 1-4 \n", "gts_spool['quadrant'] = lisa.q\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "l5S17HgEB1Lr" }, "source": [ "
\n", " \n", " ### Learning a new language – Interpreting the results \n", " \n", "\n", "\n", " First, the 'significance' column. As with global Moran's I, PySAL is automatically computing a p-value for each LISA. Because not every observation represents a statistically significant one, we want to identify those with a p-value small enough that rules out the possibility of obtaining a similar situation from pure chance. Since we have been working with a 95% confidence interval, we select 5% using p.sim< as the threshold for statistical significance (just like with global Moran's I). \n", "\n", "To identify these values, we create in *##codecell_Spatial_Statistical_Analysis_MoransLocal_SignificanceResults* below a variable, significant, that contains :\n", " * **True** if the p-value of the observation is satisfies the condition, and \n", " * **False** otherwise." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 390 }, "colab_type": "code", "id": "KcB0rHtvB67c", "outputId": "bd1f3055-8a07-46cc-9779-5070924fd1eb" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_MoransLocal_SignificancereadingUrResults\n", "\n", "gts_spool['significant'][:20]\n", "# true means it is in a cluster, false means it is not" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 70 }, "colab_type": "code", "id": "goVU6rWyGDCo", "outputId": "288a47ed-647d-496c-8326-0af48255015e" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_MoransLocal_SignificancereadingUrResults\n", "\n", "# You can read out the calculated p values for each \n", "lisa.p_sim[:20]" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "-resv_GpGy_Z" }, "source": [ "#### Bringing your Moran's local to your dataset ####" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 467 }, "colab_type": "code", "id": "lksLfhAyH76s", "outputId": "193c56a6-ffe3-410b-8078-5b7fbc4db5fe" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_AddingUrMoransLocal2Gabiidataset\n", "\n", "#add this info back onto the spatial data\n", "gabii_spool_lisa = gabii.merge(gts_spool, on='SU')\n", "gabii_spool_lisa.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "cglqoAI4JAyC" }, "source": [ "#### Let's plot your Moran's Local #####\n", "\n", "Now we have these elements, the signifigance and quadrant , we can make a map showing which quadrant each SU belongs to, essentially a display of where the local clusters are located.\n", "\n", "**Parameters** are for:\n", "* High-high = 4\n", "* Low-low = 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 530 }, "colab_type": "code", "id": "AP20J-TWGMsx", "outputId": "35671272-cba8-4551-d506-3a120febd400" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_MappingGabiiMoransLocal\n", "\n", "# Setup the figure and axis\n", "f, ax = plt.subplots(1, figsize=(9, 9))\n", "\n", "# Plot baseline su poly\n", "gabii_spool_lisa.plot(column='quadrant', ax=ax, \\\n", " cmap='Accent', legend=True, linewidth=3)\n", "\n", "ax.set_axis_off()\n", "plt.axis('equal')\n", "plt.show()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "ZcivQCnTJ4-r" }, "source": [ "#### Interpreting your Moran's Local ####\n", "\n", "How would you interpret the results of this analysis? \n", "\n", "**Remember**: The z-scores and p-values are measures of statistical significance which tell you whether or not the local pattern is random, feature by feature. In effect, they indicate whether the apparent similarity (a spatial clustering of either high or low values) or dissimilarity (a spatial outlier) is more pronounced than one would expect in a random distribution.\n", "* A high positive z-score for a feature indicates that the surrounding features have similar values (either high values or low values). \n", "* A low negative z-score (for example, less than -3.96) for a feature indicates a statistically significant spatial data outlier. " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "auhfktQSqVAU" }, "source": [ "## 6.5 From Vector to Raster ##\n", "\n", "You can run similar spatial statistics on raster data to understand trends and distributions of data. \n", "\n", "It is possible to convert point vector data to raster data via the process of interpolation. \n", "\n", " ~ déjà vu ~ We have talked about this before:\n", "* in Chapter 3.3 when presenting type of data distribution and example of applications; and,\n", "* in Chapter 5 when working with DEMs, raster with an elevation attribute which was interpolated to create a surface.\n", "\n", "
\n", "*IDW interpolation calculates the value of an unknown cell center value (a query point) using surrounding points with the assumption that closest points will be the most similar to the value of interest*\n", "\n", "There are many ways to interpolate data, which you can read about on this [site](https://www.neonscience.org/spatial-interpolation-basics).\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "5W9_wwCjrNwM" }, "source": [ "### 6.5.1 Whitebox is an open source GIS ###\n", "\n", " **Whitebox** is full of hundreds (412 for version 1.0.2!) of useful spatial tools. \n", "\n", "You can download it and run it standalone (without installing!) or you can call its functions from the jupyter notebook. \n", "\n", "It will be well worth checking out for your own independent projects: Read more about [Whitebox Tools here](https://github.com/jblindsay/whitebox-tools).\n", "\n", "| | | | \n", "| ------------- | ------------- |------------- |\n", "|
| = |
|\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 212 }, "colab_type": "code", "id": "h63q0-mr7LNa", "outputId": "44db8b02-6f22-41bb-f5d9-5a3a48c58e3d" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_InstallUrLibraries\n", "\n", "# we're going to use the whitebox tools to interpolate our data and imageio to view it\n", "!pip install whitebox # acts as a spatial toolbox\n", "!pip install imageio \n", "!pip install imageio tifffile # stores numpy arrays in TIFF format files & read image&metadata from TIFFs " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 478 }, "colab_type": "code", "id": "qpg4bRw07N3n", "outputId": "f9eff073-23af-470e-fdca-bfc937f0132e" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_ImportUrLibraries\n", "\n", "# import your tools and print the version to check it worked\n", "\n", "import pkg_resources # provides runtime facilities for finding, introspecting, activating and using installed Python distributions\n", "import whitebox #advanced geospatial data analysis platform. Well you need its toolbox for that (see 3 codeline down)!\n", "import imageio # provides an easy interface to read and write a wide range of image data, including animated images, volumetric data, and scientific formats\n", "import IPython # for displaying html outputs\n", "import os\n", "\n", "wbt = whitebox.WhiteboxTools()\n", "print(wbt.version())\n", "print(wbt.help())" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 443 }, "colab_type": "code", "id": "0Ep29_-C7hOH", "outputId": "e05e0d19-ebf4-45ab-931f-0a8df0b02482" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_CheckUrNewSpatialToolbox\n", "\n", "# when there are lots of tools, it's useful to call their help info to find out what the parameters are\n", "print(wbt.tool_help(\"IdwInterpolation\"))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "FgiUOkxApm8Z" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "### 6.5.2 Let's interpolate ###" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 197 }, "colab_type": "code", "id": "4Mr8zlMvlMUs", "outputId": "da20d736-6e08-42e6-8806-826385ea3bfb" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_PrepareUrData\n", "\n", "# To interpolate, as you may have noticed in your reading, we need points rather than polygons. \n", "# To have points, we get the center of each polygon, called centroids. \n", "\n", "# In these lines we essentially strip out the geometry information when we get the center of each polygon. \n", "spool_centroids = gabii_spool_lisa.geometry.centroid\n", "spoolgdf = gpd.GeoDataFrame(geometry=gpd.GeoSeries(spool_centroids))\n", "\n", "#Then we 'merge' to glue the attributes we are interested in interpolating back on to our data.\n", "spool_centroids_attr = spoolgdf.merge(gabii_spool_lisa.spool_std,right_index=True,left_index=True)\n", "\n", "#Let's have a sneaky peek at the dataframe\n", "spool_centroids_attr.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "lPEPVLUm2xgM" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "### 6.5.3 Let's get our working directory sorted ####" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "id": "Kt38Uahqybi4", "outputId": "50bca1d9-6b5c-45cd-a1a1-70e9c0b302fc" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_SetUrWorkingDirectory\n", "\n", "# to work in whitebox effectively, we put this data out to a temporary shapefile\n", "# we also set a data directory where we can put this file\n", "data_dir = '/usr/local/lib/python3.6/dist-packages/whitebox/testdata/'\n", "wbt.set_working_dir('data_dir')\n", "print(data_dir)\n", "\n", "#wbt.verbose command allows you to decide if you want output messages such as warnings, progress updates and other notifications\n", "wbt.verbose = True\n", "\n", "#Let's place our shapefile in a file \n", "gabii_spool_shapes = spool_centroids_attr.to_file(\"/usr/local/lib/python3.6/dist-packages/whitebox/testdata/gabii_spool.shp\")\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "colab_type": "code", "id": "R9oPY80wGWTg", "outputId": "29e78185-3dc8-4fa9-9f7d-aadc59969433" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_InterpolateRaster\n", "\n", "# now we interpolate our centriods using the standardised spool weights and show the result\n", "# note you can play around with the interpolation parameters to have different kernel sizes and cell sizes...\n", "\n", "wbt.idw_interpolation(\n", " i=\"/usr/local/lib/python3.6/dist-packages/whitebox/testdata/gabii_spool.shp\", \n", " field=\"spool_std\",\n", " output=\"/usr/local/lib/python3.6/dist-packages/whitebox/testdata/gabii_spool.tif\", \n", " use_z=False, \n", " weight=2.0, \n", " radius=5, \n", " min_points=3, \n", " cell_size=1, \n", " base=None\n", ")\n", "\n", "raster = imageio.imread(\"/usr/local/lib/python3.6/dist-packages/whitebox/testdata/gabii_spool.tif\")\n", "plt.imshow(raster)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "id": "_yoIwIzgJm_P", "outputId": "1f4cb9c0-46a9-4e0b-bf58-192209c72129" }, "outputs": [], "source": [ "#List data under the data folder\n", "print(os.listdir('testdata'))" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "jLkWKM_rvit-" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "### 6.5.4 Let's get the spatial statistical Analysis running ###\n", "\n", "Now you can calculate statistics on this dataset, for example using **rasterio** like we did last week, or using **whitebox's tools**.\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Un1fJHefODET" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "### 6.5.5 Let's check distribution ###\n", "\n", "For example, you can check to see if the data forms a normal distribution. This lets you know what kinds of statistical tests will be valid. As we discussed last week, often archaeological distributions are not normal." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "colab_type": "code", "id": "MnUQMaqSy_Kw", "outputId": "ed55a12d-1045-4ae0-ffec-26ed0f09e0e6" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_RasterSpatialStats_normality\n", "\n", "wbt.ks_test_for_normality(\n", " \"/usr/local/lib/python3.6/dist-packages/whitebox/testdata/gabii_spool.tif\", \n", " \"/usr/local/lib/python3.6/dist-packages/whitebox/testdata/normal.html\", \n", " num_samples=None\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 710 }, "colab_type": "code", "id": "w41lhagV1P_A", "outputId": "e8e05d5d-ec59-4d9f-bbcf-3a9fa2b05f79" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_RasterSpatialStats_ViewNormality\n", "\n", "# view your results\n", "IPython.display.HTML(filename=\"/usr/local/lib/python3.6/dist-packages/whitebox/testdata/normal.html\")" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "thKxqY3F2AvM" }, "source": [ "| | | \n", "| ------------- |-------------|\n", "|
||\n", "\n", "### 6.5.6 Let's check for autocorrelation ###\n", "\n", "\n", "You can also run a similar autocorrelation test to the LISA one we ran above with the vector data.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "colab_type": "code", "id": "KmqoKZKP2GDr", "outputId": "ebafcec5-1a5b-4980-b734-8fd846a18053" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_RasterSpatialStats_Autocorrelation\n", "\n", "wbt.image_autocorrelation(\n", " \" /usr/local/lib/python3.6/dist-packages/whitebox/testdata/gabii_spool.tif\", \n", " \"/usr/local/lib/python3.6/dist-packages/whitebox/testdata/gabii_autocorr.html\", \n", " contiguity=\"Rook\"\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 457 }, "colab_type": "code", "id": "Mw95v0Qw2hU9", "outputId": "c4b6c1ea-118e-44b9-93e9-bba793d25475" }, "outputs": [], "source": [ "##codecell_Spatial_Statistical_Analysis_RasterSpatialStats_ViewAutocorrelation\n", "\n", "# view your results\n", "IPython.display.HTML(filename='/usr/local/lib/python3.6/dist-packages/whitebox/testdata/gabii_autocorr.html')" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "o4x4PpWcJ76O" }, "source": [ "This exercise ends here. Hopefully you've learned that there are statistical tests for spatial patterns and that these let us go beyond 'just visualising' to look for patterns. These tests can be run on either vector or raster data.\n", "\n", "Clearly there's a lot to learn and explore in the world of spatial stats..." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "0U2G_IqKe2Wl" }, "source": [ "## 6.6. LexiCode ##\n", "To re-use the codes - you will need to first load their respective libraries. So far, you have used ...:\n", "\n", "
\n", "\n", ">libraries \t|\t \t|\t \t|\n", ">--- \t|\t--- \t|\t --- \t|\n", ">[folium](https://python-visualization.github.io/folium/) \t|\t [numpy](https://numpy.org/)  \t|\t [rasterio](https://rasterio.readthedocs.io/en/stable/quickstart.html)\t|\n", ">[branca](https://pypi.org/project/branca/)\t|\t [rtree](https://pypi.org/project/Rtree/) \t|\t [richdem](https://richdem.readthedocs.io/en/latest/)\t|\n", ">[pandas](https://pandas.pydata.org/)\t|\t [osmnx](https://osmnx.readthedocs.io/en/stable/)\t|\t [elevation](https://pypi.org/project/elevation/)\t|\n", ">[geopandas](http://geopandas.org/)\t|\t [requests](https://realpython.com/python-requests/) \t|\t [zipfile](https://docs.python.org/3/library/zipfile.html) \t|\n", ">[seaborn](https://seaborn.pydata.org/index.html) \t|\t [fiona](https://pypi.org/project/Fiona/)\t|\t [io](https://docs.python.org/3/library/io.html#overview) \t|\n", ">[matplotlib.pyplot](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.html) \t|\t [ipywidgets](https://github.com/jupyter-widgets/ipywidgets)\t|\t[seaborn](https://seaborn.pydata.org/) \t|\n", "> [pysal](https://pysal.readthedocs.io/en/v1.11.0/users/introduction.html) \t|\t [os](https://docs.python.org/3/library/os.html)\t|\t[gdal](https://gdal.org/)\t|\n", ">  [pkg_resources](https://setuptools.readthedocs.io/en/latest/pkg_resources.html#overview)\t|\t[whitebox](https://jblindsay.github.io/wbt_book/intro.html) \t|\t[imageio](https://pypi.org/project/imageio/) \t|\n", "> [IPython](https://ipython.org/)\t|\t[imageio tifffile](https://pypi.org/project/tifffile/)\t|\t\t|\n", "\n", "\n", "
\n", "\n", "\n", " plugins| |\n", "--- |--- |\n", "HeatMapWithTime \n", "HeatMap\n", "MeasureControl\n", "PrepareUrBasemaps_CreateLayers from [folium.plugins]\n", "cluster (from sklearn)\n", "rasterio.plot\n", "mapping (from shapely.geometry)\n", "Point(from shapely.geometry) \n", "mask (from rasterio.mask) \n", "Moran (from pysal.explore.esda.moran) \n", "Moran_Local (from pysal.explore.esda.moran) \n", "\n", "
\n", "\n", "At this point you have learned so many thigs that the lexicode is non-exhaustive. It might be useful to make your own custum 'lexicode' with the commands you use the most often.\n", "\n", "
\n", "\n", ">Lexicode_MakingaBasicMap \t|\t Lexicode_Webmaps&Distributions \t|\t\t|\t Lexicode_SpatialPatterns \t|\t Lexicode_RasterLandscape \t|\t Lexicode_Pysal_weights_Moran_Lisa\t|\n", ">--- \t|\t --- \t|\t ---\t|\t---\t|\t----\t|\t---\t|\n", "> ==   () [] \t|\t pd.concat() \t|\t { } *subselection from list*\t|\t%matplotlib inline  \t|\t .open()\t|\t.tail()\t|\n", ">.head_csv() \t|\t .dtype() \t|\t ox.gdf_from_places()\t|\trequests.get()\t|\t.print()\t|\tax.set_axis_off()\t|\n", ">.read_csv() \t|\t astype() \t|\t ox.plot_shape()\t|\trequest.content()\t|\tdataset.name\t|\tf.set_facecolor\t|\n", ">mean()  \t|\t fillna()\t|\tnetwork_type= ''\t|\t.bytes()\t|\tdataset.count\t|\tplt.axis\t|\n", ">folium.Map \t|\t def return \t|\tox.add_edge_bearings(ox.get_undirected())\t|\tgpd.GeoDataFrame.from_features()\t|\tdataset.shape\t|\tf.suptitle()\t|\n", ">range() \t|\t .apply(lambda x:*function*,axis=) \t|\tcount_and_merge()\t|\tSet()\t|\tdataset.descriptions\t|\tf.set_facecolor\t|\n", ">len() \t|\t pd.merge() \t|\tnp.arrange()\t|\tpd.value_counts() \t|\tdataset.meta\t|\tplt.axis\t|\n", ">iloc[]\t|\t how= , left_index= ,left_index= \t|\tnp.histogram()\t|\t.merge()\t|\tdataset.driver\t|\tf.suptitle()\t|\n", ">.value_counts()\t|\t gpd.GeoDataFrame()\t|\t ax.set_theta_location()\t|\t.sort_values\t|\tdataset.read\t|\t.remap_ids\t|\n", ">if =:\t|\t geometry=gpd.points_from_xy \t|\tax.set_ylim()\t|\tcluster.KMeans()\t|\t.shape\t|\t.transform\t|\n", ">elif =: \t|\tprint() \t|\tax.set_title()\t|\t.fit()\t|\tnp.amean()\t|\tlag_spatial()\t|\n", ">else =:\t|\t .isin()\t|\tax.set_yticks()\t|\t.drop() \t|\tnp.amin ()\t|\t.mean()\t|\n", ">folium.Marker()\t|\t classic.plot()\t|\tax.set_xlabels() & ax.set_yticklabels\t|\t.assign()\t|\tnp.amax()\t|\t.std()\t|\n", ">folium.Icon()\t|\t generateBaseMap()\t|\tplt.subplots()\t|\tplt.show()\t|\tnp.std()\t|\tsns.regplot()\t|\n", ">folium.Circle\t|\t .groupby(['', ''])\t|\t.dropna()\t|\t.set_title\t|\tshow()\t|\tMoran_local\t|\n", ">popup= \t|\t .reset_index() \t|\tpolar_plot()\t|\tsns.pairplot()\t|\tcmap=\t|\tp.sim()\t|\n", ">radius= \t|\t  max_zoom= \t|\tpd.Series()\t|\tarrs.append()\t|\tnp.seterr()\t|\tLisa.q\t|\n", ">.values.tolist() \t|\tfolium.TileLayer()\t|\tnp.pi\t|\t.show_hist()\t|\t[plt.imshow](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.imshow.html)\t|\twbt.set_working_dir()\t|\n", ">.add_to()\t|\t plugins.DualMap(location= , tiles= , zoom_start= )\t|\t \t|\t \t|\tclipped_img\t|\twbt.verbose()\t|\n", "> \t|\t \t|\t \t|\t \t|\trd.TerrainAttribute()\t|\t.to_file(\" \")\t|\n", ">  \t|\t \t|\t \t|\t \t|\trdShow() \t|\twbt.idw_interpolation()\t|\n", ">  \t|\t \t|\t \t|\t \t|\tgdal_data.GetRasterBand()\t|\timageio.imread(os.path.join())\t|\n", ">  \t|\t \t|\t \t|\t \t|\tgdal_band.GetNoDataValue()\t|\twbt.ks_test_for_normality()\t|\n", ">  \t|\t \t|\t \t|\t \t|\tgdal_data.ReadAsArray\t|\tIPython.display.HTML()\t|\n", ">  \t|\t \t|\t \t|\t \t|\tgdal_data.ReadAsArray().astype(np.float)\t|\twbt.image_autocorrelation()\t|\n", ">  \t|\t \t|\t \t|\t \t|\tplt.contour()\t|\t\t|\n", ">  \t|\t \t|\t \t|\t \t|\tplt.contourf()\t|\t\t|\n", "\n", "\n" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "0_GetDataAndTools.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "570.4px", "left": "48px", "top": "219.533px", "width": "371.95px" }, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 1 }