{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Copy of 2_webmaps_and_distributions.ipynb", "provenance": [], "collapsed_sections": [], "include_colab_link": true }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "XEKQmmmkiIio", "colab_type": "text" }, "source": [ "## Getting started ### \n", "\n", " ~ déjà vu ~ if you haven't already done so: You need your own copy of this notebook. Go to \"File\" and 'save a copy in github' (give access if needed.... put it into the repository you made for this course). Now you have your own copy of the notebook. Click 'open in colab' to get started working on the practical exercise." ] }, { "cell_type": "markdown", "metadata": { "id": "-1zyXWFvOql5", "colab_type": "text" }, "source": [ "# Interactive maps and looking more at distributions\n", "\n", " ~ déjà vu ~ Last week, we focused on making \"mostly static\" maps, that is maps where you mostly just expect your user to look at the end-product, the map, you've prepared. We looked at a research question visualizing the distribution of Iron Age sites in Central Italy, and we focused the practical exercise on using the map for:\n", " * data exploration => filtering and creating subsets of data to show different aspects of the overall dataset, one at a time\n", " * visualisation => zoom level (visual balance), visual variables (symbols, fonts, generalisation, color) \n", " * organisation => Balancing geospatial and attribute (descriptive) data, and which attributes will be emphasized in a given map\n", "\n", " ~ new ~ This week we'll continue exploring different types of maps. Specifically, we'll look at making maps where interactivity is a key part of the design. We'll see how interactivity allows you to include more variables and more information in a single map.\n", "\n", "Beyond the question of how to present spatial data (i.e. designing the map), we will spend time on an important topic in archaeology: **spatial distributions**. Spatial distributions are patterns in space. Are things clustered together or spaced out regularly or randomly distributed?\n", "\n", "*Types of Distribution: Data Sampling*\n", "\n", "\n", " \n", "\n", "*Examples of Applications*\n", "\n", " \n", "\n", "Central to investigating spatial distributions (patterns) is our ability to manipulate and rearrange spatial data, as we work to answer spatially explicit questions. \n", " \n", "###This practical lab will provide you ways to do so through:\n", "\n", " * transforming database \n", " * merging database\n", " * creating layers\n", " \n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "nVvxjDYFOkK8", "colab_type": "text" }, "source": [ "##Start by getting tools, always\n", "\n" ] }, { "cell_type": "code", "metadata": { "ExecuteTime": { "end_time": "2018-11-19T05:33:33.494092Z", "start_time": "2018-11-19T05:33:24.204201Z" }, "id": "mLiflC-JiIir", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_ImportUrLibraries\n", "\n", "#Like last time, get the tools we need at the start\n", "import pandas as pd\n", "import folium\n", "import numpy as np" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "1lJa2mq5FkDQ", "colab_type": "text" }, "source": [ "in #codecell__Webmaps&Distributions_ImportUrLibraries:\n", "\n", " ~ déjà vu ~ Last week, we worked with the folium, branca and pandas libraries. \n", "\n", " ~ new ~ This week, you still be working with panda and folium and you will be replacing the branca library with [numpy](https://numpy.org/). " ] }, { "cell_type": "markdown", "metadata": { "id": "TN6AB-EgPF-q", "colab_type": "text" }, "source": [ "## And then by getting the data\n", "\n", "This week we are working with the data from the Antikythera survey project.\n", "\n", "\n", "It's citation is: Bevan, A. and Conolly, J. 2012. Intensive Survey Data from Antikythera, Greece. **Journal of Open Archaeology Data** 1(1), DOI: http://dx.doi.org/10.5334/4f3bcb3f7f21d.\n", "\n", "This data has been made available at the Archaeological Data Service (ADS) archive at https://archaeologydataservice.ac.uk/archives/view/antikythera_ahrc_2012/ and completely open for re-use. You can [see](https://archaeologydataservice.ac.uk/archives/view/antikythera_ahrc_2012/stats.cfm) how many times their database has been viewed and downloaded. \n", "\n", "\n", "### Open Data\n", "\n", "Last week we mentioned open source software. Open data operates under the same broad ethos, and follows many of the same principles. Sharing, reuse, and attribution are key. If you continue to reuse the Antikythera data, be sure to continue to link back to and cite the source.\n", "\n", "Perhaps the most relevant example in the UK is **OS (Ordnance survey) OpenData** which was made freely available for the first time in 2010. Last review ([2018](https://www.ordnancesurvey.co.uk/business-government/tools-support/open-data-support)) counted 1.9 million open data downloads, the equivalent of 150 people download OS OpenData every day!\n", "\n", "2015 has seen Environmental Agency made **lidar (light detection and ranging)** data available to the public, for free, as open data. Within the first year of release 500,000 lidar downloads were made equating to nearly 13 million km2 of data! \n", "\n", "\n", "### Working with other people's data: the case of Antikythera \n", "\n", "Have a quick look around the dataset as it's described on the ADS site. You'll notice that they've split up their dataset in ways that made sense to them at the time. Specifically they've divided up the artefact data into three discrete elements: 'pottery', 'lithics' and 'other' into separate files (very much like we did last week by filtering and creating a subset). This is a pretty normal archaeological data thing to do. \n", "\n", "Here's the trick, we want to focus on both **ceramics and small finds** and to look at these datasets together. This means you'll have to grab both of them and combine them. When you are combining the, you will also need to re-organize the attribute data in order to reuse it for something new. This follows the model of a **relational database** (google this if if you are curious about different ways of combining data)." ] }, { "cell_type": "code", "metadata": { "id": "up4gTxO6Ukzb", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_OrganisingUrdata\n", "\n", "#Like last time, get the data we'll need at the start. I've been nice again and converted their coordinates to latitude and longitude for you. \n", "#You'll learn to do this yourself later in the course.\n", "\n", "# we label the first dataset 'pottery'\n", "pottery = pd.read_csv('https://raw.githubusercontent.com/ropitz/spatialarchaeology/master/data/antikythera_survey_pottery.csv')\n", "\n", "# we label the second dataset 'small finds'\n", "small_finds = pd.read_csv('https://raw.githubusercontent.com/ropitz/spatialarchaeology/master/data/antikythera_survey_small_finds.csv')\n", "\n" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "g3MV8nj-VKzB", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_CheckingZeData\n", "\n", "#let's check the individual pottery file. \n", "pottery.head()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "SRR2Nyugbnh-", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_CheckingZeData\n", "\n", "#let's check the individual pottery file to see \n", "small_finds.head()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "mrtIpk4DS-DK", "colab_type": "text" }, "source": [ "\n", "\n", "---\n", "\n", "\n", "###Learning a new language – decomposing the code\n", " ~ déjà vu ~ in #codecell__Webmaps&Distributions_CheckingZeData:\n", "\n", "* **=** allows you to provide a new name, often more convenient than the link pathname\n", "* **pd.read_csv** allows you to open a CSV format file\n", "\n", "* **.head() function** take a peek at the first part of the dataset \n", "\n", "\n", "---\n", "\n", "\n", "Now have a look at these two datasets and how they are structured. Also, the coordinates have been transformed for you from cartesian co-ordinates (a grid system - shown in column *'Xsugg' &\t'Ysugg'*) back to latitude/longitude (geodetic system - shown in column *'DDlat' &\t'DDlong'*). Below is a very crude diagram demonstrating this transformation. \n", "\n", "*The relationship between Geographic (top left image)& Cartesian coordinates (bottom left image)*\n", "\n", " \n", "\n", "\n", "While the conversion from geodetic to cartesian is fairly straightforward, converting cartesian to geodetic is a complex problem (& if you are interested on how this is done mathematically have a look at [this](https://www.movable-type.co.uk/scripts/latlong-os-gridref.html)).\n", "\n", "\n", "---\n", "\n" ] }, { "cell_type": "code", "metadata": { "id": "VIQ_n9BCY1cu", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_Concatenation\n", "\n", "# then we combine the two datasets together to make a big dataset we call 'survey data'\n", "survey_data = pd.concat([pottery,small_finds], sort=False, ignore_index=True)" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "SjpgcdojY5NJ", "colab_type": "text" }, "source": [ "##Learning a new language – decomposing the code\n", "\n", "Codecell__Webmaps&Distributions_Concatenation can be decomposed like this:\n", " \n", "![](https://github.com/Francoz-Charlotte/Spatial_teaching_CFediting/blob/master/Webmaps&Distributions_Concatenation_.jpg?raw=1)\n", "\n", "\n", "\n", "---\n", "\n", "\n", "* **Reminder**: an array is a group of elements/objects/items organised as a set where columns (of equal size) are multiplied by rows (of equal size). The advantage of using arrays is that all elements can be accessed at any time randomly. Last week, you had done something similar to call all items from the range [i] ( in #codecell_makeabasicmap_BringingUrData2theMap ). This week, you are linking together 2 arrays. Pandas library provides various ways to combine together Series or DataFrame such as merge, join and concatenate ([see user guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html)). \n", "\n", "\n", "---\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "_62uXk_TQwVY", "colab_type": "text" }, "source": [ "\n", "Let's make sure nothing went wrong..." ] }, { "cell_type": "code", "metadata": { "id": "pv4XxSSQU2YT", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_CheckingZeData\n", "\n", "#check things loaded in and combined OK\n", "survey_data.head()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "PZqevW03Q4ig", "colab_type": "text" }, "source": [ "# Now ask a question\n", "\n", "Like we said last week, we are using maps and spatial analysis to pose, explore and respond to spatial questions. \n", "\n", "My question is a bit like last week's question. I want to know about how many sites are in each period, so I can try and understand changing patterns over time. \n", "\n", "However, you may have noticed when you read in the data that it's structure is a bit different from last week's data. Instead of each site belonging to one period, it's assigned with varying probability to several different periods. \n", "\n", "This is a totally legit archaeological thing to do. Many sites have activity from multiple periods, and depending on the available evidence, you might be more or less confident about the presence or absence of activity in a specific period. \n", "\n", "### So what do we do now?\n", "\n", "We might start simply by assigning each site primarily to its 'most likely period'. \n", "\n", "This takes a few steps....\n", "\n", "\n", "### step 1 - prepare the data\n", "\n", "...you have to re-organize the dataset to answer our question: **'how many sites are in each period?'**. Data cleaning is the most tiem-consuming part of any analysis - and not only in archaeology but for all scientific analyses. Therefore we will walk through the steps of data cleaning in this exercise.\n" ] }, { "cell_type": "code", "metadata": { "id": "bbjLzluu1kx9", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_Subselect\n", "\n", "# first we create a subset of our data that only contains the columns with information about time\n", "# this is in part because we want to do some operations where everything has to be a number, and some of the other fields contain text\n", "# it's also just to make things simpler when we look at them\n", "\n", "survey_data_time = survey_data[['MNLN', 'FNEB1',\t'EB2',\t'LPrePal', 'FPal', 'SPal', 'TPal', 'PPalPG', 'Geom', 'Arch', 'Class', 'Hell', 'ERom', 'MRom', 'LRom', 'EByz', 'MByz', 'EVen', 'MVen', 'LVen', 'Recent', 'Other']]\n", "survey_data_time.head()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "T1CkxzZPdzeh", "colab_type": "text" }, "source": [ "---\n", "####Learning a new language – decomposing the code \n", "\n", " ~ déjà vu ~ in codecell_Webmaps&Distributions_Subselect\n", "\n", "**[ ]** allows you to subselect within your new *'survey_data'* dataframe.\n", " \n", "---" ] }, { "cell_type": "code", "metadata": { "id": "1P23K0yo2y5U", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_ChangingDataType\n", "\n", "# if you were to look through this data\n", "survey_data_time.dtypes" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "s9ZRoNq_kY-N", "colab_type": "text" }, "source": [ "---\n", "####Learning a new language – decomposing the code\n", "\n", "In #codecell__Webmaps&Distributions_ChangingDataType, the ~ .dtypes ~ function allows you to see which type of data is in our *'survey_data_time'* dataframe. Three types of data are returned **int64** which are integers (whole numbers -numerical data), **float64** which are floating data (contains floating decimal points -numerical data) and **object** which is a string (not a number -categorical data). \n", "\n", "Looking further in this dataset, *'survey_data_time'*, we can also see that some that some fields contain missing values (NaN). Let's see what we can do about that and why it matters. \n", "\n", "---\n", "\n" ] }, { "cell_type": "code", "metadata": { "id": "rDnMwtylkVec", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_AllNumbers\n", "\n", "# We can get rid of NaN (null) values by 'filling' them.\n", "# This is important because null values can break number-based operations.\n", "# Let's get rid of missing values and make sure everything is a number.\n", "\n", "survey_data_time.astype('float64').fillna(0)" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "mjRxTDQQhqn2", "colab_type": "text" }, "source": [ "---\n", "####Learning a new language – decomposing the code\n", "\n", "In the #codecell_Webmaps&Distributions_AllNumbers, you are making sure all these missing values, NaN, becomes null numbers (0=zero) . \n", "\n", "Why do missing values need to be removed? Simply because you cannot apply maths to them (e.g. adding or multiplying columns together, reordering values of a column from max to min value, etc.).\n", "\n", "And this is how it can be done:\n", "\n", "\n", "\n", "\n", "\n", "\n", "---\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "ym3UIQjPZ3Y5", "colab_type": "text" }, "source": [ "### step 2 - Reshaping your data\n", "\n", "Now that you've removed the null values, you can move to the next step in data cleaning, which is applying a transformation to arrange the data the way you want it for your analysis. Data transformation is an important thing to learn to do. \n", "\n", "Right now we have a bunch of columns with information about time. What we want is one single column that contains the most likely period - which is represented in each row by the column with the greatest value. \n", "\n", "*think about that for a moment*\n", "\n", "Right now the 'most likely period' is represented by a number in each row, but that's not the piece of information we want in our new column - we want the name of the column that contains that number.**In other words, we want to know the the item in the table that satisfies the condition 'the most likely period', and to be able to do something with this item.** \n", "\n", "\n", "Once you have identified the maximum values, you can extract them...\n", "\n" ] }, { "cell_type": "code", "metadata": { "id": "-VTV0zNi8TBA", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_MaximumValue\n", "\n", "#here we take the columns from all the different periods, get the one with the maximum value, and write that column's name to our new 'colmax' field\n", "def returncolname(row, colnames):\n", " return colnames[np.argmax(row.values)]\n", "\n", "survey_data_time['colmax'] = survey_data_time.apply(lambda x: returncolname(x, survey_data_time.columns), axis=1)\n" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "hyFFUO0NIzR2", "colab_type": "text" }, "source": [ "---\n", "####Learning a new language – decomposing the code\n", "\n", "In the #codecell_Webmaps&Distributions_MaximumValue \n", "\n", "you are creating a function. A function is a small script with multiple steps that you can run over your whole dataset. In this function, for each your you return the name of the column where the maximum (greatest) value in that row is found. You then create a new column \"colmax\" and store that value. \n", "\n", "This can be done using a [lambda function](https://www.w3schools.com/python/python_lambda.asp).\n", "\n", "Breaking down the steps...\n", "\n", "\n", "#####1) define a function : \n", "\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "#####2) pass/apply this function as an argument:\n", "\n", " \n", "\n", "---\n", "\n", "\n", "\n", "#####*how does this really work?* \n", "\n", "Remember that we are working with arrays, so, getting maximum or minimum values can be schematised like this:\n", "\n", " \n", "\n", "\n", "\n", "\n", "\n", "---\n", "\n" ] }, { "cell_type": "code", "metadata": { "id": "jcza17og5zeo", "colab_type": "code", "colab": {} }, "source": [ "#we can check it has all gone well\n", "survey_data_time.head()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "O5iNwhgraxn9", "colab_type": "text" }, "source": [ "## Merging tables\n", "\n", "OK now we have a single column with the information we need - the most likely date. To create this column, we broke off some of our data (the columns with numbers) from the rest of the data (important descriptive text). We might well want to stick these two datasets back together before proceeding. \n", "\n", "**splitting and merging tables is another basic skill when working with data**\n", "\n", "\n" ] }, { "cell_type": "code", "metadata": { "id": "6LxENtT79Rvl", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_MergingZeData\n", "\n", "#now we can also add our new column back to our original data table by doing a 'merge'\n", "#create a new table 'survey_data_maxtime' by merging our original 'survey_data' with ONLY the 'colmax' column from our new table\n", "survey_data_maxtime = pd.merge(survey_data, survey_data_time['colmax'], how='inner', left_index=True, right_index=True)\n", "survey_data_maxtime.head()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "9tzHj_vOz4Aa", "colab_type": "text" }, "source": [ "####Learning a new language – decomposing the code\n", "\n", "In the #codecell_Webmaps&Distributions_AllNumbers, you learned how to merge using pandas, and, if you want to further explore and apply pd.merge() to your data, check which [parameters](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) apply.\n", "\n", " " ] }, { "cell_type": "markdown", "metadata": { "id": "8tbZd-bqVx7p", "colab_type": "text" }, "source": [ "## The curse of abbreviations\n", "\n", "Have a look at the resulting table. What do all those column names mean? Right now you are probably justifiably confused. We'll be talking more about the mess that is 'other people's data' next week. For now, have a look at the documentation for these datasets at: https://archaeologydataservice.ac.uk/catalogue/adsdata/arch-1115-2/dissemination/csv/pottery/documentation/pottery.txt\n", "\n", "You'll see they explain that many of those weird abbreviations are periods and that the number in each one represents the chance that a given find belongs to that period. Sometimes I wish people wouldn't use abbreviations like this, but they've defined them in their metadata file, so we can't compain too much." ] }, { "cell_type": "markdown", "metadata": { "id": "TZeKlPYrbzT3", "colab_type": "text" }, "source": [ "## Finally, we make maps!\n", "\n", "\n", "We're going to look at a couple different ways of making maps in this exercise, because there are lots of tools we can use to do this.\n", "\n", "### Maps for visualization and interpretation\n", "\n", "Broadly speaking, there are two ways to approach interpreting spatial patterns. There's visualisation and interpretation, where you might visually compare distributions or densities or locations of two or more datasets by plotting them on a map and intepreting what you see. Then there's statistical analysis. \n", "\n", "We'll start with the tools to do the first one, and introduce statistical analysis later in the course. \n", "\n", "We'll also discuss the value of each approach, and when to apply it.\n", "\n", "### As always, start with a question\n", "\n", "As we said at the beginning of today, we're interested in change over time.\n", "\n", "**Analysis Question:**
\n", "How does the distribution of finds change between different periods?\n", "\n", "\n" ] }, { "cell_type": "code", "metadata": { "id": "hYZwq0DesMEw", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_ImportUrLibraries\n", "\n", "# we're going to get geopandas, another tool for making maps\n", "!pip install geopandas\n" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "_RdAEMkRsl11", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_ImportUrLibraries\n", "\n", "# get some more tools for making maps (and other things)\n", "\n", "%matplotlib inline\n", "import geopandas as gpd\n", "import seaborn as sns\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "T6MYggBtwUnd", "colab_type": "text" }, "source": [ "## Geopandas\n", "\n", "We're going to use geopandas for the next few steps. You've used geopandas before, in the first lesson.\n", "\n", "It can do many of the same things as folium, which we were using last time.\n", "\n", "Geopandas is particularly useful for showing categorical data. \n", "![Categorical data is data about groups.](https://d2h0cx97tjks2p.cloudfront.net/blogs/wp-content/uploads/sites/2/2018/04/categorical-data-sample-image.jpg)\n", "\n", "Categories can be anything where you are defining groups. Archaeological periods are categories. Types of lithics or ceramics are categories. As above, types of cars are categories. The name of each category is sometimes referred to as a label.\n", "\n", "We have seen above in #codecell_Webmaps&Distributions_ChangingDataType that the function **.dtype()** allows you to see which type of data is your columns. Data can be broadly split between numerical and categorical (see image below), and the difference that lies between them is crucial to grasp as it will define which tests and commands you can perform. \n", "\n", " \n", "\n", "We could start trying to see and understand the distributions of our sites by periods simply by mapping the period labels with different colors.\n" ] }, { "cell_type": "code", "metadata": { "id": "MpBpnIxBrWq6", "colab_type": "code", "colab": {} }, "source": [ "# take our big dataset from above and turn it from a 'dataframe' which is the data that folium uses to make maps into a 'geodataframe' which is the data geopandas uses to make maps\n", "\n", "gdf_survey = gpd.GeoDataFrame(\n", " survey_data_maxtime, geometry=gpd.points_from_xy(survey_data_maxtime.DDLon, survey_data_maxtime.DDLat))\n", "print(gdf_survey.head())" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "qKa9a2Yu0cWq", "colab_type": "text" }, "source": [ "####Learning a new language – decomposing the code\n", "\n", "In the #codecell_Webmaps&Distributions_dataframesINTOgeodataframes, gpd.GeoDataFrame()~ the code allows you to define the geometry (geometry=) of the new dataframe *'survey_data_maxtime'*, so it can be mapped. \n", "\n", "Here, the location of the points is defined by the x and y centroids (exact middles of polygon shapes)(gpd.points_from_xy~ ) following their respective geodetic coordinates, which can be found in the column *'survey_data_maxtime.DDLon'* for the Xs and *'survey_data_maxtime.DDLat'* for Ys.\n", "\n", "print()~ allows you to see the results." ] }, { "cell_type": "code", "metadata": { "id": "u6Hmn-KFvz8X", "colab_type": "code", "colab": {} }, "source": [ "#Codecell_Webmaps&Distributions_PlotGeodataframes\n", "\n", "#plot your data colouring the points by the period to which they belong. You are grouping your sites by their category label when you do this. \n", "\n", "#the plot requires you to define which type of data it is (categorical or numerical)\n", "#figsize =(width, height)\n", "gdf_survey.plot(column='colmax', categorical=True, legend=True, figsize=(15,15))" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "N0k9G6d0yVPO", "colab_type": "text" }, "source": [ "### Assess the resulting map\n", "\n", "Is it useful? Why or why not?\n", "\n", "Can you see the distribution of sites from individual periods easily?\n", "\n", "Can you easily discern change over time?\n", "\n", "I'm not overly convinced by the result here. If you think about the map design principles we discused last week, you will probably also conclude that this is not a successful map.\n", "\n", "What other approach might we take?\n", "\n", "\n", "Let's try something else.\n" ] }, { "cell_type": "code", "metadata": { "id": "CZFoGw070Yq_", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_SplittingUrData\n", "\n", "#Maybe it would be better to only look at two or three periods at a time. \n", "# Recall last week's discussion about selecting appropriate data that matches our question, and also about appropriate levels of generalisation.\n", "\n", "#let's select a subset of our periods to see change from the early bronze age to hellenistic to late roman periods\n", "# note that some crazy abbreviations have been used for the names of periods...\n", "\n", "# list the types (periods) of ceramics we are interested in seeing.\n", "types = ['EB2','Hell','LRom']\n", "\n", "#define 'classic' (as in classical archaeology) as this group of periods. Create a subset of the data to contain only the types you just listed\n", "classic = gdf_survey.loc[gdf_survey['colmax'].isin(types)]\n", "\n", "# check you get the expected result - only sites from the periods you just defined as belonging to the 'classic' group\n", "classic.head()" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "sO4kepty1EAV", "colab_type": "text" }, "source": [ "####Learning a new language – decomposing the code\n", "\n", "in #codecell__Webmaps&Distributions_SplittingUrData, \n", "\n", "\n", " ~ déjà vu ~ the function .loc (gdf_survey.loc~ ) is similar to .loc[] & .iloc[] used in pandas last week (in #codecell_makeabasicmap_BringingUrData2theMap). Note that .loc is using a label to index data whereas .iloc uses an integer position. This is the difference between working by columns or by rows. \n", "\n", "The .isin()~ command allows you to exclude data." ] }, { "cell_type": "code", "metadata": { "id": "ROpEOeRw1TZp", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_PlotUrData\n", "\n", "#plot your data colouring the points by the period to which they belong\n", "# Do this by setting 'categorical' to 'true' so that the plot is coloured by category\n", "classic.plot(column='colmax', categorical=True, legend=True, figsize=(15,15))" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "VQu68G8o1bDi", "colab_type": "text" }, "source": [ "## Thinking about data visualization and map design\n", "\n", "That's a bit better perhaps. The map is less crowded.\n", "\n", "Recall our discussions about how to design a map well. Clearly too much data introduces design problems.\n", "\n", "Well, now we can see the distributions a bit, and maybe say something about change over time, but there are still a lot of dots, and it's pretty clear dots from some periods are hidden under dots from other periods and we have no way to separate them. \n", "\n", "\n", "\n", "**what we need here is:**\n", " * **layers, so we can group our data and work with it interactively; and,**\n", " * **some context for all those dots**\n", "\n", "## To address our design problem, we will make interactive maps with layers that can be toggled on and off\n" ] }, { "cell_type": "code", "metadata": { "id": "C8fCQvSyUrx-", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_ImportUrLibraries\n", "\n", "#Like last time, we'll use folium and one of it's plugins. Import the tools you'll need, as usual. \n", "from folium.plugins import HeatMapWithTime\n" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "NBXk0ENHiIjR", "colab_type": "text" }, "source": [ "### Map Visualizations with Folium" ] }, { "cell_type": "markdown", "metadata": { "id": "wPAgyDlciIjR", "colab_type": "text" }, "source": [ "\n", "To see the survey data in context and build our interactive maps, we'll start by generating the base map that will be used throughout this notebook.\n", "\n", "'Basemaps' are generic background maps, like a satellite image or an image of the street map. You know the different backgrounds you can show on google maps? Those are 'basemaps'. ~ déjà vu ~ Very much like the basemap imported folium.Map (#codecell_makeabasicmap_BringingUrData2theMap).\n", "\n", "\n", "Have a look around the web, and you'll see that most modern online maps use a basemap, so we're going to do so as well.\n" ] }, { "cell_type": "code", "metadata": { "ExecuteTime": { "end_time": "2018-11-19T05:34:04.673013Z", "start_time": "2018-11-19T05:34:04.670120Z" }, "id": "0H-uRYF1iIjS", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_BringingUrData2theMap\n", "\n", "#get the survey area centre, like you did last week, so you can centre the map where the data is located\n", "\n", "location_survey=survey_data_maxtime['DDLat'].mean(), survey_data_maxtime['DDLon'].mean()\n", "print(location_survey)\n" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "wUBTM83AbAeX", "colab_type": "code", "colab": {} }, "source": [ "#define a basemap we can reuse. Use the coordiantes for the centre you generated just above to centre the basemap\n", "#This is a variant on how we did things last time...\n", "\n", "def generateBaseMap(default_location=[35.870086207930626, 23.301798820980512], default_zoom_start=11):\n", " base_map = folium.Map(location=default_location, control_scale=True, zoom_start=default_zoom_start)\n", " return base_map" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "4-qIFOBUiIjU", "colab_type": "text" }, "source": [ "### Review - basic map controls\n", "​\n", "Arguments:
\n", "* generateBaseMap(default_location=[],default_zoom_start=) \n", "* location=: Define the default location to zoom at when rendering the map
\n", "* zoom_start=: The zoom level that the map will default to when rendering the map
\n", "* control_scale=: Shows the map scale for a given zoom level" ] }, { "cell_type": "code", "metadata": { "ExecuteTime": { "end_time": "2018-11-19T05:34:04.687135Z", "start_time": "2018-11-19T05:34:04.674745Z" }, "id": "ZhOzNzEEiIjV", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_CheckingZeData \n", "\n", "#check the basemap is working\n", "base_map = generateBaseMap()\n", "base_map" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "ExecuteTime": { "end_time": "2018-11-19T06:05:45.764352Z", "start_time": "2018-11-19T06:05:45.761769Z" }, "id": "--pBpUiviIjc", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_ImportUrLibraries\n", "\n", "#lets get the heatmap tool, like last time, let's also get a measure control so we can measure distance\n", "from folium import plugins\n", "from folium.plugins import HeatMap\n", "from folium.plugins import MeasureControl\n", "\n", "# Measure controls are pretty neat. Rather than just having a scale bar, like you would in a static map, and needing to visually estimate the size of features, you can mesure them.\n", "# The ability to measure is a benefit of moving up the 'interactivity spectrum'." ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "g-4G4lb9iIje", "colab_type": "text" }, "source": [ "Let's start by visually comparing MRom to LRom, that is middle roman to late roman sites by putting their data in separate layers." ] }, { "cell_type": "code", "metadata": { "ExecuteTime": { "end_time": "2018-11-19T05:34:05.037298Z", "start_time": "2018-11-19T05:34:04.818915Z" }, "id": "2p4cDhYeiIjf", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_Splitting your data \n", "\n", "# make a layer for when each period is more than 50% likely, so you have all the sites that are probably in that period\n", "survey_data_MRom = survey_data_maxtime.loc[(survey_data_maxtime['MRom'] > 50)]\n", "survey_data_ERom = survey_data_maxtime.loc[(survey_data_maxtime['ERom'] > 50)]\n", "\n", "# Yes, I know choosing a 50% cut-off is arbitrary. You could choose a different cut-off and change the values listed above. \n", "# If you do so, all your maps (and all your conclusions about change over time) will be affected." ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "RQIM47Vs2Gtv", "colab_type": "text" }, "source": [ "####Learning a new language – decomposing the code\n", "\n", " ~ déjà vu ~ in the last practical lab in ##codecell_makeabasicmap_ManipulatingyourData_UsingSymbology, you used the symbol **==** . This is similar, you are using a mathematical symbol to filter (in this case split it 2 with 50%)your dataframe: **>** greater than and **<** less than." ] }, { "cell_type": "markdown", "metadata": { "id": "Da1dgQn7fsXx", "colab_type": "text" }, "source": [ "### The concept of layers\n", "\n", "We've introduced a new concept here. Maps have 'layers'. Each layer contains information and can be turned on and off. Think of this like a stack of transparent paper. Each sheet of paper is a layer, and can be added to or taken away from the stack. Their order can also be changed.\n", "\n", "
\n", "\n", "\n", " \n" ] }, { "cell_type": "code", "metadata": { "ExecuteTime": { "end_time": "2018-11-19T10:09:37.970112Z", "start_time": "2018-11-19T10:09:34.940649Z" }, "id": "DxWjo2YUiIjo", "colab_type": "code", "colab": {} }, "source": [ "#codecell_Webmaps&Distributions_PrepareUrBasemaps_CreateLayers \n", "\n", "\n", "# like last time, make heatmaps, but one for each period, put them in different layers.\n", "\n", "# give your map a name\n", "base_map = generateBaseMap()\n", "\n", "# add two layers, one for each period\n", "\n", "mrom = HeatMap(data=survey_data_MRom[['DDLat', 'DDLon', 'MRom']].groupby(['DDLat', 'DDLon']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(base_map)\n", "erom = HeatMap(data=survey_data_ERom[['DDLat', 'DDLon', 'ERom']].groupby(['DDLat', 'DDLon']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(base_map)\n", "\n", "#give the layers sensible names that human can read\n", "mrom.layer_name = 'Middle Roman Distribution'\n", "erom.layer_name = 'Early Roman Distribution'\n", "\n", "# add the layer control. This is the tool that lets you turn different layers in your map on and off\n", "folium.LayerControl().add_to(base_map)\n" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "XxO-yK-h2RT6", "colab_type": "text" }, "source": [ "####Learning a new language – decomposing the code\n", "\n", "In #codecell__Webmaps&Distributions_PrepareUrBasemaps_CreateLayers: \n", "\n", "\n", " ~ déjà vu ~ you should refer to #codecell_makeabasicmap_ManipulatingyourData_Heatmap to review the steps taken last time. \n", "\n", "Last time you had used to create your heatmap using a size-based weighting (making larger sites have larger symbols). This time you are creating two heatmaps named mrom ('Middle Roman Distribution')and erom ('Early Roman Distribution'). Because the data is located within the same island, the maps will overlap.\n", "\n", "The layer control plugin allows you to switch on and off the visibility of one or the other (or both). Layer control () [documentation](https://python-visualization.github.io/folium/modules.html).\n", "\n" ] }, { "cell_type": "code", "metadata": { "ExecuteTime": { "end_time": "2018-11-19T10:09:48.682799Z", "start_time": "2018-11-19T10:09:39.486706Z" }, "id": "mzgBodwBiIjr", "colab_type": "code", "colab": {} }, "source": [ "#codecell__Webmaps&Distributions_GenerateUrBasemap\n", "\n", "#Now generate your map by calling it by its name\n", "base_map" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "wSCv-r8eXuXE", "colab_type": "text" }, "source": [ "## An exercise\n", "\n", "Now try and add some more layers to the map to show other periods! What other periods might it be relevant to consider if you are trying to understand change over time? Edit the cell above to add more layers, or add a new cell below and follow the steps above to make a new map, and add your extra layers to it. " ] }, { "cell_type": "code", "metadata": { "id": "engTEEWWe454", "colab_type": "code", "colab": {} }, "source": [ "#codecell_makeabasicmap_ManipulatingyourData\n", "\n", "# Here I'm doing the same thing as before but with different periods\n", "# make a layer for when the max period is LRom or MRom to compare these periods\n", "survey_data_lrommax = survey_data_maxtime.loc[(survey_data_maxtime['colmax'] =='LRom')]\n", "survey_data_mrommax = survey_data_maxtime.loc[(survey_data_maxtime['colmax'] =='MRom')]" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "u_ucaKUkfW_F", "colab_type": "code", "colab": {} }, "source": [ "\n", "#codecell_Webmaps&Distributions_SplittingUrData_CreateLayers \n", "\n", "# like last time, make heatmaps, but one for each period, put them in different layers\n", "base_map = generateBaseMap()\n", "\n", "lrommax = HeatMap(data=survey_data_lrommax[['DDLat', 'DDLon']].groupby(['DDLat', 'DDLon']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(base_map)\n", "mrommax = HeatMap(data=survey_data_mrommax[['DDLat', 'DDLon']].groupby(['DDLat', 'DDLon']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(base_map)\n", "\n", "#give the layers sensible names\n", "lrommax.layer_name = 'Late Roman Distribution'\n", "mrommax.layer_name = 'Middle Roman Distribution'\n", "\n", "# add the layer control\n", "folium.LayerControl().add_to(base_map)\n", "base_map\n", "\n", "\n", "# Adds a measure tool to the top right\n", "\n", "base_map.add_child(MeasureControl())\n" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "M85UnCT-lCKz", "colab_type": "text" }, "source": [ "# Making visual comparisons\n", "\n", "Stacking two layers on top of one another is one way to visually compare distributions. Do you think it is effective in this case?\n", "\n", "Perhaps it will be more useful to be able to view our distributions and explore them side by side, to help us compare what is happening in these two periods.\n", "\n", "\n", "We'll import a new tool to allow us to see two maps where the views are synced - that is identical and moving together - to have another way to compare distributions.\n", "\n", "\n", "* Once you have done this exercise, I recommend you diving in NLS website and see the differences between [overlays](https://maps.nls.uk/geo/explore/#zoom=16&lat=55.8716&lon=-4.2894&layers=1&b=1) (remember to slide the transparency cursor) and [side by side](https://maps.nls.uk/geo/explore/side-by-side/#zoom=16&lat=55.8716&lon=-4.2894&layers=1&right=BingHyb) maps." ] }, { "cell_type": "code", "metadata": { "id": "2NEHqdsAhYYy", "colab_type": "code", "colab": {} }, "source": [ "# get another plugin for side by side maps\n", "from folium.plugins import DualMap" ], "execution_count": 0, "outputs": [] }, { "cell_type": "code", "metadata": { "id": "EBBVGMamhNNY", "colab_type": "code", "colab": {} }, "source": [ "# declare you are making a new map \"m\"\n", "\n", "# set the location to the location of your survey, set your starting zoom\n", "m = plugins.DualMap(location=location_survey, tiles=None, zoom_start=13)\n", "\n", "# the dual maps plugin automatically defines two buddy maps, \"m1\" and \"m2\" which pan and zoom together\n", "\n", "# give yourself some options in life for your base layers, add them to both maps 'm1' and 'm2' by just using \"m\"\n", "folium.TileLayer('cartodbpositron').add_to(m)\n", "folium.TileLayer('openstreetmap').add_to(m)\n", "\n", "\n", "# like last time, make heatmaps, one for each period, put them in different layers\n", "# put one layer in the left hand map 'm' and the other in the right hand map 'm2'\n", "\n", "lrommax = HeatMap(data=survey_data_lrommax[['DDLat', 'DDLon']].groupby(['DDLat', 'DDLon']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(m.m1)\n", "mrommax = HeatMap(data=survey_data_mrommax[['DDLat', 'DDLon']].groupby(['DDLat', 'DDLon']).sum().reset_index().values.tolist(), radius=8, max_zoom=13).add_to(m.m2)\n", "\n", "#give the layers sensible names\n", "lrommax.layer_name = 'Late Roman Distribution'\n", "mrommax.layer_name = 'Middle Roman Distribution'\n", "\n", "# layer control time\n", "folium.LayerControl(collapsed=False).add_to(m)\n", "\n", "# Adds a measure tool to the top right\n", "\n", "m.add_child(MeasureControl())\n", "\n", "#draw your side by side maps\n", "\n", "m" ], "execution_count": 0, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "ZYz4hI_1kImv", "colab_type": "text" }, "source": [ "## visualization and interpretation\n", "\n", "Thought exercise: The results of these two maps should be similar but slightly different. What is making the difference?\n", "\n", "How good are you at interpreting these distributions and comparing them visually?\n", "\n", "*This is me hinting at you that you are going to end up wanting to use statistics eventually*\n" ] }, { "cell_type": "markdown", "metadata": { "id": "XWAQRddKcogi", "colab_type": "text" }, "source": [ " \n", "\n", "# Think about basic principles\n", "\n", "The principles of what we've done this week are the same as the principles of what we did last week. \n", "\n", "I think it's important to learn to do things more than one way, and to adapt to slightly different tools. The software and code packages used for modern spatial analysis and mapping are pretty diverse and are always developing as people improve things. It doesn't make much sense to just learn one way of making maps mechanistically. The important thing is to understand the principles and aims of what you're doing. \n", "\n", "Key principles from this week are:\n", "\n", "* transforming, or re-shaping data to prepare it for mapping (or analysis)\n", "* visualizing categorical data\n", "* making layers in maps\n", "* adding tools to toggle layer visibility\n", "* adding tools to interactively measure things on maps\n", "\n", "\n", "\n", "\n", "\n", "\n", "In any code package that is meant to be used for making maps, odds are good you will find a way to set the zoom level, set the centre starting location, and set the initial scale. \n", "\n", "You will be able to set up colour schemes, map attributes, and make layers. Knowing keywords and princples is the important thing. " ] }, { "cell_type": "markdown", "metadata": { "id": "3ZCz_MQuj8ZF", "colab_type": "text" }, "source": [ "## The End\n", "\n", "That's all for today. Be sure to save your copy of the notebook in your own repo so I can see it!" ] }, { "cell_type": "markdown", "metadata": { "id": "0nGkpE2cU9xj", "colab_type": "text" }, "source": [ "#LexiCode\n", "In the past two practical labs, you have learned and experimented with programming commands - !Remember to load first your libraries (folium, branca, pandas, geopandas, seaborn, matplotlib.pyplot and numpy( and some plugins (HeatMapWithTime, HeatMap, MeasureControl, PrepareUrBasemaps_CreateLayers from [folium.plugins](https://python-visualization.github.io/folium/plugins.html)) to use this new language! - that you can now reuse with your own datasets :\n", "\n", "
\n", "\n", ">Lexicode_MakingaBasicMap | Lexicode_Webmaps&Distributions\n", ">--- | ---\n", ">\t== () [] | pd.concat()\n", ">.head_csv() | .dtype()\n", ">.read_csv() | astype()\n", ">mean() | fillna()\n", ">folium.Map | def return\n", ">range() | .apply(lambda x:*function*,axis=)\n", ">len() | pd.merge()\n", ">iloc[]| how= , left_index= ,left_index= \n", ">.value_counts()| gpd.GeoDataFrame()\n", ">if =:| geometry=gpd.points_from_xy\n", ">elif =: |print() \n", ">else =:| .isin()\n", ">folium.Marker()| classic.plot()\n", ">folium.Icon()| generateBaseMap()\n", ">folium.Circle| .groupby(['', ''])\n", ">popup= | .reset_index()\n", ">radius= | max_zoom=\n", ">.values.tolist() |folium.TileLayer()\n", "> .add_to()| plugins.DualMap(location= , tiles= , zoom_start= )\n", "> | \n" ] } ] }