\n",
"\n",
" In *#codecell_SpatialPatterns_ImportUrLibraries*. These are what we call **prerequisites**. You know by now that they are basic tools so you can get started. Let's review them broadly:\n",
" \n",
"* *Pandas* manipulate data. \n",
"* *Geo-pandas* manipulate geographic data. They're also black and white and like to eat bamboo... You need these to manipulate your data!\n",
"* *Fiona* helps with geographic data (find more about [fiona](https://pypi.org/project/Fiona/)).\n",
"* *Requests* are for asking for things. It's good to be able to ask for things.\n",
"* *ipywidgets* supports interactivity."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "YDGLKQ8_iwBc"
},
"source": [
"| | | \n",
"| ------------- |-------------|\n",
"|
||\n",
"\n",
"\n",
"## 4.2 Getting to know your Data...##"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 158
},
"colab_type": "code",
"id": "siGQqpwn-BQc",
"outputId": "720a8a11-68b1-457a-d39a-85a84465603a"
},
"outputs": [],
"source": [
"##codecell_SpatialPatterns_ImportUrData\n",
"\n",
"# then get your data\n",
"# This is where I put the data. It's in a format called geojson, used to represent spatial geometry (shapes) and attributes (text).\n",
"url = 'http://ropitz.github.io/digitalantiquity/data/gabii_SU.geojson'\n",
"\n",
"# Please get me the data at that web address (url):\n",
"# use requests.get to retrieve data from any destination\n",
"request = requests.get(url)\n",
"\n",
"# I will use the letter 'b' to refer to the data, like a nickname\n",
"#we can use requests to read the response content in bytes\n",
"b = bytes(request.content)\n",
"\n",
"#So we will use fiona.BytesCollection referred by the letter 'f':\n",
"# to read the raw data (as single-file formats or zipped shapefiles)\n",
"# to wrap up all the data from 'b'\n",
"# check the coordinate refereence system (crs) listed in the features\n",
"with fiona.BytesCollection(b) as f:\n",
" crs = f.crs\n",
" #by using also GeoDataFrame.from_features you can read geospatial data that's in the url without saving that data to disk (your PC) first\n",
" gabii_su_poly = gpd.GeoDataFrame.from_features(f, crs=crs)\n",
" # and print out the first few lines of the file, so I can check everything looks ok: you know this by now...we will call .head()\n",
" print(gabii_su_poly.head())\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "8bXO2Veyf4b-"
},
"source": [
"
\n",
" \n",
" ### Learning a new language – decomposing the code \n",
" \n",
"\n",
"\n",
" In *#codecell_SpatialPatterns_ImportUrData*:
\n",
" \n",
"
*GeoJSON* is used to store the excavation data. GeoJSON format allows to encode a variety of geographic data structures which contains features with spatial attributes (e.g. points, line strings, polygons, multiparts geometries) and non-spatial attributes (text). This is a really useful format to use when creating a GIS.
\n",
"
*bytes()* method returns bytes object which is an immmutable (cannot be modified) sequence of integers. We tend to use this to compress data, save or send it.
\n",
"
*requests.get* is used to retrieve data from any destination &
*requests.content* to read all content which is in bytes (for non-text requests using the .content property).
\n",
"
*BytesCollection()* takes a buffer of bytes and maps to a virtual file that can then be opened by fiona. By using both fiona.BytesCollection and GeoDataFrame.from_features you can:\n",
"* to read the raw data (as single-file formats or zipped shapefiles)\n",
"* to wrap up all the data from 'b'\n",
"* check the coordinate refereence system (crs) listed in the features\n",
"\n",
"\n",
" **Open source**\n",
"\n",
"So far, we have abundantly benefitted from open-source software, data, tools, code, design documents, or content. It is only natural to open, share and use the results of 10 years excavation..."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "4aXd2B1bre1_"
},
"source": [
"| | | \n",
"| ------------- |-------------|\n",
"|
||\n",
"\n",
"## 4.3 Assessing your data visually...\n",
"\n",
" So you know what's in this dataset... Maybe you want to see it before you start querying or analysing it? ... Start by visualising the spatial data for all the contexts (stratigraphic units) from the excavation we'll be exploring.\n",
"\n",
"So far you have dealt with survey data where all data has been logged with coordinates (x, y and sometimes z for the elevation height). However, it is not always possible, or even meaningful, to record the coordinates of all artefacts retrieved during an excavation, especially if it lasts over several years. It is then more appropriate to use stratigraphical units (SU). The spatial analysis of these finds is more challenging as they don't have a spatial location per se. We need to think carefully about organising and presenting them. Colour can be a great help for this!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 874
},
"colab_type": "code",
"id": "wVIYnLr0-BQg",
"outputId": "9b581c0b-4a65-4c17-d5d1-f3faf0356959"
},
"outputs": [],
"source": [
"##codecell_SpatialPatterns_PlottingUrData\n",
"\n",
"\n",
"# Now we have polygons, the shapes of our contexts. Let's visualise the data to double check that all is well\n",
"# We'll use again the function .plot (see lab_Webmaps&Distributions)\n",
"# 'plot' means draw me an image showing the geometry of each feature in my data. \n",
"# We want to control things like the color of different types of features on our map. \n",
"# I used the 'Blues' colorscale command (cmap stands for 'colour map') \n",
"# and asked it to draw the polygons differently based on the type of feature.\n",
"\n",
"gabii_map1 = gabii_su_poly.plot(column='DESCRIPTIO', cmap='Blues', edgecolor='grey', figsize=(15, 15));\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "yVgg9fN1-BQj"
},
"source": [
"
\n",
" \n",
" ### Learning a new language – decomposing the code \n",
" \n",
"\n",
"\n",
"In *#codecell_SpatialPatterns_PlottingUrData*, when using
.plot(), when
column=' ' is specifed, the plot colouring is based on its values. The colour scale is defined by using
cmap='' , the edge of the features (our polygons) are defined by
edgecolor='' and the size of the plot by
figsize=(width, height)) .\n",
"\n",
"you can of course add parameters/symbologies to your plots. The choice of parameters is largely dictated by your data analysis. [Here](http://geopandas.org/mapping.html) is some documentation on plot generation.\n",
"\n",
"The colorscale options are: Accent, Accent_r, Blues, Blues_r, BrBG, BrBG_r, BuGn, BuGn_r, BuPu, BuPu_r, CMRmap, CMRmap_r, Dark2, Dark2_r, GnBu, GnBu_r, Greens, Greens_r, Greys, Greys_r, OrRd, OrRd_r, Oranges, Oranges_r, PRGn, PRGn_r, Paired, Paired_r, Pastel1, Pastel1_r, Pastel2, Pastel2_r, PiYG, PiYG_r, PuBu, PuBuGn, PuBuGn_r, PuBu_r, PuOr, PuOr_r, PuRd, PuRd_r, Purples, Purples_r, RdBu, RdBu_r, RdGy, RdGy_r, RdPu, RdPu_r, RdYlBu, RdYlBu_r, RdYlGn, RdYlGn_r, Reds, Reds_r, Set1, Set1_r, Set2, Set2_r, Set3, Set3_r, Spectral, Spectral_r, Wistia, Wistia_r, YlGn, YlGnBu, YlGnBu_r, YlGn_r, YlOrBr, YlOrBr_r, YlOrRd, YlOrRd_r, afmhot, afmhot_r, autumn, autumn_r, binary, binary_r, bone, bone_r, brg, brg_r, bwr, bwr_r, cividis, cividis_r, cool, cool_r, coolwarm, coolwarm_r, copper, copper_r, cubehelix, cubehelix_r, flag, flag_r, gist_earth, gist_earth_r, gist_gray, gist_gray_r, gist_heat, gist_heat_r, gist_ncar, gist_ncar_r, gist_rainbow, gist_rainbow_r, gist_stern, gist_stern_r, gist_yarg, gist_yarg_r, gnuplot, gnuplot2, gnuplot2_r, gnuplot_r, gray, gray_r, hot, hot_r, hsv, hsv_r, inferno, inferno_r, jet, jet_r, magma, magma_r, nipy_spectral, nipy_spectral_r, ocean, ocean_r, pink, pink_r, plasma, plasma_r, prism, prism_r, rainbow, rainbow_r, seismic, seismic_r, spring, spring_r, summer, summer_r, tab10, tab10_r, tab20, tab20_r, tab20b, tab20b_r, tab20c, tab20c_r, terrain, terrain_r, viridis, viridis_r, winter, winter_r\n",
"\n",
"Swap out 'Blues' in the cell above for any of these options...\n",
"\n",
"
\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "asEFZLAQr04v"
},
"source": [
"| | | \n",
"| ------------- |-------------|\n",
"|
||\n",
"\n",
"## 4.4 Loading the special finds\n",
"\n",
"Like many excavations, not every special finds in this datset has spatial coordinates associated with it (because in real archaeology life things are found in the sieve, the wheelbarrow, and during washing). \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 406
},
"colab_type": "code",
"id": "zikTZlaj-BQk",
"outputId": "b497e506-2592-4a80-a42b-45c4c558799c"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_WhichTypeOfSpecialFinds&fromWhere?\n",
"\n",
"# Now I'm going to bring in all the basic Gabii special finds data - descriptions, object types, IDs and the contexts from which they come.\n",
"# We've had a few special finds over the years.\n",
"sf_su = pd.read_csv(\"https://raw.githubusercontent.com/ropitz/gabii_experiments/master/spf_SU.csv\")\n",
"sf_su"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"colab_type": "code",
"id": "JhUKJYdCLfGt",
"outputId": "5f546b1c-65c6-43dd-c3c7-d51a9323b86e"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_WhichTypeOfSpecialFinds?\n",
"\n",
"#the set()function allows to return all values (here our special finds type) without duplicates \n",
"#this is a useful tool when you need to standardise your finds labels (and check your metadata for spelling!)\n",
"sf_su_desc = sf_su['SF_OBJECT_TYPE']\n",
"set(sf_su_desc)\n",
"\n",
"#however, this is a really long list to deal with, so we need to find ways to prepare this dataset to really see what has been happening on this site."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "1JPhVomY-BQn"
},
"source": [
"One of our area supervisors, Troy, is super excited about tools related to textile production. They're a great example of how we think about special finds at Gabii. Multiple types of finds are related to textile production. Do we find all types everywhere? Are certain types of tools more concentrated in one type of context or one area than others? Troy has lots of questions about the patterns of places where we find these tools. Do they provide evidence for early textile production? Are they a major factor in the city's early wealth? Do we find the same things in later periods? After all, people under the Republic and Empire wore clothes... Loom Weights, spools, and spindle whorls are the most common weaving tools at Gabii.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "MHlxZSplR5W6"
},
"source": [
"| | | \n",
"| ------------- |-------------|\n",
"|
||\n",
"\n",
"\n",
"## 4.5 Preparing your data, a prerequisite to all analysis ##\n",
"\n",
"### 4.5.1 Selection ### \n",
"As this data is not yet spatial and only associated with a stratigraphic unit, logically, we can merge our non-spatial special finds data with our spatial stratigraphic units data to make all our data spatial."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 406
},
"colab_type": "code",
"id": "OCrCdSIQ-BQo",
"outputId": "ea40b1b1-8f9b-463c-d588-f1e8d2e1901c"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_SpecialFindsSelection\n",
"\n",
"#Let's pull all those find types out of the big list. \n",
"#We're selecting the finds data we want to work with before merging with the spatial data. We could do these operations in reverse if we wanted to.\n",
"#here very much like in lab1,#codecell_makeabasicmap_BringingUrData2theMap, & lab2, #codecell_Webmaps&Distributions_SplittingUrData, we are using iloc and isin functions\n",
"\n",
"types = ['Loom Weight','Spool','Spindle Whorl']\n",
"textile_tools = sf_su.loc[sf_su['SF_OBJECT_TYPE'].isin(types)]\n",
"textile_tools\n",
"\n",
"#we now have a new dataframe containing only textile_tools"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ChQKlGPcs9qM"
},
"source": [
"### 4.5.2 Listing and merging to become spatial ### \n",
"Presence or absence isn't everything. You may want to know how many of a certain type of find is present in a given area."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 230
},
"colab_type": "code",
"id": "QG9Hbhn0-BQq",
"outputId": "3f928824-7949-48c6-8274-aae8e977caec"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_TextileToolsListing\n",
"\n",
"# Now let's count up how many of these tools appear in each context (SU).\n",
"# pd.value_counts() functioon returns a series containing counts of unique values.\n",
"# So we can print out a list of the number of textile_tools in each SU next to that SU number.\n",
"\n",
"pd.value_counts(textile_tools['SU'].values, sort=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 197
},
"colab_type": "code",
"id": "q3x4foKL-BQt",
"outputId": "dbb90939-1125-4158-bf30-29955710ba0f",
"scrolled": true
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_TextileToolsBecomesSpatial\n",
"\n",
"#Then let's combine the special finds data with our polygons representing context with shape and a spatial location\n",
"# We do this with a command called 'merge'. In lab2,#codecell_Webmaps&Distributions_MergingZeData, you have used pandas pd.merge()\n",
"\n",
"gabii_textools = gabii_su_poly.merge(textile_tools, on='SU')\n",
"\n",
"# very much like p.merge(), you have now created a new dataframe ('gabii_textools') by merging dataframe 'textile_tools' on= SU, the stratigraphical unit\n",
"# let's have a look at the new dataframe using .head() to print out just the first few rows.\n",
"gabii_textools.head()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "LhKJFX1wXMkP"
},
"source": [
"### 4.5.3 Visual assessment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 892
},
"colab_type": "code",
"id": "KXxiPahi-BQw",
"outputId": "534dc2f8-6622-460f-d62f-9eda8483a7f8"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_SeeingTextileToolsinContext\n",
"\n",
"# If we want to see this result as a map, we just add the .plot command to the end of the dataframe's name\n",
"# here .plot() symbology is expanded to transparency with 'alpha=' where value of 1 is complete opacity and 0 complete transparency \n",
"\n",
"gabii_textools.plot(column='SF_OBJECT_TYPE', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "87-PZM6l-BQz"
},
"source": [
"#### But what do you see really?\n",
"OK, what do you see here? Compare the distribution of each type of textile tool. Do some types seem to be **concentrated** in certain areas? How might you check? What **factors** might contribute to this pattern? Do big layer simply aggregate lots of stuff? Do late dumps contain early materials? Why would one type of tool appear where the others don't?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 4.5.4 Sorting and tidying"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 406
},
"colab_type": "code",
"id": "HY3cT_mV-BQ0",
"outputId": "d5b18843-648f-4ab6-b0db-1be8dd3f61bf"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_SortingDataTextileTools\n",
"\n",
"# We can try and see the relationship between layer size and count by sorting\n",
"#our list of finds by the surface area of each layer.\n",
"# We use the command 'sort_values' \n",
"\n",
"gabii_textools.sort_values(by=['Shape_Area'],ascending=False)\n",
"\n",
"# '.sort_values' function sort along their axis (here the axis is defined by 'Shape_Area' ). \n",
"# the default sorting is on ascending values (smallest to largest) if you are happy with this =True, however, we want to see them in descending order, so we select =False"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "dQKRWndWJkqb"
},
"source": [
"#### Knowing your site and refining your analysis ####\n",
"\n",
"Gabii excavations have revealed that there are enormous colluvial layers. This is an important consideration as these large areas will contribute to a bias distribution of the artefacts across the site. Therefore, very large areas should probably be excluded."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "Vs3Lvg3O-BQ4"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_RefiningDataSorting\n",
"\n",
"# Outliers will mess with any analysis. Here large stratigraphical layer are our outliers\n",
"# By cutting out these layers i.e. excluding SUs with a surface area greater than 800 we can deal with these outliers\n",
"\n",
"gabii_textools2 = gabii_textools.loc[gabii_textools['Shape_Area']<800]\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 892
},
"colab_type": "code",
"id": "vXgBTzkX-BQ7",
"outputId": "f9a5f2a3-53c8-4d8c-fe07-36ebafa07b21"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_VisualisingDataSorting\n",
"\n",
"# If we want to see this result as a map, we just add the .plot command to the end again.\n",
"\n",
"gabii_textools2.plot(column='SF_OBJECT_TYPE', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5)\n",
"\n",
"# That's better. Plot the results to see that you've removed the big colluvial layers."
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "skqDgXvhNKgq"
},
"source": [
"#### Grouping and merging further #### \n",
"\n",
"to answer to question: *how many of each tool type appears in each SU?* You will need to further group and merge your data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 436
},
"colab_type": "code",
"id": "tW8PUkQI-BQ-",
"outputId": "a0202828-010e-469b-a3e5-9eba1b5531d9"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_GroupingData\n",
"\n",
"# OK, count up how many of each tool type appears in each SU using the 'groupby' command. \n",
"# You have used this command before in #codecell_Webmaps&Distributions_SplittingUrData_CreateLayers \n",
"## and .fillna() was explained in codecell_Webmaps&Distributions_AllNumbers \n",
"\n",
"textools_counts = gabii_textools2.groupby('SU')['SF_OBJECT_TYPE'].value_counts().unstack().fillna(0)\n",
"\n",
"\n",
"# Sort the list so that the SUs with the most stuff end up at the top.\n",
"textools_counts.sort_values(by=['Loom Weight','Spindle Whorl','Spool'], ascending=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 197
},
"colab_type": "code",
"id": "uZsp42WE-BRA",
"outputId": "87ecb666-883a-41c2-aa7f-ef02e58962ea"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_MergingData\n",
"\n",
"# Merge your textile tool counts with your spatial data for the contexts\n",
"# Because both dataframes have a 'SU' column, you can use this to match up the rows. \n",
"# so the merger will occur on='SU'\n",
"\n",
"gabii_textools_counts = gabii_su_poly.merge(textools_counts, on='SU')\n",
"gabii_textools_counts.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "dQVlzdgZtU8c"
},
"source": [
"### 4.5.5 Visual assessment: exploring your data ### \n",
" \n",
" Side by side plots of different variables can help you to visualize the differences between the spatial patterns you're exploring. Very much like in lab2_MakeaBasicMap, when you compared Late Roman and\n",
"Middle Roman artefact distributions using two heatmaps side-by-side. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"colab_type": "code",
"id": "sr_Wmixp-BRD",
"outputId": "88aa6e47-3e3d-4a77-ce9b-eca99a3265c9"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_AssessingFindType\n",
"\n",
"# Let's start by looking at each class of textile tool individually. \n",
"# Plot the counts of each type of find spatially\n",
"\n",
"gabii_textools_counts.plot(column='Loom Weight', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5, legend_kwds={'label': \"Number of Loom weight\",'orientation': \"vertical\"})\n",
"gabii_textools_counts.plot(column='Spindle Whorl', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5, legend_kwds={'label': \"Number of Spindle Whorl\",'orientation': \"vertical\"})\n",
"gabii_textools_counts.plot(column='Spool', cmap='Accent', figsize=(15, 15), legend=True, alpha=0.5, legend_kwds={'label': \"Number of Spool\",'orientation': \"vertical\"})\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 874
},
"colab_type": "code",
"id": "QBDAmDtW-BRG",
"outputId": "636bc085-c5ef-4410-b734-4a0ce61a8c20"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_PlottingAllFindType\n",
"# here's another visualisation. I've chosen a single colour scale - so shades of red, shades of blue, shades of green, to show the quantity of each type of find in a single map.\n",
"\n",
"base = gabii_textools_counts.plot(column='Loom Weight', cmap='Blues', figsize=(15, 15), alpha=0.7)\n",
"gabii_textools_counts.plot(ax=base, column='Spindle Whorl', cmap='Reds', alpha=0.7)\n",
"gabii_textools_counts.plot(ax=base, column='Spool', cmap='Greens', alpha=0.7);\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "stOFt2mulEpQ"
},
"source": [
"#### Let's get another library to help visualisation ####\n",
"So far, it has been difficult to see what's happening, to identify activities between the buildings and to compare the maps when we have to scroll.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "mPhlEUHClPK4"
},
"outputs": [],
"source": [
"##codecell_SpatialPatterns_ImportUrLibraries\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 362
},
"colab_type": "code",
"id": "6qjbSNcx-BRJ",
"outputId": "a1780041-9dec-43aa-be29-25a349586e1d"
},
"outputs": [],
"source": [
"###codecell_SpatialPatterns_AllFindTypeSidebySide\n",
"\n",
"# Let's put the maps side by side to help with comparative visualisation.\n",
"fig, axes = plt.subplots(ncols=3,figsize=(15, 5))\n",
"gabii_textools_counts.plot(column='Loom Weight', cmap='autumn', ax=axes[0], legend=True, legend_kwds={'label': \"Number of Loom weight\",'orientation': \"vertical\"}).axis('equal')\n",
"gabii_textools_counts.plot(column='Spindle Whorl', cmap='autumn', ax=axes[1], legend=True, legend_kwds={'label': \"Number of Spindle Whorl\",'orientation': \"vertical\"}).axis('equal')\n",
"gabii_textools_counts.plot(column='Spool', cmap='autumn',ax=axes[2], legend=True, legend_kwds={'label': \"Number of Spool\",'orientation': \"vertical\"}).axis('equal')"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "82KrdqwQ-BRM"
},
"source": [
"### 4.5.6 Questioning your maps ###\n",
"\n",
"Can you see any **patterns** here? Do the different types of tools **concentrate** in the same parts of the site? Why might different types of tools have different **distributions**? "
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "i_WBqeTEtcPh"
},
"source": [
"
\n",
" \n",
" ### ~ warning ~ \n",
" \n",
"\n",
"\n",
"*OK, this next cell is a big scary cell is because something has broken after I drafted this exercise. Push run to fix the thing they've broken (hopefully).*"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "both",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"colab_type": "code",
"id": "-l6FaKTG92ZJ",
"outputId": "1c600f88-7937-4ea6-b876-1fb7358bab32"
},
"outputs": [],
"source": [
"\n",
"#@title\n",
"!apt-get install -qq curl g++ make\n",
"#@title\n",
"!curl -L http://download.osgeo.org/libspatialindex/spatialindex-src-1.8.5.tar.gz | tar xz\n",
"#@title\n",
"import os\n",
"os.chdir('spatialindex-src-1.8.5')\n",
"#@title\n",
"!./configure\n",
"#@title\n",
"!make\n",
"#@title\n",
"!make install\n",
"#@title\n",
"!pip install rtree\n",
"#@title\n",
"!ldconfig\n",
"#Working through the example at http://toblerity.org/rtree/examples.html\n",
"#@title\n",
"from rtree import index\n",
"from rtree.index import Rtree\n",
"#@title\n",
"p = index.Property()\n",
"idx = index.Index(properties=p)\n",
"idx"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "MeUS-o2Jm6dW"
},
"source": [
"## 4.6 Quantifying these patterns ##\n",
"\n",
"### 4.6.1 Using statistics to explore, characterize and quantify spatial patterns\n",
"\n",
"There is a human limitation to recognise more than 3+ attributes/variables out of 100 point data such as this four-dimensional tic-tac-toe grid:\n",
"\n",
"
\n",
"\n",
"That's why we have to use statistics, spatial statistics to recognise and more to the point quantify patterns."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"| | | | \n",
"| ------------- | ------------- |------------- |\n",
"|
| = |
|\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 781
},
"colab_type": "code",
"id": "EVtbdZPW-BRN",
"outputId": "f5cb484e-7e48-4dbe-9ae9-2b209a1a5140"
},
"outputs": [],
"source": [
"##codecell_SpatialPatterns_ImportUrLibraries\n",
"\n",
"# I think the distributions of different weaving tools vary.\n",
"# To investigate further, we are going to need more tools. Specifically we need statistical tools. \n",
"# pysal, numpy and sklearn are all useful for statistics. Seaborn is useful for visualisation. \n",
"!pip install pysal\n",
"import pysal\n",
"from sklearn import cluster\n",
"import seaborn as sns\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "0qHclmig-BRP"
},
"source": [
"### 4.6.2 Data Clustering ### \n",
"\n",
"We're going to use **cluster analysis** to try and better understand our patterns. Clustering is a broad set of techniques for finding groups within a data set. Cluster analysis has as its objective grouping together similar observations (unlike factor analysis works by searching for similar variables).\n",
"
\n",
"Given a set of data points, we can use a clustering algorithm to classify each data point into a specific group. The clustering algorithm tests the hypothesis that data points of the same group have similar properties, and that data points in different groups have dissimilar properties. So, when we cluster observations, we want items in the same group to be similar and items in different groups to be dissimilar.
\n",
" \n",
"M.Fortin & M. Dale (2005) explain that\n",
"> \" Arising from time series statistics and the more familiar parametric statistics, spatial statistics quantify the degree of self-similarity of a variable as a function of distance. These spatial statistics assume that, within the study area, the parameters of the function defining the underlying process, such as the mean and the variance, are constant regardless of the distance and direction between the sampling locations. Then the goal of spatial statistics is **to test the null hypothesis of absence of ‘spatial pattern’**. The null hypothesis implies that nearby locations (or attributes, measures) do not affect one another such that there is independence and spatial randomness (Figure 6.2(b)). The **alternatives** are that there is clustering and thus positive spatial autocorrelation (Figure 6.2(a)) or repulsion and negative spatial auotocorrelation (Figure 6.2(c)) \n",
"\n",
"
\n",
"\n",
"**K-Means clustering** is probably the most well-known clustering algorithm that solve clustering problem by splitting a dataset into a set of k (k being an arbitrary number you get to choose) groups. \n",
"\n",
"We can only recommand Ben Alex Keen [blog]( https://benalexkeen.com/k-means-clustering-in-python/) to see how Python works through clustering! Here is a simple version of K-Means Clustering explained in 5 visual steps:
\n",
"\n",
"\n",
"|Step1|Step2|Step3|Step4|Step5|\n",
"|------|------|------|------|------|\n",
"|Assign each points to similar centre: we need to identify the number of classes to use by looking at the data and identifying any discrete groupings. |Identify the cluster centroids (3 coloured symbols on graph).|Reassign the points based on the minimum distance or closest from cluster centroids. Here we should emphasise that there many possible definitions that may be used for “closest” (e.g.[nearest neighbour]( https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761), Ward's method). |Identify the new centroids by taking the average of all points in the cluster. |Reassign the groupings -points and assignment- until points stop changing clusters in a loop. |\n",
"|