{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2020-03-23T02:46:22.625477Z", "start_time": "2020-03-23T02:46:20.899614Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from IPython.display import Image\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lecture 24:\n", "\n", "- Find out about Machine Learning\n", "- Learn about using the **scikit-learn** python package for clustering analysis.\n", "- Apply clustering analysis to Earth Science problems\n", "\n", "In the next couple of lectures, we are going to learn how to use the **scikit-learn** package to perform some machine learning techniques." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What is machine learning anyway?\n", "\n", "Machine Learning is one of those buzzwords you seem to hear all over the place these days, but what does it actually mean?\n", "\n", "At it's simplest, machine learning is a way of finding 'features' in and assigning 'labels' to your data that allow you to build models.\n", "\n", "There are two main types of machine learning, _supervised_ and _unsupervised_. \n", "\n", "_Supervised_ machine learning involves labeling a subset of your data and fitting a known type of model to explain these labels. The computer then applies this model to new data and can label the new data automatically. The labels you apply to data can be a continuous set of values, known as _regression_ or they can be discrete, known as _classification. \n", "\n", "We're all pretty familiar with supervised machine learning already. Every time you try to log into a website and you have to click on pictures of cars or road signs to log in, you're teaching Google how to recognise those objects in images. Google then uses those data to create models that it uses to teach its self driving cars how to 'see', for example. \n", "\n", "_Unsupervised_ machine learning is fundamentally different because it can be applied to a dataset without training it first. This type of learning looks for features in the dataset that it can use to categorize different parts of the data (known as _clustering_) or define a coordinate space to help see the data, better known as (_dimensionality reduction_).\n", "\n", "This all might seem quite confusing and abstract right now, but with the examples below you should start to get an idea of what you can use machine learning for. This lecture is just a small fraction of what can be done. If you want to really learn how to do it, start with this book: https://jakevdp.github.io/PythonDataScienceHandbook/ It is like this class, in that it is a bunch of Jupyter notebooks which now you are a pro at. \n", "\n", "Today we are going to use a form of unsupervised learning (clustering) to do some geology!." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Orocopio Mountains Dataset\n", "The dataset poles_data contains a dataset of poles to bedding planes from the Orocopio mountains. We learned about poles to planes in Lecture 22. If a rock is composed of sediments that are layed down flat on top of one another, then we would expect the pole to the plane to be vertical (because the plane itself is horizontal). If instead the plane is tilted, we might expect the pole to the plane to be in some other direction. Let's peek at a data set of poles from bedding planes measured in the Orocopio Mountains." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2020-03-19T23:34:41.422081Z", "start_time": "2020-03-19T23:34:41.188737Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | Lon | \n", "Lat | \n", "Pole_Az | \n", "Pole_Plunge | \n", "
---|---|---|---|---|
0 | \n", "-115.684573 | \n", "33.547327 | \n", "340.0 | \n", "13.0 | \n", "
1 | \n", "-115.684341 | \n", "33.547283 | \n", "359.0 | \n", "14.0 | \n", "
2 | \n", "-115.684341 | \n", "33.547283 | \n", "347.0 | \n", "11.0 | \n", "
3 | \n", "-115.685112 | \n", "33.547408 | \n", "332.0 | \n", "13.0 | \n", "
4 | \n", "-115.685650 | \n", "33.547545 | \n", "12.0 | \n", "32.0 | \n", "