{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#The curse of hunting rare things\n", "###What are the chances of intersecting features with a grid of cross-sections?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I'd like to know the probability of intersecting features with a grid of cross-sections? These sections or transects might be 2D seismic lines, or outcrops.\n", "\n", "This notebook goes with the Agile blog post — [The curse of hunting rare things](http://www.agilegeoscience.com/blog/2015/5/21/the-curse-of-hunting-rare-things) — from May 2015. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear sampling theory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I got this approach from [some lab notes by Dave Oleyar](http://www.webpages.uidaho.edu/wlf448/lab5line.htm) for an ecology class at the Unisity of Idaho. He in turn refers to the following publication:\n", "\n", "> Buckland, S. T., Anderson, D. R., Burnham, K. P., and Laake, J. L. 1993. Distance sampling: estimating abundance of biological populations. Chapman and Hall, London.\n", "\n", "The equation in the notes expresses density in terms of observations, but I was interested in the inverse problem: expected intersections given some population. Of course, we don't know the population, but we can tune our intuition with some modeling." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Expected number of features intersected: 37.5\n" ] } ], "source": [ "area = 120000.0 # km^2, area covered by transects\n", "population = 120 # Total number of features (guess)\n", "no_lines = 250 # Total number of transects\n", "line_length = 150 # km, mean length of a transect\n", "feature_width = 0.5 # km, width of features\n", "\n", "density = population / area\n", "length = no_lines * line_length\n", "\n", "observed = 2 * density * length * feature_width\n", "print \"Expected number of features intersected:\", observed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just for fun, we can — using the original formula in the reference — calculate the population size from an observation:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Population: 120.0\n" ] } ], "source": [ "observed = 37.5\n", "population = (observed * area) / (2. * length * feature_width)\n", "\n", "print \"Population:\", population" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's a linear relationship, let's plot it." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEPCAYAAACp/QjLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAF7ZJREFUeJzt3XuQJVV9wPHvLLsjT8FVssvTQdQqiY9ZTMKagAwKaKpE\nY5LSREWClWhIKRhBdhdiMuYPg6vxUUmsVEWJssRVUYIQyxWITEwiL4WF5THCIsgua0ANZdCoaPbm\nj3Nmp+/svTP3zty+ffr091N1a7r79nSfX9fu78yvTz9AkiRJkiRJkiRJkiRJkiRJys6lwKPAtjnL\n3wHcC9wFvL+wfANwPzANnD6MBkqShu8kYA3tncMpwHXAijh/aPx5HLA1Lh8DtgPLhtJKSdLQjdHe\nOXwOeFmH9TYA6wrzW4C15TVLkjSfYf91/hzgpcBNwBTwK3H54cDOwno7gSOG2jJJ0h7LK9jf0whV\nwa8SKolndVm3NaxGSZLaDbtz2AlcGadvBXYDzwAeAY4qrHdkXDbXduDYMhsoSRl6AHh21Y0oGqN9\nzOFtwHvj9HOBh+P0zID0KHAMIZCRDtuzmpg1WXUDEjJZdQMSMll1AxIyWXUDEtJ37iyzctgMnAw8\nHdgB/Dnh8tZLCR3Gk8Cb47r3EE4x3QP8AvgT7AgkST2yw5g1WXUDEjJZdQMSMll1AxIyWXUDhqc1\nCq23QWufbiv0u0XvJaivqaobkJCpqhuQkKmqG5CQqaobMBytNYQx3DOAAytuTGWsHCQJiNXCe6H1\nGLTeDK1O47R7Vh5asyqSfYCStLDWGmjdAa1/gdbhvfxC6U2qWPYBSlJ3fVULbb9YarMSkH2AktRZ\n39VC2y+X0qSEZB+gJLVbdLXQtpGBNysx2QcoSbOWVC20bWhgTUpU9gFK0oCqhbYNDqRZCcs+QElN\nN7BqoW2jA9pOsrIPUFJTDbxaaNv4ALeVpOwDlNREpVQLbTsoYZtJyT5ASU1SarXQtqOStpuM7AOU\n1BR7qoVrSqoW2nZW8vYrl32AknLXVi2cWWK10LbTIeyjUtkHKClnQ60W2nY8xH1VIvsAJeWokmqh\nrQFD3t/QZR+gpNxUVi20NaKi/Q5N9gFKykXl1UJbYyrc91BkH6CkHJR+30K/ksqdlwKPAts6fHc+\nsBtYWVi2AbgfmAZO77LNpAKUpHZDu2+hX0nlzpOANezdORwFbAEeZLZzOA7YCqwAxoDtdH6/dVIB\nStKs5KqFouRy5xh7dw5XAC+kvXPYAKwrrLMFWNthe8kFKKnpkq0WivrOncvLaMU8XgPsBO6cs/xw\n4KbC/E7giGE1SpIWpzUOfJKQs8ZhZFe17RmcYXYO+wMXAacVls3Xw1olSEpUaxS4GDgHuADYBCNZ\n5axhdg7HEk4z3RHnjwS+CZwAPEIYi6Dw3SNdtjNZmJ6KH0kaklpUCxPxk6wxOl+tBJ0HpEeBY4AH\n6FxVZNUzS6qTWowtdJNU7twM7AJ+BuwAzp7z/bdpv5T1IsJVStPAK7psM6kAJTVFaxxaWxO9EqkX\n2efO7AOUlJK2auGsmlULRdnnzuwDlJSKpO9b6Ff2uTP7ACVVrdZjC91knzuzD1BSlbKqFoqyz53Z\nByipCllWC0XZ587sA5Q0bEm8b6Fs2efO7AOUNCxJvW+hbNnnzuwDlDQMjagWirLPndkHKKlMjaoW\nirLPndkHKKksjasWirLPndkHKGnQGlstFGWfO7MPUNIgNbpaKMo+d2YfoKRBsFqYI/vcmX2AkpbK\naqGD7HNn9gFKWqzs73JeiuxzZ/YBSlqMbJ+JNCjZ587sA5TUD6uFHmWfO7MPUFKvWuNWCz3LPndm\nH6CkhVgtLEL2uTP7ACXNp/bvcq5KUrnzUuBRYFth2QeAe4E7gCuBgwvfbQDuB6aB07tsM6kAJQ2L\n1cISJZU7TwLW0N45nAYsi9OXxA/AccBWYAUwBmwvrFeUVICShsFqYQCSy51jtHcORa8FLo/TG4B1\nhe+2AGs7/E5yAUoqS1u1cJbVwpL0nTuXl9GKHr0F2BynDwduKny3Ezhi6C2SlIjWGuCTwA5gHEZ2\nVdue5qmqc7gYeBL49DzrdOvpJgvTU/EjKQutUUJ+OAe4ANgEI54x6N9E/CRrjL1PK/0B8J/AvoVl\n6+NnxhbghA7b8x+JlC3vci5RcrlzjPbO4ZXA3cAz5qw3MyA9ChwDPAB0Or+YXICSlsorkYYgqdy5\nGdhFOH20gzDGcD/wHeD2+PlYYf2LCFcpTQOv6LLNpAKUtFRWC0OSfe7MPkCpGawWhiz73Jl9gFL+\nrBYqkH3uzD5AKV9WCxXKPndmH6CUJ6uFimWfO7MPUMqL1UIiss+d2Qco5cNqISHZ587sA5Tqz2oh\nQdnnzuwDlOrNaiFR2efO7AOU6slqIXHZ587sA5Tqx2qhBrLPndkHKNWH1UKNZJ87sw9QqgerhZrJ\nPndmH6CUNquFmso+d2YfoJQuq4Uayz53Zh+glJ62auFMq4Vayj53Zh+glJY91cI1Vgu1ln3uzD5A\nKQ1WC5nJPndmH6BUPauFDGWfO7MPUKqO1ULGss+d2QcoVcNqIXNJ5c5LgUeBbYVlK4HrgPuAa4FD\nCt9tAO4HpoHTu2wzqQCl+rNaaIikcudJwBraO4eNwIVxeh1wSZw+DtgKrADGgO3Asg7bTCpAqd68\nb6FBksudY7R3DtPAqji9Os5DqBrWFdbbAqztsL3kApTqx7ucG6jv3Lm8jFbMYxXhVBPx50xHcThw\nU2G9ncARQ2yX1BCtNcAngR3AOIzsqrY9StWwO4eiFvP3Zt2+myxMT8WPpHm1RoGLgXOAC4BNMGIl\nnq+J+EnWGHufVlodpw9j9rTS+viZsQU4ocP2/Mcs9c2xBaWXO8fYe0B6ZmxhPXsPSI8CxwAPAJ3O\ngyYXoJQuxxa0R2m5cx/CuMDRhc9CNgO7gCcJ5zfPJlzKej2dL2W9iHCV0jTwii7btHOQemK1oDal\n5M53AN8H7iFUATOfKtg5SPOyWlBHpeTOB4Cnl7HhRbBzkLqyWlBXpeTOGwg3p6XAzkHai9WCFlTK\nfQ4PEjqILxHGD2Z29KF+dyZp0LxvQeXopXN4OH5G42cE/4KXKuZ9C0rHQfFTJf/xS44tqH+l5M4X\nALczW0F8E3h+GTvqgZ2DGsyxBS1aKbnzRuCUwvwE8PUydtQDOwc1lNWClqSU3HlHj8uGwc5BDeP7\nFjQQpV2t9B5gE2Ew+o3At/vdkaR+7bkS6WG8EkkJWgn8DXBb/HwUeFpFbbFyUANYLWjgss+d2Qeo\npvNdzirFQE8rfRQ4D7imy45e3e/OJHXTdt/C+cDl3regKs3XOVwWf/51h+/8RysNjGMLqqd39rhs\nGOyUlBHHFjQ0peTO2zss21rGjnpg56BMOLagoRromMPvA28gvJmtOO5wEPCDfnckCXwmkupivs7h\n68B3gUOBDzL72s4nqO4mOKnGfIKqVBb/wlIN+UwkVa6U3PkS4FbgR8DPgd3A/5Sxox7YOahmWuM+\nE0kJKCV3fhN4DmFgeh/gbOCSJW5zA3A34V3UnwaeQrgT+zrgPuBa4JAOv2fnoJqwWlBSSuscAO4s\nLFvK1UpjhGczPSXOfxY4C9gIXBiXraNzB2TnoBpojUNrq9WCElJK7vwaIZFvIiTwd7G0AemVwLcI\nz2daTrgS6jRgGlgV11kd5+eyc1DCrBaUrFJy5xiwH3AwMEl4d/Szl7jNtxKuenqM0OkAPF74fmTO\n/Aw7ByXKakFJK+WR3Q/Fnz8hdA5LdSzhDusx4IfAFcCb5qzTonswxTZMxY9Ukbb7Ft4NXOZ9C0rA\nRPyUYts8nzvn+b2FvB74eGH+TODvgHsJp5MADsPTSkqeb2dTbQy0cjhjCQ2ZzzTh5UH7AT8FTgVu\nAX5MGJh+f/x5VUn7l5bIJ6hKZbmQ2UtZPwWsIAxUX4+XsippPhNJtVRK7vwRYfD4CeBneBOcGqk1\nCq2/9AmqqqlSBqQPLEwvI7zkZ22/O5Lqq3U84ZlI38FnIknz8pHdagCrBWWjlMrhdwrTy4AXEy5r\nlTJmtaBm66VzOIPZXucXhPseXlNWg6RqtUaBPwP+GK9EkmrD/6QqkVciKVul5M5jCc8/+j7wPeCL\nwLPK2FEP7BxUAt/lrOyVkjtvJtzFvCJ+3hSXVcHOQQNmtaBGKCV3dnpURlWvCbVz0IBYLahRSrla\n6cuEl/NsjvOvj8tWxvn/7nenUrX2vMv5YbwSSeqol7+WHqJ7r9NiuOMPLXprs9RB2zORLgA2eSWS\nGiL73Ol/ZC2ST1BVo5WSO0eB84AvAJ8H3kEYmK6CnYP65NvZJBaRO3v5j/IJwtjEp+L6ZxJuhvvD\nfnc2ANmXRhqkPWMLO4C3OragBisld3a6WmkpL/tZCisH9cBqQZqjlKuVfkF4Z/T2OH9sXCYlqDVO\nqBZ24pVIUqleTrjkbwr4N8KDyF5WUVusHNSF1YI0j1Jy536EB5F9FbgSuAjYt4wd9cDOQR20xqG1\n1SuRpK5KyZ1XEAalTyFUDB+Py6pg56ACqwWpR6Xkznt6XDYMdg6KrBakPpSSOy8HXlKYXwtsWuI2\nDyHcM3EvoaM5gfA4juuA+4Br4zpz2Tk0ntWCtAil5M5pYDdhIPqhOH0vsI3FX9L6KeAtcXo5cDCw\nEbgwLlsHXNLh9+wcGs27nKVFKiV3ji3w6dfBwLc7LJ8GVsXp1XF+LjuHRrJakJaoFrlznPA+iH8E\nbgP+ATgAeLywzsic+Rm1CFCD5PsWpAEo5Sa4QVsOHA+8HbgV+Aiwfs46LboHM1mYnoofZaftCaq+\ny1nqz0T81Mpq4MHC/InAlwjjGKvjssPwtFKDWS1IA9Z37lxWRisW8F+EB6E9N86fCtxNeE/1WXHZ\nWcBVw2+aqjUztsBXgA8Cr/bxF1KzvIhwSukOwl3XBxMuZb0eL2VtKKsFqUTZ587sA2we3+UsDUH2\nuTP7AJvFakEakuxzZ/YBNoPVgjRk2efO7APMn9WCVIHsc2f2AebLu5ylCmWfO7MPME8+E0mqWPa5\nM/sA82K1ICUi+9yZfYD5sFqQEpJ97sw+wPqzWpASlH3uzD7AevPtbFKiss+d2QdYT1YLUuKyz53Z\nB1g/VgtSDWSfO7MPsD6sFqQayT53Zh9gPbRVC0dU3RpJC8o+d2YfYNqsFqSayj53Zh9gurxvQaqx\n7HNn9gGmx2pBykD2uTP7ANPiE1SlTGSfO7MPMA2+b0HKTPa5M/sAq2e1IGWoVrlzH+B24Jo4vxK4\nDrgPuBY4pMPv1CrAerFakDLWd+5cVkYrenQecA+zjV5P6ByeC/xrnNdQtNYAtwLHA+MwsglG7Igl\nDd2RwPXAKcxWDtPAqji9Os7PZcIaKK9EkhqiNrnzCmANcDKzncPjhe9H5szPqE2A6XNsQWqQvnPn\n8jJasYBXAY8RxhsmuqzTonswk4XpqfhRz1qjwMXAOcD5wOWeQpKyM0H3/Jqs9wE7gAeB7wI/BjYR\nTiOtjuschqeVSmC1IDVU7XJn8bTSRmBdnF4PXNJh/doFmAbHFqSGq13uPBm4Ok6vJAxSeynrQPlM\nJEn5587sAxwcqwVJe2SfO7MPcDCsFiS1yT53Zh/g0lgtSOoo+9yZfYCL55VIkrrKPndmH2D/fCaS\npAVlnzuzD7A/VguSepJ97sw+wN5YLUjqS/a5M/sAF2a1IKlv2efO7APszmpB0qJlnzuzD7AzqwVJ\nS5J97sw+wHZWC5IGIvvcmX2As6wWJA1M9rkz+wC9y1lSCbLPnZkH6DORJJUi89yZbYBWC5JKlWnu\nnJVhgFYLkkqXYe5sl1GAVguShiaj3NlZJgFaLUgaqkxyZ3c1D9BqQVIlap47F1bjAK0WJFWmFrnz\nKOAG4G7gLuDcuHwlcB1wH3AtcEiH361FgO2sFiRVrha5czUwHqcPBL4FPA/YCFwYl68DLunwu7UI\ncJbVgqQk1Cx3BlcBpwLTwKq4bHWcn6smAVotSEpKTXLnrDHgO8BBwOOF5SNz5mfUIECrBUnJ6Tt3\nLi+jFT06EPgCcB7wxJzvWnQPZrIwPRU/CWiNAhcD5wAXAJtgpAadmaQMTcRP7awAvgK8s7BsmnA6\nCeAwanVayWpBUtISzZ3tRoDLgA/PWb6RMBANsJ5aDEg7tiCpFhLLnZ2dCOwGtgK3x88rCZeyXk9t\nLmX1fQuSaiOh3FmOBAL07WySaieB3FmuigO0WpBUS3YOJe3WakFSndk5lLBLqwVJdWfnMMBdWS1I\nyoWdw4B2Y7UgKSd2DkvcvPctSMqRncMSNu1dzpJyZeewiE1aLUjKnZ1Dn5uzWpDUBHYOPW7GakFS\nk9g59LCJcWhttVqQ1CB2DvP8qtWCpKayc+jya1YLkprMzmHO6lYLkmTn0LZqsVo4orwmSVLy7Bys\nFiRpL03vHBxbkKQOmto5WC1I0jxq3zm8EpgG7gfWdfi+Q4BWC5K0gFp3DvsA24ExYAWwFXjenHUK\nATa+WpiougEJmai6AQmZqLoBCZmougEJ6btzWFZGKxbp1widw0PAz4HPAK/pvGprHLgFeDEwDiOX\nwUite8ZFmKi6AQmZqLoBCZmougEJmai6AXWWUudwBLCjML8zLpuj9V7gWuBDwBkwsmsYjZOkJlle\ndQMKev3Lf6ZasFOQpJKkdJ5+LTBJGJQG2ADsBt5fWGc7cOxwmyVJtfcA8OyqG7FYywkBjAGjdB6Q\nliQ10G8C3yJUCBsqboskSZKkOlroBrmcXQo8CmwrLFsJXAfcR7h665AK2lWFo4AbgLuBu4Bz4/Im\nHo99gZsJp2DvAf4qLm/isZixD3A7cE2cb+qxeAi4k3AsbonLsjwWvdwgl7OTgDW0dw4bgQvj9Drg\nkmE3qiKrgfE4fSDhNOTzaO7x2D/+XA7cBJxIc48FwLuAfwKujvNNPRYPEjqDoiyPxUuALYX59fHT\nJGO0dw7TwKo4vTrON9FVwKl4PPYHbgV+meYeiyOB64FTmK0cmnosHgSePmdZX8cipZvg5tPjDXKN\nsopwqon4c9U86+ZqjFBR3Uxzj8cyQiX9KLOn25p6LD4MvJtwCfyMph6LFqGj/AbwR3FZX8cipZvg\n5tO0R2P0q0XzjtGBwBeA84An5nzXpOOxm3Ca7WDgK4S/mouacixeBTxGOMc+0WWdphwLgN8Avgsc\nShhnmFslLHgs6lI5PEIYiJxxFKF6aLJHCaUhwGGE/xhNsYLQMWwinFaCZh8PgB8CXyI8QaCJx+LX\ngVcTTqdsBl5G+PfRxGMBoWMA+B7wz4Rn1/V1LOrSOXwDeA6zN8i9ntkBp6a6GjgrTp/FbJLM3Qjw\nCcLVOR8pLG/i8XgGs1ec7AecRvjLuYnH4iLCH43HAL8HfBU4k2Yei/2Bg+L0AcDphPHKbI9Fk2+Q\n2wzsAp4kjL2cTbgS4XoyuyytBycSTqVsJSTC2wmXOTfxeLwAuI1wLO4knG+HZh6LopOZ/eOxicfi\nGMK/ia2Ey71n8mUTj4UkSZIkSZIkSZIkSZIkSZIk1dEY7Q81TMUU4Y5naejqcoe0VDeDeG5Zk54F\npMTYOaiJ3kWoFLYRHtzXIiTzywmP5biC8DgKCM+8vxu4A/hAXHYo8HnCS1RuITzXB2CS8Dyf/wAu\nA24Ejivsdwo4nvBIg0sJT5O9jfBMIOI+PxPbcGWcHxlAvJKkBbyY8KiJ/QhJ+i7CU013E94bAuHZ\nTecTHjdQfJrlU+PPTxOeeglwNCGZQ+gcbgWeEuffGZdBeNDZzLbeB7wxTh9CeCzM/oRO6+Nx+QuA\nnxM6E2norBzUNCcS/ir/CfDjOP1SwjOrbozrXB7X+yHwU0Jn8dr4OxBeLvS3hOc6fZHwkLMDCBXI\n1cDP4nqfA343Tr+OUJFAeBDa+vj7NxA6k6MJb/y7PK6zjdCJSZWoy/scpEFpsfepmrnn9kfi/P8R\nHnX8ckKSf3ucHgFOIDwIca7/LUzvAn5AqAJeB7yt8N1vE96HPpenkZQEKwc1zb8Dv8XsaaXXxmVH\nA2vjOm+Iyw4gnPb5MuGUz4vi99cC5xa2+SK6+yzhfb1PJZzCgvBSnuLvr4k/vxb3DfB84IW9hyVJ\nWqo/ZXZA+lzgmcC9hMHkmQHpfQnjBDcTBqPvJLwfAMK7eT8Tl98NfCwu/wtCJ1L0S4Sxg/cUlu0L\n/H3c5l3MPl56X8Lj2e8hvMzoRhxzkCRJkiRJkiRJkiRJkiRJkiRJkiRJavf/t0uEVarRQq4AAAAA\nSUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Make that last expression into a quick function\n", "def pop(obs, area, length, width):\n", " return (obs * area) / (2. * length * width)\n", "\n", "# Pass in an array of values\n", "obs = np.arange(50)\n", "pop = pop(obs, area, length, feature_width)\n", "\n", "plt.plot(obs, pop)\n", "plt.xlabel('observed')\n", "plt.ylabel('population')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "## Geometric reasoning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we think of a 2D line, we can reason that if a feature lies more than its radius from the line then it is not intersected. Here's the situation for a grid:\n", "\n", "\n", "\n", "If there's a set of lines, then the problem is symmetric across the gaps between lines. The width of the 'invisible strip' (grey in the figure, which shows a grid rather than a swath) is the size of the gap minus the width of the feature. If we divide the width of the invisible strip by the width of the gap, we get a probability of randomly distributed features falling into the invisible strip." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parallel 2D lines" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Probability of intersecting a given feature: 0.166666666667\n" ] } ], "source": [ "line_spacing = 3.0 # km, the width of the gap\n", "\n", "# 'Invisible' means 'not intersected'\n", "width_invisible = line_spacing - feature_width\n", "prob_invisible = width_invisible / line_spacing\n", "prob_visible = 1 - prob_invisible\n", "\n", "print \"Probability of intersecting a given feature:\", prob_visible" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Orthogonal grid of 2D lines" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Probability of intersecting a given feature: 0.305555555556\n" ] } ], "source": [ "x_spacing = 3.0 # km\n", "y_spacing = 3.0 # km\n", "\n", "# Think of the quadrilaterals between lines as 'units'\n", "area_of_unit = x_spacing * y_spacing\n", "area_invisible = (x_spacing - feature_width) * (y_spacing - feature_width)\n", "area_visible = area_of_unit - area_invisible\n", "prob_visible = area_visible / area_of_unit\n", "\n", "print \"Probability of intersecting a given feature:\", prob_visible" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're going to need a binomial distribution, `scipy.stats.binom`." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import scipy.stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the distribution to estimate the probability of seeing no features. Then we can use the survival function (or, equivalently, 1 - the cumulative distribution function), `sf(x, n, p)`, to tell us the probability of drawing more than *x* in *n* trials, given a success probability *p*:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Probability of intersecting no features: 9.91975505873e-20\n", "Probability of intersecting at least one: 1.0\n", "Probability of intersecting at least two: 1.0\n", "Probability of intersecting all features: 1.62488311945e-62\n" ] } ], "source": [ "p = \"Probability of intersecting\"\n", "print p, \"no features:\", scipy.stats.binom.pmf(0, population, prob_visible)\n", "print p, \"at least one:\", scipy.stats.binom.sf(0, population, prob_visible)\n", "print p, \"at least two:\", scipy.stats.binom.sf(1, population, prob_visible)\n", "print p, \"all features:\", scipy.stats.binom.sf(population-1, population, prob_visible)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "## Interpretation accuracy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can apply Bayes' theorem to update the prior probability (above) with the reliability of our interpretation (due to lack of resolution, data quality, or skill). " ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [], "source": [ "reliability = 0.75\n", "trials = 120\n", "\n", "intersect_interpret = prob_visible * reliability * trials\n", "intersect_xinterpret = prob_visible * (1 - reliability) * trials\n", "xintersect_interpret = (1 - prob_visible) * (1 - reliability) * trials\n", "xintersect_xinterpret = (1 - prob_visible) * reliability * trials\n", "\n", "t = [[intersect_interpret, intersect_xinterpret], [xintersect_interpret, xintersect_xinterpret]]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use a `pandas` DataFrame to show a quick table:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
InterpretedNot interpreted
Intersected 27.500000 9.166667
Not intersected 20.833333 62.500000
\n", "

2 rows × 2 columns

\n", "
" ], "text/plain": [ " Interpreted Not interpreted\n", "Intersected 27.500000 9.166667\n", "Not intersected 20.833333 62.500000\n", "\n", "[2 rows x 2 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pandas import DataFrame\n", "df = DataFrame(t, index=['Intersected', 'Not intersected'], columns=['Interpreted','Not interpreted'])\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can compute the probability of a given feature being correctly interpreted:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Probability of a feature existing if interpreted: 0.568965517241\n" ] } ], "source": [ "prob_correct = intersect_interpret / (intersect_interpret + xintersect_interpret)\n", "print \"Probability of a feature existing if interpreted:\", prob_correct" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

© 2015 Agile GeoscienceCC-BY — Have fun!   

" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 }