--- title: "Homework #10: Density Estimation" author: "**Your Name Here**" format: ds6030hw-html --- ```{r config, include=FALSE} # Set global configurations and settings here knitr::opts_chunk$set() # set global chunk options ggplot2::theme_set(ggplot2::theme_bw()) # set ggplot2 theme ``` # Required R packages and Directories {.unnumbered .unlisted} ```{r packages, message=FALSE, warning=FALSE} data_dir = 'https://mdporter.github.io/teaching/data' # data directory library(ks) # functions for KDE library(tidyverse) # functions for data manipulation ``` # Problem 1 Geographic Profiling Geographic profiling, a method developed in criminology, can be used to estimate the [home location (roost) of animals](https://www.sciencedirect.com/science/article/pii/S0022519305004157) based on a collection of sightings. The approach requires an estimate of the distribution the animal will travel from their roost to forage for food. A sample of $283$ distances that pipistrelle bats traveled (in meters) from their roost can be found at: - **Bat Data**: <`r file.path(data_dir, 'geo_profile.csv')`> One probability model for the distance these bats will travel is: \begin{align*} f(x; \theta) = \frac{x}{\theta} \exp \left( - \frac{x^2}{2 \theta} \right) \end{align*} where the parameter $\theta > 0$ controls how far they are willing to travel. ## a. Derive a closed-form expression for the MLE for $\theta$ (i.e., show the math). ::: {.callout-note title="Solution"} Add solution here. ::: ## b. Estimate $\theta$ for the bat data using MLE? Calculate using the solution to part a, or use computational methods. ::: {.callout-note title="Solution"} Add solution here. ::: ## c. Plot the estimated density Using the MLE value of $\theta$ from part b, calculate the estimated density at a set of evaluation points between 0 and 8 meters. Plot the estimated density. - The x-axis should be distance and y-axis should be density (pdf). ::: {.callout-note title="Solution"} Add solution here. ::: ## d. Estimate the density using KDE. Report the bandwidth you selected and produce a plot of the estimated density. ::: {.callout-note title="Solution"} Add solution here. ::: ## e. Which model do you prefer, the parametric or KDE? Why? ::: {.callout-note title="Solution"} Add solution here. ::: # Problem 2: Interstate Crash Density Interstate 64 (I-64) is a major east-west road that passes just south of Charlottesville. Where and when are the most dangerous places/times to be on I-64? The crash data (link below) gives the mile marker and fractional time-of-week for crashes that occurred on I-64 between mile marker 87 and 136 in 2016. The time-of-week data takes a numeric value of *\.\*, where the dow starts at 0 for Sunday (6 for Sat) and the decimal gives the time of day information. Thus `time=0.0417` corresponds to Sun at 1am and `time=6.5` corresponds to Sat at noon. - **Crash Data**: <`r file.path(data_dir, 'crashes16.csv')`> ## a. Crash Data Extract the crashes and make a scatter plot with mile marker on x-axis and time on y-axis. ::: {.callout-note title="Solution"} Add solution here. ::: ## b. Use KDE to estimate the *mile marker* density. - Report the bandwidth. - Plot the density estimate. ::: {.callout-note title="Solution"} Add solution here. ::: ## c. Use KDE to estimate the temporal *time-of-week* density. - Report the bandwidth. - Plot the density estimate. ::: {.callout-note title="Solution"} Add solution here. ::: ## d. Use KDE to estimate the bivariate mile-time density. - Report the bandwidth parameters. - Plot the bivariate density estimate. ::: {.callout-note title="Solution"} Add solution here. ::: ## e. Crash Hotspot Based on the estimated density, approximate the most dangerous place and time to drive on this stretch of road. Identify the mile marker and time-of-week pair (within a few miles and hours). ::: {.callout-note title="Solution"} Add solution here. :::