{ "cells": [ { "cell_type": "markdown", "id": "8b07a7e2-21e6-44f1-a2fc-1f4754ff851f", "metadata": {}, "source": [ "# Tests using the \"distance\" between the empirical and the null" ] }, { "cell_type": "markdown", "id": "08304b78-3acf-46c0-bba7-02a3c51c277d", "metadata": {}, "source": [ "1. Show that the collection of all sets of the form $(-\\infty, x] \\times (-\\infty, y]$\n", "comprise a Vapnik-Cervonenkis class (V-C class) over the plane.\n", "\n", "1. Show that if $\\mathcal{V}$ is the set of all closed intervals in $\\Re$,\n", "$m^{\\mathcal{V}}(n) = 1 + n + {{n}\\choose{2}}$.\n", "\n", "1. Show that intersections and finite unions of V-C classes are V-C classes. \n", "\n", "1. Show that countable unions of V-C classes need not be V-C classes.\n", "\n", "1. Code up Romano's approach for testing whether a set of $k$ of real-valued random variables is independent\n", "based on observing $n$ IID $k$-tuples of values, using group invariance (not the bootstrap approach). \n", "That is, we observe $\\{X_j\\}_{j=1}^n$ where each $X_j = (X_{j1}, \\ldots, X_{jk})$ takes values in $\\Re^k$.\n", "The null hypothesis is that for each $j$, the components $\\{X_{j1}, \\ldots, X_{jk}\\}$ are independent.\n", "Explain why you used the particular V-C class you chose.\n", "Is the relevant group of transformations for the hypothesis a finite or infinite group?\n", "As usual, provide unit tests and a coverage report for your code.\n", "\n", "1. Write a program to simulate data from $k$-variate normal distributions with different covariance matrices and apply the\n", "test you programmed in the previous question.\n", "Confirm that the level of the test is approximately correct by simulating from a multivariate normal distribution\n", "with a diagonal covariance matrix with various values of $k$ and $n$.\n", "Simulate the power of the test for level $\\alpha = 0.05$ as a function of $\\rho$\n", "for $k = 3$, $n=10$, $100$, and $1000$, and a covariance matrix of the form \n", "\\begin{equation}\n", "\\Sigma = \\left [ \\begin{array}{ccc}\n", " 1 & \\rho & 0\\\\\n", " \\rho & 1 & 0 \\\\\n", " 0 & 0 & 1 \n", " \\end{array}\n", " \\right ]\n", "\\end{equation}\n", "for $\\rho \\in \\{-1, -.75, -.5, -.25, .25, .5, .75, 1\\}$.\n", "Provide unit tests and a coverage report for your code.\n", "\n", "1. The file https://www.stat.berkeley.edu/~stark/Java/Data/lomaPrieta.dat contains 221 observations of the times of putative aftershocks of the 17 October 1989 earthquake in Loma Prieta, California.\n", "There are 222 lines in the file.\n", "The first is 0, the main shock, which occurred at 4:15:43pm.\n", "The other lines are the times in days from the main event to the aftershocks,\n", "defined as earthquakes determined to have magnitude 3.0 and above, focal depth\n", "of 0--20km, and epicenter within 40km of the epicenter of the Loma Prieta earthquake.\n", "The data are from the UC Berkeley Seismographic Stations, courtesy of\n", "Dr. Bob Uhrhammer. \n", "A common model for earthquakes (\"main\" shocks, not aftershocks) is that they are a spatially heterogeneous but temporally homogeneous Poisson\n", "process.\n", "If so, inter-event times have an exponential distribution, and conditional on the number $n$ of events in the time interval $[0, T]$, the times of the events are IID uniform. \n", "Treat the time of the first event as 0, and let $T = 805$.\n", "Find the $P$-value of the hypothesis that the 222 events are a realization of a Poisson process for three tests:\n", " + The Kolmogorov-Smirnov test that the inter-event times are exponentially distributed\n", " + The Kolmogorov-Smirnov test that the times of the 221 events after the first are IID uniform on $[0, T]$\n", " + A 2-sample permutation test that compares the number of events in the first half of the interval, $[0, T/2)$, to the number in the second half, $[T/2, T]$. \n", "Provide unit tests and a coverage report.\n", "Comment on the test results.\n", "Which test would you recommend? Why? (\"It gave the smallest $P$-value\" isn't a good reason: selecting the\n", "test after peeking at the data introduces multiplicity and selection\n", "that are hard to take into account in the $P$-value.)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "141ed16d-e57e-4fd3-9e2f-e2ecde867236", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" } }, "nbformat": 4, "nbformat_minor": 5 }