{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Statistics 157, Fall 2017\n", "## Nonparametric Inference and Sensitivity Auditing with Applications to Social Good\n", "\n", "## Philip B. Stark http://www.stat.berkeley.edu/~stark\n", "\n", "## Course outline: \n", "\n", "**Theory and Philosophy:** pseudo-random number generation, algorithms for random\n", "sampling, elements of group theory in probability (invariance, orbits,\n", "exchangeability), permutation tests, stratified permutation tests, selecting\n", "test statistics, nonparametric inference about effect size, sampling and\n", "inference for finite populations, useful probability inequalities for\n", "statistical inference, Wald's sequential probability ratio test, cargo-cult\n", "statistics, the ontogeny of probability in applied statistics, post-normal\n", "science, sensitivity analysis and sensitivity auditing, uncertainty\n", "quantification.\n", "\n", "**Applications:** election auditing, gender bias in teaching evaluations,\n", "forecasting the price of solar cells, forecasting the economic impact of climate\n", "change, predictive policing, ethical considerations in applying data science to\n", "societal problems, policy implications of models and uncertainty, communicating\n", "statistical ideas to a lay audience.\n", "\n", "**Term projects:** contribute to an open-source project for permutation methods and\n", "to an open-source project for election auditing, and analyze data relevant to an\n", "important societal issue, e.g., the partial recount of the 2016 US presidential\n", "election or the 2017 Kenyan election, crime or recidivism or \"predictive policing,\" \n", "consumer lending or college loans, alternative energy, farm subsidies, \n", "mass transportation, bicycle commuting, natural disasters, or climate. \n", "Written and oral presentations are\n", "required; some projects might involve building an interactive website to present\n", "results or enable others to use the techniques. Grades will be based in part on\n", "the quality of the writing and the programming, the relevance and acuity of the\n", "analysis, the reproducibility of the computations, and the effectiveness of the\n", "communication.\n", "\n", "Written assignments will be submitted using GitHub. There will be emphasis on\n", "using good computational hygiene, including revision control systems, unit\n", "tests, regression tests, and coverage tests, and on documenting one's code\n", "adequately. We will occasionally have code reviews in class. \n", "\n", "**Philosophy:** Learn something, teach something, make a contribution to\n", "something you feel good about.\n", "\n", "**Prerequisites:** Statistics 133, 134, 135, willingness to work with a team of\n", "peers, dedication. Students are expected to be comfortable with LaTeX, Markdown,\n", "HTML5, git, GitHub, Python, Javascript, and Jupyter. Some projects might require\n", "jQuery and D3.\n", "\n", "** Code of conduct; attribution of work:**\n", "The high academic standard at the University of California, Berkeley, is reflected in each degree \n", "awarded.\n", "Every student is expected to maintain this high standard by ensuring that all\n", "academic work reflects unique ideas or properly attributes the ideas to the original sources.\n", "\n", "These are some basic expectations of students with regards to academic integrity:\n", "Any work submitted should be your own individual thoughts, and should not have been submitted\n", "for credit in another course unless you have prior written permission to re-use it in this \n", "course from this instructor.\n", "\n", "All assignments must use \"proper attribution,\" meaning that you have identified the original\n", "source and extent or words or ideas that you reproduce or use in your assignment.\n", "This includes drafts and homework assignments!\n", "If you are unclear about expectations, ask your instructor.\n", "\n", "Do not collaborate or work with other students on assignments or projects unless the \n", "instructor gives you permission or instruction to do so.\n", "\n", "**Disability accommodations:**\n", "If you need an accommodation for a disability, if you have information your wish to share with \n", "the instructor about a medical emergency,\n", "or if you need special arrangements if the building needs to be evacuated, please inform the \n", "instructor as soon as possible.\n", "\n", "If you are not currently listed with DSP (the Disabled Students' Program) and believe you might \n", "benefit from their support, please apply online at dsp.berkeley.edu" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## GitHub classroom for this course:\n", "\n", "+ Assignment 1: https://classroom.github.com/a/HoXDOIbu\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resources\n", "\n", "+ Scientific Python:\n", " - http://www.scipy-lectures.org/intro/intro.html\n", " - http://fperez.org/py4science/\n", " - https://www.enthought.com/ (the Enthought suite is a good start)\n", "+ Jupyter:\n", " - http://jupyter.org/\n", "+ Git and GitHub:\n", " - https://git-scm.com/book/en/v2/Getting-Started-Git-Basics\n", " - https://github.com/\n", "+ Travis Continuous Integration:\n", " - https://travis-ci.org/\n", "+ Software development \"hygiene\" for repositories, pull requests, etc.:\n", " - https://github.com/kellieotto/git-fundamentals\n", " - https://statlab.github.io/permute/dev/index.html\n", "+ Markdown: https://daringfireball.net/projects/markdown/\n", "+ Pandoc: https://pandoc.org/\n", "+ LaTeX: https://www.latex-project.org/about/\n", "+ HTML5, JQuery, JQuery UI, D3:\n", " - https://www.w3schools.com/html/html5_intro.asp\n", " - https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/HTML5\n", " - http://jquery.com/\n", " - https://www.w3schools.com/jquery/\n", " - https://jqueryui.com/\n", " - https://d3js.org/\n", "+ Mathematics and Statistics\n", " - Clark, A., 1984. _Elements of Abstract Algebra_, Dover.\n", " - Hardy, G., J.E. Littlewood, and G. P\\'{o}lya, 1952. _Inequalities, 2nd ed._, Cambridge University Press, Cambridge.\n", " - Lehmann, E., _Nonparametrics: Statistical Methods Based on Ranks_, 1st ed. 1975; revised edition 2006.\n", " - Lugosi, G., 2006. _Concentration-of-Measure Inequalities_\n", " - Stark, P.B., 1997--2017. _SticiGui_, statistics.berkeley.edu/users/stark/SticiGui" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Software \"stack\" for this course\n", "\n", "+ Jupyter\n", "+ Python\n", " - SciPy\n", " - NumPy\n", " - *not* Pandas, for the most part\n", " - nose, unittest, or other test suites\n", "+ Git/GitHub\n", "+ Travis CI\n", "+ Coveralls\n", "+ LaTeX / Markdown / Pandoc" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Rough weekly schedule\n", "\n", "1. mathematical preliminaries; talk about term projects. \n", " - first assignment, due 9/5\n", "2. mathematical preliminaries, permutations, invariance; the permute package; PRNGs\n", "3. PRNGs, introduction to election auditing, Wald's sequential probability ratio test\n", "4. nonparametric inference about the mean of finite populations\n", "5. risk-limiting audits\n", "6. guest lecture by Kristian Lum on predictive policing\n", "7. permutation tests, application to gender bias\n", "8. the 2-sample problem; probability inequalities\n", "9. sensitivity auditing" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 1 }