{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "# Participant Selection for Python in Astronomy 2017\n", "\n", "This notebook documents the partipant selection procedure for Python in Astronomy 2017. \n", "For privacy reasons, this notebook uses data that has been completely randomized within categories, thus no candidate is individually identifiable (and names and other markers of identity have been removed completely).\n", "\n", "For this reason, the results of this procedure do not exactly mirror the results of our participant selection: the candidates in our data set here are random combinations that follow the input distributions of our real data, and not actual people. But with over 285 applications for only 55 spots within Python in Astronomy, we felt it was important to be both transparent about and accountable for our selection procedure. This notebook is designed to give the reader an overview of the procedure from start to finish, and we have added our reasoning for certain choices where those were part of the selection. The notebook is also an example of what this kind of procedure can look like, and thus a kind of tutorial for other conference organizers.\n", "\n", "Our procedures for admitting participants is constantly evolving as we tweak, make mistakes and learn from them. If you have any suggestions for future procedures (or more generally have thoughts about participant selection), we would love to hear from you either via an issue on this repository, or an e-mail to **python.in.astronomy.soc [at] gmail.com**.\n", "\n", "## Asking The Right Questions\n", "\n", "Designing the application form for PyAstro17 was perhaps the most difficult task, and it is at this stage that conference organizers will already want to put serious thought into the goals of the workshop and the ideal mix of participants to achieve those goals. It should be obvious, but it bears repeating: you will only be able to include categories in your selection that you actually ask for! \n", "Additional considerations include asking well-calibrated questions (everyone may have different ideas of what an \"expert programmer\" is), and ask questions in such a way that stereotype threat is reduced. \n", "\n", "We include all questions from our application form for PyAstro17 in this repository for reference.\n", "\n", "## Pre-selection\n", "\n", "Our participant selection proceeded in two parts. In the first part, we anonymized our applicant pool by replacing names and other identifying information with a unique identifier. The only candidates we rejected outright were either (1) duplicate entries and (2) candidates who had informed us that they would not be able to come. \n", "\n", "For the remainder of the pool, each member of the organizing committee was then asked to make a simple yes/no decision for a candidate based solely on their responses to the following two questions:\n", "* _Can you tell us in 2-3 sentences why you would like to attend this workshop?_\n", "* _Other possible contributions_\n", "\n", "We ensured that each response was reviewed by at least three independent members of the committee, and reviewers could not see the answers that other members had already given, to further reduce bias.\n", "\n", "This allowed us to remove candidates whose objectives for attending the workshop were very far removed from our actual objectives for the workshop. For example, a sizeable number of applicants stated that their goal at the workshop is to learn Python. Because the workshop website and description explicitly said that Python in Astronomy is *not* a Python tutorial, we excluded these candidates from the pool. Converting yes/no into numerical scores (1/0), we then accepted candidates whose average score across reviewers was at least 0.5. \n", "\n", "In the next step, we asked members of the committee to review both talk proposals and tutorial proposals in the same way. Because we had a much larger number of talks that all reviewers voted yes on than we had talk slots, we gave those entries a value of 1 and set the entry for the rest to 0 (these scores will be included in the selection procedure below).\n", "\n", "Finally we pre-selected a number of candidates. These included the organizing committee, who need to be present at the meeting, and one candidate whom we had selected blindly based on their abstract for a longer (invited) talk. Finally, we gave each member of the committee to select a single person based on some specific expertise they would bring to the workshop; thus a total of 12 participants (6 organizers, 5 pre-selects, 1 speaker) where pre-selected. \n", "\n", "## Participant Selection\n", "\n", "For the remaining 43 slots, we used `Entrofy` to optimize our participant set based on a set of well-defined criteria which the organizers discussed at length before performing the selection. It's worth noting here that this selection depended entirely on a discussion about the _goals_ of the workshop and was independent of the input data set. \n", "\n", "### Coding\n", "\n", "While the questions on the application form collecting data for the second step of the application procedure were all multiple choice-type questions, some of them allowed specification via an \"Other\" category. Additionally, some categories had a very low response rate. Therefore we re-coded some responses either into a new category or by combining several categories into a new one that was more manageable. For example, we re-coded a response like `PhD Student` into our existing category `Graduate/Postgraduate student` and created a new `other` category for responses such as `Astroparticle Physics`, `science software`, `Solar System`. Finally, we also concatenated the responses in the \"Geographical Location\" question into 3 categories: `Western Europe`, `North America`, and `Other`.\n", "\n", "### Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "import numpy as np\n", "import entrofy\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "## Data Loading\n", "\n", "Pandas to the rescue!" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "deletable": true, "editable": true }, "outputs": [], "source": [ "data_ready = pd.read_csv(\"../data/pyastro17_randomized.csv\", sep=\"\\t\", index_col=\"ID\", )" ] }, { "cell_type": "markdown", "metadata": { "deletable": true, "editable": true }, "source": [ "Let's have a look at the data:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "deletable": true, "editable": true }, "outputs": [ { "data": { "text/html": [ "
\n", " | current_position | \n", "education | \n", "gender | \n", "geography | \n", "open_source | \n", "preselect | \n", "previous_unconference | \n", "prog_bkg | \n", "race | \n", "reject | \n", "research_area | \n", "talks | \n", "tutorials | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
1.0 | \n", "Graduate/Postgraduate student | \n", "No | \n", "Yes | \n", "Others | \n", "Yes | \n", "0.0 | \n", "No | \n", "program Python a little (e.g. for coursework o... | \n", "No | \n", "0.0 | \n", "non-astronomy | \n", "0.0 | \n", "0.0 | \n", "
2.0 | \n", "Graduate/Postgraduate student | \n", "No | \n", "No | \n", "North America | \n", "No | \n", "0.0 | \n", "No | \n", "release Python code for others to use (e.g. fo... | \n", "Yes | \n", "0.0 | \n", "main sequence & late type stars | \n", "0.0 | \n", "0.0 | \n", "
3.0 | \n", "Undergraduate/College student | \n", "No | \n", "No | \n", "Western Europe | \n", "No | \n", "0.0 | \n", "Yes | \n", "program Python heavily (e.g. regularly for my ... | \n", "No | \n", "0.0 | \n", "galaxies and galaxy clusters | \n", "0.0 | \n", "0.0 | \n", "
4.0 | \n", "Graduate/Postgraduate student | \n", "No | \n", "No | \n", "Western Europe | \n", "No | \n", "0.0 | \n", "Yes | \n", "program Python a little (e.g. for coursework o... | \n", "Yes | \n", "0.0 | \n", "exoplanets and planet formation | \n", "0.0 | \n", "0.0 | \n", "
5.0 | \n", "Postdoc | \n", "No | \n", "NaN | \n", "Others | \n", "No | \n", "0.0 | \n", "Yes | \n", "program Python a little (e.g. for coursework o... | \n", "Yes | \n", "0.0 | \n", "other | \n", "0.0 | \n", "0.0 | \n", "