{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Handicapping pub trivia\n", "\n", "Allen B. Downey\n", "\n", "[MIT License](https://en.wikipedia.org/wiki/MIT_License)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# If we're running in Colab, install empiricaldist\n", "\n", "import sys\n", "IN_COLAB = 'google.colab' in sys.modules\n", "\n", "if IN_COLAB:\n", " !pip install empiricaldist" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Introduction\n", "\n", "This notebook is inspired by [this question on Reddit's statistics forum](https://www.reddit.com/r/statistics/comments/e7bqbr/q_normalize_trivia_results_with_different_team/)\n", "\n", "> If there is a quiz of `x` questions with varying results between teams of different sizes, how could you logically handicap the larger teams to bring some sort of equivalence in performance measure?\n", "\n", "> [Suppose there are] 25 questions and a team of two scores 11/25. A team of 4 scores 17/25. Who did better in terms of average performance of each member?\n", "\n", "One respondent suggested a binomial model, in which every player has the same probability of answering any question correctly.\n", "\n", "I suggested a model based on item response theory, in which each question has a different level of difficulty, `d`, each player has a different level of efficacy `e`, and the probability that a player answers a question is\n", "\n", "```\n", "expit(e-d+c)\n", "```\n", "\n", "where `c` is a constant offset for all players and questions, and `expit` is the inverse of the logit function.\n", "\n", "Another respondent pointed out that group dynamics will come into play. On a given team, it is not enough if one player knows the answer; they also have to persuade their teammates.\n", "\n", "So let's explore these models and see how far we get. Among other things, this will be a good exercise in using NumPy n-dimensional arrays." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The binomial model\n", "\n", "I'll start with an array with dimensions for `k` players, `n` questions, and `m=1000` simulations." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 25, 10000)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "\n", "k = 4 # number of players\n", "n = 25 # number of questions\n", "m = 10000 # number of iterations\n", "\n", "a = np.random.random((k, n, m))\n", "a.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now suppose each player has a 30% chance of answer each question. We can compute a Boolean array that indicates which questions each player got." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(4, 25, 10000)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p = 0.3\n", "\n", "b = (a < p)\n", "b.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's assume for now that a team gets the question right if any of the players gets it.\n", "\n", "If you like, you can think of \"gets it\" as a combination of \"knows the answer\" and \"successfuly convinces teammates\".\n", "\n", "With this assumption, we can use the logical OR operator to reduce the answers along the player axis." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(25, 10000)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = np.logical_or.reduce(b, axis=0)\n", "c.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is an array with one dimension for questions and one for iterations.\n", "\n", "Now we can compute the sum along the questions." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10000,)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = np.sum(c, axis=0)\n", "d.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result is one-dimensional array of iterations that approximates the distribution of scores for a team of `k=4` players.\n", "\n", "Here's what the distribution looks like." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from empiricaldist import Cdf" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "cdf = Cdf.from_seq(d)\n", "cdf.plot()\n", "\n", "plt.xlabel('Number of correct responses')\n", "plt.ylabel('CDF');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following function encapsulates the code we have so far, so we can run it with different values of `k`:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def simulate(p, k, n=25, m=10000):\n", " a = np.random.random((k, n, m))\n", " b = (a < p)\n", " c = np.logical_or.reduce(b, axis=0)\n", " d = np.sum(c, axis=0)\n", " return d" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's what the distribution of correct responses looks like for a range of values of `k`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7.0\n", "13.0\n", "16.0\n", "19.0\n", "21.0\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "p = 0.3\n", "\n", "for k in range(1, 6):\n", " d = simulate(p, k)\n", " cdf = Cdf.from_seq(d)\n", " cdf.plot(label=k)\n", " print(cdf.median())\n", "\n", "plt.xlabel('Number of correct responses')\n", "plt.ylabel('CDF');\n", "plt.legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use these CDFs to compare scores between teams with different sizes.\n", "\n", "A team of two that scores 11/25 is in the 31nd percentile. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(0.309)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = simulate(p=0.3, k=2)\n", "cdf = Cdf.from_seq(d)\n", "cdf(11)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A team of 4 that scores 17/25 is in the 22nd percentile." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(0.2415)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d = simulate(p=0.3, k=4)\n", "cdf = Cdf.from_seq(d)\n", "cdf(17)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So I would say the team of two out-performed the team of 4." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Item response theory\n", "\n", "Now suppose we have players with different levels of efficacy, drawn from a standard normal distribution." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(5, 1, 10000)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "players = np.random.normal(size=(k, 1, m))\n", "players.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And questions with different levels of difficulty, also drawn from a standard normal distribution." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(1, 25, 10000)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "questions = np.random.normal(size=(1, n, m))\n", "questions.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the `expit` function to compute the probability that each player answers each question." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(5, 25, 10000)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from scipy.special import expit\n", "\n", "c = -1\n", "p = expit(players - questions + c)\n", "p.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I chose `c=-1` so that the average probability is about the same as in the binomial model." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.3237610597780847" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.mean(p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the same `simulate` function with the new model; the difference is that `p` is an array now, rather than a constant." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "d = simulate(p, k)\n", "cdf = Cdf.from_seq(d)\n", "cdf.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the distributions with a range of values for `k`:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "8.0\n", "13.0\n", "16.0\n", "18.0\n", "20.0\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "questions = np.random.normal(size=(1, n, m))\n", "\n", "for k in range(1, 6):\n", " players = np.random.normal(size=(k, 1, m))\n", " p = expit(players - questions + c)\n", " d = simulate(p, k)\n", " cdf = Cdf.from_seq(d)\n", " cdf.plot(label=k)\n", " print(cdf.median())\n", " \n", "plt.legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, we can use these CDFs to compare scores between teams with different sizes.\n", "\n", "A team of two that scores 11/25 is in the 39th percentile. " ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(0.3921)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "k = 2\n", "players = np.random.normal(size=(k, 1, m))\n", "p = expit(players - questions + c)\n", "d = simulate(p, k)\n", "cdf = Cdf.from_seq(d)\n", "cdf(11)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A team of 4 that scores 17/25 is in the 40th percentile." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(0.4097)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "k = 4\n", "players = np.random.normal(size=(k, 1, m))\n", "p = expit(players - questions + c)\n", "d = simulate(p, k)\n", "cdf = Cdf.from_seq(d)\n", "cdf(17)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So under this model we would say the team of 4 out-performed the team of 2, which is the opposite of our conclusion under the multinomial model!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }