{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analysis of results of the 2015 FINA World Swimming Championships\n", "> In this chapter, you will practice your EDA, parameter estimation, and hypothesis testing skills on the results of the 2015 FINA World Swimming Championships. This is the Summary of lecture \"Case Studies in Statistical Thinking\", via datacamp.\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Statistics]\n", "- image: images/swim_slowdown.png" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import dc_stat_think as dcst\n", "\n", "plt.rcParams['figure.figsize'] = (10, 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction to swimming data\n", "- Strokes at the World Championships\n", " - Freestyle\n", " - Breaststroke\n", " - Butterfly\n", " - Backstroke\n", "- Events at the World Championships\n", " - Defined by gender, distance, stroke\n", "- Rounds of events\n", " - Heats: First round\n", " - Semifinals: Penultimate round in some events\n", " - Finals: The final round; the winner is champion\n", "- Data source\n", " - Data are freely available from [OMEGA](http://www.omegatiming.com)\n", "- Domain-specific knowledge\n", " - Imperative\n", " - An absolute pleasure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Graphical EDA of men's 200 free heats\n", "In the heats, all contestants swim, the very fast and the very slow. To explore how the swim times are distributed, plot an ECDF of the men's 200 freestyle.\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | athleteid | \n", "lastname | \n", "firstname | \n", "birthdate | \n", "gender | \n", "name | \n", "code | \n", "eventid | \n", "heat | \n", "lane | \n", "... | \n", "swimtime | \n", "split | \n", "cumswimtime | \n", "splitdistance | \n", "daytime | \n", "round | \n", "distance | \n", "relaycount | \n", "stroke | \n", "splitswimtime | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "100784 | \n", "BORSHI | \n", "NOEL | \n", "1996-02-13 | \n", "F | \n", "Albania | \n", "ALB | \n", "1 | \n", "1 | \n", "4 | \n", "... | \n", "63.65 | \n", "1 | \n", "29.63 | \n", "50 | \n", "930.0 | \n", "PRE | \n", "100 | \n", "1 | \n", "FLY | \n", "29.63 | \n", "
1 | \n", "100784 | \n", "BORSHI | \n", "NOEL | \n", "1996-02-13 | \n", "F | \n", "Albania | \n", "ALB | \n", "1 | \n", "1 | \n", "4 | \n", "... | \n", "63.65 | \n", "2 | \n", "63.65 | \n", "100 | \n", "930.0 | \n", "PRE | \n", "100 | \n", "1 | \n", "FLY | \n", "34.02 | \n", "
2 | \n", "100784 | \n", "BORSHI | \n", "NOEL | \n", "1996-02-13 | \n", "F | \n", "Albania | \n", "ALB | \n", "20 | \n", "1 | \n", "8 | \n", "... | \n", "140.28 | \n", "1 | \n", "31.33 | \n", "50 | \n", "1014.0 | \n", "PRE | \n", "200 | \n", "1 | \n", "FLY | \n", "31.33 | \n", "
3 | \n", "100784 | \n", "BORSHI | \n", "NOEL | \n", "1996-02-13 | \n", "F | \n", "Albania | \n", "ALB | \n", "20 | \n", "1 | \n", "8 | \n", "... | \n", "140.28 | \n", "2 | \n", "66.81 | \n", "100 | \n", "1014.0 | \n", "PRE | \n", "200 | \n", "1 | \n", "FLY | \n", "35.48 | \n", "
4 | \n", "100784 | \n", "BORSHI | \n", "NOEL | \n", "1996-02-13 | \n", "F | \n", "Albania | \n", "ALB | \n", "20 | \n", "1 | \n", "8 | \n", "... | \n", "140.28 | \n", "3 | \n", "103.29 | \n", "150 | \n", "1014.0 | \n", "PRE | \n", "200 | \n", "1 | \n", "FLY | \n", "36.48 | \n", "
5 rows × 22 columns
\n", "\n", " | athleteid | \n", "lastname | \n", "firstname | \n", "birthdate | \n", "gender | \n", "name | \n", "code | \n", "eventid | \n", "heat | \n", "lane | \n", "... | \n", "swimtime | \n", "split | \n", "cumswimtime | \n", "splitdistance | \n", "daytime | \n", "round | \n", "distance | \n", "relaycount | \n", "stroke | \n", "splitswimtime | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
303 | \n", "100537 | \n", "CAMPBELL | \n", "BRONTE | \n", "1994-05-14 | \n", "F | \n", "Australia | \n", "AUS | \n", "223 | \n", "2 | \n", "5 | \n", "... | \n", "53.00 | \n", "2 | \n", "53.00 | \n", "100 | \n", "1732.0 | \n", "SEM | \n", "100 | \n", "1 | \n", "FREE | \n", "27.44 | \n", "
305 | \n", "100537 | \n", "CAMPBELL | \n", "BRONTE | \n", "1994-05-14 | \n", "F | \n", "Australia | \n", "AUS | \n", "123 | \n", "1 | \n", "3 | \n", "... | \n", "52.52 | \n", "2 | \n", "52.52 | \n", "100 | \n", "1732.0 | \n", "FIN | \n", "100 | \n", "1 | \n", "FREE | \n", "27.37 | \n", "
307 | \n", "100537 | \n", "CAMPBELL | \n", "BRONTE | \n", "1994-05-14 | \n", "F | \n", "Australia | \n", "AUS | \n", "234 | \n", "2 | \n", "6 | \n", "... | \n", "24.32 | \n", "1 | \n", "24.32 | \n", "50 | \n", "1828.0 | \n", "SEM | \n", "50 | \n", "1 | \n", "FREE | \n", "24.32 | \n", "
308 | \n", "100537 | \n", "CAMPBELL | \n", "BRONTE | \n", "1994-05-14 | \n", "F | \n", "Australia | \n", "AUS | \n", "134 | \n", "1 | \n", "6 | \n", "... | \n", "24.12 | \n", "1 | \n", "24.12 | \n", "50 | \n", "1805.0 | \n", "FIN | \n", "50 | \n", "1 | \n", "FREE | \n", "24.12 | \n", "
315 | \n", "100631 | \n", "CAMPBELL | \n", "CATE | \n", "1992-05-20 | \n", "F | \n", "Australia | \n", "AUS | \n", "223 | \n", "1 | \n", "4 | \n", "... | \n", "52.84 | \n", "2 | \n", "52.84 | \n", "100 | \n", "1732.0 | \n", "SEM | \n", "100 | \n", "1 | \n", "FREE | \n", "27.49 | \n", "
5 rows × 22 columns
\n", "\n", " | athleteid | \n", "stroke | \n", "distance | \n", "lastname | \n", "final_swimtime | \n", "semi_swimtime | \n", "
---|---|---|---|---|---|---|
0 | \n", "100537 | \n", "FREE | \n", "100 | \n", "CAMPBELL | \n", "52.52 | \n", "53.00 | \n", "
1 | \n", "100537 | \n", "FREE | \n", "50 | \n", "CAMPBELL | \n", "24.12 | \n", "24.32 | \n", "
2 | \n", "100631 | \n", "FREE | \n", "100 | \n", "CAMPBELL | \n", "52.82 | \n", "52.84 | \n", "
3 | \n", "100631 | \n", "FREE | \n", "50 | \n", "CAMPBELL | \n", "24.36 | \n", "24.22 | \n", "
4 | \n", "100650 | \n", "FLY | \n", "100 | \n", "MCKEON | \n", "57.67 | \n", "57.59 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
91 | \n", "105595 | \n", "BACK | \n", "200 | \n", "FRANKLIN | \n", "126.34 | \n", "127.79 | \n", "
92 | \n", "105607 | \n", "BREAST | \n", "50 | \n", "HARDY | \n", "30.20 | \n", "30.25 | \n", "
93 | \n", "105640 | \n", "FLY | \n", "200 | \n", "MCLAUGHLIN | \n", "126.95 | \n", "127.52 | \n", "
94 | \n", "105676 | \n", "BACK | \n", "100 | \n", "BAKER | \n", "59.99 | \n", "59.63 | \n", "
95 | \n", "105686 | \n", "FLY | \n", "200 | \n", "ADAMS | \n", "126.40 | \n", "127.57 | \n", "
96 rows × 6 columns
\n", "