{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Введение в ТВиМС: практикум по описанию выборок №1\n", "\n", "*Aлла Тамбовцева, НИУ ВШЭ*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Часть 1: работа с выборкой с библиотекой NumPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Импортируем библиотеку `numpy` для работы с массивами, прежде всего, числовыми (*NumPy* – от *numeric Python*):" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Создадим массив с выборкой из семинарского листочка:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 2 1 0 20 3 1 0]\n" ] } ], "source": [ "sample = np.array([2, 1, 0, 20, 3, 1, 0])\n", "print(sample)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "На числовых массивах определён ряд методов, позволяющих получить описательные статистики по выборке:\n", "\n", "* `min()` и `max()`: минимальное и максимальное значение;\n", "* `mean()`: среднее значение." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Min: 0\n", "Max: 20\n", "Mean: 3.857142857142857\n" ] } ], "source": [ "print(\"Min:\", sample.min())\n", "print(\"Max:\", sample.max())\n", "print(\"Mean:\", sample.mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Методов для нахождения медианы и квартилей нет, вместо них используются соответствующие функции из `NumPy`:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Median: 1.0\n", "25%: 0.5\n", "75%: 2.5\n" ] } ], "source": [ "print(\"Median:\", np.median(sample))\n", "print(\"25%:\", np.quantile(sample, 0.25))\n", "print(\"75%:\", np.quantile(sample, 0.75))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Если нам нужно определить межквартильный размах, можем просто сохранить верхний и нижний квартили в переменные и вычесть из одного значения другое:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.0\n" ] } ], "source": [ "Q1 = np.quantile(sample, 0.25)\n", "Q3 = np.quantile(sample, 0.75)\n", "IQR = Q3 - Q1\n", "print(IQR)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Определим границы характерных значений и выведем их в виде списка:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[-2.5, 5.5]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[Q1 - 1.5 * IQR, Q3 + 1.5 * IQR]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Чтобы выбрать те значения, которые в полученный интервал не входят (выбрать нетипичные значения или выбросы), сохраним границы интервала в переменные:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "left = Q1 - 1.5 * IQR\n", "right = Q3 + 1.5 * IQR" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "И сформулируем условие для отбора в квадратных скобках:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([20])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample[(sample < left) | (sample > right)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Логика работы условия выше простая: выбери из `sample` те значения, на которых условие в квадратных скобках возвращает `True`:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([False, False, False, True, False, False, False])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(sample < left) | (sample > right)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Теперь построим график, который визуализирует описательные статистики и наличие/отсутствие нехарактерных значений. Этот график называется *ящик с усами* (*boxplot* или *box-and-whiskers plot*). Для этого импортируем из библиотеки `matplotlib` модуль `pyplot` для отрисовки графиков:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "from matplotlib import pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "И построим сам график для нашей выборки:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAPr0lEQVR4nO3df6zddX3H8edrBZdMUalcEQGt2QgpNgPJSdXADJ2TFULELbrRLBubXaoGjSb+MbSJOJcuLouaDIykswxc3NVtipJYhYY1wSb445aAlFUHIziuJfRiGWiUaPW9P/rtcrme2557vrf3th+ej+TkfL+fz+f7/bxrzOt++ZzvOd9UFZKkdv3achcgSTq2DHpJapxBL0mNM+glqXEGvSQ17qTlLmCY0047rVatWrXcZUjSCWP37t1PVNXEsL7jMuhXrVrF1NTUcpchSSeMJN+fr8+lG0lqnEEvSY0z6CWpcQa9JDXOoJekxh016JOcnWRnkr1JHkjy3q59ZZIdSR7s3k+d5/iruzEPJrl6sf8B0lKYnJxkzZo1rFixgjVr1jA5ObncJUkjG+WK/iDw/qpaDbwOuCbJecC1wJ1VdQ5wZ7f/LElWAtcBrwXWAtfN9wdBOl5NTk6yefNmrr/+ep555hmuv/56Nm/ebNjrhHHUoK+qx6rqnm77R8Be4EzgSuCWbtgtwFuGHP77wI6qOlBVTwI7gPWLUbi0VLZs2cK2bdtYt24dJ598MuvWrWPbtm1s2bJluUuTRrKgNfokq4DXAN8ETq+qx+DQHwPgpUMOORN4dNb+dNc27NybkkwlmZqZmVlIWdIxtXfvXi6++OJntV188cXs3bt3mSqSFmbkoE/yAuALwPuq6ulRDxvSNvRJJ1W1taoGVTWYmBj6LV5pWaxevZpdu3Y9q23Xrl2sXr16mSqSFmakoE9yModC/rNV9cWu+fEkZ3T9ZwD7hxw6DZw9a/8sYN/45UpLb/PmzWzcuJGdO3fy85//nJ07d7Jx40Y2b9683KVJIznqb90kCbAN2FtVH5/VdRtwNfDR7v3LQw6/HfjbWR/AXgp8oFfF0hLbsGEDAO95z3vYu3cvq1evZsuWLf/fLh3vcrRnxia5GPg6cD/wy675gxxap/9X4BXA/wBvq6oDSQbAO6vqL7vj396NB9hSVf90tKIGg0H5o2aSNLoku6tqMLTveHw4uEEvSQtzpKD3m7GS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMaN8ijBm4ArgP1VtaZr+zxwbjfkxcD/VtUFQ459BPgR8Avg4Hw/ii9JOnaOGvTAzcANwGcON1TVHx/eTvIx4KkjHL+uqp4Yt0BJUj9HDfqquivJqmF93YPD/wj43cUtS5K0WPqu0f8O8HhVPThPfwF3JNmdZNORTpRkU5KpJFMzMzM9y5IkHdY36DcAk0fov6iqLgQuA65J8ob5BlbV1qoaVNVgYmKiZ1mSpMPGDvokJwF/CHx+vjFVta973w/cCqwddz5J0nj6XNH/HvDdqpoe1pnk+UlOObwNXArs6TGfJGkMRw36JJPA3cC5SaaTbOy6rmLOsk2SlyfZ3u2eDuxKch/wLeArVfW1xStdkjSKUe662TBP+58PadsHXN5tPwyc37M+SVJPfjNWkhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktS4UZ4wdVOS/Un2zGr7cJIfJLm3e10+z7Hrk3wvyUNJrl3MwiVJoxnliv5mYP2Q9k9U1QXda/vcziQrgE8ClwHnARuSnNenWEnSwh016KvqLuDAGOdeCzxUVQ9X1c+AzwFXjnEeSVIPfdbo353kO93SzqlD+s8EHp21P921DZVkU5KpJFMzMzM9ypIkzTZu0H8K+E3gAuAx4GNDxmRIW813wqraWlWDqhpMTEyMWZYkaa6xgr6qHq+qX1TVL4F/5NAyzVzTwNmz9s8C9o0znyRpfGMFfZIzZu3+AbBnyLBvA+ckeVWS5wFXAbeNM58kaXwnHW1AkkngEuC0JNPAdcAlSS7g0FLMI8A7urEvBz5dVZdX1cEk7wZuB1YAN1XVA8fkXyFJmleq5l02XzaDwaCmpqaWuwxJOmEk2V1Vg2F9fjNWkhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxh016LuHf+9PsmdW298n+W73cPBbk7x4nmMfSXJ/knuT+APzkrQMRrmivxlYP6dtB7Cmqn4b+C/gA0c4fl1VXTDfD+JLko6towZ9Vd0FHJjTdkdVHex2v8GhB39Lko5Di7FG/3bgq/P0FXBHkt1JNh3pJEk2JZlKMjUzM7MIZUmSoGfQJ9kMHAQ+O8+Qi6rqQuAy4Jokb5jvXFW1taoGVTWYmJjoU5YkaZaxgz7J1cAVwJ/UPE8Yr6p93ft+4FZg7bjzSZLGM1bQJ1kP/BXw5qr6yTxjnp/klMPbwKXAnmFjJUnHzii3V04CdwPnJplOshG4ATgF2NHdOnljN/blSbZ3h54O7EpyH/At4CtV9bVj8q+QJM3rpKMNqKoNQ5q3zTN2H3B5t/0wcH6v6iRJvfnNWElqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS40YK+iQ3JdmfZM+stpVJdiR5sHs/dZ5jr+7GPNg9Z1aStIRGvaK/GVg/p+1a4M6qOge4s9t/liQrgeuA13LoweDXzfcHQZJ0bIwU9FV1F3BgTvOVwC3d9i3AW4Yc+vvAjqo6UFVPAjv41T8YkqRjqM8a/elV9RhA9/7SIWPOBB6dtT/dtf2KJJuSTCWZmpmZ6VGWJGm2Y/1hbIa01bCBVbW1qgZVNZiYmDjGZUnSc0efoH88yRkA3fv+IWOmgbNn7Z8F7OsxpyRpgfoE/W3A4btorga+PGTM7cClSU7tPoS9tGuTJC2RUW+vnATuBs5NMp1kI/BR4E1JHgTe1O2TZJDk0wBVdQD4G+Db3esjXZskaYmkauiS+bIaDAY1NTW13GVI0gkjye6qGgzr85uxktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGjR30Sc5Ncu+s19NJ3jdnzCVJnpo15kP9S5YkLcRJ4x5YVd8DLgBIsgL4AXDrkKFfr6orxp1HktTPYi3dvBH476r6/iKdT5K0SBYr6K8CJufpe32S+5J8Ncmr5ztBkk1JppJMzczMLFJZkqTeQZ/kecCbgX8b0n0P8MqqOh+4HvjSfOepqq1VNaiqwcTERN+yJEmdxbiivwy4p6oen9tRVU9X1Y+77e3AyUlOW4Q5JUkjWoyg38A8yzZJXpYk3fbabr4fLsKckqQRjX3XDUCS3wDeBLxjVts7AarqRuCtwLuSHAR+ClxVVdVnTknSwvQK+qr6CfCSOW03ztq+AbihzxySpH78ZqwkNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNW4xnxj6S5P4k9yaZGtKfJP+Q5KEk30lyYd85JUmj6/XgkVnWVdUT8/RdBpzTvV4LfKp7lyQtgaVYurkS+Ewd8g3gxUnOWIJ5JUksTtAXcEeS3Uk2Dek/E3h01v501/YsSTYlmUoyNTMzswhlSZJgcYL+oqq6kENLNNckecOc/gw55lceEF5VW6tqUFWDiYmJRShLkgSLEPRVta973w/cCqydM2QaOHvW/lnAvr7zSpJG0yvokzw/ySmHt4FLgT1zht0G/Fl3983rgKeq6rE+80qSRtf3rpvTgVuTHD7Xv1TV15K8E6CqbgS2A5cDDwE/Af6i55ySpAXoFfRV9TBw/pD2G2dtF3BNn3kkSePzm7GS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXF9f49eOmF1z1FYEod+rVtaHga9nrPGCd8khrZOOGMv3SQ5O8nOJHuTPJDkvUPGXJLkqST3dq8P9StXkrRQfa7oDwLvr6p7uufG7k6yo6r+c864r1fVFT3mkST1MPYVfVU9VlX3dNs/AvYCZy5WYZKkxbEod90kWQW8BvjmkO7XJ7kvyVeTvPoI59iUZCrJ1MzMzGKUpeeYlStXkuSYvoBjPsfKlSuX+X9Jtab3h7FJXgB8AXhfVT09p/se4JVV9eMklwNfAs4Zdp6q2gpsBRgMBn7apQV78sknm/igdCnvBtJzQ68r+iQncyjkP1tVX5zbX1VPV9WPu+3twMlJTuszpyRpYfrcdRNgG7C3qj4+z5iXdeNIsrab74fjzilJWrg+SzcXAX8K3J/k3q7tg8ArAKrqRuCtwLuSHAR+ClxVLfy3tSSdQMYO+qraBRxxMbGqbgBuGHcOSVJ//taNJDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zkcJqhl13Qvhwy9a7jJ6q+teuNwlqDEGvZqRv366mZ8prg8vdxVqiUs3ktQ4g16SGmfQS1LjXKNXU1p4DN+pp5663CWoMQa9mrEUH8QmaeIDXz239H1m7Pok30vyUJJrh/T/epLPd/3fTLKqz3ySpIXr88zYFcAngcuA84ANSc6bM2wj8GRV/RbwCeDvxp1PkjSePlf0a4GHqurhqvoZ8DngyjljrgRu6bb/HXhjWlhElaQTSJ+gPxN4dNb+dNc2dExVHQSeAl4y7GRJNiWZSjI1MzPToyxpNEkW/OpznLRc+gT9sP/3zv2UapQxhxqrtlbVoKoGExMTPcqSRlNVS/aSllOfoJ8Gzp61fxawb74xSU4CXgQc6DGnJGmB+gT9t4FzkrwqyfOAq4Db5oy5Dbi6234r8B/l5Y0kLamx76OvqoNJ3g3cDqwAbqqqB5J8BJiqqtuAbcA/J3mIQ1fyVy1G0ZKk0fX6wlRVbQe2z2n70KztZ4C39ZlDktSPv3UjSY0z6CWpcQa9JDXOoJekxuV4vNsxyQzw/eWuQxriNOCJ5S5CGuKVVTX026bHZdBLx6skU1U1WO46pIVw6UaSGmfQS1LjDHppYbYudwHSQrlGL0mN84pekhpn0EtS4wx6aQRJbkqyP8me5a5FWiiDXhrNzcD65S5CGodBL42gqu7Cp6PpBGXQS1LjDHpJapxBL0mNM+glqXEGvTSCJJPA3cC5SaaTbFzumqRR+RMIktQ4r+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWrc/wHlD3LrIeTfggAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# добавление ; в конце позволяет отрисовать график \n", "# в чистом виде, без лишней информации перед картинкой\n", "\n", "plt.boxplot(sample);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Границы ящика соответствуют нижнему и верхнему квартилю, линия внутри ящика – медиане. Усы ящика строятся по-разному, в зависимости от данных:\n", "\n", "* если нехарактерных значений нет, граница нижнего и верхнего уса – просто минимальное и максимальное значение в выборке;\n", "* если нехарактерные значения есть, граница нижнего уса – минимальное значение среди типичных, а граница верхнего уса – максимальное значение среди типичных, нетипичные значения обозначаются точками вне ящика.\n", "\n", "Добавим цвет заливки для ящика и цвет его границ:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAPu0lEQVR4nO3df6zddX3H8edrBZdMcYJcAQGp2QgpNgPJSdXADJ0TCyHiFt1olo3NLlWDRhP/GNpEnEsXl0XNBkbSWQYu7uo2ZZJYhYY1wyb445SAlFUHIzhqGb1YBxolWn3vj367XK7ntuee721v++H5SE7O9/v5fL7fz7vGvO6X7/me80lVIUlq1y8tdQGSpCPLoJekxhn0ktQ4g16SGmfQS1LjTljqAkY59dRTa/ny5UtdhiQdN3bs2PFkVU2N6jsmg3758uUMh8OlLkOSjhtJvjNfn7duJKlxBr0kNc6gl6TGGfSS1DiDXpIad9igT3J2km1JdiV5MMm7u/ZTkmxN8lD3fvI8x1/TjXkoyTWL/Q+Qjobp6WlWrlzJsmXLWLlyJdPT00tdkjS2ca7o9wPvraoVwKuBa5OcD1wH3FVV5wJ3dfvPkuQU4HrgVcAq4Pr5/iBIx6rp6Wk2bNjADTfcwDPPPMMNN9zAhg0bDHsdNw4b9FX1eFXd223/ANgFnAlcBdzaDbsVeNOIw98AbK2qfVX1fWArsGYxCpeOlo0bN7J582ZWr17NiSeeyOrVq9m8eTMbN25c6tKksSzoHn2S5cArga8Bp1XV43DgjwHwkhGHnAk8Nmt/d9c26tzrkwyTDGdmZhZSlnRE7dq1i0suueRZbZdccgm7du1aooqkhRk76JO8APgc8J6qenrcw0a0jVzppKo2VdWgqgZTUyO/xSstiRUrVrB9+/ZntW3fvp0VK1YsUUXSwowV9ElO5EDIf7qqPt81P5HkjK7/DGDviEN3A2fP2j8L2DN5udLRt2HDBtatW8e2bdv46U9/yrZt21i3bh0bNmxY6tKksRz2t26SBNgM7Kqqj87quh24Bvhw9/6FEYffAfzlrA9gLwPe16ti6Shbu3YtAO9617vYtWsXK1asYOPGjf/fLh3rcrg1Y5NcAnwFeAD4edf8fg7cp/8n4GXAfwNvqap9SQbA26vqT7vj39qNB9hYVX9/uKIGg0H5o2aSNL4kO6pqMLLvWFwc3KCXpIU5VND7zVhJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuPGWUrwZuBKYG9VrezaPguc1w15EfC/VXXhiGMfBX4A/AzYP9+P4kuSjpzDBj1wC3Aj8KmDDVX1+we3k3wEeOoQx6+uqicnLVCS1M9hg76q7k6yfFRft3D47wG/tbhlSZIWS9979L8JPFFVD83TX8CdSXYkWX+oEyVZn2SYZDgzM9OzLEnSQX2Dfi0wfYj+i6vqIuBy4Nokr51vYFVtqqpBVQ2mpqZ6liVJOmjioE9yAvC7wGfnG1NVe7r3vcBtwKpJ55MkTabPFf1vA9+qqt2jOpM8P8lJB7eBy4CdPeaTJE3gsEGfZBq4Bzgvye4k67quq5lz2ybJS5Ns6XZPA7YnuR/4OvDFqvry4pUuSRrHOE/drJ2n/Y9HtO0Brui2HwEu6FmfJKknvxkrSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWrcOCtM3Zxkb5Kds9o+mOS7Se7rXlfMc+yaJN9O8nCS6xazcEnSeMa5or8FWDOi/WNVdWH32jK3M8ky4OPA5cD5wNok5/cpVpK0cIcN+qq6G9g3wblXAQ9X1SNV9RPgM8BVE5xHktRDn3v070zyze7Wzskj+s8EHpu1v7trGynJ+iTDJMOZmZkeZUmSZps06D8B/BpwIfA48JERYzKireY7YVVtqqpBVQ2mpqYmLEuSNNdEQV9VT1TVz6rq58DfceA2zVy7gbNn7Z8F7JlkPknS5CYK+iRnzNr9HWDniGHfAM5N8vIkzwOuBm6fZD5J0uROONyAJNPApcCpSXYD1wOXJrmQA7diHgXe1o19KfDJqrqiqvYneSdwB7AMuLmqHjwi/wpJ0rxSNe9t8yUzGAxqOBwudRmSdNxIsqOqBqP6/GasJDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTts0HeLf+9NsnNW218n+Va3OPhtSV40z7GPJnkgyX1J/IF5SVoC41zR3wKsmdO2FVhZVb8B/CfwvkMcv7qqLpzvB/ElSUfWYYO+qu4G9s1pu7Oq9ne7X+XAwt+SpGPQYtyjfyvwpXn6CrgzyY4k6w91kiTrkwyTDGdmZhahLEkS9Az6JBuA/cCn5xlycVVdBFwOXJvktfOdq6o2VdWgqgZTU1N9ypIkzTJx0Ce5BrgS+IOaZ4XxqtrTve8FbgNWTTqfJGkyEwV9kjXAnwFvrKofzTPm+UlOOrgNXAbsHDVWknTkjPN45TRwD3Bekt1J1gE3AicBW7tHJ2/qxr40yZbu0NOA7UnuB74OfLGqvnxE/hWSpHmdcLgBVbV2RPPmecbuAa7oth8BLuhVnSSpN78ZK0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklq3FhBn+TmJHuT7JzVdkqSrUke6t5PnufYa7oxD3XrzEqSjqJxr+hvAdbMabsOuKuqzgXu6vafJckpwPXAqziwMPj18/1BkCQdGWMFfVXdDeyb03wVcGu3fSvwphGHvgHYWlX7qur7wFZ+8Q+GJOkI6nOP/rSqehyge3/JiDFnAo/N2t/dtf2CJOuTDJMMZ2ZmepQlSZrtSH8YmxFtNWpgVW2qqkFVDaampo5wWZL03NEn6J9IcgZA9753xJjdwNmz9s8C9vSYU5K0QH2C/nbg4FM01wBfGDHmDuCyJCd3H8Je1rVJko6ScR+vnAbuAc5LsjvJOuDDwOuTPAS8vtsnySDJJwGqah/wF8A3uteHujZJ0lGSqpG3zJfUYDCo4XC41GVI0nEjyY6qGozq85uxktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGTRz0Sc5Lct+s19NJ3jNnzKVJnpo15gP9S5YkLcQJkx5YVd8GLgRIsgz4LnDbiKFfqaorJ51HktTPYt26eR3wX1X1nUU6nyRpkSxW0F8NTM/T95ok9yf5UpJXzHeCJOuTDJMMZ2ZmFqksSVLvoE/yPOCNwD+P6L4XOKeqLgBuAP51vvNU1aaqGlTVYGpqqm9ZkqTOYlzRXw7cW1VPzO2oqqer6ofd9hbgxCSnLsKckqQxLUbQr2We2zZJTk+SbntVN9/3FmFOSdKYJn7qBiDJrwCvB942q+3tAFV1E/Bm4B1J9gM/Bq6uquozpyRpYXoFfVX9CHjxnLabZm3fCNzYZw5JUj9+M1aSGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGLcaasY8meSDJfUmGI/qT5G+TPJzkm0ku6junJGl8vRYemWV1VT05T9/lwLnd61XAJ7p3SdJRcDRu3VwFfKoO+CrwoiRnHIV5JUksTtAXcGeSHUnWj+g/E3hs1v7uru1ZkqxPMkwynJmZWYSyJEmwOEF/cVVdxIFbNNcmee2c/ow45hcWCK+qTVU1qKrB1NTUIpQlSYJFCPqq2tO97wVuA1bNGbIbOHvW/lnAnr7zSpLG0yvokzw/yUkHt4HLgJ1zht0O/FH39M2rgaeq6vE+80qSxtf3qZvTgNuSHDzXP1bVl5O8HaCqbgK2AFcADwM/Av6k55ySpAXoFfRV9QhwwYj2m2ZtF3Btn3kkSZPzm7GS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXF9f49eOm516ygcFQd+rVtaGga9nrMmCd8khraOOxPfuklydpJtSXYleTDJu0eMuTTJU0nu614f6FeuJGmh+lzR7wfeW1X3duvG7kiytar+Y864r1TVlT3mkST1MPEVfVU9XlX3dts/AHYBZy5WYZKkxbEoT90kWQ68EvjaiO7XJLk/yZeSvOIQ51ifZJhkODMzsxhl6Tlm+fLTSXJEX8ARn2P58tOX+H9JtSZ9P1hK8gLg34GNVfX5OX0vBH5eVT9McgXwN1V17uHOORgMajgc9qpLzz0HPihd6ir6S3xKRwuXZEdVDUb19bqiT3Ii8Dng03NDHqCqnq6qH3bbW4ATk5zaZ05J0sL0eeomwGZgV1V9dJ4xp3fjSLKqm+97k84pSVq4Pk/dXAz8IfBAkvu6tvcDLwOoqpuANwPvSLIf+DFwdfnfpJJ0VE0c9FW1HTjkVwur6kbgxknnkCT152/dSFLjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS41xKUM2o618IH1zqKvqr65e6ArXGoFcz8udPt/MzxR9c6irUEm/dSFLjDHpJapxBL0mN8x69mnHOOaeRPLHUZfR2zjmnLXUJaoxBr2Y8+uj/HPE5DqxL28AnvnpO6btm7Jok307ycJLrRvT/cpLPdv1fS7K8z3ySpIXrs2bsMuDjwOXA+cDaJOfPGbYO+H5V/TrwMeCvJp1PkjSZPlf0q4CHq+qRqvoJ8BngqjljrgJu7bb/BXjdwcXCJUlHR5+gPxN4bNb+7q5t5Jiq2g88Bbx41MmSrE8yTDKcmZnpUZY0niQLfvU5TloqfYJ+1P97535KNc6YA41Vm6pqUFWDqampHmVJ46mqo/aSllKfoN8NnD1r/yxgz3xjkpwA/Cqwr8eckqQF6hP03wDOTfLyJM8DrgZunzPmduCabvvNwL+VlzeSdFRN/Bx9Ve1P8k7gDmAZcHNVPZjkQ8Cwqm4HNgP/kORhDlzJX70YRUuSxtfrC1NVtQXYMqftA7O2nwHe0mcOSVI//taNJDXOoJekxhn0ktQ4g16SGpdj8WnHJDPAd5a6DmmEU4Enl7oIaYRzqmrkt02PyaCXjlVJhlU1WOo6pIXw1o0kNc6gl6TGGfTSwmxa6gKkhfIevSQ1zit6SWqcQS9JjTPopTEkuTnJ3iQ7l7oWaaEMemk8twBrlroIaRIGvTSGqrobV0fTccqgl6TGGfSS1DiDXpIaZ9BLUuMMemkMSaaBe4DzkuxOsm6pa5LG5U8gSFLjvKKXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalx/wefO3piUVlJJwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.boxplot(sample, \n", " patch_artist = True, \n", " boxprops = dict(facecolor = \"yellow\", color = \"black\"));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Если в конце строки для построения графика мы не поставим `;`, Python нам покажет словарь, на основе которого отрисовывается график:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'whiskers': [,\n", " ],\n", " 'caps': [,\n", " ],\n", " 'boxes': [],\n", " 'medians': [],\n", " 'fliers': [],\n", " 'means': []}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAPr0lEQVR4nO3df6zddX3H8edrBZdMUalcEQGt2QgpNgPJSdXADJ2TFULELbrRLBubXaoGjSb+MbSJOJcuLouaDIykswxc3NVtipJYhYY1wSb445aAlFUHIziuJfRiGWiUaPW9P/rtcrme2557vrf3th+ej+TkfL+fz+f7/bxrzOt++ZzvOd9UFZKkdv3achcgSTq2DHpJapxBL0mNM+glqXEGvSQ17qTlLmCY0047rVatWrXcZUjSCWP37t1PVNXEsL7jMuhXrVrF1NTUcpchSSeMJN+fr8+lG0lqnEEvSY0z6CWpcQa9JDXOoJekxh016JOcnWRnkr1JHkjy3q59ZZIdSR7s3k+d5/iruzEPJrl6sf8B0lKYnJxkzZo1rFixgjVr1jA5ObncJUkjG+WK/iDw/qpaDbwOuCbJecC1wJ1VdQ5wZ7f/LElWAtcBrwXWAtfN9wdBOl5NTk6yefNmrr/+ep555hmuv/56Nm/ebNjrhHHUoK+qx6rqnm77R8Be4EzgSuCWbtgtwFuGHP77wI6qOlBVTwI7gPWLUbi0VLZs2cK2bdtYt24dJ598MuvWrWPbtm1s2bJluUuTRrKgNfokq4DXAN8ETq+qx+DQHwPgpUMOORN4dNb+dNc27NybkkwlmZqZmVlIWdIxtXfvXi6++OJntV188cXs3bt3mSqSFmbkoE/yAuALwPuq6ulRDxvSNvRJJ1W1taoGVTWYmBj6LV5pWaxevZpdu3Y9q23Xrl2sXr16mSqSFmakoE9yModC/rNV9cWu+fEkZ3T9ZwD7hxw6DZw9a/8sYN/45UpLb/PmzWzcuJGdO3fy85//nJ07d7Jx40Y2b9683KVJIznqb90kCbAN2FtVH5/VdRtwNfDR7v3LQw6/HfjbWR/AXgp8oFfF0hLbsGEDAO95z3vYu3cvq1evZsuWLf/fLh3vcrRnxia5GPg6cD/wy675gxxap/9X4BXA/wBvq6oDSQbAO6vqL7vj396NB9hSVf90tKIGg0H5o2aSNLoku6tqMLTveHw4uEEvSQtzpKD3m7GS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMaN8ijBm4ArgP1VtaZr+zxwbjfkxcD/VtUFQ459BPgR8Avg4Hw/ii9JOnaOGvTAzcANwGcON1TVHx/eTvIx4KkjHL+uqp4Yt0BJUj9HDfqquivJqmF93YPD/wj43cUtS5K0WPqu0f8O8HhVPThPfwF3JNmdZNORTpRkU5KpJFMzMzM9y5IkHdY36DcAk0fov6iqLgQuA65J8ob5BlbV1qoaVNVgYmKiZ1mSpMPGDvokJwF/CHx+vjFVta973w/cCqwddz5J0nj6XNH/HvDdqpoe1pnk+UlOObwNXArs6TGfJGkMRw36JJPA3cC5SaaTbOy6rmLOsk2SlyfZ3u2eDuxKch/wLeArVfW1xStdkjSKUe662TBP+58PadsHXN5tPwyc37M+SVJPfjNWkhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktS4UZ4wdVOS/Un2zGr7cJIfJLm3e10+z7Hrk3wvyUNJrl3MwiVJoxnliv5mYP2Q9k9U1QXda/vcziQrgE8ClwHnARuSnNenWEnSwh016KvqLuDAGOdeCzxUVQ9X1c+AzwFXjnEeSVIPfdbo353kO93SzqlD+s8EHp21P921DZVkU5KpJFMzMzM9ypIkzTZu0H8K+E3gAuAx4GNDxmRIW813wqraWlWDqhpMTEyMWZYkaa6xgr6qHq+qX1TVL4F/5NAyzVzTwNmz9s8C9o0znyRpfGMFfZIzZu3+AbBnyLBvA+ckeVWS5wFXAbeNM58kaXwnHW1AkkngEuC0JNPAdcAlSS7g0FLMI8A7urEvBz5dVZdX1cEk7wZuB1YAN1XVA8fkXyFJmleq5l02XzaDwaCmpqaWuwxJOmEk2V1Vg2F9fjNWkhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxh016LuHf+9PsmdW298n+W73cPBbk7x4nmMfSXJ/knuT+APzkrQMRrmivxlYP6dtB7Cmqn4b+C/gA0c4fl1VXTDfD+JLko6towZ9Vd0FHJjTdkdVHex2v8GhB39Lko5Di7FG/3bgq/P0FXBHkt1JNh3pJEk2JZlKMjUzM7MIZUmSoGfQJ9kMHAQ+O8+Qi6rqQuAy4Jokb5jvXFW1taoGVTWYmJjoU5YkaZaxgz7J1cAVwJ/UPE8Yr6p93ft+4FZg7bjzSZLGM1bQJ1kP/BXw5qr6yTxjnp/klMPbwKXAnmFjJUnHzii3V04CdwPnJplOshG4ATgF2NHdOnljN/blSbZ3h54O7EpyH/At4CtV9bVj8q+QJM3rpKMNqKoNQ5q3zTN2H3B5t/0wcH6v6iRJvfnNWElqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS40YK+iQ3JdmfZM+stpVJdiR5sHs/dZ5jr+7GPNg9Z1aStIRGvaK/GVg/p+1a4M6qOge4s9t/liQrgeuA13LoweDXzfcHQZJ0bIwU9FV1F3BgTvOVwC3d9i3AW4Yc+vvAjqo6UFVPAjv41T8YkqRjqM8a/elV9RhA9/7SIWPOBB6dtT/dtf2KJJuSTCWZmpmZ6VGWJGm2Y/1hbIa01bCBVbW1qgZVNZiYmDjGZUnSc0efoH88yRkA3fv+IWOmgbNn7Z8F7OsxpyRpgfoE/W3A4btorga+PGTM7cClSU7tPoS9tGuTJC2RUW+vnATuBs5NMp1kI/BR4E1JHgTe1O2TZJDk0wBVdQD4G+Db3esjXZskaYmkauiS+bIaDAY1NTW13GVI0gkjye6qGgzr85uxktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGjR30Sc5Ncu+s19NJ3jdnzCVJnpo15kP9S5YkLcRJ4x5YVd8DLgBIsgL4AXDrkKFfr6orxp1HktTPYi3dvBH476r6/iKdT5K0SBYr6K8CJufpe32S+5J8Ncmr5ztBkk1JppJMzczMLFJZkqTeQZ/kecCbgX8b0n0P8MqqOh+4HvjSfOepqq1VNaiqwcTERN+yJEmdxbiivwy4p6oen9tRVU9X1Y+77e3AyUlOW4Q5JUkjWoyg38A8yzZJXpYk3fbabr4fLsKckqQRjX3XDUCS3wDeBLxjVts7AarqRuCtwLuSHAR+ClxVVdVnTknSwvQK+qr6CfCSOW03ztq+AbihzxySpH78ZqwkNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNW4xnxj6S5P4k9yaZGtKfJP+Q5KEk30lyYd85JUmj6/XgkVnWVdUT8/RdBpzTvV4LfKp7lyQtgaVYurkS+Ewd8g3gxUnOWIJ5JUksTtAXcEeS3Uk2Dek/E3h01v501/YsSTYlmUoyNTMzswhlSZJgcYL+oqq6kENLNNckecOc/gw55lceEF5VW6tqUFWDiYmJRShLkgSLEPRVta973w/cCqydM2QaOHvW/lnAvr7zSpJG0yvokzw/ySmHt4FLgT1zht0G/Fl3983rgKeq6rE+80qSRtf3rpvTgVuTHD7Xv1TV15K8E6CqbgS2A5cDDwE/Af6i55ySpAXoFfRV9TBw/pD2G2dtF3BNn3kkSePzm7GS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXF9f49eOmF1z1FYEod+rVtaHga9nrPGCd8khrZOOGMv3SQ5O8nOJHuTPJDkvUPGXJLkqST3dq8P9StXkrRQfa7oDwLvr6p7uufG7k6yo6r+c864r1fVFT3mkST1MPYVfVU9VlX3dNs/AvYCZy5WYZKkxbEod90kWQW8BvjmkO7XJ7kvyVeTvPoI59iUZCrJ1MzMzGKUpeeYlStXkuSYvoBjPsfKlSuX+X9Jtab3h7FJXgB8AXhfVT09p/se4JVV9eMklwNfAs4Zdp6q2gpsBRgMBn7apQV78sknm/igdCnvBtJzQ68r+iQncyjkP1tVX5zbX1VPV9WPu+3twMlJTuszpyRpYfrcdRNgG7C3qj4+z5iXdeNIsrab74fjzilJWrg+SzcXAX8K3J/k3q7tg8ArAKrqRuCtwLuSHAR+ClxVLfy3tSSdQMYO+qraBRxxMbGqbgBuGHcOSVJ//taNJDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zkcJqhl13Qvhwy9a7jJ6q+teuNwlqDEGvZqRv366mZ8prg8vdxVqiUs3ktQ4g16SGmfQS1LjXKNXU1p4DN+pp5663CWoMQa9mrEUH8QmaeIDXz239H1m7Pok30vyUJJrh/T/epLPd/3fTLKqz3ySpIXr88zYFcAngcuA84ANSc6bM2wj8GRV/RbwCeDvxp1PkjSePlf0a4GHqurhqvoZ8DngyjljrgRu6bb/HXhjWlhElaQTSJ+gPxN4dNb+dNc2dExVHQSeAl4y7GRJNiWZSjI1MzPToyxpNEkW/OpznLRc+gT9sP/3zv2UapQxhxqrtlbVoKoGExMTPcqSRlNVS/aSllOfoJ8Gzp61fxawb74xSU4CXgQc6DGnJGmB+gT9t4FzkrwqyfOAq4Db5oy5Dbi6234r8B/l5Y0kLamx76OvqoNJ3g3cDqwAbqqqB5J8BJiqqtuAbcA/J3mIQ1fyVy1G0ZKk0fX6wlRVbQe2z2n70KztZ4C39ZlDktSPv3UjSY0z6CWpcQa9JDXOoJekxuV4vNsxyQzw/eWuQxriNOCJ5S5CGuKVVTX026bHZdBLx6skU1U1WO46pIVw6UaSGmfQS1LjDHppYbYudwHSQrlGL0mN84pekhpn0EtS4wx6aQRJbkqyP8me5a5FWiiDXhrNzcD65S5CGodBL42gqu7Cp6PpBGXQS1LjDHpJapxBL0mNM+glqXEGvTSCJJPA3cC5SaaTbFzumqRR+RMIktQ4r+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWrc/wHlD3LrIeTfggAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.boxplot(sample)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Если сохранить этот словарь в переменную, из него можно будет извлечь вспомогательные данные." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAPr0lEQVR4nO3df6zddX3H8edrBZdMUalcEQGt2QgpNgPJSdXADJ2TFULELbrRLBubXaoGjSb+MbSJOJcuLouaDIykswxc3NVtipJYhYY1wSb445aAlFUHIziuJfRiGWiUaPW9P/rtcrme2557vrf3th+ej+TkfL+fz+f7/bxrzOt++ZzvOd9UFZKkdv3achcgSTq2DHpJapxBL0mNM+glqXEGvSQ17qTlLmCY0047rVatWrXcZUjSCWP37t1PVNXEsL7jMuhXrVrF1NTUcpchSSeMJN+fr8+lG0lqnEEvSY0z6CWpcQa9JDXOoJekxh016JOcnWRnkr1JHkjy3q59ZZIdSR7s3k+d5/iruzEPJrl6sf8B0lKYnJxkzZo1rFixgjVr1jA5ObncJUkjG+WK/iDw/qpaDbwOuCbJecC1wJ1VdQ5wZ7f/LElWAtcBrwXWAtfN9wdBOl5NTk6yefNmrr/+ep555hmuv/56Nm/ebNjrhHHUoK+qx6rqnm77R8Be4EzgSuCWbtgtwFuGHP77wI6qOlBVTwI7gPWLUbi0VLZs2cK2bdtYt24dJ598MuvWrWPbtm1s2bJluUuTRrKgNfokq4DXAN8ETq+qx+DQHwPgpUMOORN4dNb+dNc27NybkkwlmZqZmVlIWdIxtXfvXi6++OJntV188cXs3bt3mSqSFmbkoE/yAuALwPuq6ulRDxvSNvRJJ1W1taoGVTWYmBj6LV5pWaxevZpdu3Y9q23Xrl2sXr16mSqSFmakoE9yModC/rNV9cWu+fEkZ3T9ZwD7hxw6DZw9a/8sYN/45UpLb/PmzWzcuJGdO3fy85//nJ07d7Jx40Y2b9683KVJIznqb90kCbAN2FtVH5/VdRtwNfDR7v3LQw6/HfjbWR/AXgp8oFfF0hLbsGEDAO95z3vYu3cvq1evZsuWLf/fLh3vcrRnxia5GPg6cD/wy675gxxap/9X4BXA/wBvq6oDSQbAO6vqL7vj396NB9hSVf90tKIGg0H5o2aSNLoku6tqMLTveHw4uEEvSQtzpKD3m7GS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMaN8ijBm4ArgP1VtaZr+zxwbjfkxcD/VtUFQ459BPgR8Avg4Hw/ii9JOnaOGvTAzcANwGcON1TVHx/eTvIx4KkjHL+uqp4Yt0BJUj9HDfqquivJqmF93YPD/wj43cUtS5K0WPqu0f8O8HhVPThPfwF3JNmdZNORTpRkU5KpJFMzMzM9y5IkHdY36DcAk0fov6iqLgQuA65J8ob5BlbV1qoaVNVgYmKiZ1mSpMPGDvokJwF/CHx+vjFVta973w/cCqwddz5J0nj6XNH/HvDdqpoe1pnk+UlOObwNXArs6TGfJGkMRw36JJPA3cC5SaaTbOy6rmLOsk2SlyfZ3u2eDuxKch/wLeArVfW1xStdkjSKUe662TBP+58PadsHXN5tPwyc37M+SVJPfjNWkhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktS4UZ4wdVOS/Un2zGr7cJIfJLm3e10+z7Hrk3wvyUNJrl3MwiVJoxnliv5mYP2Q9k9U1QXda/vcziQrgE8ClwHnARuSnNenWEnSwh016KvqLuDAGOdeCzxUVQ9X1c+AzwFXjnEeSVIPfdbo353kO93SzqlD+s8EHp21P921DZVkU5KpJFMzMzM9ypIkzTZu0H8K+E3gAuAx4GNDxmRIW813wqraWlWDqhpMTEyMWZYkaa6xgr6qHq+qX1TVL4F/5NAyzVzTwNmz9s8C9o0znyRpfGMFfZIzZu3+AbBnyLBvA+ckeVWS5wFXAbeNM58kaXwnHW1AkkngEuC0JNPAdcAlSS7g0FLMI8A7urEvBz5dVZdX1cEk7wZuB1YAN1XVA8fkXyFJmleq5l02XzaDwaCmpqaWuwxJOmEk2V1Vg2F9fjNWkhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxh016LuHf+9PsmdW298n+W73cPBbk7x4nmMfSXJ/knuT+APzkrQMRrmivxlYP6dtB7Cmqn4b+C/gA0c4fl1VXTDfD+JLko6towZ9Vd0FHJjTdkdVHex2v8GhB39Lko5Di7FG/3bgq/P0FXBHkt1JNh3pJEk2JZlKMjUzM7MIZUmSoGfQJ9kMHAQ+O8+Qi6rqQuAy4Jokb5jvXFW1taoGVTWYmJjoU5YkaZaxgz7J1cAVwJ/UPE8Yr6p93ft+4FZg7bjzSZLGM1bQJ1kP/BXw5qr6yTxjnp/klMPbwKXAnmFjJUnHzii3V04CdwPnJplOshG4ATgF2NHdOnljN/blSbZ3h54O7EpyH/At4CtV9bVj8q+QJM3rpKMNqKoNQ5q3zTN2H3B5t/0wcH6v6iRJvfnNWElqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS40YK+iQ3JdmfZM+stpVJdiR5sHs/dZ5jr+7GPNg9Z1aStIRGvaK/GVg/p+1a4M6qOge4s9t/liQrgeuA13LoweDXzfcHQZJ0bIwU9FV1F3BgTvOVwC3d9i3AW4Yc+vvAjqo6UFVPAjv41T8YkqRjqM8a/elV9RhA9/7SIWPOBB6dtT/dtf2KJJuSTCWZmpmZ6VGWJGm2Y/1hbIa01bCBVbW1qgZVNZiYmDjGZUnSc0efoH88yRkA3fv+IWOmgbNn7Z8F7OsxpyRpgfoE/W3A4btorga+PGTM7cClSU7tPoS9tGuTJC2RUW+vnATuBs5NMp1kI/BR4E1JHgTe1O2TZJDk0wBVdQD4G+Db3esjXZskaYmkauiS+bIaDAY1NTW13GVI0gkjye6qGgzr85uxktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWqcQS9JjTPoJalxBr0kNc6gl6TGjR30Sc5Ncu+s19NJ3jdnzCVJnpo15kP9S5YkLcRJ4x5YVd8DLgBIsgL4AXDrkKFfr6orxp1HktTPYi3dvBH476r6/iKdT5K0SBYr6K8CJufpe32S+5J8Ncmr5ztBkk1JppJMzczMLFJZkqTeQZ/kecCbgX8b0n0P8MqqOh+4HvjSfOepqq1VNaiqwcTERN+yJEmdxbiivwy4p6oen9tRVU9X1Y+77e3AyUlOW4Q5JUkjWoyg38A8yzZJXpYk3fbabr4fLsKckqQRjX3XDUCS3wDeBLxjVts7AarqRuCtwLuSHAR+ClxVVdVnTknSwvQK+qr6CfCSOW03ztq+AbihzxySpH78ZqwkNc6gl6TGGfSS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNW4xnxj6S5P4k9yaZGtKfJP+Q5KEk30lyYd85JUmj6/XgkVnWVdUT8/RdBpzTvV4LfKp7lyQtgaVYurkS+Ewd8g3gxUnOWIJ5JUksTtAXcEeS3Uk2Dek/E3h01v501/YsSTYlmUoyNTMzswhlSZJgcYL+oqq6kENLNNckecOc/gw55lceEF5VW6tqUFWDiYmJRShLkgSLEPRVta973w/cCqydM2QaOHvW/lnAvr7zSpJG0yvokzw/ySmHt4FLgT1zht0G/Fl3983rgKeq6rE+80qSRtf3rpvTgVuTHD7Xv1TV15K8E6CqbgS2A5cDDwE/Af6i55ySpAXoFfRV9TBw/pD2G2dtF3BNn3kkSePzm7GS1DiDXpIaZ9BLUuMMeklqnEEvSY0z6CWpcQa9JDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXF9f49eOmF1z1FYEod+rVtaHga9nrPGCd8khrZOOGMv3SQ5O8nOJHuTPJDkvUPGXJLkqST3dq8P9StXkrRQfa7oDwLvr6p7uufG7k6yo6r+c864r1fVFT3mkST1MPYVfVU9VlX3dNs/AvYCZy5WYZKkxbEod90kWQW8BvjmkO7XJ7kvyVeTvPoI59iUZCrJ1MzMzGKUpeeYlStXkuSYvoBjPsfKlSuX+X9Jtab3h7FJXgB8AXhfVT09p/se4JVV9eMklwNfAs4Zdp6q2gpsBRgMBn7apQV78sknm/igdCnvBtJzQ68r+iQncyjkP1tVX5zbX1VPV9WPu+3twMlJTuszpyRpYfrcdRNgG7C3qj4+z5iXdeNIsrab74fjzilJWrg+SzcXAX8K3J/k3q7tg8ArAKrqRuCtwLuSHAR+ClxVLfy3tSSdQMYO+qraBRxxMbGqbgBuGHcOSVJ//taNJDXOoJekxhn0ktQ4g16SGmfQS1LjDHpJapxBL0mNM+glqXEGvSQ1zkcJqhl13Qvhwy9a7jJ6q+teuNwlqDEGvZqRv366mZ8prg8vdxVqiUs3ktQ4g16SGmfQS1LjXKNXU1p4DN+pp5663CWoMQa9mrEUH8QmaeIDXz239H1m7Pok30vyUJJrh/T/epLPd/3fTLKqz3ySpIXr88zYFcAngcuA84ANSc6bM2wj8GRV/RbwCeDvxp1PkjSePlf0a4GHqurhqvoZ8DngyjljrgRu6bb/HXhjWlhElaQTSJ+gPxN4dNb+dNc2dExVHQSeAl4y7GRJNiWZSjI1MzPToyxpNEkW/OpznLRc+gT9sP/3zv2UapQxhxqrtlbVoKoGExMTPcqSRlNVS/aSllOfoJ8Gzp61fxawb74xSU4CXgQc6DGnJGmB+gT9t4FzkrwqyfOAq4Db5oy5Dbi6234r8B/l5Y0kLamx76OvqoNJ3g3cDqwAbqqqB5J8BJiqqtuAbcA/J3mIQ1fyVy1G0ZKk0fX6wlRVbQe2z2n70KztZ4C39ZlDktSPv3UjSY0z6CWpcQa9JDXOoJekxuV4vNsxyQzw/eWuQxriNOCJ5S5CGuKVVTX026bHZdBLx6skU1U1WO46pIVw6UaSGmfQS1LjDHppYbYudwHSQrlGL0mN84pekhpn0EtS4wx6aQRJbkqyP8me5a5FWiiDXhrNzcD65S5CGodBL42gqu7Cp6PpBGXQS1LjDHpJapxBL0mNM+glqXEGvTSCJJPA3cC5SaaTbFzumqRR+RMIktQ4r+glqXEGvSQ1zqCXpMYZ9JLUOINekhpn0EtS4wx6SWrc/wHlD3LrIeTfggAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "box = plt.boxplot(sample)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Например, те же выбросы в виде массива:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([20])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# извлекаем запись по ключу fliers, fliers = outliers,\n", "# из нее один элемент с индексом 0,\n", "# а из него – значения по оси y\n", "\n", "box[\"fliers\"][0].get_ydata()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Более подробно про ящик с усами можно почитать в официальной [документации](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.boxplot.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Часть 2: работа с выборкой с библиотекой SciPy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Импортируем из библиотеки `scipy` для научных вычислений (от *Scientific Python*) модуль `stats`:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "from scipy import stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Вызовем из этого модуля функцию `describe()` для получения описательных статистик:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "DescribeResult(nobs=7, minmax=(0, 20), mean=3.857142857142857, variance=51.80952380952382, skewness=1.9493974630335922, kurtosis=1.9618625310877995)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats.describe(sample)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Здесь:\n", " \n", "* `nobs`: число наблюдений;\n", "* `minmax`: кортеж с минимумом и максимумом;\n", "* `mean`: среднее;\n", "* `variance`: выборочная дисперсия;\n", "* `skewness`: коэффициент [скошенности](https://en.wikipedia.org/wiki/Skewness) (насколько распределение несимметрично, скошено вправо или влево);\n", "* `kurtosis`: коэффициент [эксцесса](https://en.wikipedia.org/wiki/Kurtosis) (насколько выражен пик распределения, «плоское» оно или «остроконечное»)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Можем сохранить результат выше в переменную и извлечь из неё характеристики отдельно:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "N: 7\n", "Mean: 3.857142857142857\n", "Min: 0\n", "Max: 20\n", "Range: 20\n" ] } ], "source": [ "desc = stats.describe(sample)\n", "print(\"N:\", desc.nobs)\n", "print(\"Mean:\", desc.mean)\n", "print(\"Min:\", desc.minmax[0])\n", "print(\"Max:\", desc.minmax[1])\n", "\n", "# размах: max - min\n", "print(\"Range:\", desc.minmax[1] - desc.minmax[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Также в `stats` есть полезная функция `iqr()`, она умеет считать межквартильный размах:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats.iqr(sample)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "И, наконец, в `stats` есть функция `rankdata()`, которая определяет ранги наблюдений (с усреднением, как мы вычисляли вручную):" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([5. , 3.5, 1.5, 7. , 6. , 3.5, 1.5])" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stats.rankdata(sample)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Часть 3: работа с реальными данными в виде датафреймов Pandas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "В файле `coffee_and_code.csv` содержатся результаты опроса программистов:\n", "\n", "* `CodingHours`: время, которое респондент тратит на написание кода (число часов в день);\n", "* `CoffeeCupsPerDay`: количество чашек кофе, которое респондент выпивает в день;\n", "* `CoffeeTime`: когда респондент пьет кофе (перед написанием кода, во время написания кода, весь день и прочие варианты);\n", "* `CodingWithoutCoffee`: пишет ли респондент код без кофе (да, нет, иногда);\n", "* `CoffeeType`: тип или марка кофе, предпочитаемые респондентом;\n", "* `CoffeeSolveBugs`: исправляет ли программист баги в коде (да, нет, иногда);\n", "* `Gender`: пол респондента;\n", "* `Country`: страна респондента;\n", "* `AgeRange`: возраст респондента (по категориям)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Импортируем библиотеку `pandas` для загрузки и обработки данных в табличном виде, её обычно импортируют с сокращённым названием `pd`:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Загрузим данные из CSV-файла (расширение `.csv` – от *comma separated values*, то есть значения, разделённые запятыми):" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"coffee_and_code.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Посмотрим на датафрейм:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodingHoursCoffeeCupsPerDayCoffeeTimeCodingWithoutCoffeeCoffeeTypeCoffeeSolveBugsGenderCountryAgeRange
082Before codingYesCaffè latteSometimesFemaleLebanon18 to 29
132Before codingYesAmericanoYesFemaleLebanon30 to 39
253While codingNoNescafeYesFemaleLebanon18 to 29
382Before codingNoNescafeYesMaleLebanonNaN
4103While codingSometimesTurkishNoMaleLebanon18 to 29
..............................
9562Before codingYesNescafeYesMaleLebanon18 to 29
9641Before codingSometimesNescafeSometimesFemaleLebanon18 to 29
97103Before codingYesCappuccinoYesMaleLebanonUnder 18
9822While codingSometimesEspresso (Short Black)SometimesFemaleLebanon18 to 29
99104Before codingSometimesDouble Espresso (Doppio)SometimesMaleLebanon18 to 29
\n", "

100 rows × 9 columns

\n", "
" ], "text/plain": [ " CodingHours CoffeeCupsPerDay CoffeeTime CodingWithoutCoffee \\\n", "0 8 2 Before coding Yes \n", "1 3 2 Before coding Yes \n", "2 5 3 While coding No \n", "3 8 2 Before coding No \n", "4 10 3 While coding Sometimes \n", ".. ... ... ... ... \n", "95 6 2 Before coding Yes \n", "96 4 1 Before coding Sometimes \n", "97 10 3 Before coding Yes \n", "98 2 2 While coding Sometimes \n", "99 10 4 Before coding Sometimes \n", "\n", " CoffeeType CoffeeSolveBugs Gender Country AgeRange \n", "0 Caffè latte Sometimes Female Lebanon 18 to 29 \n", "1 Americano Yes Female Lebanon 30 to 39 \n", "2 Nescafe Yes Female Lebanon 18 to 29 \n", "3 Nescafe Yes Male Lebanon NaN \n", "4 Turkish No Male Lebanon 18 to 29 \n", ".. ... ... ... ... ... \n", "95 Nescafe Yes Male Lebanon 18 to 29 \n", "96 Nescafe Sometimes Female Lebanon 18 to 29 \n", "97 Cappuccino Yes Male Lebanon Under 18 \n", "98 Espresso (Short Black) Sometimes Female Lebanon 18 to 29 \n", "99 Double Espresso (Doppio) Sometimes Male Lebanon 18 to 29 \n", "\n", "[100 rows x 9 columns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Можем вызвать первые несколько строк:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodingHoursCoffeeCupsPerDayCoffeeTimeCodingWithoutCoffeeCoffeeTypeCoffeeSolveBugsGenderCountryAgeRange
082Before codingYesCaffè latteSometimesFemaleLebanon18 to 29
132Before codingYesAmericanoYesFemaleLebanon30 to 39
253While codingNoNescafeYesFemaleLebanon18 to 29
\n", "
" ], "text/plain": [ " CodingHours CoffeeCupsPerDay CoffeeTime CodingWithoutCoffee \\\n", "0 8 2 Before coding Yes \n", "1 3 2 Before coding Yes \n", "2 5 3 While coding No \n", "\n", " CoffeeType CoffeeSolveBugs Gender Country AgeRange \n", "0 Caffè latte Sometimes Female Lebanon 18 to 29 \n", "1 Americano Yes Female Lebanon 30 to 39 \n", "2 Nescafe Yes Female Lebanon 18 to 29 " ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Или последние:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodingHoursCoffeeCupsPerDayCoffeeTimeCodingWithoutCoffeeCoffeeTypeCoffeeSolveBugsGenderCountryAgeRange
97103Before codingYesCappuccinoYesMaleLebanonUnder 18
9822While codingSometimesEspresso (Short Black)SometimesFemaleLebanon18 to 29
99104Before codingSometimesDouble Espresso (Doppio)SometimesMaleLebanon18 to 29
\n", "
" ], "text/plain": [ " CodingHours CoffeeCupsPerDay CoffeeTime CodingWithoutCoffee \\\n", "97 10 3 Before coding Yes \n", "98 2 2 While coding Sometimes \n", "99 10 4 Before coding Sometimes \n", "\n", " CoffeeType CoffeeSolveBugs Gender Country AgeRange \n", "97 Cappuccino Yes Male Lebanon Under 18 \n", "98 Espresso (Short Black) Sometimes Female Lebanon 18 to 29 \n", "99 Double Espresso (Doppio) Sometimes Male Lebanon 18 to 29 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.tail(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Теперь запросим техническую информацию по датафрейму:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 100 entries, 0 to 99\n", "Data columns (total 9 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 CodingHours 100 non-null int64 \n", " 1 CoffeeCupsPerDay 100 non-null int64 \n", " 2 CoffeeTime 100 non-null object\n", " 3 CodingWithoutCoffee 100 non-null object\n", " 4 CoffeeType 99 non-null object\n", " 5 CoffeeSolveBugs 100 non-null object\n", " 6 Gender 100 non-null object\n", " 7 Country 100 non-null object\n", " 8 AgeRange 98 non-null object\n", "dtypes: int64(2), object(7)\n", "memory usage: 7.2+ KB\n" ] } ], "source": [ "df.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Метод `.info()` возвращает число строк в датафрейме (`100 entries`), перечень столбцов с сохранёнными в них типами данных (`Dtype`) и количество заполненных ячеек (`Not-Null Count`) в каждом столбце. Обратите внимание: строковый тип в `pandas` называется не `string`, а `object`.\n", "\n", "Теперь вызовем статистическое описание датафрейма – основные описательные статистики:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodingHoursCoffeeCupsPerDay
count100.000000100.000000
mean6.4100002.890000
std2.6442051.613673
min1.0000001.000000
25%4.0000002.000000
50%7.0000002.500000
75%8.0000004.000000
max10.0000008.000000
\n", "
" ], "text/plain": [ " CodingHours CoffeeCupsPerDay\n", "count 100.000000 100.000000\n", "mean 6.410000 2.890000\n", "std 2.644205 1.613673\n", "min 1.000000 1.000000\n", "25% 4.000000 2.000000\n", "50% 7.000000 2.500000\n", "75% 8.000000 4.000000\n", "max 10.000000 8.000000" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "По умолчанию метод `.describe()` возвращает характеристики только числовых столбцов:\n", "\n", "* `count`: число заполненных ячеек в столбце;\n", "* `mean`: среднее арифметическое;\n", "* `std`: стандартное отклонение, посчитанное по столбцу;\n", "* `25%`, `50%`, `75%`: нижний квартиль, медиана, верхний квартиль;\n", "* `min` и `max`: минимальное и максимальное значение." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Если мы хотим описать текстовые столбцы, мы сможем это сделать – добавим аргумент `include` и укажем в нём соответствующий тип:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CoffeeTimeCodingWithoutCoffeeCoffeeTypeCoffeeSolveBugsGenderCountryAgeRange
count1001009910010010098
unique7383215
topWhile codingSometimesNescafeSometimesMaleLebanon18 to 29
freq615132437410060
\n", "
" ], "text/plain": [ " CoffeeTime CodingWithoutCoffee CoffeeType CoffeeSolveBugs Gender \\\n", "count 100 100 99 100 100 \n", "unique 7 3 8 3 2 \n", "top While coding Sometimes Nescafe Sometimes Male \n", "freq 61 51 32 43 74 \n", "\n", " Country AgeRange \n", "count 100 98 \n", "unique 1 5 \n", "top Lebanon 18 to 29 \n", "freq 100 60 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.describe(include = \"object\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Для текстовых столбцов метод `.describe()` возвращает следующие характеристики:\n", " \n", "* `count`: число заполненных ячеек в столбце;\n", "* `unique`: число уникальных значений в столбце;\n", "* `top`: мода – самое частое значение в столбце;\n", "* `freq`: частота, которая соответствует моде (сколько раз встретилось значение, указанное в `top`)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Если хочется увидеть частоты для всех уникальных значений, пригодится метод `.value_counts()`. Выберем столбец с типом кофе по названию (как из словаря!) и получим по нему таблицу с частотами: " ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Nescafe 32\n", "American Coffee 23\n", "Turkish 19\n", "Espresso (Short Black) 8\n", "Cappuccino 7\n", "Caffè latte 5\n", "Double Espresso (Doppio) 3\n", "Americano 2\n", "Name: CoffeeType, dtype: int64" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"CoffeeType\"].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Вернёмся к числовым столбцам. Выведем описательные статистики для числа чашек кофе, выпиваемым программистами:" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "count 100.000000\n", "mean 2.890000\n", "std 1.613673\n", "min 1.000000\n", "25% 2.000000\n", "50% 2.500000\n", "75% 4.000000\n", "max 8.000000\n", "Name: CoffeeCupsPerDay, dtype: float64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"CoffeeCupsPerDay\"].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Или отдельно:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1\n", "8\n", "2.89\n" ] } ], "source": [ "print(df[\"CoffeeCupsPerDay\"].min())\n", "print(df[\"CoffeeCupsPerDay\"].max())\n", "print(df[\"CoffeeCupsPerDay\"].mean())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Квартили и медиана:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.0\n", "2.5\n", "4.0\n" ] } ], "source": [ "print(df[\"CoffeeCupsPerDay\"].quantile(0.25))\n", "print(df[\"CoffeeCupsPerDay\"].quantile(0.5))\n", "print(df[\"CoffeeCupsPerDay\"].quantile(0.75))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Теперь вычислим ранги наблюдений:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 33.5\n", "1 33.5\n", "2 62.0\n", "3 33.5\n", "4 62.0\n", " ... \n", "95 33.5\n", "96 8.5\n", "97 62.0\n", "98 33.5\n", "99 81.0\n", "Name: CoffeeCupsPerDay, Length: 100, dtype: float64" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[\"CoffeeCupsPerDay\"].rank()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "В завершение вводного знакомства с описанием данных в Python выведем описательные статистики по группам. Сравним характеристики программистов, по-разному относящихся к багам в коде.\n", "\n", "Сначала посмотрим на средние значения числовых столбцов:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodingHoursCoffeeCupsPerDay
CoffeeSolveBugs
No6.5185192.407407
Sometimes6.0465122.790698
Yes6.8333333.466667
\n", "
" ], "text/plain": [ " CodingHours CoffeeCupsPerDay\n", "CoffeeSolveBugs \n", "No 6.518519 2.407407\n", "Sometimes 6.046512 2.790698\n", "Yes 6.833333 3.466667" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# в .groupby() – столбец, по которому группируем данные\n", "# далее – нужная функция \n", "\n", "df.groupby(\"CoffeeSolveBugs\").mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "А теперь посмотрим на минимальные и максимальные значения, средние и медианы:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CodingHoursCoffeeCupsPerDay
minmaxmeanmedianminmaxmeanmedian
CoffeeSolveBugs
No2106.5185197.0162.4074072.0
Sometimes2106.0465126.0182.7906983.0
Yes1106.8333337.5183.4666673.0
\n", "
" ], "text/plain": [ " CodingHours CoffeeCupsPerDay \\\n", " min max mean median min max \n", "CoffeeSolveBugs \n", "No 2 10 6.518519 7.0 1 6 \n", "Sometimes 2 10 6.046512 6.0 1 8 \n", "Yes 1 10 6.833333 7.5 1 8 \n", "\n", " \n", " mean median \n", "CoffeeSolveBugs \n", "No 2.407407 2.0 \n", "Sometimes 2.790698 3.0 \n", "Yes 3.466667 3.0 " ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# в .groupby() – столбец, по которому группируем данные\n", "# в .agg() – перечень функций в кавычках\n", "\n", "df.groupby(\"CoffeeSolveBugs\").agg([\"min\", \"max\", \"mean\", \"median\"])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }