{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Основы программирования в Python\n",
"\n",
"*Алла Тамбовцева, НИУ ВШЭ*\n",
"\n",
"## Работа с таблицами. Основы работы с датафреймами `pandas`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В этой и последующих лекциях мы будем работать с таблицами. В социальных науках термины «база данных» и «таблица» часто используются как синонимы. Вообще, между этими терминами есть существенная разница, так как база данных – это набор таблиц, связанных друг с другом (при определённых условиях можно думать о ней как о файле Excel с разными листами). Давайте для простоты считать эти термины эквивалентными, основы работы с «настоящими» базами данных (SQL, PyMongo) мы обсуждать не будем. Кроме того, в качестве синонима слова таблица мы будем использовать слово датафрейм как кальку с термина data frame.\n",
"\n",
"Библиотека pandas используется для удобной и более эффективной работы с таблицами. Её функционал достаточно разнообразен, но давайте начнем с каких-то базовых функций и методов.\n",
"\n",
"Для начала импортируем саму библиотеку."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Здесь мы использовали такой приём: импортировали библиотеку и присвоили ей сокращённое имя, которое будет использоваться в пределах данного ipynb-файла. Чтобы не писать перед каждой библиотечной функцией длинное `pandas`. и не импортировать сразу все функции из этой библиотеки, мы сократили название до `pd`, и в дальнейшем Python будет понимать, что мы имеем в виду. Можно было бы сократить и до `p`, но тогда есть риск забыть про это и создать переменную с таким же именем, что в какой-то момент приведёт к проблемам. К тому же `pd` – распространенное сокращение."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Загрузка таблицы из файла и описание переменных\n",
"А теперь давайте загрузим какую-нибудь реальную базу данных из файла. Библиотека `pandas` достаточно гибкая, она позволяет загружать данные из файлов разных форматов. Пока остановимся на самом простом – файле csv, что расшифровывается как *comma separated values*. Столбцы в таком файле по умолчанию отделяются друг от друга запятой. Например, такая таблица"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
0
\n",
"
1
\n",
"
2
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1
\n",
"
4
\n",
"
9
\n",
"
\n",
"
\n",
"
1
\n",
"
4
\n",
"
8
\n",
"
6
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" 0 1 2\n",
"0 1 4 9\n",
"1 4 8 6"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame([[1, 4, 9], [4, 8, 6]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"сохраненная в формате csv без названий строк и столбцов будет выглядеть так:"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"1, 4, 9\n",
"4, 8, 6"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Но разделитель столбцов в таблице может быть и другим, например, точкой с запятой:"
]
},
{
"cell_type": "raw",
"metadata": {},
"source": [
"1; 4; 9\n",
"4; 8; 6"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В таких случаях нам потребуется дополнительно выставлять параметр `sep = \";\"`, чтобы Python понимал, как правильно отделять один столбец от другого. Посмотрим на примере двух файлов: `test1.xlsx` и `test2.csv`."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" A B C\n",
"0 2 2.5 1.8\n",
"1 3 4.2 0.0\n",
"2 4 4.3 1.6"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# загружаем test2.csv – тоже все хорошо\n",
"d2 = pd.read_csv(\"test2.csv\")\n",
"d2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Теперь поэкспериментируем: откроем файл `test2.csv` (можно в блокноте, а можно прямо в Jupyter, он открывает текстовые файлы) и изменим разделитель столбцов. Заменим запятые на точки с запятой:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"A;B;C\n",
"2;2.5;1.8\n",
"3;4.2;0\n",
"4;4.3;1.6\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
A;B;C
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
2;2.5;1.8
\n",
"
\n",
"
\n",
"
1
\n",
"
3;4.2;0
\n",
"
\n",
"
\n",
"
2
\n",
"
4;4.3;1.6
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A;B;C\n",
"0 2;2.5;1.8\n",
"1 3;4.2;0\n",
"2 4;4.3;1.6"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# теперь при загрузке получим что-то не то\n",
"pd.read_csv(\"test2.csv\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Это из-за разделителя столбцов по умолчанию (запятая), укажем явно, что теперь это точка с запятой:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
A
\n",
"
B
\n",
"
C
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
2
\n",
"
2.5
\n",
"
1.8
\n",
"
\n",
"
\n",
"
1
\n",
"
3
\n",
"
4.2
\n",
"
0.0
\n",
"
\n",
"
\n",
"
2
\n",
"
4
\n",
"
4.3
\n",
"
1.6
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C\n",
"0 2 2.5 1.8\n",
"1 3 4.2 0.0\n",
"2 4 4.3 1.6"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# все хорошо\n",
"pd.read_csv(\"test2.csv\", sep = \";\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Если мы при этом еще изменим десятичный разделитель в дробях, нас тоже будут ожидать странности:\n",
"\n",
"```\n",
"A;B;C\n",
"2;2,5;1,8\n",
"3;4,2;0\n",
"4;4,3;1,6\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
A
\n",
"
B
\n",
"
C
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
2
\n",
"
2,5
\n",
"
1,8
\n",
"
\n",
"
\n",
"
1
\n",
"
3
\n",
"
4,2
\n",
"
0
\n",
"
\n",
"
\n",
"
2
\n",
"
4
\n",
"
4,3
\n",
"
1,6
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" A B C\n",
"0 2 2,5 1,8\n",
"1 3 4,2 0\n",
"2 4 4,3 1,6"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# визуально все так же\n",
"dd = pd.read_csv(\"test2.csv\", sep = \";\")\n",
"dd"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 3 entries, 0 to 2\n",
"Data columns (total 3 columns):\n",
"A 3 non-null int64\n",
"B 3 non-null object\n",
"C 3 non-null object\n",
"dtypes: int64(1), object(2)\n",
"memory usage: 152.0+ bytes\n"
]
}
],
"source": [
"dd.info() # тип object, не float"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 3 entries, 0 to 2\n",
"Data columns (total 3 columns):\n",
"A 3 non-null int64\n",
"B 3 non-null float64\n",
"C 3 non-null float64\n",
"dtypes: float64(2), int64(1)\n",
"memory usage: 152.0 bytes\n"
]
}
],
"source": [
"# изменим десятичный разделитель\n",
"dd = pd.read_csv(\"test2.csv\", sep = \";\", decimal = \",\")\n",
"dd.info()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Пока загрузим файл по ссылке: пропишем путь к нему внутри функции `read_csv()` из библиотеки `pandas`. Плюс, сделаем так, чтобы первый столбец (с индексом 0) был использован в качестве названий строк (строки будут иметь не номер от 0 до N, а названия, которые мы захотим, важно только, чтобы они все были уникальными, без повторов):"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv(\"scores2.csv\", index_col = 0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Иногда такой подход может быть полезен. Представьте, что все переменные в таблице, кроме *id*, измерены в количественной шкале, и мы планируем реализовать на них статистический метод, который работает исключительно с числовыми данными. Если мы просто выкинем столбец с *id*, мы потеряем информацию о наблюдении, если мы его оставим, нам придется собирать в отдельную таблицу показатели, к которым будем применять метод, так как сохраненный в исходной таблице текст будет мешать. Если же мы назовем строки в соответствии с *id*, мы убьем сразу двух зайцев: избавимся от столбца с текстом и не потеряем информацию о наблюдении (код, имя респондента, название страны и прочее)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В файле `scores2.csv` сохранены оценки студентов-политологов по ряду курсов. Оценки реальные, взяты из кумулятивного рейтинга, но имена студентов зашифрованы – вместо них задействованы номера студенческих билетов. Посмотрим на датафрейм:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" econ eng polth mstat2 phist law\n",
"id \n",
"М141БПЛТЛ024 8 9 8 10 8.0 7\n",
"М141БПЛТЛ031 10 10 10 10 9.0 9\n",
"М141БПЛТЛ075 10 9 10 9 8.0 9\n",
"М141БПЛТЛ017 8 9 9 10 6.0 9\n",
"М141БПЛТЛ069 10 10 10 9 8.0 8\n",
"М141БПЛТЛ072 10 9 8 9 8.0 8\n",
"М141БПЛТЛ020 6 9 10 8 8.0 7\n",
"М141БПЛТЛ026 7 10 7 9 8.0 8\n",
"М141БПЛТЛ073 8 9 8 9 8.0 8\n",
"М141БПЛТЛ078 5 6 10 7 6.0 8\n",
"М141БПЛТЛ060 7 9 8 8 5.0 7\n",
"М141БПЛТЛ040 6 9 7 8 6.0 9\n",
"М141БПЛТЛ065 4 8 8 7 9.0 8\n",
"М141БПЛТЛ053 5 9 8 7 8.0 8\n",
"М141БПЛТЛ015 6 9 7 9 4.0 7\n",
"М141БПЛТЛ021 8 9 8 8 7.0 7\n",
"М141БПЛТЛ018 7 9 7 8 6.0 6\n",
"М141БПЛТЛ039 8 8 8 6 8.0 7\n",
"М141БПЛТЛ036 8 8 6 9 4.0 8\n",
"М141БПЛТЛ049 6 8 6 8 4.0 8\n",
"06114043 5 8 8 8 10.0 7\n",
"М141БПЛТЛ048 6 9 6 4 4.0 6\n",
"М141БПЛТЛ034 6 9 6 8 6.0 7\n",
"М141БПЛТЛ045 7 8 6 7 6.0 7\n",
"М141БПЛТЛ033 7 9 7 9 7.0 7\n",
"М141БПЛТЛ083 5 8 7 6 5.0 7\n",
"М141БПЛТЛ008 9 8 10 9 8.0 9\n",
"М141БПЛТЛ001 4 10 7 7 6.0 8\n",
"М141БПЛТЛ038 4 9 6 7 6.0 7\n",
"М141БПЛТЛ052 7 8 6 6 6.0 8\n",
"М141БПЛТЛ011 6 9 6 6 5.0 6\n",
"М141БПЛТЛ004 6 8 6 6 5.0 5\n",
"М141БПЛТЛ010 6 9 7 7 6.0 7\n",
"М141БПЛТЛ071 7 9 6 8 4.0 6\n",
"М141БПЛТЛ035 6 8 5 5 4.0 6\n",
"М141БПЛТЛ030 6 7 6 6 4.0 8\n",
"М141БПЛТЛ070 4 8 6 5 5.0 6\n",
"М141БПЛТЛ051 6 8 7 6 7.0 6\n",
"М141БПЛТЛ046 4 7 5 8 5.0 7\n",
"М141БПЛТЛ047 4 7 5 9 5.0 6\n",
"М141БПЛТЛ063 4 8 4 4 4.0 5\n",
"М141БПЛТЛ029 7 9 5 6 7.0 6\n",
"М141БПЛТЛ064 7 6 6 8 4.0 6\n",
"М141БПЛТЛ076 6 8 6 6 6.0 8\n",
"М141БПЛТЛ062 6 9 6 6 5.0 6\n",
"М141БПЛТЛ074 4 7 6 5 6.0 6\n",
"130232038 5 8 4 8 4.0 8\n",
"М141БПЛТЛ023 8 9 6 9 4.0 7\n",
"М141БПЛТЛ054 4 8 6 4 4.0 6\n",
"М141БПЛТЛ012 4 10 6 5 4.0 7\n",
"М141БПЛТЛ006 5 8 5 5 5.0 6\n",
"М141БПЛТЛ055 4 7 7 4 8.0 5\n",
"М141БПЛТЛ007 6 7 6 7 4.0 5\n",
"М141БПЛТЛ050 6 8 4 5 4.0 5\n",
"М141БПЛТЛ066 7 9 5 8 4.0 6\n",
"М141БПЛТЛ043 5 8 5 6 5.0 6\n",
"М141БПЛТЛ084 4 8 5 5 NaN 8\n",
"М141БПЛТЛ005 5 7 4 7 4.0 5\n",
"М141БПЛТЛ044 4 6 4 4 5.0 4\n",
"13051038 4 9 5 5 5.0 5"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[:, 'econ' : 'law']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Откуда в квадратных скобках взялось двоеточие? Дело в том, что метод `.loc` – более универсальный, и позволяет выбирать не только столбцы, но и строки. При этом нужные строки указываются на первом месте, а столбцы – на втором. Когда мы пишем `.loc[:, 1]`, мы сообщаем Python, что нам нужны все строки (`:`) и столбцы, начиная с `Econ` и до `Law` включительно.\n",
"\n",
"**Внимание:** выбор столбцов по названиям через двоеточие очень напоминает срезы (*slices*) в списках. Но есть важное отличие. В случае текстовых названий, оба конца среза (левый и правый) включаются. Если бы срезы по названиям были бы устроены как срезы по числовым индексам, код выше выдавал бы столбцы с `Econ` и до `Phist`, не включая колонку `Law`, так как в обычных срезах правый конец исключается."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Выбор столбцов по номеру**\n",
"\n",
"Иногда может возникнуть необходимость выбрать столбец по его порядковому номеру. Например, когда названий столбцов нет как таковых или когда названия слишком длинные, а переименовывать их нежелательно. Сделать это можно с помощью метода `.iloc`:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"id\n",
"М141БПЛТЛ024 9\n",
"М141БПЛТЛ031 10\n",
"М141БПЛТЛ075 9\n",
"М141БПЛТЛ017 9\n",
"М141БПЛТЛ069 10\n",
"М141БПЛТЛ072 9\n",
"М141БПЛТЛ020 7\n",
"М141БПЛТЛ026 10\n",
"М141БПЛТЛ073 9\n",
"М141БПЛТЛ078 6\n",
"М141БПЛТЛ060 8\n",
"М141БПЛТЛ040 9\n",
"М141БПЛТЛ065 9\n",
"М141БПЛТЛ053 7\n",
"М141БПЛТЛ015 9\n",
"М141БПЛТЛ021 9\n",
"М141БПЛТЛ018 7\n",
"М141БПЛТЛ039 8\n",
"М141БПЛТЛ036 10\n",
"М141БПЛТЛ049 7\n",
"06114043 8\n",
"М141БПЛТЛ048 6\n",
"М141БПЛТЛ034 9\n",
"М141БПЛТЛ045 8\n",
"М141БПЛТЛ033 9\n",
"М141БПЛТЛ083 5\n",
"М141БПЛТЛ008 8\n",
"М141БПЛТЛ001 7\n",
"М141БПЛТЛ038 9\n",
"М141БПЛТЛ052 7\n",
"М141БПЛТЛ011 6\n",
"М141БПЛТЛ004 7\n",
"М141БПЛТЛ010 6\n",
"М141БПЛТЛ071 9\n",
"М141БПЛТЛ035 6\n",
"М141БПЛТЛ030 6\n",
"М141БПЛТЛ070 5\n",
"М141БПЛТЛ051 9\n",
"М141БПЛТЛ046 7\n",
"М141БПЛТЛ047 8\n",
"М141БПЛТЛ063 5\n",
"М141БПЛТЛ029 8\n",
"М141БПЛТЛ064 8\n",
"М141БПЛТЛ076 7\n",
"М141БПЛТЛ062 7\n",
"М141БПЛТЛ074 6\n",
"130232038 7\n",
"М141БПЛТЛ023 9\n",
"М141БПЛТЛ054 8\n",
"М141БПЛТЛ012 6\n",
"М141БПЛТЛ006 5\n",
"М141БПЛТЛ055 5\n",
"М141БПЛТЛ007 7\n",
"М141БПЛТЛ050 6\n",
"М141БПЛТЛ066 10\n",
"М141БПЛТЛ043 5\n",
"М141БПЛТЛ084 7\n",
"М141БПЛТЛ005 7\n",
"М141БПЛТЛ044 5\n",
"13051038 4\n",
"Name: mstat, dtype: int64"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[:, 1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Синтаксис кода с `.iloc` несильно отличается от синтаксиса `.loc`. В чем разница? Разница заключается в том, что метод `.loc` работает с текстовыми названиями, а метод `.iloc` – с числовыми индексами. Отсюда и префикс `i` в названии (*i* – индекс, *loc* – location). Если мы попытаемся в `.iloc` указать названия столбцов, Python выдаст ошибку:"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"ename": "TypeError",
"evalue": "cannot do slice indexing on with these indexers [mstat] of ",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0miloc\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'mstat'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'econ'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32m/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m 1470\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mKeyError\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mIndexError\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1471\u001b[0m \u001b[0;32mpass\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1472\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_tuple\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1473\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1474\u001b[0m \u001b[0;31m# we by definition only have the 0th axis\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_getitem_tuple\u001b[0;34m(self, tup)\u001b[0m\n\u001b[1;32m 2027\u001b[0m \u001b[0;32mcontinue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2028\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2029\u001b[0;31m \u001b[0mretval\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mretval\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2030\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2031\u001b[0m \u001b[0;31m# if the dim was reduced, then pass a lower-dim the next time\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_getitem_axis\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 2078\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2079\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mslice\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2080\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_slice_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2081\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2082\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_get_slice_axis\u001b[0;34m(self, slice_obj, axis)\u001b[0m\n\u001b[1;32m 2046\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdeep\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2047\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2048\u001b[0;31m \u001b[0mslice_obj\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_convert_slice_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mslice_obj\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2049\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mslice_obj\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mslice\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2050\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_slice\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mslice_obj\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkind\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'iloc'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py\u001b[0m in \u001b[0;36m_convert_slice_indexer\u001b[0;34m(self, key, axis)\u001b[0m\n\u001b[1;32m 264\u001b[0m \u001b[0;31m# if we are accessing via lowered dim, use the last dim\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 265\u001b[0m \u001b[0max\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_axis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maxis\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mndim\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 266\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0max\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_convert_slice_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkind\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 267\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 268\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_has_valid_setitem_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindexer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36m_convert_slice_indexer\u001b[0;34m(self, key, kind)\u001b[0m\n\u001b[1;32m 1688\u001b[0m \u001b[0;31m# validate iloc\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1689\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mkind\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'iloc'\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1690\u001b[0;31m return slice(self._validate_indexer('slice', key.start, kind),\n\u001b[0m\u001b[1;32m 1691\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validate_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'slice'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstop\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkind\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1692\u001b[0m self._validate_indexer('slice', key.step, kind))\n",
"\u001b[0;32m/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36m_validate_indexer\u001b[0;34m(self, form, key, kind)\u001b[0m\n\u001b[1;32m 4126\u001b[0m \u001b[0;32mpass\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4127\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mkind\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m'iloc'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'getitem'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 4128\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_invalid_indexer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mform\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 4129\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4130\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36m_invalid_indexer\u001b[0;34m(self, form, key)\u001b[0m\n\u001b[1;32m 1846\u001b[0m \"indexers [{key}] of {kind}\".format(\n\u001b[1;32m 1847\u001b[0m \u001b[0mform\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mform\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mklass\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1848\u001b[0;31m kind=type(key)))\n\u001b[0m\u001b[1;32m 1849\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1850\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mget_duplicates\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mTypeError\u001b[0m: cannot do slice indexing on with these indexers [mstat] of "
]
}
],
"source": [
"df.iloc[:, 'mstat': 'econ']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python пишет, что невозможно взять срез по индексам, которые имеют строковый тип (`class 'str'`), так как в квадратных скобках ожидаются числовые (целочисленные) индексы.\n",
"\n",
"Если нужно выбрать несколько столбцов подряд, можно воспользоваться срезами:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
mstat
\n",
"
soc
\n",
"
\n",
"
\n",
"
id
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
М141БПЛТЛ024
\n",
"
9
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ031
\n",
"
10
\n",
"
10
\n",
"
\n",
"
\n",
"
М141БПЛТЛ075
\n",
"
9
\n",
"
9
\n",
"
\n",
"
\n",
"
М141БПЛТЛ017
\n",
"
9
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ069
\n",
"
10
\n",
"
10
\n",
"
\n",
"
\n",
"
М141БПЛТЛ072
\n",
"
9
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ020
\n",
"
7
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ026
\n",
"
10
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ073
\n",
"
9
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ078
\n",
"
6
\n",
"
9
\n",
"
\n",
"
\n",
"
М141БПЛТЛ060
\n",
"
8
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ040
\n",
"
9
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ065
\n",
"
9
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ053
\n",
"
7
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ015
\n",
"
9
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ021
\n",
"
9
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ018
\n",
"
7
\n",
"
9
\n",
"
\n",
"
\n",
"
М141БПЛТЛ039
\n",
"
8
\n",
"
9
\n",
"
\n",
"
\n",
"
М141БПЛТЛ036
\n",
"
10
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ049
\n",
"
7
\n",
"
6
\n",
"
\n",
"
\n",
"
06114043
\n",
"
8
\n",
"
10
\n",
"
\n",
"
\n",
"
М141БПЛТЛ048
\n",
"
6
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ034
\n",
"
9
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ045
\n",
"
8
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ033
\n",
"
9
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ083
\n",
"
5
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ008
\n",
"
8
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ001
\n",
"
7
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ038
\n",
"
9
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ052
\n",
"
7
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ011
\n",
"
6
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ004
\n",
"
7
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ010
\n",
"
6
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ071
\n",
"
9
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ035
\n",
"
6
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ030
\n",
"
6
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ070
\n",
"
5
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ051
\n",
"
9
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ046
\n",
"
7
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ047
\n",
"
8
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ063
\n",
"
5
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ029
\n",
"
8
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ064
\n",
"
8
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ076
\n",
"
7
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ062
\n",
"
7
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ074
\n",
"
6
\n",
"
7
\n",
"
\n",
"
\n",
"
130232038
\n",
"
7
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ023
\n",
"
9
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ054
\n",
"
8
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ012
\n",
"
6
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ006
\n",
"
5
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ055
\n",
"
5
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ007
\n",
"
7
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ050
\n",
"
6
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ066
\n",
"
10
\n",
"
7
\n",
"
\n",
"
\n",
"
М141БПЛТЛ043
\n",
"
5
\n",
"
6
\n",
"
\n",
"
\n",
"
М141БПЛТЛ084
\n",
"
7
\n",
"
8
\n",
"
\n",
"
\n",
"
М141БПЛТЛ005
\n",
"
7
\n",
"
5
\n",
"
\n",
"
\n",
"
М141БПЛТЛ044
\n",
"
5
\n",
"
7
\n",
"
\n",
"
\n",
"
13051038
\n",
"
4
\n",
"
4
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" mstat soc\n",
"id \n",
"М141БПЛТЛ024 9 8\n",
"М141БПЛТЛ031 10 10\n",
"М141БПЛТЛ075 9 9\n",
"М141БПЛТЛ017 9 8\n",
"М141БПЛТЛ069 10 10\n",
"М141БПЛТЛ072 9 8\n",
"М141БПЛТЛ020 7 7\n",
"М141БПЛТЛ026 10 8\n",
"М141БПЛТЛ073 9 8\n",
"М141БПЛТЛ078 6 9\n",
"М141БПЛТЛ060 8 7\n",
"М141БПЛТЛ040 9 8\n",
"М141БПЛТЛ065 9 8\n",
"М141БПЛТЛ053 7 7\n",
"М141БПЛТЛ015 9 7\n",
"М141БПЛТЛ021 9 8\n",
"М141БПЛТЛ018 7 9\n",
"М141БПЛТЛ039 8 9\n",
"М141БПЛТЛ036 10 7\n",
"М141БПЛТЛ049 7 6\n",
"06114043 8 10\n",
"М141БПЛТЛ048 6 8\n",
"М141БПЛТЛ034 9 7\n",
"М141БПЛТЛ045 8 8\n",
"М141БПЛТЛ033 9 8\n",
"М141БПЛТЛ083 5 6\n",
"М141БПЛТЛ008 8 8\n",
"М141БПЛТЛ001 7 7\n",
"М141БПЛТЛ038 9 6\n",
"М141БПЛТЛ052 7 7\n",
"М141БПЛТЛ011 6 8\n",
"М141БПЛТЛ004 7 6\n",
"М141БПЛТЛ010 6 7\n",
"М141БПЛТЛ071 9 7\n",
"М141БПЛТЛ035 6 7\n",
"М141БПЛТЛ030 6 6\n",
"М141БПЛТЛ070 5 6\n",
"М141БПЛТЛ051 9 8\n",
"М141БПЛТЛ046 7 7\n",
"М141БПЛТЛ047 8 6\n",
"М141БПЛТЛ063 5 6\n",
"М141БПЛТЛ029 8 8\n",
"М141БПЛТЛ064 8 6\n",
"М141БПЛТЛ076 7 8\n",
"М141БПЛТЛ062 7 7\n",
"М141БПЛТЛ074 6 7\n",
"130232038 7 6\n",
"М141БПЛТЛ023 9 6\n",
"М141БПЛТЛ054 8 6\n",
"М141БПЛТЛ012 6 7\n",
"М141БПЛТЛ006 5 6\n",
"М141БПЛТЛ055 5 6\n",
"М141БПЛТЛ007 7 7\n",
"М141БПЛТЛ050 6 6\n",
"М141БПЛТЛ066 10 7\n",
"М141БПЛТЛ043 5 6\n",
"М141БПЛТЛ084 7 8\n",
"М141БПЛТЛ005 7 5\n",
"М141БПЛТЛ044 5 7\n",
"13051038 4 4"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[:, 1:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Числовые срезы в `pandas` уже ничем не отличаются от списковых срезов: правый конец среза не включается. В нашем случае мы выбрали только столбцы с индексами 1 и 2."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Выбор строк по названию**"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Выбор строки по названию происходит аналогичным образом, только здесь метод `.loc` уже обязателен."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"catps 8.0\n",
"mstat 10.0\n",
"soc 10.0\n",
"econ 10.0\n",
"eng 10.0\n",
"polth 10.0\n",
"mstat2 10.0\n",
"phist 9.0\n",
"law 9.0\n",
"phil 10.0\n",
"polsoc 10.0\n",
"ptheo 9.0\n",
"preg 8.0\n",
"compp 8.0\n",
"game 9.0\n",
"wpol 10.0\n",
"male 1.0\n",
"Name: М141БПЛТЛ031, dtype: float64"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc['М141БПЛТЛ031'] # строка для студента с номером М141БПЛТЛ031"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"При этом ставить запятую и двоеточие, показывая, что нам нужна одна строка и все столбцы, уже не нужно. Если нам нужно выбрать несколько строк подряд, то `.loc` не нужен:"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
catps
\n",
"
mstat
\n",
"
soc
\n",
"
econ
\n",
"
eng
\n",
"
polth
\n",
"
mstat2
\n",
"
phist
\n",
"
law
\n",
"
phil
\n",
"
polsoc
\n",
"
ptheo
\n",
"
preg
\n",
"
compp
\n",
"
game
\n",
"
wpol
\n",
"
male
\n",
"
\n",
"
\n",
"
id
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
М141БПЛТЛ024
\n",
"
7
\n",
"
9
\n",
"
8
\n",
"
8
\n",
"
9
\n",
"
8
\n",
"
10
\n",
"
8.0
\n",
"
7
\n",
"
9
\n",
"
9
\n",
"
7.0
\n",
"
8
\n",
"
8.0
\n",
"
6
\n",
"
10
\n",
"
1
\n",
"
\n",
"
\n",
"
М141БПЛТЛ031
\n",
"
8
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
9.0
\n",
"
9
\n",
"
10
\n",
"
10
\n",
"
9.0
\n",
"
8
\n",
"
8.0
\n",
"
9
\n",
"
10
\n",
"
1
\n",
"
\n",
"
\n",
"
М141БПЛТЛ075
\n",
"
9
\n",
"
9
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
8.0
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
9.0
\n",
"
8
\n",
"
8.0
\n",
"
7
\n",
"
9
\n",
"
1
\n",
"
\n",
"
\n",
"
М141БПЛТЛ017
\n",
"
9
\n",
"
9
\n",
"
8
\n",
"
8
\n",
"
9
\n",
"
9
\n",
"
10
\n",
"
6.0
\n",
"
9
\n",
"
9
\n",
"
9
\n",
"
8.0
\n",
"
8
\n",
"
8.0
\n",
"
8
\n",
"
9
\n",
"
0
\n",
"
\n",
"
\n",
"
М141БПЛТЛ069
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
9
\n",
"
8.0
\n",
"
8
\n",
"
10
\n",
"
9
\n",
"
7.0
\n",
"
6
\n",
"
5.0
\n",
"
8
\n",
"
10
\n",
"
1
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" catps mstat soc econ eng polth mstat2 phist law phil \\\n",
"id \n",
"М141БПЛТЛ024 7 9 8 8 9 8 10 8.0 7 9 \n",
"М141БПЛТЛ031 8 10 10 10 10 10 10 9.0 9 10 \n",
"М141БПЛТЛ075 9 9 9 10 9 10 9 8.0 9 10 \n",
"М141БПЛТЛ017 9 9 8 8 9 9 10 6.0 9 9 \n",
"М141БПЛТЛ069 10 10 10 10 10 10 9 8.0 8 10 \n",
"\n",
" polsoc ptheo preg compp game wpol male \n",
"id \n",
"М141БПЛТЛ024 9 7.0 8 8.0 6 10 1 \n",
"М141БПЛТЛ031 10 9.0 8 8.0 9 10 1 \n",
"М141БПЛТЛ075 9 9.0 8 8.0 7 9 1 \n",
"М141БПЛТЛ017 9 8.0 8 8.0 8 9 0 \n",
"М141БПЛТЛ069 9 7.0 6 5.0 8 10 1 "
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[\"М141БПЛТЛ024\":'М141БПЛТЛ069']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Как Python понимает, что мы просим вывести именно строки с такими названиями, а не столбцы? Потому что у нас стоят одинарные квадратные скобки, а не двойные, как в случае со столбцами. (Да, в `pandas` много всяких тонкостей, но чтобы хорошо в них разбираться, нужно просто попрактиковаться и привыкнуть).\n",
"\n",
"Обратите внимание: разницы между двойными и одинарными кавычками нет, строки можно вводить в любых кавычках, как в примере выше."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Выбор строк по номеру**\n",
"\n",
"В этом случае достаточно указать номер в квадратных скобках в `.iloc`:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"catps 9.0\n",
"mstat 9.0\n",
"soc 9.0\n",
"econ 10.0\n",
"eng 9.0\n",
"polth 10.0\n",
"mstat2 9.0\n",
"phist 8.0\n",
"law 9.0\n",
"phil 10.0\n",
"polsoc 9.0\n",
"ptheo 9.0\n",
"preg 8.0\n",
"compp 8.0\n",
"game 7.0\n",
"wpol 9.0\n",
"male 1.0\n",
"Name: М141БПЛТЛ075, dtype: float64"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Если нужно несколько строк подряд, можно воспользоваться срезами:"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
catps
\n",
"
mstat
\n",
"
soc
\n",
"
econ
\n",
"
eng
\n",
"
polth
\n",
"
mstat2
\n",
"
phist
\n",
"
law
\n",
"
phil
\n",
"
polsoc
\n",
"
ptheo
\n",
"
preg
\n",
"
compp
\n",
"
game
\n",
"
wpol
\n",
"
male
\n",
"
\n",
"
\n",
"
id
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
М141БПЛТЛ031
\n",
"
8
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
9.0
\n",
"
9
\n",
"
10
\n",
"
10
\n",
"
9.0
\n",
"
8
\n",
"
8.0
\n",
"
9
\n",
"
10
\n",
"
1
\n",
"
\n",
"
\n",
"
М141БПЛТЛ075
\n",
"
9
\n",
"
9
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
8.0
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
9.0
\n",
"
8
\n",
"
8.0
\n",
"
7
\n",
"
9
\n",
"
1
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" catps mstat soc econ eng polth mstat2 phist law phil \\\n",
"id \n",
"М141БПЛТЛ031 8 10 10 10 10 10 10 9.0 9 10 \n",
"М141БПЛТЛ075 9 9 9 10 9 10 9 8.0 9 10 \n",
"\n",
" polsoc ptheo preg compp game wpol male \n",
"id \n",
"М141БПЛТЛ031 10 9.0 8 8.0 9 10 1 \n",
"М141БПЛТЛ075 9 9.0 8 8.0 7 9 1 "
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df[1:3] # и без iloc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Если нужно несколько строк не подряд, можно просто перечислить внутри списка в `.iloc`:"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
catps
\n",
"
mstat
\n",
"
soc
\n",
"
econ
\n",
"
eng
\n",
"
polth
\n",
"
mstat2
\n",
"
phist
\n",
"
law
\n",
"
phil
\n",
"
polsoc
\n",
"
ptheo
\n",
"
preg
\n",
"
compp
\n",
"
game
\n",
"
wpol
\n",
"
male
\n",
"
\n",
"
\n",
"
id
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
М141БПЛТЛ031
\n",
"
8
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
9.0
\n",
"
9
\n",
"
10
\n",
"
10
\n",
"
9.0
\n",
"
8
\n",
"
8.0
\n",
"
9
\n",
"
10
\n",
"
1
\n",
"
\n",
"
\n",
"
М141БПЛТЛ075
\n",
"
9
\n",
"
9
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
8.0
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
9.0
\n",
"
8
\n",
"
8.0
\n",
"
7
\n",
"
9
\n",
"
1
\n",
"
\n",
"
\n",
"
М141БПЛТЛ072
\n",
"
10
\n",
"
9
\n",
"
8
\n",
"
10
\n",
"
9
\n",
"
8
\n",
"
9
\n",
"
8.0
\n",
"
8
\n",
"
10
\n",
"
9
\n",
"
7.0
\n",
"
8
\n",
"
8.0
\n",
"
9
\n",
"
9
\n",
"
0
\n",
"
\n",
"
\n",
"
М141БПЛТЛ060
\n",
"
7
\n",
"
8
\n",
"
7
\n",
"
7
\n",
"
9
\n",
"
8
\n",
"
8
\n",
"
5.0
\n",
"
7
\n",
"
5
\n",
"
8
\n",
"
5.0
\n",
"
7
\n",
"
8.0
\n",
"
7
\n",
"
9
\n",
"
1
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" catps mstat soc econ eng polth mstat2 phist law phil \\\n",
"id \n",
"М141БПЛТЛ031 8 10 10 10 10 10 10 9.0 9 10 \n",
"М141БПЛТЛ075 9 9 9 10 9 10 9 8.0 9 10 \n",
"М141БПЛТЛ072 10 9 8 10 9 8 9 8.0 8 10 \n",
"М141БПЛТЛ060 7 8 7 7 9 8 8 5.0 7 5 \n",
"\n",
" polsoc ptheo preg compp game wpol male \n",
"id \n",
"М141БПЛТЛ031 10 9.0 8 8.0 9 10 1 \n",
"М141БПЛТЛ075 9 9.0 8 8.0 7 9 1 \n",
"М141БПЛТЛ072 9 7.0 8 8.0 9 9 0 \n",
"М141БПЛТЛ060 8 5.0 7 8.0 7 9 1 "
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.iloc[[1, 2, 5, 10]]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Удаление пропущенных значений\n",
"\n",
"Мы уже видели, что в данном датафрейме есть строки (и столбцы) с пропущенными значениями (`NaN`). Из-за наличия этих таких значений содержащие их столбцы, даже если остальные значения являются целыми, имеют тип `float`. \n",
"\n",
"Удалим строки с пропущенными значениями из датафрейма совсем:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [],
"source": [
"df = df.dropna()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Однако, если посмотрим на обновленный датасет, тип `float` никуда не исчез:"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
catps
\n",
"
mstat
\n",
"
soc
\n",
"
econ
\n",
"
eng
\n",
"
polth
\n",
"
mstat2
\n",
"
phist
\n",
"
law
\n",
"
phil
\n",
"
polsoc
\n",
"
ptheo
\n",
"
preg
\n",
"
compp
\n",
"
game
\n",
"
wpol
\n",
"
male
\n",
"
\n",
"
\n",
"
id
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
М141БПЛТЛ024
\n",
"
7
\n",
"
9
\n",
"
8
\n",
"
8
\n",
"
9
\n",
"
8
\n",
"
10
\n",
"
8.0
\n",
"
7
\n",
"
9
\n",
"
9
\n",
"
7.0
\n",
"
8
\n",
"
8.0
\n",
"
6
\n",
"
10
\n",
"
1
\n",
"
\n",
"
\n",
"
М141БПЛТЛ031
\n",
"
8
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
9.0
\n",
"
9
\n",
"
10
\n",
"
10
\n",
"
9.0
\n",
"
8
\n",
"
8.0
\n",
"
9
\n",
"
10
\n",
"
1
\n",
"
\n",
"
\n",
"
М141БПЛТЛ075
\n",
"
9
\n",
"
9
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
8.0
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
9.0
\n",
"
8
\n",
"
8.0
\n",
"
7
\n",
"
9
\n",
"
1
\n",
"
\n",
"
\n",
"
М141БПЛТЛ017
\n",
"
9
\n",
"
9
\n",
"
8
\n",
"
8
\n",
"
9
\n",
"
9
\n",
"
10
\n",
"
6.0
\n",
"
9
\n",
"
9
\n",
"
9
\n",
"
8.0
\n",
"
8
\n",
"
8.0
\n",
"
8
\n",
"
9
\n",
"
0
\n",
"
\n",
"
\n",
"
М141БПЛТЛ069
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
9
\n",
"
8.0
\n",
"
8
\n",
"
10
\n",
"
9
\n",
"
7.0
\n",
"
6
\n",
"
5.0
\n",
"
8
\n",
"
10
\n",
"
1
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" catps mstat soc econ eng polth mstat2 phist law phil \\\n",
"id \n",
"М141БПЛТЛ024 7 9 8 8 9 8 10 8.0 7 9 \n",
"М141БПЛТЛ031 8 10 10 10 10 10 10 9.0 9 10 \n",
"М141БПЛТЛ075 9 9 9 10 9 10 9 8.0 9 10 \n",
"М141БПЛТЛ017 9 9 8 8 9 9 10 6.0 9 9 \n",
"М141БПЛТЛ069 10 10 10 10 10 10 9 8.0 8 10 \n",
"\n",
" polsoc ptheo preg compp game wpol male \n",
"id \n",
"М141БПЛТЛ024 9 7.0 8 8.0 6 10 1 \n",
"М141БПЛТЛ031 10 9.0 8 8.0 9 10 1 \n",
"М141БПЛТЛ075 9 9.0 8 8.0 7 9 1 \n",
"М141БПЛТЛ017 9 8.0 8 8.0 8 9 0 \n",
"М141БПЛТЛ069 9 7.0 6 5.0 8 10 1 "
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Применим преобразование типов."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Преобразование типов столбцов"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Просто воспользуемся методом `.astype()`, который преобразует тип столбца в тот, который мы укажем (если это возможно, разумеется):"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy\n",
" \"\"\"Entry point for launching an IPython kernel.\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
catps
\n",
"
mstat
\n",
"
soc
\n",
"
econ
\n",
"
eng
\n",
"
polth
\n",
"
mstat2
\n",
"
phist
\n",
"
law
\n",
"
phil
\n",
"
polsoc
\n",
"
ptheo
\n",
"
preg
\n",
"
compp
\n",
"
game
\n",
"
wpol
\n",
"
male
\n",
"
\n",
"
\n",
"
id
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
М141БПЛТЛ024
\n",
"
7
\n",
"
9
\n",
"
8
\n",
"
8
\n",
"
9
\n",
"
8
\n",
"
10
\n",
"
8
\n",
"
7
\n",
"
9
\n",
"
9
\n",
"
7.0
\n",
"
8
\n",
"
8.0
\n",
"
6
\n",
"
10
\n",
"
1
\n",
"
\n",
"
\n",
"
М141БПЛТЛ031
\n",
"
8
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
9
\n",
"
9
\n",
"
10
\n",
"
10
\n",
"
9.0
\n",
"
8
\n",
"
8.0
\n",
"
9
\n",
"
10
\n",
"
1
\n",
"
\n",
"
\n",
"
М141БПЛТЛ075
\n",
"
9
\n",
"
9
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
8
\n",
"
9
\n",
"
10
\n",
"
9
\n",
"
9.0
\n",
"
8
\n",
"
8.0
\n",
"
7
\n",
"
9
\n",
"
1
\n",
"
\n",
"
\n",
"
М141БПЛТЛ017
\n",
"
9
\n",
"
9
\n",
"
8
\n",
"
8
\n",
"
9
\n",
"
9
\n",
"
10
\n",
"
6
\n",
"
9
\n",
"
9
\n",
"
9
\n",
"
8.0
\n",
"
8
\n",
"
8.0
\n",
"
8
\n",
"
9
\n",
"
0
\n",
"
\n",
"
\n",
"
М141БПЛТЛ069
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
10
\n",
"
9
\n",
"
8
\n",
"
8
\n",
"
10
\n",
"
9
\n",
"
7.0
\n",
"
6
\n",
"
5.0
\n",
"
8
\n",
"
10
\n",
"
1
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" catps mstat soc econ eng polth mstat2 phist law phil \\\n",
"id \n",
"М141БПЛТЛ024 7 9 8 8 9 8 10 8 7 9 \n",
"М141БПЛТЛ031 8 10 10 10 10 10 10 9 9 10 \n",
"М141БПЛТЛ075 9 9 9 10 9 10 9 8 9 10 \n",
"М141БПЛТЛ017 9 9 8 8 9 9 10 6 9 9 \n",
"М141БПЛТЛ069 10 10 10 10 10 10 9 8 8 10 \n",
"\n",
" polsoc ptheo preg compp game wpol male \n",
"id \n",
"М141БПЛТЛ024 9 7.0 8 8.0 6 10 1 \n",
"М141БПЛТЛ031 10 9.0 8 8.0 9 10 1 \n",
"М141БПЛТЛ075 9 9.0 8 8.0 7 9 1 \n",
"М141БПЛТЛ017 9 8.0 8 8.0 8 9 0 \n",
"М141БПЛТЛ069 9 7.0 6 5.0 8 10 1 "
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['phist'] = df['phist'].astype(int)\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Описательные статистики и базовые графики"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"В самом начале мы обсуждали описание базы данных с помощью метода `.describe()`. Помимо этого метода существует много методов, которые выводят отдельные статистики."
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"catps 7.0\n",
"mstat 7.5\n",
"soc 7.0\n",
"econ 6.0\n",
"eng 8.5\n",
"polth 6.0\n",
"mstat2 7.0\n",
"phist 6.0\n",
"law 7.0\n",
"phil 6.0\n",
"polsoc 8.0\n",
"ptheo 5.0\n",
"preg 7.0\n",
"compp 5.0\n",
"game 6.0\n",
"wpol 8.0\n",
"male 0.0\n",
"dtype: float64"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.median() # медиана (для всех показателей)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Можно запрашивать статистики по отдельным переменным (столбцам):"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5.833333333333333"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.phist.mean() # среднее арифметическое Phist"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Или по наблюдениям (строкам):"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"6.235294117647059"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.loc[\"М141БПЛТЛ023\"].mean() # средний балл студента по всем курсам"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Давайте теперь построим какие-нибудь графики. Библиотеку pandas удобно использовать в сочетании с библиотекой для построения графиков `matplotlib`. Давайте её импортируем (эта библиотека должна была быть установлена на ваш компьютер вместе с Anaconda)."
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Построим гистограмму для оценок по теории игр."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAD8CAYAAAB6paOMAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAADvJJREFUeJzt3XusZWV9xvHv44wGhmq9cLAKjgcaghpSBY/GSksrSKOioKa2mNpQax2TWkXbREdjiv80wcR6adpYR7zgDSN4rVoEsWqaWHQGaLiMBqsjjKAz1raoUAH99Y+9pg4jMnv2OXu9c877/SQne601e/b7rNkzPKx7qgpJUr/u0zqAJKkti0CSOmcRSFLnLAJJ6pxFIEmdswgkqXMWgSR1ziKQpM5ZBJLUufWtA0zj8MMPr8XFxdYxJGlV2bZt2/eramF/71sVRbC4uMjWrVtbx5CkVSXJt6d5n7uGJKlzFoEkdc4ikKTOWQSS1DmLQJI6ZxFIUucsAknqnEUgSZ2zCCSpc6viyuLlWNz86Sbj7jjv9Cbjalyt/n6Bf8e0ctwikKTOWQSS1DmLQJI6ZxFIUucsAknqnEUgSZ2zCCSpcxaBJHXOIpCkzlkEktQ5i0CSOmcRSFLnLAJJ6pxFIEmdm1sRJHlXkl1Jrt1r2YOTXJbkhuH1QfMaX5I0nXluEbwHeNo+yzYDl1fVscDlw7wkqaG5FUFVfQn4wT6LzwQuGKYvAJ49r/ElSdMZ+xjBQ6vqFoDh9YiRx5ck7eOgPVicZFOSrUm27t69u3UcSVqzxi6C7yV5GMDwuuuXvbGqtlTVUlUtLSwsjBZQknozdhF8Ejh7mD4b+MTI40uS9jHP00cvBL4MHJdkZ5IXAecBpyW5AThtmJckNbR+Xh9cVc//Jb906rzGlCQduIP2YLEkaRwWgSR1ziKQpM5ZBJLUOYtAkjpnEUhS5ywCSeqcRSBJnbMIJKlzFoEkdc4ikKTOWQSS1DmLQJI6N7e7j/ZucfOnm42947zTm40tafVxi0CSOmcRSFLnLAJJ6pxFIEmdswgkqXMWgSR1ziKQpM5ZBJLUOYtAkjpnEUhS5ywCSeqcRSBJnbMIJKlzFoEkdc4ikKTONSmCJK9Mcl2Sa5NcmOSQFjkkSQ2KIMmRwMuBpao6HlgHnDV2DknSRKtdQ+uBQ5OsBzYANzfKIUndG/1RlVX1nSRvBG4EbgcurapL931fkk3AJoCNGzeOG1Iz8fGc42r1593jn/Va12LX0IOAM4GjgYcDhyV5wb7vq6otVbVUVUsLCwtjx5SkbrTYNfRU4FtVtbuq7gQ+Cjy5QQ5JEm2K4EbgSUk2JAlwKrC9QQ5JEg2KoKquAC4GrgSuGTJsGTuHJGli9IPFAFV1LnBui7ElSXfnlcWS1DmLQJI6ZxFIUucsAknqnEUgSZ2zCCSpcxaBJHXOIpCkzlkEktQ5i0CSOmcRSFLnLAJJ6txURZDk+HkHkSS1Me0WwT8m+UqSP0/ywLkmkiSNaqoiqKrfAv4IeASwNckHk5w212SSpFFMfYygqm4AXge8Gvgd4O+SfC3Jc+cVTpI0f9MeI/iNJG9m8kjJU4BnVdWjh+k3zzGfJGnOpn1C2d8D7wBeW1W371lYVTcned1ckkmSRjFtETwDuL2qfgqQ5D7AIVV1W1W9b27pJElzN+0xgs8Bh+41v2FYJkla5aYtgkOq6kd7ZobpDfOJJEka07RF8OMkJ+6ZSfJ44PZ7eb8kaZWY9hjBK4CLktw8zD8M+MP5RJIkjWmqIqiqryZ5FHAcEOBrVXXnXJNJkkYx7RYBwBOAxeH3nJCEqnrvXFJJkkYzVREkeR/w68DVwE+HxQVYBJK0yk27RbAEPKaqap5hJEnjm/asoWuBX5tnEElSG9NuERwOXJ/kK8BP9iysqjNmGXS4lfX5wPFMdjH9aVV9eZbPkiQtz7RF8PoVHvetwCVV9ftJ7ocXp0lSM9OePvrFJI8Ejq2qzyXZAKybZcAkDwBOBv5k+Ow7gDtm+SxJ0vJNexvqFwMXA28fFh0JfHzGMY8BdgPvTnJVkvOTHDbjZ0mSlmnag8UvBU4CboX/f0jNETOOuR44EXhbVZ0A/BjYvO+bkmxKsjXJ1t27d884lCRpf6Ytgp8Mu3AASLKeyUHeWewEdlbVFcP8xUyK4W6qaktVLVXV0sLCwoxDSZL2Z9oi+GKS1wKHDs8qvgj4p1kGrKrvAjclOW5YdCpw/SyfJUlavmnPGtoMvAi4BngJ8Bkmp3/O6mXAB4Yzhr4JvHAZnyVJWoZpzxr6GZNHVb5jJQatqquZXK0sSWps2nsNfYt7OCZQVceseCJJ0qgO5F5DexwCPA948MrHkSSNbaqDxVX1n3v9fKeq3gKcMudskqQRTLtraO/TO+/DZAvh/nNJJEka1bS7hv52r+m7gB3AH6x4GknS6KY9a+gp8w4iSWpj2l1Df3lvv15Vb1qZOJKksR3IWUNPAD45zD8L+BJw0zxCSZLGcyAPpjmxqn4IkOT1wEVV9WfzCiZJGse09xrayN2fGXAHsLjiaSRJo5t2i+B9wFeSfIzJFcbPAd47t1SSpNFMe9bQ3yT5Z+C3h0UvrKqr5hdLkjSWaXcNweS5wrdW1VuBnUmOnlMmSdKIpn1U5bnAq4HXDIvuC7x/XqEkSeOZdovgOcAZTB4rSVXdjLeYkKQ1YdoiuKOqiuFW1D5sXpLWjmmL4MNJ3g48MMmLgc+xQg+pkSS1Ne1ZQ28cnlV8K3Ac8NdVddlck0mSRrHfIkiyDvhsVT0V8D/+krTG7HfXUFX9FLgtya+OkEeSNLJpryz+X+CaJJcxnDkEUFUvn0sqSdJopi2CTw8/kqQ15l6LIMnGqrqxqi4YK5AkaVz7O0bw8T0TST4y5yySpAb2VwTZa/qYeQaRJLWxvyKoXzItSVoj9new+LFJbmWyZXDoMM0wX1X1gLmmkyTN3b0WQVWtGyuIJKmNA3kegSRpDWpWBEnWJbkqyadaZZAktd0iOAfY3nB8SRKNiiDJUcDpwPktxpck/dy0t5hYaW8BXsW9POUsySZgE8DGjRtHiiXpYLa4uc2dbnacd3qTcccy+hZBkmcCu6pq2729r6q2VNVSVS0tLCyMlE6S+tNi19BJwBlJdgAfAk5J8v4GOSRJNCiCqnpNVR1VVYvAWcDnq+oFY+eQJE14HYEkda7VwWIAquoLwBdaZpCk3rlFIEmdswgkqXMWgSR1ziKQpM5ZBJLUOYtAkjpnEUhS5ywCSeqcRSBJnbMIJKlzFoEkdc4ikKTOWQSS1Lmmdx+VpNVgrT8i0y0CSeqcRSBJnbMIJKlzFoEkdc4ikKTOWQSS1DmLQJI6ZxFIUucsAknqnEUgSZ2zCCSpcxaBJHXOIpCkzlkEktS50YsgySOS/EuS7UmuS3LO2BkkST/X4nkEdwF/VVVXJrk/sC3JZVV1fYMsktS90bcIquqWqrpymP4hsB04cuwckqSJpscIkiwCJwBXtMwhST1rVgRJfgX4CPCKqrr1Hn59U5KtSbbu3r17/ICS1IkmRZDkvkxK4ANV9dF7ek9VbamqpapaWlhYGDegJHWkxVlDAd4JbK+qN409viTp7lpsEZwE/DFwSpKrh59nNMghSaLB6aNV9a9Axh5XknTPvLJYkjpnEUhS5ywCSeqcRSBJnbMIJKlzFoEkdc4ikKTOWQSS1DmLQJI6ZxFIUucsAknqnEUgSZ2zCCSpcxaBJHXOIpCkzlkEktQ5i0CSOmcRSFLnLAJJ6pxFIEmdswgkqXMWgSR1ziKQpM5ZBJLUOYtAkjpnEUhS5ywCSeqcRSBJnbMIJKlzTYogydOSfD3JN5JsbpFBkjQxehEkWQf8A/B04DHA85M8ZuwckqSJFlsETwS+UVXfrKo7gA8BZzbIIUmiTREcCdy01/zOYZkkqYH1DcbMPSyrX3hTsgnYNMz+KMnXZxzvcOD7M/7eg81U65I3jJBkeVb8O2m4zv79Ojitie8lb1j2ejxymje1KIKdwCP2mj8KuHnfN1XVFmDLcgdLsrWqlpb7OQeDtbIua2U9wHU5WK2VdRlrPVrsGvoqcGySo5PcDzgL+GSDHJIkGmwRVNVdSf4C+CywDnhXVV03dg5J0kSLXUNU1WeAz4w03LJ3Lx1E1sq6rJX1ANflYLVW1mWU9UjVLxynlSR1xFtMSFLn1nQRJFmX5Kokn2qdZTmS7EhyTZKrk2xtnWc5kjwwycVJvpZke5LfbJ1pFkmOG76PPT+3JnlF61yzSPLKJNcluTbJhUkOaZ1pVknOGdbjutX2fSR5V5JdSa7da9mDk1yW5Ibh9UHzGHtNFwFwDrC9dYgV8pSqetwaOCXurcAlVfUo4LGs0u+nqr4+fB+PAx4P3AZ8rHGsA5bkSODlwFJVHc/kBI6z2qaaTZLjgRczuXvBY4FnJjm2baoD8h7gafss2wxcXlXHApcP8ytuzRZBkqOA04HzW2fRRJIHACcD7wSoqjuq6r/bploRpwL/UVXfbh1kRuuBQ5OsBzZwD9f1rBKPBv6tqm6rqruALwLPaZxpalX1JeAH+yw+E7hgmL4AePY8xl6zRQC8BXgV8LPWQVZAAZcm2TZccb1aHQPsBt497LI7P8lhrUOtgLOAC1uHmEVVfQd4I3AjcAvwP1V1adtUM7sWODnJQ5JsAJ7B3S9eXY0eWlW3AAyvR8xjkDVZBEmeCeyqqm2ts6yQk6rqRCZ3bH1pkpNbB5rReuBE4G1VdQLwY+a0qTuW4aLIM4CLWmeZxbDP+UzgaODhwGFJXtA21WyqajvwBuAy4BLg34G7moZaJdZkEQAnAWck2cHk7qanJHl/20izq6qbh9ddTPZDP7FtopntBHZW1RXD/MVMimE1ezpwZVV9r3WQGT0V+FZV7a6qO4GPAk9unGlmVfXOqjqxqk5mspvlhtaZlul7SR4GMLzumscga7IIquo1VXVUVS0y2Wz/fFWtyv/LSXJYkvvvmQZ+j8km8KpTVd8Fbkpy3LDoVOD6hpFWwvNZpbuFBjcCT0qyIUmYfCer8gA+QJIjhteNwHNZ3d8NTG6/c/YwfTbwiXkM0uTKYh2QhwIfm/wbZT3wwaq6pG2kZXkZ8IFhl8o3gRc2zjOzYT/0acBLWmeZVVVdkeRi4Eomu1GuYnVflfuRJA8B7gReWlX/1TrQtJJcCPwucHiSncC5wHnAh5O8iElpP28uY3tlsST1bU3uGpIkTc8ikKTOWQSS1DmLQJI6ZxFIUucsAknqnEUgSZ2zCCSpc/8HBCSFFd5RE6UAAAAASUVORK5CYII=\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df[\"game\"].plot.hist() # histogram"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Что показывает этот график? Он показывает, сколько студентов получили те или иные оценки. По гистограмме видно, что больше всего по этому курсу оценок 4 и 7.\n",
"\n",
"Можно поменять цвет гистограммы:"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAD8CAYAAAB6paOMAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAADuFJREFUeJzt3XusZWV9xvHv44wGhmpROVgFx4GGoIZUwaOx0tIK0ngFNbXF1IZa65jUKtomisYU/2liE+ulaWMd8YI3jOC11iKIVdPEojNAw2U0WEUYQWesbVGhAvrrH3tNHcaR2bPP2euds9/vJznZe61Zs99nzZ7hYd1TVUiS+nWf1gEkSW1ZBJLUOYtAkjpnEUhS5ywCSeqcRSBJnbMIJKlzFoEkdc4ikKTOrW8dYBpHHHFEbdq0qXUMSVpTtm3b9r2qWtrfcmuiCDZt2sTWrVtbx5CkNSXJt6ZZzl1DktQ5i0CSOmcRSFLnLAJJ6pxFIEmdswgkqXMWgSR1ziKQpM5ZBJLUuTVxZfGKJG3GrWozrsbV6u8X+HdMq8YtAknqnEUgSZ2zCCSpcxaBJHXOIpCkzlkEktQ5i0CSOmcRSFLnLAJJ6pxFIEmdswgkqXMWgSR1ziKQpM5ZBJLUubkVQZJ3JdmZ5No95j0oyWVJbhheHziv8SVJ05nnFsF7gKfuNe9c4PKqOg64fJiWJDU0tyKoqi8C399r9pnABcP7C4Bnz2t8SdJ0xj5G8JCquhVgeD1y5PElSXs5aA8WJ9mcZGuSrbt27WodR5IW1thF8N0kDwUYXnf+ogWraktVLVfV8tLS0mgBJak3YxfBJ4Gzh/dnA58YeXxJ0l7mefrohcCXgOOT7EjyIuANwOlJbgBOH6YlSQ2tn9cHV9Xzf8EvnTavMSVJB+6gPVgsSRqHRSBJnbMIJKlzFoEkdc4ikKTOWQSS1DmLQJI6ZxFIUucsAknqnEUgSZ2zCCSpcxaBJHXOIpCkzs3t7qPdS9qNXdVubElrjlsEktQ5i0CSOmcRSFLnLAJJ6pxFIEmdswgkqXMWgSR1ziKQpM5ZBJLUOYtAkjpnEUhS5ywCSeqcRSBJnbMIJKlzFoEkda5JESR5ZZLrklyb5MIkh7TIIUlqUARJjgJeDixX1QnAOuCssXNIkiZa7RpaDxyaZD2wAbilUQ5J6t7oj6qsqm8neSNwE3AHcGlVXbr3ckk2A5sBNm7cOG5IzcbHc46r1Z93j3/WC67FrqEHAmcCxwAPAw5L8oK9l6uqLVW1XFXLS0tLY8eUpG602DX0FOCbVbWrqu4CPgo8qUEOSRJtiuAm4IlJNiQJcBqwvUEOSRINiqCqrgAuBq4ErhkybBk7hyRpYvSDxQBVdR5wXouxJUn35JXFktQ5i0CSOmcRSFLnLAJJ6pxFIEmdswgkqXMWgSR1ziKQpM5ZBJLUOYtAkjpnEUhS5ywCSercVEWQ5IR5B5EktTHtFsE/JPlykj9NcvhcE0mSRjVVEVTVbwB/ADwc2Jrkg0lOn2sySdIopj5GUFU3AK8DXg38FvC3Sb6a5LnzCidJmr9pjxH8WpI3M3mk5KnAs6rqUcP7N88xnyRpzqZ9QtnfAe8AXltVd+yeWVW3JHndXJJJkkYxbRE8Hbijqn4CkOQ+wCFVdXtVvW9u6SRJczftMYLPAofuMb1hmCdJWuOmLYJDquqHuyeG9xvmE0mSNKZpi+BHSU7aPZHkccAd97K8JGmNmPYYwSuAi5LcMkw/FPj9+USSJI1pqiKoqq8keSRwPBDgq1V111yTSZJGMe0WAcDjgU3D7zkxCVX13rmkkiSNZqoiSPI+4FeBq4GfDLMLsAgkaY2bdotgGXh0VdU8w0iSxjftWUPXAr8yzyCSpDam3SI4Arg+yZeBH++eWVVnzDLocCvr84ETmOxi+uOq+tIsnyVJWplpi+D1qzzuW4FLqup3k9wPL06TpGamPX30C0keARxXVZ9NsgFYN8uASR4AnAL80fDZdwJ3zvJZkqSVm/Y21C8GLgbePsw6Cvj4jGMeC+wC3p3kqiTnJzlsxs+SJK3QtAeLXwqcDNwG//+QmiNnHHM9cBLwtqo6EfgRcO7eCyXZnGRrkq27du2acShJ0v5MWwQ/HnbhAJBkPZODvLPYAeyoqiuG6YuZFMM9VNWWqlququWlpaUZh5Ik7c+0RfCFJK8FDh2eVXwR8I+zDFhV3wFuTnL8MOs04PpZPkuStHLTnjV0LvAi4BrgJcCnmZz+OauXAR8Yzhj6BvDCFXyWJGkFpj1r6KdMHlX5jtUYtKquZnK1siSpsWnvNfRN9nFMoKqOXfVEkqRRHci9hnY7BHge8KDVjyNJGttUB4ur6j/3+Pl2Vb0FOHXO2SRJI5h219Cep3feh8kWwv3nkkiSNKppdw39zR7v7wZuBH5v1dNIkkY37VlDT553EElSG9PuGvrze/v1qnrT6sSRJI3tQM4aejzwyWH6WcAXgZvnEUqSNJ4DeTDNSVX1A4Akrwcuqqo/mVcwSdI4pr3X0Ebu+cyAO4FNq55GkjS6abcI3gd8OcnHmFxh/BzgvXNLJUkazbRnDf1Vkn8GfnOY9cKqump+sSRJY5l21xBMnit8W1W9FdiR5Jg5ZZIkjWjaR1WeB7waeM0w677A++cVSpI0nmm3CJ4DnMHksZJU1S14iwlJWgjTFsGdVVUMt6L2YfOStDimLYIPJ3k7cHiSFwOfZZUeUiNJamvas4beODyr+DbgeOAvq+qyuSaTJI1iv0WQZB3wmap6CuB//CVpwex311BV/QS4Pckvj5BHkjSyaa8s/l/gmiSXMZw5BFBVL59LKknSaKYtgn8afiRJC+ZeiyDJxqq6qaouGCuQJGlc+ztG8PHdb5J8ZM5ZJEkN7K8Issf7Y+cZRJLUxv6KoH7Be0nSgtjfweLHJLmNyZbBocN7humqqgfMNZ0kae7utQiqat1YQSRJbRzI8wgkSQuoWREkWZfkqiSfapVBktR2i+AcYHvD8SVJNCqCJEcDzwDObzG+JOlnpr3FxGp7C/Aq7uUpZ0k2A5sBNm7cOFIsSQe1ZP/LzEMt9tnzo28RJHkmsLOqtt3bclW1paqWq2p5aWlppHSS1J8Wu4ZOBs5IciPwIeDUJO9vkEOSRIMiqKrXVNXRVbUJOAv4XFW9YOwckqQJryOQpM61OlgMQFV9Hvh8ywyS1Du3CCSpcxaBJHXOIpCkzlkEktQ5i0CSOmcRSFLnLAJJ6pxFIEmdswgkqXMWgSR1ziKQpM5ZBJLUOYtAkjrX9O6jkrQmLPgjMt0ikKTOWQSS1DmLQJI6ZxFIUucsAknqnEUgSZ2zCCSpcxaBJHXOIpCkzlkEktQ5i0CSOmcRSFLnLAJJ6pxFIEmdG70Ikjw8yb8k2Z7kuiTnjJ1BkvQzLZ5HcDfwF1V1ZZL7A9uSXFZV1zfIIkndG32LoKpuraorh/c/ALYDR42dQ5I00fQYQZJNwInAFS1zSFLPmhVBkl8CPgK8oqpu28evb06yNcnWXbt2jR9QkjrRpAiS3JdJCXygqj66r2WqaktVLVfV8tLS0rgBJakjLc4aCvBOYHtVvWns8SVJ99Rii+Bk4A+BU5NcPfw8vUEOSRINTh+tqn8FMva4kqR988piSeqcRSBJnbMIJKlzFoEkdc4ikKTOWQSS1DmLQJI6ZxFIUucsAknqnEUgSZ2zCCSpcxaBJHXOIpCkzlkEktQ5i0CSOmcRSFLnLAJJ6pxFIEmdswgkqXMWgSR1ziKQpM5ZBJLUOYtAkjpnEUhS5ywCSeqcRSBJnbMIJKlzFoEkdc4ikKTONSmCJE9N8rUkX09ybosMkqSJ0YsgyTrg74GnAY8Gnp/k0WPnkCRNtNgieALw9ar6RlXdCXwIOLNBDkkSbYrgKODmPaZ3DPMkSQ2sbzBm9jGvfm6hZDOweZj8YZKvzTjeEcD3Zvy9B5vp1iX7+iM+qKz+d9Junf37dXBajO8lWel6PGKahVoUwQ7g4XtMHw3csvdCVbUF2LLSwZJsrarllX7OwWBR1mVR1gNcl4PVoqzLWOvRYtfQV4DjkhyT5H7AWcAnG+SQJNFgi6Cq7k7yZ8BngHXAu6rqurFzSJImWuwaoqo+DXx6pOFWvHvpILIo67Io6wGuy8FqUdZllPVI1c8dp5UkdcRbTEhS5xa6CJKsS3JVkk+1zrISSW5Mck2Sq5NsbZ1nJZIcnuTiJF9Nsj3Jr7fONIskxw/fx+6f25K8onWuWSR5ZZLrklyb5MIkh7TONKsk5wzrcd1a+z6SvCvJziTX7jHvQUkuS3LD8PrAeYy90EUAnANsbx1ilTy5qh67AKfEvRW4pKoeCTyGNfr9VNXXhu/jscDjgNuBjzWOdcCSHAW8HFiuqhOYnMBxVttUs0lyAvBiJncveAzwzCTHtU11QN4DPHWveecCl1fVccDlw/SqW9giSHI08Azg/NZZNJHkAcApwDsBqurOqvrvtqlWxWnAf1TVt1oHmdF64NAk64EN7OO6njXiUcC/VdXtVXU38AXgOY0zTa2qvgh8f6/ZZwIXDO8vAJ49j7EXtgiAtwCvAn7aOsgqKODSJNuGK67XqmOBXcC7h1125yc5rHWoVXAWcGHrELOoqm8DbwRuAm4F/qeqLm2bambXAqckeXCSDcDTuefFq2vRQ6rqVoDh9ch5DLKQRZDkmcDOqtrWOssqObmqTmJyx9aXJjmldaAZrQdOAt5WVScCP2JOm7pjGS6KPAO4qHWWWQz7nM8EjgEeBhyW5AVtU82mqrYDfw1cBlwC/Dtwd9NQa8RCFgFwMnBGkhuZ3N301CTvbxtpdlV1y/C6k8l+6Ce0TTSzHcCOqrpimL6YSTGsZU8Drqyq77YOMqOnAN+sql1VdRfwUeBJjTPNrKreWVUnVdUpTHaz3NA60wp9N8lDAYbXnfMYZCGLoKpeU1VHV9UmJpvtn6uqNfl/OUkOS3L/3e+B32GyCbzmVNV3gJuTHD/MOg24vmGk1fB81uhuocFNwBOTbEgSJt/JmjyAD5DkyOF1I/Bc1vZ3A5Pb75w9vD8b+MQ8BmlyZbEOyEOAj03+jbIe+GBVXdI20oq8DPjAsEvlG8ALG+eZ2bAf+nTgJa2zzKqqrkhyMXAlk90oV7G2r8r9SJIHA3cBL62q/2odaFpJLgR+GzgiyQ7gPOANwIeTvIhJaT9vLmN7ZbEk9W0hdw1JkqZnEUhS5ywCSeqcRSBJnbMIJKlzFoEkdc4ikKTOWQSS1Ln/A972e/WEttqAAAAAAElFTkSuQmCC\n",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"df[\"game\"].plot.box() # boxplot"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Этот график визуализирует основные описательные статистики переменной и отображает форму её распределения. Нижняя граница яшика – это нижний квартиль, верхняя – верхний квартиль, линяя внутри ящика – медиана. Усы графика могут откладываться по-разному: если в переменной встречаются нетипичные значения (выбросы), то границы усов совпадают с границами типичных значений, если нетипичных значений нет, границы усов соответствуют минимальному и максимальному значению переменной. Подробнее про ящик с усами см. [здесь](https://ru.wikipedia.org/wiki/%D0%AF%D1%89%D0%B8%D0%BA_%D1%81_%D1%83%D1%81%D0%B0%D0%BC%D0%B8)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Фильтрация строк по условиям"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Часто в исследованиях нас не интересует выбор отдельных строк по названию или номеру, мы хотим отбирать строки в таблице согласно некорому условию (условиям). Другими словами, проводить фильтрацию наблюдений. Для этого интересующее нас условие необходимо указать в квадратных скобках. Выберем из датафрейма `df` строки, которые соответствуют студентам с оценкой по экономике выше 6."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"