{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true,
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"\n",
"%matplotlib inline\n",
"\n",
"plt.style.use('ggplot')\n",
"plt.rcParams['figure.figsize'] = (12,5)\n",
"\n",
"# Для кириллицы на графиках\n",
"font = {'family': 'Verdana',\n",
" 'weight': 'normal'}\n",
"plt.rc('font', **font)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Майнор по Анализу Данных, Группа ИАД-2\n",
"## 10/04/2017 Отбор признаков и понижение размерности"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Проклятье размености\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Способы понижения размерности\n",
"\n",
"Избавляться от размерности можно методами **отбора признаков (Feature Selection)** и методами **уменьшения размерности (Feature Reduction)**"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Feature Selection\n",
"Методы деляться на три группы:\n",
"* Filter methods \n",
" * Признаки рассматриваются независимо друг от друга\n",
" * Изучается индивидуальный \"вклад\" призника в предсказываемую переменную\n",
" * Быстрое вычисление\n",
"* Wrapper methods\n",
" * Идет отбор группы признаков\n",
" * Может быть оооочень медленным, но качество, обычно, лучше чем у Filter Methods\n",
"* Embedded methods\n",
" * Отбор признаков \"зашит\" в модель\n",
" * *Пример?*"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Filter method - Mutual Information\n",
"$$MI(y,x) = \\sum_{x,y} p(x,y) \\ln\\left[\\frac{p(x,y)}{p(x)p(y)}\\right]$$\n",
"Сколько информации $x$ сообщает об $y$.\n",
"$$NormalizedMI(y,x) = \\frac{MI(y,x)}{H(y)}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Загрузим довольно известный набор данных о выживаемости после катастрофы титаника."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" Braund, Mr. Owen Harris | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" Heikkinen, Miss. Laina | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" STON/O2. 3101282 | \n",
" 7.9250 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 113803 | \n",
" 53.1000 | \n",
" C123 | \n",
" S | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" Allen, Mr. William Henry | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_titanic = pd.read_csv('titanic.csv')\n",
"df_titanic.head()"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" Sex | \n",
" female | \n",
" male | \n",
"
\n",
" \n",
" Survived | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.090909 | \n",
" 0.525253 | \n",
"
\n",
" \n",
" 1 | \n",
" 0.261504 | \n",
" 0.122334 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Sex female male\n",
"Survived \n",
"0 0.090909 0.525253\n",
"1 0.261504 0.122334"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.crosstab(df_titanic.Survived, df_titanic.Sex, normalize=True, )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Найдем MI между выживаемостью и остальными признаками"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def calc_mutual_information(y, x):\n",
" P = pd.crosstab(x, y, normalize=True).values\n",
" logP = np.log(((P/P.sum(axis=0)).T/P.sum(axis=1)).T)\n",
" \n",
" return (P*logP).sum()"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.15087048925218172"
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"calc_mutual_information(df_titanic.Survived, df_titanic.Sex)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.metrics import mutual_info_score\n",
"from sklearn.metrics import normalized_mutual_info_score"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.66591197352676512"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mutual_info_score(df_titanic.Survived, df_titanic.Sex)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Wrapper Methods - Recursive Feature Elimination\n",
"\n",
"При данном подходе из (линейной) модели последовательно удаляются признаки с наименьшим коэффициентом\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Используйте реализацию RFE в sklean c кросс-валидацией.\n",
"\n",
"* Обучите модель\n",
"* Выведите на графике размер признакового пространства и полученное качество\n",
"* Выведите веса признаков в выбранном признаковом пространстве"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.feature_selection import RFECV\n",
"from sklearn.model_selection import StratifiedKFold"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=123)"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def titanic_preproc(df_input):\n",
" \n",
" df = df_input.copy()\n",
" \n",
" # Удаляем пропуски\n",
" df = df.dropna()\n",
"\n",
" # Создаем такой признак\n",
" df.loc[:, 'has_cabin'] = df.loc[:, 'Cabin'].isnull().astype(int) \n",
" \n",
" # Удаляем колонки\n",
" cols2drop = ['PassengerId', 'Name', 'Ticket', 'Cabin']\n",
" df = df.drop(cols2drop, axis=1)\n",
" \n",
" # Нормализуем Age Fare и SibSp (Так делать не оч хорошо)\n",
" df.loc[:, 'Age'] = (df.loc[:, 'Age'] - df.loc[:, 'Age'].mean())/df.loc[:, 'Age'].std()\n",
" df.loc[:, 'Fare'] = (df.loc[:, 'Fare'] - df.loc[:, 'Fare'].mean())/df.loc[:, 'Fare'].std()\n",
" df.loc[:, 'SibSp'] = (df.loc[:, 'SibSp'] - df.loc[:, 'SibSp'].mean())/df.loc[:, 'SibSp'].std()\n",
" \n",
" # Закодируем поле Sex\n",
" df.loc[:, 'Sex'] = df.loc[:, 'Sex'].replace({'male': 0, 'female':1})\n",
" \n",
" # Pclass и Embarked можно рассматривать как категориальный признак\n",
" df = pd.get_dummies(df, prefix_sep='=', columns=['Pclass', 'Embarked'], drop_first=True)\n",
" \n",
" return df"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Survived | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Fare | \n",
" has_cabin | \n",
" Pclass=2 | \n",
" Pclass=3 | \n",
" Embarked=Q | \n",
" Embarked=S | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0.148657 | \n",
" 0.831347 | \n",
" 0 | \n",
" -0.096914 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" -0.043111 | \n",
" 0.831347 | \n",
" 0 | \n",
" -0.335078 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 6 | \n",
" 0 | \n",
" 0 | \n",
" 1.171422 | \n",
" -0.721066 | \n",
" 0 | \n",
" -0.351287 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 10 | \n",
" 1 | \n",
" 1 | \n",
" -2.024719 | \n",
" 0.831347 | \n",
" 1 | \n",
" -0.811843 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 11 | \n",
" 1 | \n",
" 1 | \n",
" 1.427114 | \n",
" -0.721066 | \n",
" 0 | \n",
" -0.682828 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Survived Sex Age SibSp Parch Fare has_cabin Pclass=2 \\\n",
"1 1 1 0.148657 0.831347 0 -0.096914 0 0 \n",
"3 1 1 -0.043111 0.831347 0 -0.335078 0 0 \n",
"6 0 0 1.171422 -0.721066 0 -0.351287 0 0 \n",
"10 1 1 -2.024719 0.831347 1 -0.811843 0 0 \n",
"11 1 1 1.427114 -0.721066 0 -0.682828 0 0 \n",
"\n",
" Pclass=3 Embarked=Q Embarked=S \n",
"1 0 0 0 \n",
"3 0 0 1 \n",
"6 0 0 1 \n",
"10 1 0 1 \n",
"11 0 0 1 "
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_prep.head()"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df_prep = df_titanic.pipe(titanic_preproc)\n",
"X, y = df_prep.iloc[:, 1:].values, df_prep.iloc[:, 0].values"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"model = LogisticRegression(random_state=123)\n",
"rfe = RFECV(model, step=1, cv=cv, scoring='roc_auc', verbose=1, n_jobs=-1)"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fitting estimator with 10 features.\n",
"Fitting estimator with 10 features.\n",
"Fitting estimator with 10 features.\n",
"Fitting estimator with 10 features.\n",
"Fitting estimator with 9 features.\n",
"Fitting estimator with 9 features.\n",
"Fitting estimator with 9 features.\n",
"Fitting estimator with 9 features.\n",
"Fitting estimator with 8 features.\n",
"Fitting estimator with 8 features.\n",
"Fitting estimator with 8 features.\n",
"Fitting estimator with 8 features.\n",
"Fitting estimator with 7 features.\n",
"Fitting estimator with 7 features.\n",
"Fitting estimator with 7 features.\n",
"Fitting estimator with 7 features.\n",
"Fitting estimator with 6 features.\n",
"Fitting estimator with 6 features.\n",
"Fitting estimator with 6 features.\n",
"Fitting estimator with 6 features.\n",
"Fitting estimator with 5 features.\n",
"Fitting estimator with 5 features.\n",
"Fitting estimator with 5 features.\n",
"Fitting estimator with 5 features.\n",
"Fitting estimator with 4 features.\n",
"Fitting estimator with 4 features.\n",
"Fitting estimator with 4 features.\n",
"Fitting estimator with 4 features.\n",
"Fitting estimator with 3 features.\n",
"Fitting estimator with 3 features.\n",
"Fitting estimator with 3 features.\n",
"Fitting estimator with 3 features.\n",
"Fitting estimator with 2 features.\n",
"Fitting estimator with 2 features.\n",
"Fitting estimator with 10 features.\n",
"Fitting estimator with 2 features.\n",
"Fitting estimator with 2 features.\n",
"Fitting estimator with 9 features.\n",
"Fitting estimator with 8 features.\n",
"Fitting estimator with 7 features.\n",
"Fitting estimator with 6 features.\n",
"Fitting estimator with 5 features.\n",
"Fitting estimator with 4 features.\n",
"Fitting estimator with 3 features.\n",
"Fitting estimator with 2 features.\n"
]
},
{
"data": {
"text/plain": [
"RFECV(cv=StratifiedKFold(n_splits=5, random_state=123, shuffle=True),\n",
" estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
" intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
" penalty='l2', random_state=123, solver='liblinear', tol=0.0001,\n",
" verbose=0, warm_start=False),\n",
" n_jobs=-1, scoring='roc_auc', step=1, verbose=1)"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rfe.fit(X, y)"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.78383333, 0.780375 , 0.80856944, 0.82218056, 0.82697222,\n",
" 0.82897222, 0.82772222, 0.82772222, 0.82838889, 0.82838889])"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rfe.grid_scores_"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAt0AAAFCCAYAAAAt72H5AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl4lNXB/vHvmez7EEKCAcIiAQKCiKJxQRbDorYqaFt3\nK33dilWh/dUqxYJSXvq6gXXBiktVFDWuWBEINqAsKpuAghAQZA9LFkJIYGbO74/BYISBgJk8k+T+\nXBcXzMwz89yTHOXO4cxzjLXWIiIiIiIiQeNyOoCIiIiISEOn0i0iIiIiEmQq3SIiIiIiQabSLSIi\nIiISZCrdIiIiIiJBptItIiIiIhJkKt0iIiIiIkGm0i0iIiIiEmQq3SIiIiIiQabSLSIiIiISZOFO\nBwiWrVu3Oh2h0UtJSWHXrl1Ox5AQpLEhx6LxIYFobEggTo6N9PT0Gh2nmW4RERERkSBT6RYRERER\nCbI6WV6yZMkSpkyZgsfjoXfv3gwZMuSIY/Lz85k2bRoej4eMjAyGDRtGdHQ08+bN4+2338bn8xER\nEcHvfvc7OnXqVBexRURERERqRdBnuisqKpg8eTKjRo3iscceY9myZaxfv77aMcXFxeTm5jJ27Fgm\nTpxIUlIS06dPByAtLY2HHnqICRMmcP311zN16tRgRxYRERERqVVBL90FBQW0bdsWt9tNWFgY2dnZ\nLF26tNoxHo+HyspKKioqAHC73YSH+yfh27dvT1xcHADbt28nIyMj2JFFRERERGpV0JeXFBUVkZSU\nVHU7MTGRbdu2VTsmJSWFSy+9lOHDh5OdnU1xcTEjRoyoenzfvn388Y9/JCYmhr/97W9HPU9eXh55\neXkAjB8/npSUlCC8GzkR4eHh+j7IUWlsyLFofEggGhsSSH0YG3Wyptvlqj6h7vF4qt0uLy9n0aJF\njB07lk2bNvHWW2+xcuVKevToAUBcXByTJk1i2bJlTJgwgdGjRx9xjpycHHJycqpu65JCztOlnSQQ\njQ05Fo0PCURjQwLRJQPxLxUpLS2tul1aWorb7a52zPLly2nRogUtW7bk3HPP5YYbbmDmzJlHvFa3\nbt3YsGFDsCOLiIiIiNSqoJfuzMxM1q1bR0lJCV6vl4ULF9K1a1fKy8urfiJJTU1l9erVlJWVAbBu\n3bqqnxoWLVpUtdb7iy++oE2bNsGOLCIiIiJSq4K+vCQ6OpqhQ4cyZswYvF4vvXr1onPnzuTn55Of\nn8/o0aNp164dAwcOZOTIkbhcLlq3bs1tt90GwPr163nxxRcJDw8nOTmZ22+/PdiRRUSkDllrobQY\ntm/B7t5BRVpzLGGQ1ASSmmAiIp2OKCLysxlrrXU6RDBoG3jnae2dBKKx0TjZiv2wYyt2xxbYvuXw\nn3dsgYr9gZ8YE3e4gCe6/X9ObAJJbkxik0OPuSE+EeMKq7s3JHVO/++QQOrDmu46+SCliIg0Dtbr\nhd07/LPWO7Ye+v1QsS7ec/hAYyC5GaS1wJzbD5q3wKS1gJQ03DFRFG/cgC3ZAyVF/lnwkiJsaRF2\n4zooLaoq6dVmjYwLEpPgUDE3h4o5iU1+dPvQr+gYjDF1+rURkcZNpVtERE6ItRb2FsP2wzPV9tDM\nNTu3g/dHV6iKS/AX6qzukJaOad4S0tIh9RRMZNRRXz8iJQWT2JRjVWJbWVFVxiktwpYU+8t4SRH2\nh5K+9XsoKa7KU62gR0ZWlXES3ZijzZ4nHnosIuJnf81EToS1Fg4e8P9wWVnh//3QL1uxHyrK4cAB\nfjKqG7X9zdKg29lOxzgmlW4RETkqW1lxaAnIVtix2f/nH8r1/n2HDwwPh9R0SG+FOeMcSGuJSUv3\nl+34xKBkM1HR0Ky5/xcELOjW54PyMn/5Li3CHirp1W4XbsOu/QbK/FfaOqLGxMYftZyT2ATjbnK4\nvMclYFxBvz6BhCBrLXgOVivH/sL846L84/srDt9feeTxVOwHn8/pt1Wv7EtLV+kWEZHQZX1e2FV4\nqFxvPlyyt2+Bop+sj0xu5p+tzu7tXxaS1sI/a920WciupTYuF8Qn+n+1yDj27LnHA3tLDs+YVy1t\n2VM1k26/W+OfXT9Q6X/Oj1/A5fIvbUk8cv25+dEyF5LcEKXlLU7yl2TPoYJb/pPiW3FkSf5hlrny\n6KWaygrwemt28vBwiI6BqBj/79Ex/s8tNEnBRP/ovqjoqj9X3f/j50RGg0tj6AfJKSns2V/pdIxj\nUukWEWngrLX+Wdwf1lf/sN56xxbYuc1fPn4QG+cv1B27Hlpn7Z+xplk6Juroy0EaChMeDk2a+n8R\nePYcDn0o9Kcz5tVm04uxm77zL8Pxeo+cPY+Mqpo9/+l682pr0ePij5OkMbH4Souxu3YcUXz9Jbni\niAJtj7I8o6os17Qkh4UfLrrVSnJTTFTMkY9FxVQvzz8p0SZcy5WCwRWXACrdIiJSF2xlJezcevQP\nMZb/aDlIWDiknuIv1916Hv4QY/MW/iuAaAb2uKpKVar/qgXHXN6yrwxK9hyx9pySYmxpEWzbjP12\nJezb639OHb2H+mhnTQ76cUn+0Wwx7uQfleToQ7PGsT+ZSY4+oiyrJEttUekWEalHrM8Lu3dWu9xe\n1TrrPT+pJE1S/MtBzr7Q/3vaoQ8xNk3FhIXmcpCGxrhckJDo/0WbY8+eHzzonxn/8Yx5eVldRa0X\n4t1NKPP6/CU5UIHWB18lRKl0i4iEIFu1HMT/IcaqddaF2/wf2PpBTKx/xjqzs3+m+kdrrU1UtHNv\nQE6YiYjwr5tPbua/7XCeUBSbkkK5rtMt9ZRKt4iIg+zuQthQcHitdeGhcn1oqQEAYWH+q3SktcCc\ndmb1tdYJbi0HERGpB1S6RUTqmN23F7toHnZhPhR8c/gBd7K/WJ95/qFrWvtnrklJ03IQEZF6TqVb\nRKQO2IMHYcWX+Bbkw4pF/g1bTmmFGXwDpssZ/pIdHet0TBERCRKVbhGRILE+HxR8g12Yj108z38F\nkUQ3pu+lmOw+kNFOS0NERBoJlW4RkVpmt23yF+3P58DuQoiMwvQ4F3NOH8g6XUtFREQaIZVuEZFa\nYEuKsF/M9a/T/n4dGBd06Y654npM93P8lzgTEZFGS6VbROQk2Yr92GUL/UX7m6/A+qB1e8xvfofp\neSEmqYnTEUVEJESodIuInADr9cKqZf7lI0sXwoFK/2YzF1+Fye6NOaWV0xFFRCQEqXSLiByHtRa+\nX+cv2l/MhdJiiI3DZPfxr9Nun+XfeVBERCQAlW4RkQDsrh3Yz+f4PxC5bROEhUO3s3Bl94GuPbXd\ntIiI1JhKt4jIj9h9ZdjFn/nXaa89tHFNZmfMDb/HnHk+Ji7B0XwiIlI/qXSLSKPn37hmEb7P82H5\nl+Dx+Ldav+J6zNkXYpo1dzqiiIjUcyrdItIoWZ8P1q3GLvwvdtFn/o1rEpIwvS/GnNsXMk7VxjUi\nIlJrVLpFpFGx2zYf2rgm//DGNWdk+3eIzOqujWtERCQoVLpFpMGzpUXYLz71r9PeWODfuCbrdMwV\n12G6Z2vjGhERCTqVbhFpkGxlBXbpQv+M9jfLwOeDjHaYX/8O07MXxp3sdEQREWlEVLpFpMGwXi+s\nXn5o45oFUFkByc0wA4dgzumDaZHhdEQREWmkVLpFpF6z1sKm9dgF+dgv50JJEcTE+a86kt0H2nfW\nxjUiIuI4lW4RqZfs7kL/xjUL8w9vXNP10MY13c7CREQ6HVFERKSKSreI1Bu2vAy7eL6/aK9Z6b+z\nfRbmujswPS/QxjUiIhKyVLpFJKRZz0FYsRjfwnxY/oV/45q0FpjLr8Oc01sb14iISL2g0i0iIcda\nC+tW+T8QuWge7Nt7eOOac/pAm/bauEZEROoVlW4RCRl2++bD67R37YDISP91tLP7+q+rHa7/ZYmI\nSP2kv8FExFG2tBj75aGNazasPbRxTTfML6/B9MjGRMc6HVFERORnU+kWkTpnKyvwfT4H+/kc+HqJ\nf+OaVm0xvxqKObsXxt3U6YgiIiK1SqVbROqUb95sdk59DltRDskpmIGDD21c09rpaCIiIkGj0i0i\ndcY3/xPsv58gossZeAddCZldtHGNiIg0CirdIlInfJ/Pwb70BHTqRpO/PsruvXudjiQiIlJnNMUk\nIkFnF8/DvvA4ZHbGNeyvmKgopyOJiIjUKZVuEQkqu2whvucegXYdcf1hlAq3iIg0SirdIhI0dvmX\n+Cb9H2Sciuuuv2GiY5yOJCIi4giVbhEJCrtyCb5n/hdatsF1z2hMjK63LSIijZdKt4jUOrvqK3xP\nj4PmrXANH4OJjXc6koiIiKNUukWkVtk1K/E9ORZST8E14iFMXILTkURERBxXJ5cMXLJkCVOmTMHj\n8dC7d2+GDBlyxDH5+flMmzYNj8dDRkYGw4YNIzo6mg8++IBZs2YRFhZGamoqd955J4mJiXURW0RO\nkC1Yhe+JByG5Ga4RD2IS9N+qiIgI1MFMd0VFBZMnT2bUqFE89thjLFu2jPXr11c7pri4mNzcXMaO\nHcvEiRNJSkpi+vTpALRp04ZHHnmECRMm0KlTJ955551gRxaRk2C/W4Nv4mhISsb1x7GYxCZORxIR\nEQkZQS/dBQUFtG3bFrfbTVhYGNnZ2SxdurTaMR6Ph8rKSioqKgBwu92Eh/sn4bt160bUoUuMZWRk\nUFxcHOzIInKC7MZ1+B7/GyQk+Qu3O9npSCIiIiEl6MtLioqKSEpKqrqdmJjItm3bqh2TkpLCpZde\nyvDhw8nOzqa4uJgRI0Yc8Vqffvop3bp1O+p58vLyyMvLA2D8+PGkpKTU4ruQkxEeHq7vQyNw8Lu1\nFE34G674BJLHPkVY6inHfY7GhhyLxocEorEhgdSHsVEna7pdruoT6h6Pp9rt8vJyFi1axNixY9m0\naRNvvfUWK1eupEePHlXHzJgxg7KyMvr27XvUc+Tk5JCTk1N1e9euXbX4DuRkpKSk6PvQwNkt3+N7\n5H6IiMQMf5AiVwTU4HuusSHHovEhgWhsSCBOjo309PQaHRf00u12uyktLa26XVpaitvtrnbM8uXL\nadGiBS1btqRly5ZER0czY8aMqtI9Z84c5s6dy8iRI48o8CLiDLttM77H/gph4f4lJc2aOx1JREQk\nZAW9wWZmZrJu3TpKSkrwer0sXLiQrl27Ul5eXvUTSWpqKqtXr6asrAyAdevWVf3U8MOykfvvv5/Y\nWG2uIRIK7I6t+B79K1jrL9xpNfspX0REpLEK+kx3dHQ0Q4cOZcyYMXi9Xnr16kXnzp3Jz88nPz+f\n0aNH065dOwYOHFg1k926dWtuu+02AN59910ARo4cWfWaEyZMCHZsEQnA7tzuL9xeD64/jcOc0tLp\nSCIiIiHPWGut0yGCYevWrU5HaPS09q7hsbsL8T18P1Ts989wt2p7Uq+jsSHHovEhgWhsSCBa0y0i\nDYbds8s/w71/H64RJ1+4RUREGiN9KlFEjssW7/YX7r0luO4Zg2l9qtORRERE6hWVbhE5JltahO/R\nUVCyB9fdozFtOzgdSUREpN5R6RaRgOzeEn/h3rMT110PYNpnOR1JRESkXlLpFpGjsvv24nvsAdi5\nHdedf8V0OM3pSCIiIvWWSreIHMGWl+F7/G+wfROuYSMxWac7HUlERKReU+kWkWrs/nJ8E0bD5g24\n7rgP0+UMpyOJiIjUeyrdIlLFVuzHN3E0fL8O1+1/xnTr6XQkERGRBkGlW0QAsJUV+P75IHy3Btct\nf8J0z3Y6koiISIOh0i0i2AOV+J76O6xdhRk6HHPm+U5HEhERaVBUukUaOXvwAL6nx8Hq5Zjf3oXr\nnN5ORxIREWlwVLpFGjHrOYjvmfHw9VLMDcNwndfP6UgiIiINkkq3SCNlPR58zz4MKxZhrrsDV68B\nTkcSERFpsFS6RRoh6/ViJz8KyxZirr4FV5+LnY4kIiLSoKl0izQy1ufFvjABu3ge5lc347rol05H\nEhERafBUukUaEevzYV/6J/aLOZghN+IaMNjpSCIiIo2CSrdII2F9PuyrT2MXfIK57FpcF1/ldCQR\nEZFGQ6VbpBGw1mJffxb76UzMJb/G/OI3TkcSERFpVFS6RRo4ay32jcnY/OmYgYMxV1yHMcbpWCIi\nIo2KSrdIA2atxea+hJ09DXPRLzFX/laFW0RExAEq3SINlLUW+96r2JnvYvpcgvnN/6hwi4iIOESl\nW6SBstOmYj96C9NrAOaaW1W4RUREHKTSLdIA+f7zJnba65jzLsJc/3uMS/+pi4iIOEl/E4s0ML4Z\n72LfexVzdm/MTXeqcIuIiIQA/W0s0oD4Zk/D5r6IOfN8zNB7MK4wpyOJiIgIKt0iDYYv/yPs1Ofg\njGzM//wRE6bCLSIiEipUukUaAN+nM7FTJkG3nrhu/X+Y8HCnI4mIiMiPqHSL1HO++bOxrzwFp/XA\ndftfMOERTkcSERGRn1DpFqnHfJ/Pwb70BHTqhuuO+zARKtwiIiKhSKVbpJ6yiz7DvvA4ZHbBNeyv\nmMgopyOJiIhIACrdIvWQXboQ3+RHoV1HXH8YhYlS4RYREQllKt0i9Yz96kt8z/4fZJyK666/YaJj\nnI4kIiIix6HSLVKP2JVL8E36X2jZBtc9ozExsU5HEhERkRpQ6RapJ+yqr/A9PQ6at8I1fAwmNt7p\nSCIiIlJDKt0i9YD9diW+Jx+C1FNwjXgIE5fgdCQRERE5ASrdIiHOFnyD758PQnIqrhEPYhISnY4k\nIiIiJ0ilWySE2fXf4ps4BpKScf1xLCaxidORRERE5CQct3Q/8sgjLFmyBJ/PVxd5ROQQu7EA34TR\nkJDkL9zuZKcjiYiIyEk6bunOzMzktdde44477uDVV19ly5YtdZFLpFGzm77D99gDEBvnL9zJKU5H\nEhERkZ8h/HgHXH755Vx++eVs2LCBTz/9lIceeoimTZvSt29fcnJy6iKjSKNit2zE99goiIr2F+6m\nqU5HEhERkZ+pxmu627Rpw3XXXcfQoUMpLi7mhRdeCGYukUbJbtuE79G/Qli4v3A3a+50JBEREakF\nx53pBigoKGDu3LksWLAAt9vNoEGD6NWrV7CziTQqdvsWf+E2xl+409KdjiQiIiK15Lil+5577mHv\n3r2cf/753HfffbRr1+6ET7JkyRKmTJmCx+Ohd+/eDBky5Ihj8vPzmTZtGh6Ph4yMDIYNG0Z0dDQA\nFRUV/P3vf+e3v/0tp5566gmfXyTU2cJt/sLt9eL60zjMKS2djiQiIiK16Lil++qrr+ass84iPLxG\nk+JHqKioYPLkyYwbN46EhATGjBlD9+7dq5X34uJicnNzefjhh4mJiWHy5MlMnz6dwYMHs2bNGh59\n9FFKS0tP6vwioc7uLvQX7oMH/DPcLTKcjiQiIiK17Lhrurdt28Z7771X7b6ZM2fy8ccf1+gEBQUF\ntG3bFrfbTVhYGNnZ2SxdurTaMR6Ph8rKSioqKgBwu91VJb9Dhw48++yzdOzYsUbnE6lP7J6d+B4Z\nCRXluIY/iGnV1ulIIiIiEgTHnb7+5JNP+POf/1ztvm7dujFu3DgGDRp03BMUFRWRlJRUdTsxMZFt\n27ZVOyYlJYVLL72U4cOHk52dTXFxMSNGjKjpewAgLy+PvLw8AMaPH09Kii6x5rTw8HB9H47Bu2cn\nRRP+htm3lyajnyCiQ2enI9UZjQ05Fo0PCURjQwKpD2PjuKW7tLSUtLS0avc1bdqUkpKSGp/E5ao+\noe7xeKrdLi8vZ9GiRYwdO5ZNmzbx1ltvsXLlSnr06FHjc+Tk5FS7hOGuXbtq/FwJjpSUFH0fArCl\nRfgevh+K9uC6ZzQlyanQiL5WGhtyLBofEojGhgTi5NhIT6/ZhQ+Ou7ykffv2zJ07t9p9c+bMoU2b\nNjU6gdvtrrYeu7S0FLfbXe2Y5cuX06JFC1q2bMm5557LDTfcwMyZM2v0+iL1jd1bgu/RUbBnF667\nRmHaZzkdSURERILsuDPdN910E3//+9/55JNPaNasGYWFhezcuZP777+/RifIzMxk0qRJlJSUEB8f\nz8KFC7n66qspLy+nvLyclJQUUlNTWb16NWVlZcTHx7Nu3boa/9QgUp/YslL/xjc7t+P6wyhMh9Oc\njiQiIiJ1wFhr7fEOqqioYPHixezevZvk5GS6d+9OfHx8jU+yePFipkyZgtfrpVevXlx11VXk5+eT\nn5/P6NGjAfjoo4+YMWMGLpeL1q1bc9tttxETE0NBQQGTJ09m69atNG3alDPOOIMbb7zxuOfcunVr\njfNJcOifAauz+8r8hXvrRlx3jsJ0OcPpSI7R2JBj0fiQQDQ2JJD6sLykRqW7PlLpdp7+53iYLd+H\n7/EHYNN3uH5/H6ZbT6cjOUpjQ45F40MC0diQQOpD6a7xNvAicnKsx4Pvnw/CpvW4bv9zoy/cIiIi\njZFKt0iQ2S/mQsEqzI1/wHTPdjqOiIiIOEClWySIrM+H/fhtaNEac25fp+OIiIiIQwKWbo/Hw65d\nu/D5fEc8Vl5eTgNdCi5Su5Z/Ads2YQZdiTHG6TQiIiLikICl+9133+XJJ588YmMbgMcff5xp06YF\nNZhIfWetxfdRLqSkYXr2cjqOiIiIOChg6V60aBG/+tWvjvrYlVdeyfz584MWSqRB+HYFfLcGM3Aw\nJizM6TQiIiLioICle/v27XTs2PGoj7Vv354tW7YELZRIQ+CbnguJbsz5OU5HEREREYcFLN0ul4v9\n+/cf9bHy8vKjLjsRET+7sQC+WYbJuRwTEel0HBEREXFYwObcsWNHvvzyy6M+NnfuXNq3bx+0UCL1\nnW96LsTEYnoPcjqKiIiIhICApfvXv/41r7zyCrm5uWzdupUDBw6wa9cu3n33Xd58882A671FGju7\nfTMsWYDpcwkmNs7pOCIiIhICwgM90K5dO+6//35effVVcnNzqy4RmJmZyb333kunTp3qLKRIfWI/\nfgfCIzA5lzkdRUREREJEwNIN/oI9ZswYDhw4QFlZGXFxcURFRdVVNpF6x+7ZhV2Yj7lwACbR7XQc\nERERCREBS/fKlSuPuC8yMpLU1FTcbpUJkaOxs94H68MMGOx0FBEREQkhAUv3M888c8R9Pp+P0tJS\n2rRpw4gRI2jatGlQw4nUJ7asFPvpDMzZF2JS0pyOIyIiIiEkYOl+6qmnjnp/RUUFL774Is8//zx/\n/vOfgxZMpL6xn3wIlRWYQVc5HUVERERCzAlfbDs6OprrrruOtWvXBiOPSL1kK/ZjZ38Ip5+NaZHh\ndBwREREJMSe1w014eDg+n6+2s4jUW/bTmVBehutizXKLiIjIkU6qdE+bNk2b44gcYg8exM58Dzqc\nhjlVl9IUERGRIwVc0/3AAw9gjKl2n8/nY9euXbhcLkaNGhX0cCL1gV34XyjejeumPzgdRUREREJU\nwNLdr1+/I+4LCwujSZMmdOrUifDwY17iW6RRsD6vfzOcjHbQ5Qyn44iIiEiICtic+/Tpc8wnrl69\nWrtSiixZAIVbcd325yP+ZUhERETkByc0Xb1x40Y+++wz5s2bx/79+3nxxReDlUsk5Flr8U1/G1LT\noce5TscRERGREHbc0r1z504+++wzPvvsMzZv3kz//v35/e9/T1ZWVl3kEwld3yyD79dhbrwT4wpz\nOo2IiIiEsICl++OPP2bevHls3ryZc889l1tuuYWHH36YIUOGkJycXJcZRUKSb3ouuJMx2X2djiIi\nIiIhLmDpfvHFF7nooosYNWoUkZGRdZlJJOTZdavh2xWYXw3FREQ4HUdERERCXMDrdF999dWsWbOG\nP/7xj7z99tvs2rVLHxQTOcQ3PRfiEjAXDnQ6ioiIiNQDAWe6Bw8ezODBg9m4cSPz5s1j9OjR7N27\nl/z8fM477zyaN29elzlFQobd8j189QXml1djomOcjiMiIiL1wHE/SNm6dWtat27Ntddey6pVq5g3\nbx5//etfiY+PZ8KECXWRUSSk2I/fhsgoTL9fOB1FRERE6okTumRgVlYWWVlZDB06lGXLlgUrk0jI\nsrt2YL+Yg+n3C0x8otNxREREpJ4IuKb7aN5++23/k1wuevToEZRAIqHMznwXjAvT/wqno4iIiEg9\nckKlOzc3N1g5REKeLS3GfpaHye6DSU5xOo6IiIjUIydUukUaMzt7GngOYgYNcTqKiIiI1DMq3SI1\nYPeXY//7EZxxLqZ5S6fjiIiISD1zQqX7n//8Z7ByiIQ0mz8d9u/DdclVTkcRERGReihg6d60aRPv\nvPMOXq+36r6UlBSstcyaNYvCwsI6CSjiNHugEpv3PnTujmnd3uk4IiIiUg8FLN3vvfce+/fvJyws\nrNr9xhiKi4t55513gh5OJBTY+Z9AaTGuizXLLSIiIicnYOlevXo1/fv3P+pjffr0Yfny5UELJRIq\nrNeLnfEOtO0AHbs6HUdERETqqYClu6SkhJSUo18WrWnTppSUlAQtlEiosIs+g107cF18FcYYp+OI\niIhIPRWwdCclJbFjx46jPrZt2zYSEhKCFkokFFhrsdNz4ZRWcPrZTscRERGReixg6T7nnHN48803\nq32QEsDn8/HKK69w1llnBT2ciKOWL4ItGzGDrsS4dHVNEREROXnhgR749a9/zUMPPcSIESM455xz\nSElJobS0lAULFuByufjDH/5QlzlF6pzv41xIboY5+0Kno4iIiEg9F7B0R0dHM2bMGObOncuKFStY\nv3498fHxDBo0iN69exMZGVnjkyxZsoQpU6bg8Xjo3bs3Q4YcuaNffn4+06ZNw+PxkJGRwbBhw4iO\njmbv3r088cQTFBYWkpqayt133018fPzJvVuRGrJrvoaCVZirb8WEB/zPRERERKRGjtkmwsPD6dev\nH/369TvpE1RUVDB58mTGjRtHQkICY8aMoXv37rRr167qmOLiYnJzc3n44YeJiYlh8uTJTJ8+ncGD\nB/PKK69w9tln079/f2bNmsWbb77J0KFDTzqPSE34pudCQhLmgqNfwUdERETkRByzdG/evJm3336b\nr7/+mrKyMhISEujSpQtXXXUV6enpNTpBQUEBbdu2xe12A5Cdnc3SpUurlW6Px0NlZSUVFRXExMTg\ndrsJPzQ3SAPaAAAfdUlEQVS7uHLlSn77298CcP7553PvvfeezPsUqTH7/XpYuRhzxfWYqCin44iI\niEgDEPDTYevXr2fkyJFERUVx55138o9//INhw4YRGRnJyJEj2bhxY41OUFRURFJSUtXtxMREiouL\nqx2TkpLCpZdeyvDhw5k0aRIFBQUMHDgQgL179xIbGwtAbGwsZWVlJ/wmRU6E/fhtiI7B9L3E6Sgi\nIiLSQASc6X7jjTe4/PLLq62/btWqFd26dSM1NZXXXnuN++67r0Yncf3kyg8ej6fa7fLychYtWsTY\nsWPZtGkTb731FitXrqRHjx7Hfe4P8vLyyMvLA2D8+PEBrzEudSc8PLzefR882zaze/E8Yi+/hoSM\nNk7HabDq49iQuqPxIYFobEgg9WFsBCzd3377LcOGDTvqYxdddBHvv/9+jU7gdrspLS2tul1aWlq1\n1OQHy5cvp0WLFrRs2ZKWLVsSHR3NjBkz6NGjB7GxsVRUVBAdHU15eXnAD1Hm5OSQk5NTdXvXrl01\nyifBk5KSUu++D76pz4MrjIrz+lNZz7LXJ/VxbEjd0fiQQDQ2JBAnx0ZNl1wHXF7i9Xqr1lX/VHh4\nOD6fr0YnyMzMZN26dZSUlOD1elm4cCFdu3alvLy86ouTmprK6tWrq5aOrFu3ruoNnHbaacyfPx+A\nefPm0bWrtuKW4LDFu7HzZ2POuwjjTnY6joiIiDQgAWe627Vrx4IFC7jooouOeGzBggW0bdu2RieI\njo5m6NChjBkzBq/XS69evejcuTP5+fnk5+czevRo2rVrx8CBAxk5ciQul4vWrVtz2223AXDDDTfw\nxBNP8P7779OsWTPuuuuuk3yrIsdmZ30AXh9m4GCno4iIiEgDY6y19mgPrFixgkceeYTBgweTnZ1N\nkyZNKCoqYv78+bz//vv8v//3/zjttNPqOm+Nbd261ekIjV59+mdAu68M372/w5zeE9ctf3I6ToNX\nn8aG1D2NDwlEY0MCqQ/LSwLOdHft2pU//elPvPHGG0ydOhVrLcYYOnbsyMiRI+nQoUOthRVxmv3v\nf6ByP+biK52OIiIiIg3QMa/T3bVrV7p27UplZSX79u0jLi6OqEPXLa6srKz6s0h9ZisrsbOnQdez\nMC1rtmxKRERE5EQE/CDlj0VFRZGcnFxVsr///nv+8pe/BDWYSF2xn82EslJcF1/ldBQRERFpoALO\ndO/fv59///vfFBUV0bNnz6rL8c2cOZOXX36ZCy+8sM5CigSL9RzEznwX2nfGZHZ2Oo6IiIg0UAFL\n9/PPP09xcTFnnXUWs2fPpqKigjVr1vD1119z991307Nnz7rMKRIU9ou5sGcXrut/73QUERERacAC\nlu7FixczYcIEkpKS6NatG8OHD+f000/n0UcfPWJzG5H6yPp82I/fgZZt4LQznY4jIiIiDVjANd0V\nFRUkJSUB/kuhhIWFcf/996twS8Ox7HPYtgkz6EqMMU6nERERkQYs4Ey3tZaVK1dW3TbGVLsNhPR1\nukWOxVqLb3ouNGuOOesCp+OIiIhIAxewdDdt2pRnnnmm6rbb7a522xjDk08+Gdx0IsGyejlsWIu5\n/veYsDCn04iIiEgDF7B0P/XUU3WZQ6RO+abnQlITzHn9nI4iIiIijUCNrtMt0pDYDWth1VeY/pdj\nIiKdjiMiIiKNgEq3NDq+6bkQG4e5cJDTUURERKSRUOmWRsVu2wxLF2L6XIqJiXU6joiIiDQSKt3S\nqNgZb0NEBCbnl05HERERkUZEpVsaDbtnJ3ZhPuaCAZiEJKfjiIiISCOi0i2Nhp35HgBmwBUOJxER\nEZHGRqVbGgW7txT76UzM2RdimqY6HUdEREQaGZVuaRTsJ9PgQCVm0JVORxEREZFGSKVbGjxbUY79\n5D/QPRuTnuF0HBEREWmEVLqlwbNzZ0B5Ga6LNcstIiIizlDplgbNHjyInfU+dOyKadfR6TgiIiLS\nSKl0S4NmF3wCxXtwXXKV01FERESkEVPplgbL+rzYGe9A6/aQ1d3pOCIiItKIqXRLg2UXL4DCbbgu\nvgpjjNNxREREpBFT6ZYGyVqLnf4WNG8BZ2Q7HUdEREQaOZVuaZi+XgKbvsMMHIJxaZiLiIiIs9RG\npEHyTc+FJimY7D5ORxERERFR6ZaGxxasgjVfYwZcjgmPcDqOiIiIiEq3NDy+j9+G+ARMr4FORxER\nEREBVLqlgbFbNsJXX2D6/RITFe10HBERERFApVsaGDs9F6KiMf0udTqKiIiISBWVbmkw7M7t2C8/\nxVw4EBOX4HQcERERkSoq3dJg2JnvgXFh+l/hdBQRERGRalS6pUGwpUXYeXmY8/phmjR1Oo6IiIhI\nNSrd0iDYvA/AcxAzcIjTUURERESOoNIt9Z4t34fNn47pcR4mLd3pOCIiIiJHUOmWes/mfwT7yzGX\nXOV0FBEREZGjUumWes0eqPQvLelyBibjVKfjiIiIiByVSrfUa3bebNhbguviXzkdRURERCQglW6p\nt6zXi53xDrTrCB26OB1HREREJCCVbqm37JdzYXchrouvwhjjdBwRERGRgFS6pV6yPh92+tuQngHd\nejodR0REROSYwuviJEuWLGHKlCl4PB569+7NkCHVr6W8YcMGJkyYUHXb6/XStGlTRo8ezXfffcfz\nzz/Pvn37OOWUU7jjjjtISNAW343eikWw9XvM74ZjXPrZUUREREJb0Et3RUUFkydPZty4cSQkJDBm\nzBi6d+9Ou3btqo5p06ZNtdKdl5fH5s2bAZg4cSLDhw+ndevWfPTRR0ydOpVbbrkl2LElhFlr8U3P\nhaapmJ4XOh1HRERE5LiCPkVYUFBA27ZtcbvdhIWFkZ2dzdKlSwMe7/V6+fDDD7nssssoLS3F6/XS\nunVrAAYMGMCiRYuCHVlC3ZqvYd1qzMDBmLAwp9OIiIiIHFfQS3dRURFJSUlVtxMTEykuLg54/Ny5\nc8nKyiI5OZmEhAQ8Hg8FBQUAFBcXs3///mBHlhDnm/4WJCRhzs9xOoqIiIhIjdTJmm7XT9bcejye\nox7n8/n44IMPuPfeewEwxnDPPffwwgsvUFlZSatWrYiPjz/qc/Py8sjLywNg/PjxpKSk1OI7kJMR\nHh5e69+Hg+u/Zc/XS4m//nbi0lvU6mtL3QnG2JCGQ+NDAtHYkEDqw9gIeul2u92UlpZW3S4tLcXt\ndh/12Hnz5tG2bVuaN29edV/Hjh0ZN24cAKtXr6aysvKoz83JySEn5/DM565du2ojvvwMKSkptf59\n8L3+PMTEUt6zN/v1Pa63gjE2pOHQ+JBANDYkECfHRnp6eo2OC/rykszMTNatW0dJSQler5eFCxfS\ntWtXysvLq31xfD4f7777LldccUW15/t8PsBf1l9++WUuu+yyYEeWEGV3bMUuno/pczEmNs7pOCIi\nIiI1FvSZ7ujoaIYOHcqYMWPwer306tWLzp07k5+fT35+PqNHjwbg888/p3nz5mRkZFR7/qxZs/jw\nww+Jjo5m8ODBZGVlBTuyhCg74x0IC8NcpB+8REREpH4x1lrrdIhg2Lp1q9MRGr3a/KceW7Qb3323\nYHr1x3XdHbXymuIc/ROxHIvGhwSisSGBaHmJSC2xee+D9WEGDHY6ioiIiMgJU+mWkGf37cXO+RjT\nsxemWfPjP0FEREQkxKh0S8izn/wHKiswg650OoqIiIjISVHplpBmKyuwn0yDbj0xLds4HUdERETk\npKh0S0izn86Esr24Lr7K6SgiIiIiJ02lW0KW9RzEznwPOnTBtNelIkVERKT+UumWkGU/nwNFuzTL\nLSIiIvWeSreEJOvzYj9+G1q1hS49nI4jIiIi8rOodEtoWvo5bN+CufgqjDFOpxERERH5WVS6JeRY\na/FNz4XUUzBnnud0HBEREZGfTaVbQs+qr2BjAWbgEIwrzOk0IiIiIj+bSreEHN/0XEhKxpzbz+ko\nIiIiIrVCpVtCiv1uDaxejul/OSYiwuk4IiIiIrVCpVtCiu+jXIiNx/Qe6HQUERERkVqj0i0hw27b\nBMsWYvpdiomOdTqOiIiISK1R6ZaQYae/DZFRmH6/dDqKiIiISK1S6ZaQYHfvxH4xB9NrACYh0ek4\nIiIiIrVKpVtCgp35LgCm/xUOJxERERGpfSrd4ji7twT72UzMOX0wTZs5HUdERESk1ql0i+Ps7Glw\n8CBm0JVORxEREREJCpVucZTdX47973/gjGzMKS2djiMiIiISFCrd4ig792Mo34dr0FVORxEREREJ\nGpVucYw9eAA7633IOh3TNtPpOCIiIiJBo9ItjrELPoGSIlwXa5ZbREREGjaV7lpifT6stU7HqDes\n14v9+B1okwmdujkdR0RERCSowp0O0GCsWIxvyjOYTl2h0+mYTt0wySlOpwpZdvE82Lkd11U3Y4xx\nOo6IiIhIUKl015a4eEy7jtgVi2DBf7EAaS0wWd0wnbpBx66YeO20CGCt9W/53rwldD/H6TgiIiIi\nQafSXUtM+yxM+yyszwdbNmJXfYVdvRy7IB+bPx2MgZZtMFn+WXAyu2CiY5yO7YyVi2Hzd5jf3o1x\naYWTiIiINHwq3bXMuFzQqi2mVVsYcAXW44ENa/0FfPVy7CcfYme+B2Fh0LaDfxlKp9OhXUdMRITT\n8euEb3ouJKdgzrnQ6SgiIiIidUKlO8hMeDgcmgXnF7/BHqiEglWHS/h/3sJ++AZERkL7zodLeOt2\nGFeY0/FrnS34BtZ+g7n6Fkx44/ghQ0RERESlu46ZyCjo3B3TuTsAtnwfrFl5uIS/87J/PXhMHHQ8\n7XAJT2/VID5w6PsoF+ITMRcMcDqKiIiISJ1R6XaYiY2D7udgDn2g0JYWYVevgB9K+LLP/SU80e1f\nC97J/8FM06y5o7lPht38HaxYhLn8OkxUlNNxREREROqMSneIMYlNMGdfCGf71zvbXTuwq76C1Suw\n3y6HL+b6S3hKWvUSntTE0dw1Yae/A1ExmL6XOh1FREREpE6pdIc4k5KG6TUAeg3wb76zbRN21aFZ\n8CXz4bNZ/hKennHoyihdocNpmNh4p6NXY3dux375Kab/5Zi40MomIiIiEmwq3fWIMcZfrtMz4KJf\nYH1e+H79oRL+FfbTGdjZ08C4oPWph68Rfmpnx5dz2BnvQJgL0/8yR3OIiIiIOEGlux4zrjBok4lp\nkwkXX4k9eBDWf3voQ5lfYWe+59+EJjwc2nU6VMJP9z8nvO6+9bakCDtvNua8izDupnV2XhEREZFQ\nodLdgJiICP8VTzqeBpdfi63YD2u/OVzCP3gd+/5rEBUDHbpgOnX1l/CWbYK6SY2d9T54vZiBg4N2\nDhEREZFQptLdgJnoGOh6JqbrmQDYslL4dqW/gK9ejl2xyL8ePD7Bv019p0O7Zaal19rlCW15GXbO\ndMxZ52NS02vlNUVERETqG5XuRsTEJ8KZ52HOPA8Au2cX9tsV8MOW9Yvn+0t4kxT/BzIPlXCTnHLS\n57T//Qgq9mMGXVk7b0JERESkHlLpbsRMcgrm3L5wbl//lVEKt2FXL/eX8BWLYMF//SU8rYV/KUrW\n6f4Z8fjEGr2+razwf7DztDMxGe2C+l5EREREQplKtwCHroySlo5JS4feg7A+H2zZiP1hFnzhHOyc\nj8EY/xrwrENLUTK7+JexHMX+2R/C3hJcF19Vx+9GREREJLSodMtRGZcLWrXFtGoLA67AejywseBw\nCf/kQ+zM9yDs0BVUfijh7TphIiKwHg/73psCp3aCzM5Ovx0RERERR9VJ6V6yZAlTpkzB4/HQu3dv\nhgwZUu3xDRs2MGHChKrbXq+Xpk2bMnr0aLZv386zzz5LcXExERER3HzzzWRlZdVFbPkREx4Op3bC\nnNoJfvEb7IFKKFh16Mooy7H/eQv74RsQEQntszBJydidO3D95tZa+1CmiIiISH0V9NJdUVHB5MmT\nGTduHAkJCYwZM4bu3bvTrt3hNb5t2rSpVrrz8vLYvHkzAC+99BKXXHIJPXv2ZO3atTz55JNMnDgx\n2LHlOExkFHTujuncHQBbvg/WrDxcwld9RXjbTHzdznI4qYiIiIjzgl66CwoKaNu2LW63G4Ds7GyW\nLl1arXT/mNfr5cMPP+SBBx4A4ODBg5SWlgLQpEkTwutwUxepORMbB93PwXQ/BwBbWkyTtObs2V/h\ncDIRERER5wW9wRYVFZGUlFR1OzExkW3btgU8fu7cuWRlZZGcnAzAzTffzAMPPMDChQux1nL77bcf\n9Xl5eXnk5eUBMH78eFJSTv4yd1ILUlIIDw8nJc7jdBIJQeHh4fpvVALS+JBANDYkkPowNupk2tj1\nk90OPZ6jFzGfz8cHH3zAvffeW3VfXl4eV199NZmZmbz//vtMnz6dzMzMI56bk5NDTk5O1e1du3bV\nUno5WSkpKfo+yFFpbMixaHxIIBobEoiTYyM9vWab/wVv7+9D3G531fIQgNLS0qqlJj81b9482rZt\nS/Pmzavuy8/Pp3///rRt25Z77rmHFStWVHs9EREREZFQF/TSnZmZybp16ygpKcHr9bJw4UK6du1K\neXl5tZ9IfD4f7777LldccUW15zdr1ozFixcDsHXrViIiIoiPjw92bBERERGRWhP05SXR0dEMHTqU\nMWPG4PV66dWrF507dyY/P5/8/HxGjx4NwOeff07z5s3JyMio9vxhw4bx3HPP8eqrrxITE8Ndd911\nxHIVEREREZFQZqy11ukQwbB161anIzR6WnsngWhsyLFofEggGhsSiNZ0i4iIiIiISreIiIiISLCp\ndIuIiIiIBJlKt4iIiIhIkKl0i4iIiIgEWYO9eomIiIiISKjQTLcEzV/+8henI0iI0tiQY9H4kEA0\nNiSQ+jA2VLpFRERERIJMpVtEREREJMhUuiVocnJynI4gIUpjQ45F40MC0diQQOrD2NAHKUVERERE\ngkwz3SIiIiIiQabSLSIiIiISZOFOB5CG58CBA/zjH/+gsLAQl8tF7969GTJkiNOxJIR88MEHzJkz\nh0cffdTpKBJCKisreeWVV/jqq6/weDw8/PDDxMfHOx1LQkB+fj7Tpk3D4/GQkZHBsGHDiI6OdjqW\nOGj9+vU8/fTTPPLIIwDs3buXJ554gsLCQlJTU7n77rtD7v8fKt0SFJdffjndunXjwIEDjBw5kh49\netCmTRunY0kIWL16NZ999pnTMSQEvfDCCyQnJ/PEE084HUVCSHFxMbm5uTz88MPExMQwefJkpk+f\nzuDBg52OJg55+eWXyc/Pp0mTJlX3vfLKK5x99tn079+fWbNm8eabbzJ06FAHUx5Jy0uk1kVGRtKt\nW7eqP6elpVFcXOxwKgkFpaWl/Pvf/+bWW291OoqEmOLiYtauXcuvfvUrjDFVv0Q8Hg+VlZVUVFQA\n4Ha7CQ/XnGFjduONNzJ+/Phq961cuZLzzz8fgPPPP5+lS5c6Ee2YNGolqH74i/SOO+5wOoo4zFrL\nU089xfXXX09iYqLTcSTEfP/99xhjePDBBykqKuLUU0/l1ltv1RICISUlhUsvvZThw4eTnZ1NcXEx\nI0aMcDqWhJi9e/cSGxsLQGxsLGVlZQ4nOpJmuiVoDhw4wOOPP84111xDXFyc03HEYf/5z3/o2LEj\nXbp0cTqKhKDS0lJOOeUURo4cyeOPP05SUhK5ublOx5IQUF5ezqJFixg7diynn346hYWFrFy50ulY\nEmJcruqV1uPxOJQkMM10S1AcPHiQxx57jO7du9OnTx+n40gIKCws5KuvvmLu3Ll4vV52797NAw88\nwIMPPuh0NAkBcXFxREVFERERAUDPnj354IMPHE4loWD58uW0aNGCli1b0rJlS6Kjo5kxYwY9evRw\nOpqEkNjYWCoqKoiOjqa8vDzkPkQJKt0SBJWVlTzyyCN06dKFK664wuk4EiJ+/IGWwsJC/vGPf6hw\nS5WOHTvy3HPPVV15YNmyZWRmZjodS0JAamoqq1evpqysjPj4eNatW0d6errTsSTEnHbaacyfP59+\n/foxb948unbt6nSkI2hHSql1X3/9NX//+99JTU2tuu/ss8/m2muvdTCVhJIfSrcuGSg/tnz5cl55\n5RU8Hg8dOnTgf/7nf6pmvqVx++ijj5gxYwYul4vWrVtz2223ERMT43Qsccgbb7zBl19+ybZt22jV\nqhU33ngjLVu25IknnmDnzp00a9aMu+66K+Q+P6TSLSIiIiISZPogpYiIiIhIkKl0i4iIiIgEmUq3\niIiIiEiQqXSLiIiIiASZSreISAgZNmwYy5cvd+TcHo+HiRMncuONN3LTTTdRXl5+xDHLli3jrrvu\n4tprr2Xq1Kknfa7CwkJ+/etf4/V6f05kEZF6Q9fpFhERAJYuXcratWuZNGkSkZGRhIWFHXHMa6+9\nxiWXXMLAgQNVmEVEToAuGSgichzDhg0jOzubb775hi1btpCVlcU999xDTEwM+fn5zJ49m4ceeggA\nr9fLNddcw5NPPklqaipPPfUUlZWVVFZWsmrVKlJTU7n55puZPXs2X331FdHR0dx+++1VGzkMGzaM\nnj17smrVKrZv384ZZ5zB7bffTnR0NACffPIJ06ZNY8+ePXTo0IHbbruNlJQUCgsLufPOO7nxxhuZ\nNWsWhYWFvP7660e8l/Lycl566SUWL15MVFQU/fv354orrmDXrl3ceeedWGtxuVycd9553HXXXdWe\n+8wzz/Df//4XYwzGGB577DGaNWvG66+/zoIFCzhw4ADZ2dncdNNNREZG8t133zFhwgT27NlDeHg4\nWVlZ3H777SQmJvKHP/yBHTt2VG3dfN999/Htt9+yffv2qvP+8Oc333wTgNGjR9OyZUu2bNnC2rVr\nufXWW7ngggv44IMPmD17Nnv37uX000/nlltuIT4+ntLSUv71r3/x9ddf43K5yMzM5K677iI2NjY4\nA0VE5Bi0vEREpAa2bNnC8OHDefrpp9mzZw+zZ8+u8XM3bdrElVdeyXPPPUdaWhqPPPIIZ511FpMm\nTSInJ4eXX3652vHFxcWMGDGCJ598kt27d/Phhx8C8MUXX/Duu+8yfPhw/vWvf9GiRQsmTZp0xHP/\n93//l5deeumoWV566SUqKiqYOHEiDzzwAHPnzmXOnDk0a9aMO+64g44dOzJ16tQjCjfAHXfcQbNm\nzRg5ciRTp04lPT2dKVOmsHXrVsaPH89jjz3Gpk2bqvKmpKRw77338uKLL/Lss8/idruZMmUKAKNG\njQJgypQpTJ06ldNPP71GX8t169Zx88038/LLL5Odnc306dP5/PPPGTVqFE8//TTW2qpzvPbaa8TF\nxfHPf/6T//u//6NLly54PJ4anUdEpLapdIuI1MAll1xCamoq8fHxdO3alS1bttT4ueeccw4dOnQg\nKiqKrl270qZNG8477zwiIiI488wz2bZtW7Xj+/XrR1paGgkJCQwYMIClS5cCkJeXx+DBg8nIyCAq\nKorBgwezYsWKakXymmuuITY2lqioqCNy+Hw+PvvsM66++mri4+Np3rw5v/jFL5gzZ85JfU2stcye\nPZubb76ZpKQkkpKSGDRoUFXeqKgo5s+fzwMPPMDvf/978vPz2b59+0md6wc5OTlkZGTgcrmIjIwk\nLy+P6667jtTUVGJjY7nsssuqzl9RUcHevXupqKigadOm/PKXvwy5HepEpPHQmm4RkRMUGRnJ3r17\nT+q5ERER/HhVX2Rk5DFnXxMTEykrKwNg586dPPfcczz//PNVjxtjKC4urtG5S0tL8Xg8NG3atOq+\npk2bsmfPnhN9G1WvV1lZyYgRI6rus9bSrFkzwD+rXlBQwO9+9zsyMjL4/PPPT+hfCGpi586djBs3\nDmNM1X0+nw+A66+/npdeeokRI0YQExNDr169uPbaa6uWtIiI1CWVbhGRnyE8PDyoSxZ27txJamoq\nAMnJyQwePJgLL7zwiOMKCwuP+1qJiYmEh4eze/du0tPTAdi9ezfJycknlS0hIYGIiAgmTpxISkrK\nEY+vWrWKa6+9lo4dOx7x2A/F98c/gJzM1zI5OZk77riDrKysIx5LSUnhT3/6Ez6fjw0bNvDQQw/R\nuXNnevTocULnEBGpDfpxX0TkZ0hPT2fTpk1s3bqV0tJScnNzf/Zrbt68GY/Hw5YtW5g2bRp9+/YF\noG/fvrz11lusXbsWj8fDjh07mDVrVo1f1+VyccEFF/Daa69RVlbG9u3b+fDDD+ndu/dJ5XS5XPTu\n3ZvJkyezY8cODh48yIYNG5g/fz4AaWlpfP3113g8HjZu3MiMGTOqnut2uwkPD2fZsmXs27eP8vJy\nWrRowZo1ayguLmbPnj289957x83Qt29fXnnlFb7//vuqr1l+fj4Ar7/+Ot988w0ej4f4+HgiIiKI\nj48/qfcqIvJzaaZbRORnaNeuHYMGDeL+++8nLi6O/v37/+zXnDNnDm+88QbR0dFcdNFFnHfeeQBc\ncMEFVFZWMmnSJHbs2EFiYiJnnnnmCb32b3/7W1588UXuvvtuIiMjGTBgwEmXboCbbrqJt956iwcf\nfJCSkhLS0tK4+OKLAbjxxhuZOHEiN910E23btqV169ZVa+HD/387d2xiIRSEYfQXXmgg2IMdGNiE\n2IAtmFuEqWBL1mILwka76cIuk53TwGWyj8swn0/Wdc15nnnfN/u+ZxzH3PedbdvS932mafr1/Xme\n0zRNjuP4+bX/nqdt21zXled50nVdlmXJMAx/nhXgP5wMBACAYtZLAACgmOgGAIBiohsAAIqJbgAA\nKCa6AQCgmOgGAIBiohsAAIqJbgAAKCa6AQCg2Bdwevtn6iZZwwAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"d = rfe.grid_scores_.shape[0]\n",
"plt.plot(range(1, d+1), rfe.grid_scores_)\n",
"plt.xlabel('number of features')\n",
"plt.ylabel('ROC-AUC cv')"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Выбранные признаки\n"
]
},
{
"data": {
"text/plain": [
"array(['Sex', 'Age', 'Parch', 'Pclass=3', 'Embarked=Q', 'Embarked=S'], dtype=object)"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print 'Выбранные признаки'\n",
"fnames = df_prep.columns[1:].values\n",
"fnames[rfe.support_]"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Sex 2.372815\n",
"Age -0.554682\n",
"Parch -0.243279\n",
"Pclass=3 -0.931482\n",
"Embarked=Q -0.329946\n",
"Embarked=S -0.338861\n",
"dtype: float64"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.Series(index=fnames[rfe.support_], data=rfe.estimator_.coef_[0])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Embedded methods"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Обучите случайный лес на данных\n",
"* Выведите важность признаков и сравните с выдачей по Filter и Wrapper подходам"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"## Your code here"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Principal Component Analysis\n",
"## Метод Главных Компонент"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# PCA\n",
"\n",
"* Позволяет уменьшить число переменных, выбрав самые изменчивые из них\n",
"* Новые переменные являются линейной комбинацией старых переменных\n",
"* Переход к ортономированному базису\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {
"heading_collapsed": true
},
"source": [
"## FYI (Посмотрите дома, если интересно)"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true,
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Построение PCA\n",
"* Пусть $x \\in \\mathbb{R}^d$ - вектор признаков для какого-то объекта. Будем считать, что $x$ - центрировано и отшкалировано. $E[x_i] = 0, V[x_i] = 1, \\quad i=1 \\dots d$\n",
"* Требуется найти линейное преобразование, которое задается ортогональной матрицей $A$:\n",
"$$ pc = A^\\top x $$\n",
"\n",
"* $pc_i = a_i^\\top x = x^\\top a_i$\n",
"* $cov[x] = E[(x - E[x])(x - E[x])^\\top] = Exx^\\top = \\Sigma$ - ковариационная матрица"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true,
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* $E[pc_i] = E[a_i^\\top x] = a_i^\\top E[x]$\n",
"* $cov[pc_i, pc_j] = E[pc_i \\cdot pc_j^\\top] = a_i^\\top \\Sigma a_j $\n",
"* $\\Sigma$ - симметричная и положительно определенная матрица.\n",
" * Собственные числа $\\lambda_i \\in \\mathbb{R}, \\lambda_i \\geq 0$ (Будем считать, что $\\lambda_1 > \\lambda_2 > \\dots > \\lambda_d $\n",
" * Собственные вектора при $\\lambda_i \\neq \\lambda_j $ ортогональны: $v_i^\\top v_j = 0$\n",
" * У каждого $\\lambda_i$ есть единственный $v_i$"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true,
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Первая компонента\n",
"$$ pc_1 = a_1 ^\\top x $$\n",
"\n",
"\\begin{equation}\n",
"\\begin{cases}\n",
"V[pc_1] = a_1^\\top \\Sigma a_1 \\rightarrow \\max_a \\\\\n",
"a_1^\\top a_1 = 1\n",
"\\end{cases}\n",
"\\end{equation}\n",
"* Строим функцию лагранжа\n",
"$$ \\mathcal{L}(a_1, \\nu) = a_1^\\top \\Sigma a_1 - \\nu (a_1^\\top a_1 - 1) \\rightarrow max_{a_1, \\nu}$$\n",
"* Считаем производую по $a_1$\n",
"$$ \\frac{\\partial\\mathcal{L}}{\\partial a_1} = 2\\Sigma a_1 - 2\\nu a_1 = 0 $$\n",
"* Получается, что $a_1$ один из собственных векторов матрицы $\\Sigma$, причем при $\\lambda_1$\n",
"$$ V[pc_1] = a_1^\\top \\Sigma a_1 = \\lambda_i a_1^\\top a_1 = \\lambda_i $$"
]
},
{
"cell_type": "markdown",
"metadata": {
"hidden": true,
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Вторая компонента\n",
"$$ pc_2 = a_2 ^\\top x $$\n",
"\n",
"\\begin{equation}\n",
"\\begin{cases}\n",
"V[pc_1] = a_2^\\top \\Sigma a_2 \\rightarrow \\max_a \\\\\n",
"a_2^\\top a_2 = 1 \\\\\n",
"cov[pc_1, pc_2] = a_2^\\top \\Sigma a_1 = \\lambda_1 a_2^\\top a_1 = 0\n",
"\\end{cases}\n",
"\\end{equation}\n",
"* Строим функцию лагранжа\n",
"$$ \\mathcal{L}(a_2, \\nu, \\tau) = a_2^\\top \\Sigma a_2 - \\nu (a_2^\\top a_2 - 1) - \\tau a_2^\\top a_1 \\rightarrow max_{a_1, \\nu}$$\n",
"\n",
"Аналогичными выкладками приходим к тому, что $a_2$ - собственный вектор $\\Sigma$ при $\\lambda_2$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Singular Value Decomposition\n",
"\n",
"Для любой матрицы $X$ размера $n \\times m$ можно найти разложение вида:\n",
"$$ X = U S V^\\top ,$$\n",
"где \n",
"* $U$ - унитарная матрица, состоящая из собственных векторов $XX^\\top$\n",
"* $V$ - унитарная матрица, состоящая из собственных векторов $X^\\top X$\n",
"* $S$ - диагональная матрица с сингулярными числами $s_i = \\sqrt{\\lambda_i}$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Матрицы $U$ и $V$ ортогональны и могут быть использованы для перехода к ортогональному базису:\n",
"$$ XV = US$$\n",
"\n",
"Сокращение размерности заключается в том, что вместо того, чтобы умножать $X$ на всю матрицу $V$, а лишь на первые $k"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## MNIST PCA\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Игрушечный пример"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.decomposition import PCA\n",
"from numpy.linalg import svd\n",
"from sklearn.datasets import load_digits"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"C = np.array([[0., -0.7], [1.5, 0.7]])\n",
"X = np.dot(np.random.randn(200, 2) + np.array([4, 2]), C)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"plt.scatter(X[:, 0], X[:, 1])\n",
"plt.axis('equal')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"pca = PCA(n_components=2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"PC = pca.fit_transform(X)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"coef = pca.components_\n",
"coef"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"m = np.mean(X,axis=0)\n",
"\n",
"fig, ax = plt.subplots(1,2)\n",
"\n",
"ax[0].plot([0, coef[0,0]*2]+m[0], [0, coef[0,1]*2]+m[1],'--k')\n",
"ax[0].plot([0, coef[1,0]*2]+m[0], [0, coef[1,1]*2]+m[1],'--k')\n",
"ax[0].scatter(X[:,0], X[:,1])\n",
"ax[0].set_xlabel('$x_1$')\n",
"ax[0].set_ylabel('$x_2$')\n",
"\n",
"ax[1].scatter(PC[:,0], PC[:,1])\n",
"ax[1].set_xlabel('$pc_1$')\n",
"ax[1].set_ylabel('$pc_2$')\n",
"\n",
"ax[0].axis('equal')\n",
"ax[1].axis('equal')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Сделаем все тоже самое через SVD"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"## Your Code Here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Чиселки"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"digits = load_digits()\n",
"X = digits.images\n",
"y = digits.target"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"plt.imshow(X[2,:], cmap='Greys', interpolation='none')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Задание\n",
"* Переведите изображения к формату \"матрица объект-признак\" (reshape)\n",
"* Выполните PCA c двумя компонентами и изобратите полученные точки на плоскости, раскаживая каждую точку в отдельный цвет в соответствии с `y`\n",
"* Отнормируйте данные, запустите SVD, домножте `X` на нужную матрицу и убедитесь, что у вас получается тот же результат"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"### Your Code Here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Пищевая ценность продуктов"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Загрузите набор данных о пищевом рационе в разных странах мира `diet.csv`\n",
"* Примените на данных PCA с 2 компонентами\n",
"* Изобразите объекты в сжатом пространстве"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df = pd.read_csv('diet.csv', sep=';')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df = df.dropna(axis=1)\n",
"df = df.drop('Energy (kcal/day)', axis=1)\n",
"df = df.set_index('Countries')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"X = df.values\n",
"X = (X - X.mean(axis=0))/X.std(axis=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"## Your Code Here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Скорее всего вы обнаружите некоторые выбросы, с этим ничего не поделать - PCA чувствителен к выбросам\n",
"* Удалите объекты-выборосы и повторите процедуру\n",
"* Постарайтесь проинтерпретировать главные компоненты"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"## Your Code Here"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Bonus: T-distributed stochastic neighbor embedding"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"* [Вывод](http://jmlr.csail.mit.edu/papers/volume9/vandermaaten08a/vandermaaten08a.pdf)\n",
"* [Примеры](http://lvdmaaten.github.io/tsne/)\n",
"* [Демо](http://distill.pub/2016/misread-tsne/)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python [conda root]",
"language": "python",
"name": "conda-root-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.13"
},
"livereveal": {
"theme": "serif",
"transition": "concave",
"width": "1024px"
},
"nav_menu": {},
"toc": {
"navigate_menu": true,
"number_sections": false,
"sideBar": true,
"threshold": 6,
"toc_cell": false,
"toc_section_display": "none",
"toc_window_display": false
},
"toc_position": {
"height": "973px",
"left": "0px",
"right": "1708px",
"top": "109px",
"width": "212px"
}
},
"nbformat": 4,
"nbformat_minor": 2
}