{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Майнор по Анализу Данных, Группа ИАД-2\n", "## 26/04/2017 Алгоритмы кластеризации" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "slideshow": { "slide_type": "notes" } }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "%matplotlib inline\n", "\n", "plt.style.use('ggplot')\n", "plt.rcParams['figure.figsize'] = (12,5)\n", "\n", "# Для кириллицы на графиках\n", "font = {'family': 'Verdana',\n", " 'weight': 'normal'}\n", "plt.rc('font', **font)\n", "\n", "try:\n", " from ipywidgets import interact, IntSlider, fixed, FloatSlider\n", "except ImportError:\n", " print u'Так надо'" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Пищевая ценность продуктов" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Загрузите файл `food.txt`. В нем содержится информация о пищевой ценности разных продуктов" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "# \"Name\" is the name of the item.\n", "#\n", "# \"Energy\" is the number of calories.\n", "#\n", "# \"Protein\" is the amount of protein in grams.\n", "#\n", "# \"Fat\" is the amount of fat in grams.\n", "#\n", "# \"Calcium\" is the amount of calcium in milligrams.\n", "#\n", "# \"Iron\" is the amount of iron in milligrams." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Подготовте данные к кластеризации - выполните нормализацию признаков\n", "* Сделайте иерарническую кластеризацию этого набора данных.\n", "* Изобразите дендрограмму\n", "* Выверите число кластеров и интерпретируйте их\n", "\n", "Почему перед применением кластеризации признки необходимо нормализовать?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from scipy.cluster.hierarchy import dendrogram, fcluster, linkage" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "df = pd.read_csv('food.txt', sep=' ')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Name | \n", "Energy | \n", "Protein | \n", "Fat | \n", "Calcium | \n", "Iron | \n", "
---|---|---|---|---|---|---|
0 | \n", "Braised beef | \n", "340 | \n", "20 | \n", "28 | \n", "9 | \n", "2.6 | \n", "
1 | \n", "Hamburger | \n", "245 | \n", "21 | \n", "17 | \n", "9 | \n", "2.7 | \n", "
2 | \n", "Roast beef | \n", "420 | \n", "15 | \n", "39 | \n", "7 | \n", "2.0 | \n", "
3 | \n", "Beefsteak | \n", "375 | \n", "19 | \n", "32 | \n", "9 | \n", "2.6 | \n", "
4 | \n", "Canned beef | \n", "180 | \n", "22 | \n", "10 | \n", "17 | \n", "3.7 | \n", "