{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# pandas 기초 (2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 싸이그래머 : 생물심리Py\n",
    "* 발표자 : 김무성    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 차례"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 추천 사이트\n",
    "* pandas로 데이터 다루기"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 추천 사이트"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 통계 기초 ppt : Basic statistics: a survival guide - https://education.med.imperial.ac.uk/ext/intercalate11-12/statistics.ppt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# pandas로 데이터 다루기"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 기본 연산, 함수, 정렬, 중복색인\n",
    "* 기본 자료 특성 탐색 \n",
    "* 누락값 처리\n",
    "* 계층적 색인\n",
    "* 로우 색인과 컬럼 색인 교환\n",
    "* Panel 데이터\n",
    "* 데이터 입출력\n",
    "* 데이터 다듬기\n",
    "* 데이터 그룹 연산"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 기본 연산, 함수, 정렬, 중복 색인"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 산술연산\n",
    "* 함수 적용과 매핑\n",
    "* 데이터 정렬, 순위\n",
    "* 중복 색인"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 산술연산"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* pandas에서 중요한 기능은 색인이 다른 객체 간의 산술연산이다. \n",
    "* 객체를 더할 때 짝이 맞지 않는 색인이 있다면 결과에 두 색인이 통합된다"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from pandas import Series, DataFrame\n",
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    7.3\n",
       "c   -2.5\n",
       "d    3.4\n",
       "e    1.5\n",
       "dtype: float64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1 = Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])\n",
    "s1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a   -2.1\n",
       "c    3.6\n",
       "e   -1.5\n",
       "f    4.0\n",
       "g    3.1\n",
       "dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2 = Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])\n",
    "s2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    5.2\n",
       "c    1.1\n",
       "d    NaN\n",
       "e    0.0\n",
       "f    NaN\n",
       "g    NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 더하기. 겹치는 색인이 없다면 데이터는 NA 값이 됨. \n",
    "s1 + s2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td> 0</td>\n",
       "      <td> 1</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td> 3</td>\n",
       "      <td> 4</td>\n",
       "      <td> 5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Colorado</th>\n",
       "      <td> 6</td>\n",
       "      <td> 7</td>\n",
       "      <td> 8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          b  c  d\n",
       "Ohio      0  1  2\n",
       "Texas     3  4  5\n",
       "Colorado  6  7  8"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DataFrame은 로우와 컬럼 모두에 전파됨\n",
    "df1 = DataFrame(np.arange(9.).reshape((3, 3)), columns=list('bcd'),\n",
    "                index=['Ohio', 'Texas', 'Colorado'])\n",
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Utah</th>\n",
       "      <td> 0</td>\n",
       "      <td>  1</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td> 3</td>\n",
       "      <td>  4</td>\n",
       "      <td>  5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td> 6</td>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oregon</th>\n",
       "      <td> 9</td>\n",
       "      <td> 10</td>\n",
       "      <td> 11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        b   d   e\n",
       "Utah    0   1   2\n",
       "Ohio    3   4   5\n",
       "Texas   6   7   8\n",
       "Oregon  9  10  11"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),\n",
    "                index=['Utah', 'Ohio', 'Texas', 'Oregon'])\n",
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Colorado</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td>  3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  6</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oregon</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>  9</td>\n",
       "      <td>NaN</td>\n",
       "      <td> 12</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Utah</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           b   c   d   e\n",
       "Colorado NaN NaN NaN NaN\n",
       "Ohio       3 NaN   6 NaN\n",
       "Oregon   NaN NaN NaN NaN\n",
       "Texas      9 NaN  12 NaN\n",
       "Utah     NaN NaN NaN NaN"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 + df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Utah</th>\n",
       "      <td> 0</td>\n",
       "      <td>  1</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td> 3</td>\n",
       "      <td>  4</td>\n",
       "      <td>  5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td> 6</td>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oregon</th>\n",
       "      <td> 9</td>\n",
       "      <td> 10</td>\n",
       "      <td> 11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        b   d   e\n",
       "Utah    0   1   2\n",
       "Ohio    3   4   5\n",
       "Texas   6   7   8\n",
       "Oregon  9  10  11"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DataFrame과 Series 간의 연산\n",
    "frame = DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'),\n",
    "                  index=['Utah', 'Ohio', 'Texas', 'Oregon'])\n",
    "frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b    0\n",
       "d    1\n",
       "e    2\n",
       "Name: Utah, dtype: float64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "series = frame.ix[0]\n",
    "series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Utah</th>\n",
       "      <td> 0</td>\n",
       "      <td> 0</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td> 3</td>\n",
       "      <td> 3</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td> 6</td>\n",
       "      <td> 6</td>\n",
       "      <td> 6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oregon</th>\n",
       "      <td> 9</td>\n",
       "      <td> 9</td>\n",
       "      <td> 9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        b  d  e\n",
       "Utah    0  0  0\n",
       "Ohio    3  3  3\n",
       "Texas   6  6  6\n",
       "Oregon  9  9  9"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 기본적으로 DataFrame 과 Series 간의 산술연산은 \n",
    "# Series의 색인을 DataFrame의 칼럼에 맞추고 아래 로우로 전파\n",
    "# 즉 broadcasting\n",
    "frame - series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b    0\n",
       "e    1\n",
       "f    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 만약 색인 값을 DataFrame의 칼럼이나 Series의 색인에서 찾을 수 없다면 \n",
    "# 그 객체는 형식을 맞추기 위해 재색인\n",
    "series2 = Series(range(3), index=['b', 'e', 'f'])\n",
    "series2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "      <th>f</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Utah</th>\n",
       "      <td> 0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  3</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td> 3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  6</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td> 6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  9</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oregon</th>\n",
       "      <td> 9</td>\n",
       "      <td>NaN</td>\n",
       "      <td> 12</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        b   d   e   f\n",
       "Utah    0 NaN   3 NaN\n",
       "Ohio    3 NaN   6 NaN\n",
       "Texas   6 NaN   9 NaN\n",
       "Oregon  9 NaN  12 NaN"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame + series2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 함수 적용과 매핑"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Utah</th>\n",
       "      <td> 1.230536</td>\n",
       "      <td> 0.954398</td>\n",
       "      <td> 0.779991</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td> 0.786592</td>\n",
       "      <td>-0.289047</td>\n",
       "      <td> 0.694158</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td> 0.374949</td>\n",
       "      <td> 0.655600</td>\n",
       "      <td> 0.573002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oregon</th>\n",
       "      <td>-1.460670</td>\n",
       "      <td>-1.101064</td>\n",
       "      <td>-0.326258</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               b         d         e\n",
       "Utah    1.230536  0.954398  0.779991\n",
       "Ohio    0.786592 -0.289047  0.694158\n",
       "Texas   0.374949  0.655600  0.573002\n",
       "Oregon -1.460670 -1.101064 -0.326258"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame = DataFrame(np.random.randn(4, 3), columns=list('bde'),\n",
    "                  index=['Utah', 'Ohio', 'Texas', 'Oregon'])\n",
    "frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Utah</th>\n",
       "      <td> 1.230536</td>\n",
       "      <td> 0.954398</td>\n",
       "      <td> 0.779991</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td> 0.786592</td>\n",
       "      <td> 0.289047</td>\n",
       "      <td> 0.694158</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td> 0.374949</td>\n",
       "      <td> 0.655600</td>\n",
       "      <td> 0.573002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oregon</th>\n",
       "      <td> 1.460670</td>\n",
       "      <td> 1.101064</td>\n",
       "      <td> 0.326258</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               b         d         e\n",
       "Utah    1.230536  0.954398  0.779991\n",
       "Ohio    0.786592  0.289047  0.694158\n",
       "Texas   0.374949  0.655600  0.573002\n",
       "Oregon  1.460670  1.101064  0.326258"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NumPy의 유니버셜 함수(배열의 각 원소에 적용되는 메서드) 적용 가능\n",
    "np.abs(frame)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b    2.691205\n",
       "d    2.055462\n",
       "e    1.106249\n",
       "dtype: float64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# apply를 통해 각 로우나 칼럼의 1차원 배열에 함수를 적용\n",
    "f = lambda x: x.max() - x.min()\n",
    "\n",
    "frame.apply(f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Utah      0.450544\n",
       "Ohio      1.075639\n",
       "Texas     0.280651\n",
       "Oregon    1.134412\n",
       "dtype: float64"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.apply(f, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>-1.460670</td>\n",
       "      <td>-1.101064</td>\n",
       "      <td>-0.326258</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td> 1.230536</td>\n",
       "      <td> 0.954398</td>\n",
       "      <td> 0.779991</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            b         d         e\n",
       "min -1.460670 -1.101064 -0.326258\n",
       "max  1.230536  0.954398  0.779991"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# apply 메서드에 전달된 함수는 스칼라 값을 반환할 필요 없으며, \n",
    "# Series 또는 여러 값을 반환해도 된다.\n",
    "def f(x):\n",
    "    return Series([x.min(), x.max()], index=['min', 'max'])\n",
    "\n",
    "frame.apply(f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>b</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Utah</th>\n",
       "      <td>  1.23</td>\n",
       "      <td>  0.95</td>\n",
       "      <td>  0.78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td>  0.79</td>\n",
       "      <td> -0.29</td>\n",
       "      <td>  0.69</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Texas</th>\n",
       "      <td>  0.37</td>\n",
       "      <td>  0.66</td>\n",
       "      <td>  0.57</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Oregon</th>\n",
       "      <td> -1.46</td>\n",
       "      <td> -1.10</td>\n",
       "      <td> -0.33</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            b      d      e\n",
       "Utah     1.23   0.95   0.78\n",
       "Ohio     0.79  -0.29   0.69\n",
       "Texas    0.37   0.66   0.57\n",
       "Oregon  -1.46  -1.10  -0.33"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 각 원소에 함수가 적용되게 하고 싶으면 applymap\n",
    "format = lambda x: '%.2f' % x\n",
    "\n",
    "frame.applymap(format)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 데이터 정렬, 순위"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 색인을 기준으로 정렬\n",
    "* 값을 기준으로 정렬\n",
    "* 특정 로우나 컬럼만 정렬"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 색인을 기준으로 정렬"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "d    0\n",
       "a    1\n",
       "b    2\n",
       "c    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj = Series(range(4), index=['d', 'a', 'b', 'c'])\n",
    "obj"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1\n",
       "b    2\n",
       "c    3\n",
       "d    0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj.sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>d</th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td> 0</td>\n",
       "      <td> 1</td>\n",
       "      <td> 2</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td> 4</td>\n",
       "      <td> 5</td>\n",
       "      <td> 6</td>\n",
       "      <td> 7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       d  a  b  c\n",
       "three  0  1  2  3\n",
       "one    4  5  6  7"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame = DataFrame(np.arange(8).reshape((2, 4)), index=['three', 'one'],\n",
    "                  columns=['d', 'a', 'b', 'c'])\n",
    "frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>d</th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td> 4</td>\n",
       "      <td> 5</td>\n",
       "      <td> 6</td>\n",
       "      <td> 7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td> 0</td>\n",
       "      <td> 1</td>\n",
       "      <td> 2</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       d  a  b  c\n",
       "one    4  5  6  7\n",
       "three  0  1  2  3"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.sort_index()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td> 1</td>\n",
       "      <td> 2</td>\n",
       "      <td> 3</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td> 5</td>\n",
       "      <td> 6</td>\n",
       "      <td> 7</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       a  b  c  d\n",
       "three  1  2  3  0\n",
       "one    5  6  7  4"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.sort_index(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>d</th>\n",
       "      <th>c</th>\n",
       "      <th>b</th>\n",
       "      <th>a</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>three</th>\n",
       "      <td> 0</td>\n",
       "      <td> 3</td>\n",
       "      <td> 2</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>one</th>\n",
       "      <td> 4</td>\n",
       "      <td> 7</td>\n",
       "      <td> 6</td>\n",
       "      <td> 5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       d  c  b  a\n",
       "three  0  3  2  1\n",
       "one    4  7  6  5"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.sort_index(axis=1, ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 값을 기준으로 정렬"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    4\n",
       "1    7\n",
       "2   -3\n",
       "3    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj = Series([4, 7, -3, 2])\n",
    "obj"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2   -3\n",
       "3    2\n",
       "0    4\n",
       "1    7\n",
       "dtype: int64"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj.order()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     4\n",
       "1   NaN\n",
       "2     7\n",
       "3   NaN\n",
       "4    -3\n",
       "5     2\n",
       "dtype: float64"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 정렬시 NaN은 가장 마지막에 위치\n",
    "obj = Series([4, np.nan, 7, np.nan, -3, 2])\n",
    "obj"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4    -3\n",
       "5     2\n",
       "0     4\n",
       "2     7\n",
       "1   NaN\n",
       "3   NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj.order()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 특정 로우나 컬럼만 정렬"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> 7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 0</td>\n",
       "      <td>-3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 1</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b\n",
       "0  0  4\n",
       "1  1  7\n",
       "2  0 -3\n",
       "3  1  2"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame = DataFrame({'b': [4, 7, -3, 2],\n",
    "                   'a': [0, 1, 0, 1]})\n",
    "frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 0</td>\n",
       "      <td>-3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 1</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> 7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b\n",
       "2  0 -3\n",
       "3  1  2\n",
       "0  0  4\n",
       "1  1  7"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.sort_index(by='b')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> 4</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> 7</td>\n",
       "      <td>-2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 0</td>\n",
       "      <td>-3</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 1</td>\n",
       "      <td> 2</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b  c\n",
       "0  0  4  1\n",
       "1  1  7 -2\n",
       "2  0 -3  0\n",
       "3  1  2  1"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 여러 개의 컬럼 정렬\n",
    "frame2 = DataFrame({'b': [4, 7, -3, 2],\n",
    "                   'a': [0, 1, 0, 1],\n",
    "                   'c': [1, -2, 0, 1]})\n",
    "frame2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 0</td>\n",
       "      <td>-3</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> 4</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> 7</td>\n",
       "      <td>-2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 1</td>\n",
       "      <td> 2</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b  c\n",
       "2  0 -3  0\n",
       "0  0  4  1\n",
       "1  1  7 -2\n",
       "3  1  2  1"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame2.sort_index(by=['a', 'c'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 기본 자료 특성 탐색"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 자료특성 요약\n",
    "* 상관관계와 공분산\n",
    "* 유일 값, 카운트, 멤버십"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 자료특성 요약"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td> 1.40</td>\n",
       "      <td> NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td> 7.10</td>\n",
       "      <td>-4.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>  NaN</td>\n",
       "      <td> NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td> 0.75</td>\n",
       "      <td>-1.3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    one  two\n",
       "a  1.40  NaN\n",
       "b  7.10 -4.5\n",
       "c   NaN  NaN\n",
       "d  0.75 -1.3"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = DataFrame([[1.4, np.nan], [7.1, -4.5],\n",
    "                [np.nan, np.nan], [0.75, -1.3]],\n",
    "               index=['a', 'b', 'c', 'd'],\n",
    "               columns=['one', 'two'])\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td> 3.000000</td>\n",
       "      <td> 2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td> 3.083333</td>\n",
       "      <td>-2.900000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td> 3.493685</td>\n",
       "      <td> 2.262742</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td> 0.750000</td>\n",
       "      <td>-4.500000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td> 1.075000</td>\n",
       "      <td>-3.700000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td> 1.400000</td>\n",
       "      <td>-2.900000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td> 4.250000</td>\n",
       "      <td>-2.100000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td> 7.100000</td>\n",
       "      <td>-1.300000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            one       two\n",
       "count  3.000000  2.000000\n",
       "mean   3.083333 -2.900000\n",
       "std    3.493685  2.262742\n",
       "min    0.750000 -4.500000\n",
       "25%    1.075000 -3.700000\n",
       "50%    1.400000 -2.900000\n",
       "75%    4.250000 -2.100000\n",
       "max    7.100000 -1.300000"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "one    9.25\n",
       "two   -5.80\n",
       "dtype: float64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    1.40\n",
       "b    2.60\n",
       "c     NaN\n",
       "d   -0.55\n",
       "dtype: float64"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sum(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a      NaN\n",
       "b    1.300\n",
       "c      NaN\n",
       "d   -0.275\n",
       "dtype: float64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mean(axis=1, skipna=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "one    b\n",
       "two    d\n",
       "dtype: object"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.idxmax()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     a\n",
       "1     a\n",
       "2     b\n",
       "3     c\n",
       "4     a\n",
       "5     a\n",
       "6     b\n",
       "7     c\n",
       "8     a\n",
       "9     a\n",
       "10    b\n",
       "11    c\n",
       "12    a\n",
       "13    a\n",
       "14    b\n",
       "15    c\n",
       "dtype: object"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 수치 데이터가 아니면 빈도 관련 기술요약치를 반환\n",
    "obj = Series(['a', 'a', 'b', 'c'] * 4)\n",
    "obj"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "count     16\n",
       "unique     3\n",
       "top        a\n",
       "freq       8\n",
       "dtype: object"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 상관관계와 공분산"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 주식 데이터 가져오기\n",
    "import pandas.io.data as web\n",
    "\n",
    "all_data = {}\n",
    "for ticker in ['AAPL', 'IBM', 'MSFT']:\n",
    "    all_data[ticker] = web.get_data_yahoo(ticker, '1/1/2000', '1/1/2010')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>AAPL</th>\n",
       "      <th>IBM</th>\n",
       "      <th>MSFT</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2000-01-03</th>\n",
       "      <td> 3.77</td>\n",
       "      <td> 94.16</td>\n",
       "      <td> 41.47</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-04</th>\n",
       "      <td> 3.45</td>\n",
       "      <td> 90.97</td>\n",
       "      <td> 40.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-05</th>\n",
       "      <td> 3.50</td>\n",
       "      <td> 94.16</td>\n",
       "      <td> 40.49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-06</th>\n",
       "      <td> 3.20</td>\n",
       "      <td> 92.54</td>\n",
       "      <td> 39.14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-07</th>\n",
       "      <td> 3.35</td>\n",
       "      <td> 92.13</td>\n",
       "      <td> 39.65</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            AAPL    IBM   MSFT\n",
       "Date                          \n",
       "2000-01-03  3.77  94.16  41.47\n",
       "2000-01-04  3.45  90.97  40.07\n",
       "2000-01-05  3.50  94.16  40.49\n",
       "2000-01-06  3.20  92.54  39.14\n",
       "2000-01-07  3.35  92.13  39.65"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "price = DataFrame({tic: data['Adj Close']\n",
    "                   for tic, data in all_data.iteritems()})\n",
    "price.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>AAPL</th>\n",
       "      <th>IBM</th>\n",
       "      <th>MSFT</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2000-01-03</th>\n",
       "      <td> 133949200</td>\n",
       "      <td> 10347700</td>\n",
       "      <td> 53228400</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-04</th>\n",
       "      <td> 128094400</td>\n",
       "      <td>  8227800</td>\n",
       "      <td> 54119000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-05</th>\n",
       "      <td> 194580400</td>\n",
       "      <td> 12733200</td>\n",
       "      <td> 64059600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-06</th>\n",
       "      <td> 191993200</td>\n",
       "      <td>  7971900</td>\n",
       "      <td> 54976600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-07</th>\n",
       "      <td> 115183600</td>\n",
       "      <td> 11856700</td>\n",
       "      <td> 62013600</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 AAPL       IBM      MSFT\n",
       "Date                                     \n",
       "2000-01-03  133949200  10347700  53228400\n",
       "2000-01-04  128094400   8227800  54119000\n",
       "2000-01-05  194580400  12733200  64059600\n",
       "2000-01-06  191993200   7971900  54976600\n",
       "2000-01-07  115183600  11856700  62013600"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "volume = DataFrame({tic: data['Volume']\n",
    "                    for tic, data in all_data.iteritems()})\n",
    "volume.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>AAPL</th>\n",
       "      <th>IBM</th>\n",
       "      <th>MSFT</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2009-12-24</th>\n",
       "      <td> 0.034521</td>\n",
       "      <td> 0.004432</td>\n",
       "      <td> 0.002600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009-12-28</th>\n",
       "      <td> 0.012070</td>\n",
       "      <td> 0.013323</td>\n",
       "      <td> 0.005187</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009-12-29</th>\n",
       "      <td>-0.011575</td>\n",
       "      <td>-0.003517</td>\n",
       "      <td> 0.007372</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009-12-30</th>\n",
       "      <td> 0.012065</td>\n",
       "      <td> 0.005463</td>\n",
       "      <td>-0.013904</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2009-12-31</th>\n",
       "      <td>-0.004208</td>\n",
       "      <td>-0.012621</td>\n",
       "      <td>-0.015584</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                AAPL       IBM      MSFT\n",
       "Date                                    \n",
       "2009-12-24  0.034521  0.004432  0.002600\n",
       "2009-12-28  0.012070  0.013323  0.005187\n",
       "2009-12-29 -0.011575 -0.003517  0.007372\n",
       "2009-12-30  0.012065  0.005463 -0.013904\n",
       "2009-12-31 -0.004208 -0.012621 -0.015584"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 각 주식의 퍼센트 변화율\n",
    "returns = price.pct_change()\n",
    "returns.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.text.Text at 0x4ca69d0>"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": [
       "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEPCAYAAACOU4kjAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\n",
       "AAALEgAACxIB0t1+/AAAIABJREFUeJztnX94HdV55z+vbAQyFrZlG9uAgUYJUQguiKaps86u1Bbb\n",
       "JN11Au6G0KarJinQbovBFomhUHAbsSENhpSkWxbKDzdpfj3Nksc0qYRhkRvSNCktGIeEJKSB8jsx\n",
       "JC00SgXRu3+cGd25c+dKV/fO/SHp+3meeTQz95yZc0fS+c553/e8x9wdIYQQIk/amt0AIYQQcw+J\n",
       "ixBCiNyRuAghhMgdiYsQQojckbgIIYTIHYmLEEKI3GmquJjZmWb2iJl9x8x2ZnzeY2ZfMbOfmNlg\n",
       "6rPHzOwhM3vAzL7WuFYLIYSYjoXNurGZLQA+BpwBPAX8g5ntdfdvJoo9D1wIvD3jEg70u/sLdW+s\n",
       "EEKIGdHMkcsbgUfd/TF3fxn4NPC2ZAF3/4G73w+8XOYaVuc2CiGEqIJmisuxwBOJ4yejc5XiwN1m\n",
       "dr+ZnZdry4QQQtRE08xiBHGohQ3u/oyZrQT2mdkj7v6lPBomhBCiNpopLk8BaxPHawmjl4pw92ei\n",
       "nz8wszsIZrYicTEzJU4TQogqcPea3A7NNIvdD7zGzE40s3bgHGBvmbJFX9LMFplZZ7R/JLAJOJhV\n",
       "0d1barvqqqua3obZ0KZWbZfapDbNh3blQdNGLu7+ipn9HjACLABucfdvmtkF0ef/x8xWA/8AHAVM\n",
       "mNlFwMnA0cD/NTMI3+Ev3f2uZnwPIYQQpTTTLIa7/w3wN6lz/yex/yzFprOYl4DT6ts6IYQQ1aIZ\n",
       "+g2mv7+/2U0ooRXbBK3ZLrWpMtSmymnVdtWK5WVfa0XMzOfy9xNCiHpgZvgsdugLIYSYo0hchBBC\n",
       "5I7ERQghRO5IXIQQQuSOxEUIIUTuSFyEEELkjsRFCCFE7khchBBC5I7ERQghRO5IXIQQQuSOxEWI\n",
       "WcjIyAibNm1l06atjIyMNLs5QpSg3GJCzDJGRkY466wBxsY+BEBHx07uuGMPmzdvbnLLxFwhj9xi\n",
       "EhchZhmbNm1l374twEB0Zg8bN+7lrrs+18xmiTmEElcKMW85CGyNtsxFWFsCme/mLxIXIaqkWR1n\n",
       "X9/pwM3Almi7OTrXWsTmu337trBv3xbOOmtAAjOPaOpKlELMVtJ+j/vuG2iY32P//n8CbqBgFoP9\n",
       "+/dy+eV1v/WM2L37puj5hHaOjYVz8g3NDzRyEaIKijvOIDK7d9/UtPYcOvS8zE+ipdDIRYhZxuDg\n",
       "+dx33wBjY+G4vf19PPzwy4yPfwRo7ChqKtLt7OjYyeDgnqa2STQORYsJUQXNDgceGRmZHCkdOvQc\n",
       "DzxwHq0YPZZs5+Dg+U0XPFEZCkWeBomLqCet0nEqNFnkjcRlGiQuYj7Q7FGUmHvM+nkuZnammT1i\n",
       "Zt8xs50Zn/eY2VfM7CdmNjiTukLMFzZv3swdd4TRysaNeyUsoiVo2sjFzBYA3wLOAJ4C/gE4192/\n",
       "mSizEjgBeDvwQ3ffXWndqJxGLkIIMUNm+8jljcCj7v6Yu78MfBp4W7KAu//A3e8HXp5pXSGEEM2j\n",
       "meJyLPBE4vjJ6Fy96wohhKgzzZznUou9quK6u3btmtzv7++nv7+/htsKIaB1IuVEPoyOjjI6Oprr\n",
       "NZvpc1kP7HL3M6Pjy4AJd/9QRtmrgJcSPpeK6srnIkT+KDpt7jPbfS73A68xsxPNrB04B9hbpmz6\n",
       "S86krhAiR1ot9Y1oTZomLu7+CvB7wAjwDeAz7v5NM7vAzC4AMLPVZvYEsB24wsz+xcwWl6vbnG8i\n",
       "xOxCafBFI9AkSiHmEXmYtGQWm/tohv40SFyEKCavVDFy6M9t8hAXZUUWQsyYzZs3S1DElEhchJhH\n",
       "KA2+aBQyiwkxz5BJS0yHfC7TIHERQoiZM9vnuQgh6kirhhy3artEvmjkIkSDaKQ5qlXDhVu1XaKY\n",
       "PEYuuPuc3cLXE6IxDA8P+8aNZ/vGjWf78PBwyWcdHascbne43Ts6VpWUyZONG8+O7uXRdrtv3Hh2\n",
       "3e4329slion6zpr6X0WLCZED6Tfy++4bKHojL06ZAmNj4Zze2MVcRT4XIXKgXvm2qvVPDA6eT0fH\n",
       "TmAPsCcKOT6/5vbUSqu2S+SPxEWIBlBNpxqPhvbt28K+fVs466yBigUmvfTx5ZdfyO7dNzXdia4l\n",
       "mecRtdrVWnlDPhfRICrxqUzlk8kiL/9Eo/09YvaDfC5CtAbxG3khGqz0jbxZKVPk7xHNQOIiRE7k\n",
       "LR6tnqpFM/3FVMjnIkSDmKlzvhL/RCXX7Os7HdhG7O+BbdG56toV16nWHyTmCbXa1Vp5Qz4X0SLU\n",
       "w+9R6TWD72bQ4exoG5z03VTbLs1XmduQg89FIxchciZrJFCPUOWZXXMd8LloW1flNYSoHPlchKiC\n",
       "cv6GcpMpm0k9fDet7g8SLUCtQ59W3pBZTNSBqUxJ5cxFjTCLtbUt86GhobJls8Kga2nXTEOrxeyB\n",
       "HMxiTReAem4SF1EPpvI3TPVZPTrjoaEhb2tb7rDeYbAq0ZJIiDR5iIvMYmLek2dI7VTmompDldPt\n",
       "AyaPDx16jomJ3RTmsKyb8RwWLVks6kKt6tTKGxq5iGmYziyU9VZfTZ282tfevtTb21cWmcJCJFjp\n",
       "SEmIakFmMYmLqI3pzFjlRGQqH0byfK1CU9q+9SXtDWYxpXYR+ZGHuDTVLGZmZwIfARYAf+7uH8oo\n",
       "cwPwFuDHwG+6+wPR+ceAfwN+Crzs7m9sVLtFZVx99dVcd91tAOzY8W4uv/zyJrdoZkyVNiXLlJSO\n",
       "FNu//zeAlxkf/whQmoY/L0499RRWrNgLMGmC27Rpa3SsmfOiSdSqTtVuBEF5FDgROAx4EHhdqsxb\n",
       "gS9G+78A/H3is+8BXdPcIwcNF9UwNDTkcNTkGzUcVTaSqZlUE/lVjqzyYaRRWf1K2pc2i7W3r/Te\n",
       "3g1FI6V0BFlv7waNZhrEXAmOYDabxYA3AcOJ40uBS1NlbgTOSRw/Aqzygrgsn+YeeTxnUQVdXd0l\n",
       "HW1XV3ezm5VJuQ5hppFY9RCXrPYNDQ15V1e3d3au9YULlxQJY3f3aZltkLms/syl7NOzXVx+Fbg5\n",
       "cfwu4KOpMncC/ylxfDdwerT/z8ADwP3AeWXukc+TFjNmtohLZfM/Bh2WeUfH6ilHX6WjjJXe3r50\n",
       "ys5mpm+6xYJ3SuoZh3aWisvZcvQ3gLmUEicPcWmmz8UrLGdlzr/Z3Z82s5XAPjN7xN2/lC60a9eu\n",
       "yf3+/n76+/tn2k5RBTt2vJsrrtiWOLONHTve37T2QHZIb7mliQv+ltXATuB6xsbgiiu2cdttn+FP\n",
       "//TDmSn1i9PufxygbBr+q6++miuvvJ6JidcAx3LPPb/Oqaeewgc/eFmmn2RkZIQrr9zNxMT10ZnB\n",
       "5KfAZ4BVwPsS57cDnwKeneHTKkVZkOcuo6OjjI6O5nvRWtWp2g1YT7FZ7DJgZ6rMjcA7E8eTZrFU\n",
       "uauAwYzztUu4qJrYfNPV1d10f0uWyaK3t6/sm2bhLTTb1FWryWN4eDgKI45HRkf5dCOcMBpc7zCc\n",
       "GKksjX6uKBplwfEOPdFWu4mmkSaf2eq3kFmsdcxiC4HvEhz67Uzv0F9P5NAHFgGd0f6RwJeBTRn3\n",
       "yO1hi9lNlskiy3QXd2i9vX2R+amnrJkpdqRP1QmW6yiL21M+ZczGjWd7b++GIic+rIoE5nY36/TO\n",
       "zuMjUdmQMIuFIIo1a47PpZNulMlntnfQs1UY08xqcQnt5y3AtwhRY5dF5y4ALkiU+Vj0+QEK/pZX\n",
       "RWL0IPD1uG7G9XN72GJ2k9U59vZuKOnIhoaGis6FEcWijI59MDHyKD/ayLr+xo1np4Qtq219Ud1B\n",
       "h+NSI5YweopzifX2bohGLqVzYMyW5dLJNUpc5pLfYjYz68Wl3pvERcSUeyNOv2lmmcpiZ34YFQx6\n",
       "8cTF8p1g1joq0BX9TJrCis1i8YgDtnowe5WOWLq6uidFo9DmLBPecbl0zo0aUUhcWoM8xEW5xcS8\n",
       "YKo17pPp8g8c+HpG7dfy5jcfQ1/f6Vx33W2Mj3+S8fGFjI/fSHD4h/r//M//XDR58dCh54C/Ba6N\n",
       "rnMJcDTwCcKKkNDWdhETEwuAZcBtUbljeOaZ54B7CHOMBwgO+5+J9v+dE074Oe6//352776Jxx9/\n",
       "Mqp3PvGEz8L9VlT1vNJM9fzyRKn85xC1qlMrb2jkMm/Iw9ZdGGkkzWJhtNLb2xf5PZLO89uj/UGH\n",
       "JQ5HFr3ZZ885SY4yBt1sSeJaK1Mjla7o53CqTfE9jyozCuryEKZ8lLe3L60o71kr+QpaqS3zFWQW\n",
       "k7iIyk02yU4r9n0kO7CCSWY46vzXO/R4R8cqX7PmxOi429OJImG1F3weBb9IsV9lOFV/vZtlzUlJ\n",
       "TrqMI79KfSkFE9jZk2W7urq9u3udd3Ss8YULj/bu7pMrWrtltjvRRf5IXCQuwiuz05dOiiwN/U13\n",
       "ssH38SofGBhIdPJx+O8Gj/0fhRFGPGroi0Y7G8qMdo7ytrbDE20oJy63e4hWO64CcSkNUEinhin3\n",
       "nLKDHfqa88sULUEe4iKfi8iVRk60i+/1j/94ANhS9Nk//uMBTj+9H3iFFStWcejQ84kklFuBG0gn\n",
       "pLzrrs/xjnecyZ49FwM9wEaeeWaYPXs+D/xJdOWdwHsJ0e/vIuRT/R0Kky3fHX12C8uXv4GJiR8T\n",
       "/CuvJumfmZjYHtW9JNHqbcB/EDIjQQiE/BlCbtaLEuUuidq+DTgPuIS2ttt59NEjixJtjo/DAw/c\n",
       "CGzhvvsG6OnpqfjZHjjwdUZGRjRRUlSNxKVJzMXZzuXWj6/Hdyu+188QOtqYbbzwwnm88MI6YAew\n",
       "HrPRiq575533EZzoTwLXExzt51DsKL8ROAbYANxKEJO7CGLzCSB8/7vvvoiQnzV26A8QO/LhtdHP\n",
       "/wp8gCAqY0AH8NvRZzuADxME6RLCbPtjgRW0td3Ob/zGWTz00N9z4MA3mJi4nhdfvDHjGx0DxA7y\n",
       "m+no2JnpLL/nnnOZmIjr7GRi4jdnvOiYEEXUOvRp5Y0WNYvNVRt3I8NIS+81GCVzjCcUJk1IfQlz\n",
       "VqlZrL195eTzD36SZI6u2IGenGOy3GGoxNRVmusry8y13oNzfjAyeaXn1MRtj300K6Jyi0o+6+rq\n",
       "jua4JP066WCDQruT5rG0szxcZ31kZhuu6+9OtD7I5zI7xWWuxvJX+r3yi+wqvVfWrPvgRI/9F2d7\n",
       "8JcsjjrT9Q6LfPHiNT40NBQtFbA04xqxU/0ohxM8e+Z+uXrJ4y4P81e6PESYZZUfjq4Vty+eyDno\n",
       "6cix0pUoByPR6Zs2aWb6d1LrC4+ivOYOEheJS0tRSQeV16it3HUKs9WTb+893t6+0hcuPDLqrJMO\n",
       "8uHEuUVRPrQTMjr9Li/k8UqPMjzaP9JLRzPp2f2DkdgNe/ZIZ1kkXOnvcIKHEVMscnFE2ykJoRws\n",
       "ygLQ29tX5NCv5JlWKw5zdTQ+X5G4tLC4TPWPOpf/EafroPIU1qx7DQ0NuVlhVGK22Ht7N/jQ0FAi\n",
       "P1eygy6eP9LRscKhw9Mz5sMoI21uS85JWerFnf76SGxiQTvbgyltvcNaD6Onnoz7HJ4QlrS4rYlE\n",
       "MD0XJ8x7aWtb5gMDAyWLhTUiaehcfWGar0hcWlRcKn2Dn48mhHp2QoXnHuaRtLUtn+xYi+8b+yay\n",
       "5o8sT4ww+ryQriXL93JKJBoneiEcOXmt9V7wq5SbfLnYCyawuEzW5MtVDl2R+JVfsyXLLNjWtnzy\n",
       "b6xef3cSl7lFHuKiaLE6MNXa6zFZa7DPB+qZ3iM893cRFik9homJDezf/09cfjkcOvR8ouRmYACz\n",
       "2wjvIAUWLJjgpz8dj46+A5wEbIyOXw38FiF67FbgPcAh4AVCSPDFiSv9LtAJvAL8CvBZQtTYQKLM\n",
       "XkJe1huBr0Tn1gE3p64VhxyvY2wsGbpcGRMTr5mMTKxXNJ/Stog0EhfRMOLw656eVwO3sWLF8lxz\n",
       "VIVcXqPAddGZHRw69DquvvpqHnzwIUJo753Aw8D3WbBgglde2Z64wvZIWE4jdPA3ROffCRxGCAuG\n",
       "0PEfA/w5IdQ4njOzLLpHW3T+mqj8NmAcOEjIEXYT8HRUBkLY81bgdOCvo+PXE+a2HBbdayPxHBm4\n",
       "hra27YnQ4TDvpaNjJzt2XMiVV24vCisOIdLfq+ilp1oalXtMzCJqHfq08kYLm8XmE8XrowzW7Zlk\n",
       "5fJqb+9ys6Rv5KjIFJV0zJ/kcLTDER58JGmzU5b5LL7m6tRnJ5cxW4WAgYL5LPajxFFg6czIi7wQ\n",
       "TRabxeKMAOsnHfVZTvvipZAHJ5+1TFeiUpDPpTXFxX3++lTSlKZUKXSS03VsM32G2WHIWalT1ntI\n",
       "EjnsISw4jvyK/TBpMckSl9UefDLJNC7xtbLKry1z/pRoP93xl5szs2IySGG60OJ0HrX0omPz/aVH\n",
       "lEfi0sLiIgJZb8txRxqLS5aIVDP6K4QhD0Yd8XIPo5Ks+w96WAo4Ln926rNFkTCtdjgsOi7Miyl2\n",
       "8g94CC+ORyVDXjoJ88QyQpclLsOeHSDQ5WFkVfnoL/0c29uXem9v37x/6RFTI3GRuLQ82eJSWIO+\n",
       "nIhUY8IJEyDTYcRpU9QqL51dv8ILqfCHow48fY2kaW2xw7pIUE7xEFbsXhwmHAvcUg/mtmVR+eR1\n",
       "l0bC1ROJ2JJEvXS48RIvjIyGPDlhciqhkClMVEMe4iKHvqgr6SiitrbtnHrqyXzwg8Hhu2nT1kwn\n",
       "80wZGRnh6qs/CpxKyM01kPj0ZuAPgH8nJJb8a0ojt64iOMavJUR5/TohmgvCqtpxssgRgsP+GQo5\n",
       "w7YBa4EfALdQWOCLqN4thDxlAIMER/9LhASYfwZ8n0IQwjagPfoOeyg4/9cA9xGixm4D3gDs4YUX\n",
       "rmXfvvrmcROiGiQuoq6URhF9qqIOMCu0ta/vwqKVHpMrSP7ar/0uY2M/AxyecbVnCQLxVUKSyR9k\n",
       "lPk3Qvbhi4EJQsd+LSHCa5QQLrya0NmfTKmAbScknXwlqhPzZYrFhui6hxOiy7qAodTngwSRuYGQ\n",
       "7XkQ+Mvoe9xISHK5i6RAlov8Uohw6zAXk9VOSa1Dn1bekFms5ckyiyUd0LF/YGhoqKKFrkqjrGJn\n",
       "fXJ/2It9IukyscksPYO/KzrOcszH51Z4sU8my3cSrwcTO/9jn0/s7+n2Uh/POo9XvDzssMXe2bk2\n",
       "dd1B7+w8vmxGiFrSukxXV8Er0zPbIkiRz0XiMhdIRzZV6oOJhae08+6JOvB4oa5CBxxE5bhIgI7z\n",
       "EIK8wYPPZa0X0rUUAg+Kr73Cg88l6cdJz9xf78GHsjQShqR/J25THNk25KUpYDrKiFdID9PWtsS7\n",
       "u0/2QjDCKZ7MexanfKm1029krri5zmzzfeUhLjKLiRlRj6F9MlvBTHwwBw58nVNPPSXjk5OBSwn+\n",
       "jnVxywmmrtj3sY0w0/4x4G6Cn2Ux8K+ENVZ2Aiszrr2CYJ76SVTmp1Fb089hBbAEeBz4EXAF0A18\n",
       "PKr/8ajcP5FcuCxwRcZ9jyGY4q5lYmIx3/vet6LveWviO+0E9jAxcT1/8AfbOeywhYyPh4mf1fhk\n",
       "Kpl0Wc+JmWJ2I3ERFZNeDGz//t/g9a8/iRUrVtXdhjw4eD533/2ruMcLYh1kYmIFBw4cJPg7DhKE\n",
       "JHSwoQM/ksIiYjdS6sS/mTBbP15lMk6t8jfRtf6O4pUiLybMmP9IonxfdJ11iXNjgAO/Twge+AWC\n",
       "3+Y/gH0EJ/4awmz+o0mvoglLo+8Rk/xOTwOvY2Li+qj+9anvdBOwBfcjGR8v+HLU6TeXeen7qnXo\n",
       "U8sGnAk8QkjitLNMmRuizw8AvTOsm88YUbh7+bDiPM0h5cwsw8PDiZn28TyUpLlpiSfngBTS3ccT\n",
       "I6da5yX9fVZ7wSczHJXrcTgmo3zsJ+nyMK+mx4N/JL5WZ6qd6TDnJV6clj8+LjV3FSfUnGqCaJy2\n",
       "vzYzjMxi+TKbfFPMZp8LIbHSo8CJhNfBB4HXpcq8FfhitP8LwN9XWtclLrlTfkJkdZ1XObL+CYvv\n",
       "Hae0z5qQuCz62RPtr4i2rV48V+UoDz6WrM55aer6sV8ka9b82al6izz4ZOI1WdLpYcqlkkk677cm\n",
       "rtfl6RUiCytUDnlxwEEsKj3e3r60qtn46Wcvh/78JA9xaaZZ7I3Ao+7+GICZfRp4G/DNRJktRIuO\n",
       "u/tXzWypma0mLJo+XV2RM+mhfTABfSL3+1SfMXoxYY35J4EegtX3W8Ai4P8Bv0wwj32b4P/4FsG0\n",
       "FHMJ8CLBnPYYcCHB3PZlwgD6jymY2Yj2N1JstrqYMH/ly4TszB0Vtv27wMvA71CYP7MH+J+EP+vf\n",
       "jq5/EYsXL+KllwaAywnzXXZF3/k9tLXdGs0jCqa7mSSSTJs9Yz/NXXd9bsp68zXDt5iaZorLscAT\n",
       "ieMnCaOT6cocS/BuTldX5ExyzsqhQ8/z8MOvMD7+LLBnynkoeVAsbAY8RLE/ZBvBp3E4hUmIB4F/\n",
       "oeBTuZgwyfIkgjX1fxPmrnyQ0LlPEIQontB4CcE/EgvEfxAmMcaTK88jCMAnCA79PQRR+zLhT/RJ\n",
       "QtBA0n/yECHbccxFhLkxvww8RcF3Q9T+doJY3Qg8wsDA2zn33HMjEQhl29u/G/m+vpc5j2j37psm\n",
       "RWaq34mc8yJPmikuXmE5q+Umu3btmtzv7++nv7+/lsvNe5JvqcnIsb6+C7n66o9OuVZILZFmsbBd\n",
       "dtkHOHDgG0xMHB998gHgOEJHfwth9BI7srdSiMQaIfy5v4HQ+cc56TdH2yWECK4/pthBfiPwdULQ\n",
       "wLGEzj8eWVxCiBbbRXDUf4KQ3v726NzXovLJmfbHE9Z3uZjwp/1bUZnthIF6cmSUnu2/h4ceuo3b\n",
       "b09PTP145rMsNxKRWIg0o6OjjI6O5nvRWu1q1W7AemA4cXwZKcc84T/7nYnjR4BVldR1+VwaynRx\n",
       "/DNdnTOeSJm24xfuUy64oC/lm0nuJ/N1xZMY4wCBctmMY1/IUodXZdTJyv21wbOd91nzYZL3WuvF\n",
       "EypLfTzJVSVr/Z2kkXNexDDLfS73A68xsxMJr3TnAOemyuwFfg/4tJmtB37k7s+Z2fMV1BUtxHQm\n",
       "l/Rb9r59hdUXs9+4z6d4hBGX30j4c4DgmotHAk9HW9yGvYR3lL2EP8UbCGas5MjhEoIf5DMEf8fe\n",
       "6JrvJ5ir0vNTdhLMYY9HZT9HCFO+hDBSejel82GSvETw9cQ+jksIIxoIJrJbmZg4lssu+0BdRh9a\n",
       "8EvkSq3qVMsGvIXgVX0UuCw6dwFwQaLMx6LPDwCnT1U34/o56biYjuneegtv0cMeR3v19m7I+Dz5\n",
       "Jr+hpOzw8HAiCioOAT7Nw7oqi7wQvtsT1R3yQnRVMhVLsj3LE6OhONQ3Dj/e4MUjjThKa7mXtndp\n",
       "NHpJp9tf7HCCF4ckx+nzk+Xi0VAynHogakMy0m2JDw0N1fw7EaIczOZQ5EZsEpfGMlVIahCFpUUd\n",
       "bHv7yjKhxu6FVC2hrNnSKNfYBl+4cIkXm6tOjjridBqVZCcdm63iTjrePy7aL5czrMcLc0diAYvr\n",
       "J+8Xz0Hp8pBif2203xOVjevG7T4qOhffN2ku604IXdbiZeXNY/Gqn11d3d7bu6GseVGIqZC4SFxm\n",
       "Fb29fSWdZOwDCEvzJt/4l5aULe1oh6PjIz2MJJKd9NboXJw7LBaJDi8sL5zs6IcyRgiLMgQrHgGd\n",
       "FglTeg5KPNflyEg44uWTY7GL5+IsisQlK7FlYe5QV1e3d3Yen/ks0v6T4lFdGA21ty+VqIgZk4e4\n",
       "KP2LaBgrViwvOXfo0POTa7FMTLwHuJG2tu+watXRPPNMuvQxif0Rgm9lDSEK7DcJUWDnEsKRf0zw\n",
       "iUDwoxjBp3IEIeXKIUK4MhTm61xKmFeyjbBmy+GUplfZS4jq2ksIH76FwhyUeC2WXyZEj30oqnMx\n",
       "haivawnRY1cQotqepOBXidsahzhvY8eO9/OGN7yBt771XCbiADd2EqLSvlf0dHbvvinKJVZo7/j4\n",
       "jU0NJ553aeZFgVrVqZU35tnIpdVnSpd7sw7LExe/mff2bkil0o8jrYaj/RO8kNolvWpjnDIl+aZ/\n",
       "vBfSvWSlfenykLH4CC+YvZZllEvOyD/Kg0luZVR/qxcix5L3zzK3JVO3xCOwOEtzIf1+8cguHp0N\n",
       "ZvpPyqXnqTRzQt5/P/L5zF6QWUziEjNb/pGDkPREHXyfx0v1ZpnL4s6ut7cv8teE77Zw4ZEJc1K5\n",
       "VDDpMN/jEmLRl1E+ThsTO/ljh37STJZ0uscpZdJ+l+GM+xf7jxYsWBaJWNrBH/uNCv6XdDj3VD6t\n",
       "3t4NifxrMzOL1ePvZ7almRcF6iouhOXvdkQ/09uOWm/ciG0+iUvWP3JXV3fLCUwQl+L167u71yU6\n",
       "tkE3W+adncd7b++GyfYXhGZDNPpJCkFW5NbyxD2O8rBWfZxEMna8x58v8xCVtcwLo6B4RBQLWZyv\n",
       "LB6hlBO1sxPXTIrSVo8XG4sd7WadUTtXefD3JIVpZvnAgviuj9q5uOT5TUc9hEDiMnupt7hMAA8A\n",
       "/4uwwHjRVuuNG7HNd3GB9S03gsly6vf29pV98zZbXNRJFoc0xya29AgimahypYdRS1ckMEd4IYor\n",
       "Dls+PHEcC0y8CuWgF5JgdkfHaYEqfubh8yO8OCw6jmwrmLqGh4czR23lVpTMYnh42BcvXuNpwU6G\n",
       "eSfLlhv51EMIZstoWpRSb3E5jeCRfJCwItFGoK3WGzZym0/iUrrcb8FE0yybexZTdWLFnyX9ED2p\n",
       "FSnjWewbog5/qQdfRTxDf0OibtoXc1TJ/eEkL14aOenfidP1b/WCKa/HQ7bjk714hLLEgy8oTpmf\n",
       "/B7LHLaWdLC1dMCFuqWp97u6usuUzb5PvYSg1f2AIpuG+FwIYTb/CfgoIT3rllpv2qhtPomLe/JN\n",
       "uHjeRCXi0qi3zKnuE0xmsUAkRzBhNNHdfbIfcUTsm0g774vn0ITjrBT5WSHOWY772OQVj1LSqVti\n",
       "s9tg4l5iBrX4AAAZHklEQVSxPyZrmeLjypopq+2AC2LcV3K/3t6+MmVLRb3Wdoi5R6PE5Wjgd4H9\n",
       "hPjKN9V600Zt801c3KsXiUbax7M6saxIsjD3JPZtrPHS/F/D0Vv7ai+Yso6PygyVEY3FnjYhFY7T\n",
       "ghMv3LXEs81u8QJh6ci0JRn36PGuru5cO+5sE2Hx5NTSsvX//YrZT73NYu8lTCYYJeT3WlXrzRq9\n",
       "zUdxca/uDbTZnU+2zyjtlI/NYbenxCNpyop9GMMZnfxyL4xECuG+wcyVNIslZ9vf7vECXMH01u2F\n",
       "tDSDnh3WvNpLJ2nGprJ6rdo56G1tyyf9V1OXlf9DTE0jHPoPAXdmbHtrvXEjtvkqLtXQSLNYMvNx\n",
       "b+8G7+rq9o6OrCWE0+HEsbkvXqExq3zsvI/9EIVcZmHUEpu4kuazw6NzKxPbcdEW+17ilSi7JkUi\n",
       "nNvqWaOUsPVF4hMLWrFwT/cSkPcqkDJ7iUqpt7j0R1tfYn/yXK03bsQmcZkZ9e580m/ahTXu4+OC\n",
       "6cksPREx6S85xbPDgNd6YQJjnJolK8V9MlggjjRLi04sHKs8jJKSoctLPYxetnoYpRyZaFNylBK3\n",
       "rbStvb19TXGwC1EJmkQpcZlVFJu+suaJxGam9d7VdaybxevHD3p7+0o/44wzfOHCo70w1ySd9+sE\n",
       "L52tn842nBSj2DR2u2evDxObvEqjsQrluz3MkYlHTIf7GWecUSQM6fXs29qWeXf3ySXXTJohm22m\n",
       "FPObPMSlbG4xMzsJ+H3gBcK6rzcD/4WQ4v633P0fytUVc5v65YtaR8iXtYUXXojzcQFczCuvjPG3\n",
       "f3uAV175MGFtk1vIXnL4yxTWbIn5A8Lyxbcl7hPnE7upgnb9OOPc14D/B7we+ALJ1Smff36Myy+/\n",
       "kOuu+wAAO3aE5ZivvHKQiYmVTEwcy3e/+xTBwjxQcmUh5gTlVIfwX3o+8D7CKkvvICwmvhH4aq2q\n",
       "1ogNjVxypxZzTXF+rK1ebBZLjjLiiYyeMGGl/RZZYcaLPTvUOG3SOsELqfOTo5oss9hRHiZELkl9\n",
       "NpjaL9yvs3NtyTMKk0fTo6rCddIRXjKLiWZCnX0uDyb2Hy33WStvEpf8qdZcUzrJc4kHv8UGD+HD\n",
       "y7ywwFecyHLYSydBxmlS4sSScZqW2CyWXqwryxy2OlEudv6vd1jnwW+zOqq3zONklGaLo+PjMq5X\n",
       "HHiQlSK/MP8oLXxHO6zwjo7VJb4uOeBFs8hDXKZKue+J/Ren+EyIaUkvcwxgth339xAWFH0Pway1\n",
       "hWD22gacTKmJ60bCIBoKKfV/j5CO/wXgROBTBHPX04Sgx3Wp1iwk/An/LGHp4dXRPeIU+TsJZrYR\n",
       "4K+AIAULFgzy05/+OON6j0Rth/b29/HqV5/EAw8UlzjhhNX86EePJNLmxxwNPMvY2DXs20fRks7x\n",
       "JsRsZCpx6TGzg9F+d2IfoLuObRItTF/f6dxzz/bJTrKjYyeDg3umrXfo0PMl59yNxYs/yUsv9QGf\n",
       "J/hFrqCr63B27Hg/V1xxXcaV4jVdfpsgCCOENVp+Pzq/Lfq5hSASmwnrqcTsAMajz2+Mzm0miMMu\n",
       "wvoq7yK4GM8ruvOiRUfy4os/n7reJQShuoaurpf55Cc/DsBZZw0wNhZKdHTs5IMf3MP999/PlVdu\n",
       "TwjMJcBro59BQMfG4LLLPqg1UMTsp9yQhvAKWHardcjUiA2ZxXKlYNoKy++2tS2vaC139zi1SzKl\n",
       "S7hOR0dsgiqYsbq7T3Z3z0jIGOdLixNBnu3Z6fOP8+LVIRd78YTGeHb9Cal7r/RCXrLizzo6VvnQ\n",
       "0FAU9ZW8//DkPZPpXcqZtOLzwXSWnBRaaH/wS8nXIpoHCkWWuDSCuEMMfoPSyYCVEHw1pZ3yggUr\n",
       "SzrXzs7j3T0EAATHe3LyYvF8mOzJlMWdc7yOfHf3aQmhiee/DHpn59qoQ4+d9LGIbfUFC1Z6V1f3\n",
       "pIiWy2RcTZr8glAXBDQs9VzdM07/vuSrEdVSV3EBXiL4WrK2f6v1xo3YJC61U+qIL07gOJOMy2HN\n",
       "kUJH2t6+MhqdlDq/Y4aGhryrq9vb2pZ4IUNxem5MOnXLkIe5MsWJIoPAlTrV4044zKGJhWe4qK1J\n",
       "0cjzmcRr1PT29k3u1zK/RVFmIg80cpG41J1y68SUm1U+XTqTON1Lb2+fDw0NRZMJi81iaVNb8Wgh\n",
       "qz1xqpXC6COrbeEa8RorhZFCcVbmFUXfsVwnn8dorlwCz1rEQZMvRR5IXCQudafcCpdZvoSZdIrF\n",
       "ZqFuD2G+i/2II7qKxKXUfJT8Ga+TEq97P+hmXd7ZubZogbEwvyYZnhwWC4t9RsnllMMSynFqmOk7\n",
       "6bCiZEGssjIST/39s0dG1Zq1JC4iD2atuABdhPT93wbuApaWKXcmIc7zO8DOxPldhLCeB6LtzDL1\n",
       "83rW85ZKRWOmnVph0a/0PJYeh0WTAlN83SGH49xsUZR7rNChx6alYHortHVgYMCzUu/HJrP091u4\n",
       "cIl3dq71zs61vnDhkim/d8HUFy8FsN4XLlzS1CUOZBYTeZCHuLTlEnI2cy4F9rn7ScA90XERZrYA\n",
       "+BhBYE4GzjWz10UfO3Cdu/dG23CD2j0rGBkZYdOmrWzatJWRkZGarrV582buuGMPGzfuZePGvZNz\n",
       "MPJpSzJVywAh3ctSYBHXXPOniXIHCflSrwfOwf1ncf+TyXrj4x8GFrJixXLGxz8yeX5s7F3s2bOX\n",
       "EO5bzM/93Kls3rw5Nf9mNa+84rz44rG8+OKxAPT23lz2e+/efRPj4z2E+TZfAb7CK6/8yWQYcTOo\n",
       "9fclRG7Uqk7VbITRyKpofzXwSEaZNwHDieNLgUuj/auAwQruU6uAzzqa9eaadd8Qurt08q3ebPGk\n",
       "uWp4eDgRclt4e499KgsXHu3uHo08kj6ZVZ4VfhzWMkk7w+Pw4+IRUtLPUjyCSPpcgqM+ay36mKkC\n",
       "BKp5XjNZ3ljRYKKeMIvNYj9M7FvyOHH+V4GbE8fvAj7qBXF5DDhAyGBYzqyWz5OeRTTT5p7u9Iod\n",
       "5HFUVc9kR1rqC4lDgG/37u7TphCg9HVDmpZ0GvtCeHFhbg4sK+PTie9fPnIt6/tmRcDVUyRk9hKN\n",
       "oKXFheBTOZixbUmLCfBCRv2tU4jL0ZEoGTAE3FKmDX7VVVdNbvfee2+Oj781aRWHbiG0Ny0M3UVt\n",
       "CiLU58lFuBYuXD7Z8Wbn41rvwYlfSMkfjzLi0OXOzrW+Zs2rorT9W6ORU1fmpM+CEJb6ZtJr0Zer\n",
       "G0fA1bujb5Xfr5hb3HvvvUV9ZUuLy5Q3DWax1dH+mjJmsfUps9hlJJz6ifMnAgfL3CeP5z6raIU3\n",
       "20IbsjIX9zkMlkScZb3FZzv9j/LFi9f4mjWvKrl2d/e6jPkngx4W81rlZsvKZhTIupfZ0pYbFUhc\n",
       "RCOYzeLyx7FQRL6UazLKLAS+G4lHO/Ag8LroszWJctuBT5a5T06PenbRKJt8ufsUOsBhDylVkiHA\n",
       "xYt8TSV+xWHI8fyUQY/9Jun5Jdkz5/uKzFblBKO4zWHENJW/ZbrnUK/fQSu8PIi5z2wWly7gblKh\n",
       "yISshF9IlHsLIWXuo8BlifN/ATwU+Vw+HwcHZNwnv6ctigj+hoJwJH0NxW/XYT2Wzs7jJ81HM3nz\n",
       "Lp6sOHUOrmBeyzbDTWfqSnfabW3lRzlT1YsDGeopAHLoi3oza8WlUZvEpX5kdeRxpz3V23W1Zp2s\n",
       "ep2da4t8HdlpWXpK6pVz0hcvZjZYkSiUm2Qq05WYzeQhLlOl3BeiLI8//mTZc/Fci0La+MJci8HB\n",
       "87nvvuJ09NOl7B8ZGeHQoedoa0umq9/Giy+eB6xjbGxnyX1Div/X8sgj32Js7KLE1S7hhBNK570A\n",
       "7N//T0xM7KaQ/n4du3ffpHkiQlRDrerUyhvzaOTSaFNJVphxPXwUxaORQW9rW55IVz/9yKAQLhzm\n",
       "vLS3l3fSVzOqaoZZTIh6g8xiEhf35jh5Z9JpV3KtZPvjdC7lsgRX67eZTnirfY6NdOgL0QgkLhIX\n",
       "d29eeGpeHehUmZcLUWGFKK7u7pPrJqYSBSHyERf5XETV1HeN92OAASYmwOxC3PcA1wLwxBPv48or\n",
       "L2L//r1AsU+nGkZGRoqWFb7rrs/V2ngh5j0SlzlANU7yViLd/rCm/CcmP1+8eBkvvvhHxI728XHY\n",
       "v39vVSKQFhKI17v/EAD33TegZI9C5IDEZQ4wVXTWbCAd5fXww68wPv4ssIeOjp28+tU9PPBA7fcZ\n",
       "GRkpEZKenp5EVmQYG0MRYkLkgMRljlBfE1X9Sba/eHQRRmBBFELZakdmxen1g5A8/vgHam+8EKIE\n",
       "iYtoObKEsl4jsxNOWM3Y2M5Za1IUolWxEBgwNzEzn8vfT8yMtFmso2Mnd9wRhCTph5nNI0Ah8sDM\n",
       "cHer6RpzufOVuIg0aYe+hESIUiQu0yBxERITIWaOxGUaJC7zm3JmMAmMEFOTh7i05dUYIVqN4uiw\n",
       "IDLxKAaC+GzatJVNm7YyMjLStHYKMRdRtJiYl2TNedGoRoj8kLiIOctUmQuy5rxo8qQQ+SFxEXOW\n",
       "2Z65QIjZjBz6Yl4iZ78Q5VG02DRIXMRUKExZiGwkLtMgcRFCiJmjUGQhhBAticRFCCFE7khchBBC\n",
       "5E5TxMXMusxsn5l928zuMrOlZcrdambPmdnBauoLIYRoDs0auVwK7HP3k4B7ouMsbgPOrKG+EEKI\n",
       "JtAscdkCxCsy7QHenlXI3b8E/LDa+kK0GspnJuYLzZqhv8rdn4v2nwNWNbi+EA1H+czEfKJu4mJm\n",
       "+4DVGR9dnjxwdzezqiejTFd/165dk/v9/f309/dXeyshakL5zESrMjo6yujoaK7XrJu4uPvGcp9F\n",
       "TvrV7v6sma0Bvj/Dy1dcPykuQgghSkm/eP/hH/5hzddsls9lL/HrW/j5+QbXF6LhDA6eT0fHToKb\n",
       "cE+Upfn8ZjdLiLrQlPQvZtYFfBY4HngMeIe7/8jMjgFudvdficp9CugDlhNGJ1e6+23l6mfcR+lf\n",
       "REuhfGZiNqDcYtMgcRFCiJmj3GJCCCFaEomLEEKI3JG4CCGEyB2JixBCiNyRuAghhMgdiYsQQojc\n",
       "kbgIIYTIHYmLEEKI3JG4CCGEyB2JixBCiNyRuAghhMgdiYsQQojckbgIIYTIHYmLEEKI3JG4CCGE\n",
       "yB2JixBCiNyRuAghhMgdiYsQQojckbgIIYTIHYmLEEKI3JG4CCGEyB2JixBCiNyRuAghhMidpoiL\n",
       "mXWZ2T4z+7aZ3WVmS8uUu9XMnjOzg6nzu8zsSTN7INrObEzLhRBCVEKzRi6XAvvc/STgnug4i9uA\n",
       "LOFw4Dp374224Tq1UwghRBU0S1y2AHui/T3A27MKufuXgB+WuYbVoV1CCCFyoFnissrdn4v2nwNW\n",
       "VXGNC83sgJndUs6sJoQQojksrNeFzWwfsDrjo8uTB+7uZuYzvPyfAX8U7X8A2A28N6vgrl27Jvf7\n",
       "+/vp7++f4a2EyJeRkRF2774JgMHB89m8eXOTWyTmO6Ojo4yOjuZ6TXOfab+ew03NHgH63f1ZM1sD\n",
       "3OvuPWXKngjc6e7rZvq5mXkzvp8Q5RgZGeGsswYYG/sQAB0dO7njjj0SGNFSmBnuXpProVlmsb3A\n",
       "QLQ/AHx+JpUjQYo5CzhYrqwQrcTu3TdFwjIABJGJRzFCzCWaJS7XABvN7NvAL0XHmNkxZvaFuJCZ\n",
       "fQr4O+AkM3vCzN4dffQhM3vIzA4AfcD2xjZfCCHEVDTFLNYoZBYTrYbMYmI2kIdZTOIiRIORQ1+0\n",
       "OhKXaZC4CCHEzJnNDn0hhBBzGImLEEKI3JG4CCGEyB2JixBCiNyRuAghhMgdiYsQQojckbgIIYTI\n",
       "HYmLEEKI3JG4CCGEyB2JixBCiNyRuAghhMgdiYsQQojckbgIIYTIHYmLEEKI3JG4CCGEyB2JixBC\n",
       "iNyRuAghhMgdiYsQQojckbgIIYTIHYmLEEKI3GmKuJhZl5ntM7Nvm9ldZrY0o8xaM7vXzB42s6+b\n",
       "2baZ1BdCCNE8mjVyuRTY5+4nAfdEx2leBra7++uB9cDvmlnPDOq3JKOjo81uQgmt2CZozXapTZWh\n",
       "NlVOq7arVpolLluAPdH+HuDt6QLu/qy7PxjtvwR8Ezi20vqtSiv+IbVim6A126U2VYbaVDmt2q5a\n",
       "aZa4rHL356L954BVUxU2sxOBXuCr1dQXQgjRWBbW68Jmtg9YnfHR5ckDd3cz8ymusxj4K+CiaART\n",
       "xHT1hRBCNB5zb3y/bGaPAP3u/qyZrQHudfeejHKHAX8N/I27f6SK+hIdIYSoAne3WurXbeQyDXuB\n",
       "AeBD0c/PpwuYmQG3AN9ICkul9aH2hyOEEKI6mjVy6QI+CxwPPAa8w91/ZGbHADe7+6+Y2ZuBvwUe\n",
       "AuJGXubuw+XqN/hrCCGEKENTxEUIIcTcZtbP0G/FCZmVXtPMbjWz58zsYOr8LjN70sweiLYzW6BN\n",
       "zXxOZ5rZI2b2HTPbmTif23Mqd49UmRuizw+YWe9M6japXY+Z2UPRs/lao9pkZj1m9hUz+4mZDc70\n",
       "+zShTc16Tr8e/c4eMrMvm9nPVlq3SW2a2XNy91m9AX8MvD/a3wlck1FmNXBatL8Y+BbQU2n9erQp\n",
       "+uw/E0KsD6bOXwXsaPRzmqZNTXlOwALgUeBE4DDgQeB1eT6nqe6RKPNW4IvR/i8Af19p3Wa0Kzr+\n",
       "HtCV899RJW1aCbwBGAIGZ1K30W1q8nN6E7Ak2j+z3n9TtbSpmuc060cutOaEzIqu6e5fAn5Y5hp5\n",
       "ByPU2qZmPac3Ao+6+2Pu/jLwaeBtic/zeE7T3aOore7+VWCpma2usG6j25Wc95X339G0bXL3H7j7\n",
       "/YQsGzOq24Q2xTTjOX3F3f81OvwqcFyldZvQppiKn9NcEJdWnJCZxzUvjIant+RhgsqhTc16TscC\n",
       "TySOn6TwYgD5PKfp7jFVmWMqqFsttbQLQiDM3WZ2v5md18A21aNuPa/bCs/pvcAXq6zbiDbBDJ9T\n",
       "s0KRZ4S14ITMvNpUhj8D/ija/wCwm/CLbmabqqqfQ5umuk9Vz2mG90jS6ND2Wtv1Znd/2sxWAvvM\n",
       "7JFoZNqINuVdt57X3eDuzzTrOZnZLwLvATbMtO4MqaVNMMPnNCvExd03lvvMgvN5tRcmVH6/TLnD\n",
       "gM8Bn3D35LyYiurXo01TXHuyvJn9OXBns9tE857TU8DaxPFawhtX1c9pJveYosxxUZnDKqhbLdW2\n",
       "6ykAd386+vkDM7uDYBaptdOspE31qFu367r7M9HPhj+nyGF+M3Cmu/9wJnUb3KYZP6e5YBaLJ1RC\n",
       "bRMyy9avR5umIupoY84CDpYr26g25VC/2mveD7zGzE40s3bgnKhens+p7D1Sbf0f0X3XAz+KTHqV\n",
       "1K2WqttlZovMrDM6fySwiXz+jmbyfdMjqno9q6rb1MznZGbHA/8XeJe7P1rl92lIm6p6TrVGIDR7\n",
       "A7qAu4FvA3cBS6PzxwBfiPbfDEwQoiMeiLYzp6pf7zZFx58Cngb+g2ALfXd0/i8Ik0cPEDrcVS3Q\n",
       "pmY+p7cQIvweJUykjc/n9pyy7gFcAFyQKPOx6PMDwOnTtS+nv++q2gW8Kvp7fxD4ep7tmq5NBDPo\n",
       "E8C/EoJD/gVYXM9nVW2bmvyc/hx4nkKf9LV6/01V26ZqnpMmUQohhMiduWAWE0II0WJIXIQQQuSO\n",
       "xEUIIUTuSFyEEELkjsRFCCFE7khchBBC5I7ERYgpMLMJM/t44nihmf3AzO6Mjv9bHinRzazNQur8\n",
       "g1Fa869FefCmqvPfzewbZnaPmfWZ2ZtqbYcQeTEr0r8I0UT+HXi9mR3h7j8BNhJSZjiAu99JhWln\n",
       "okwRePbksnOANe6+Lip7DPDjaS75XuC33P3vzGwX8CLwlUraIkS90chFiOn5IvAr0f65hCwGBmBm\n",
       "v2lmH432V5nZHWb2YLStj1JtfMvM9hDSZaw1sw8nRijviK67GngmvqG7P+3R0t1mdm5U9qCZXROd\n",
       "u5KQVPBWM/ssYZb1dgsLOb253g9EiOmQuAgxPZ8B3mlmhwPrKCzXkOYG4F53Pw04HfhGdP7VwJ+6\n",
       "+ynAzwOnAj8LnAF82ML6K58F/lskDtea2WkwOYK5BvhF4DTg583sbe7+R4RcUb/m7u8AbgSuc/de\n",
       "d78v7wcgxEyRuAgxDe5+kLB637nAF6Yo+ouEZQBw9wl3/7fo/OPuHi8LuwH4pAe+D+wH3ujuTwGv\n",
       "BS4j5MG7x8x+iSBGo+7+vLv/FPhL4L+UuX+jlwQQoizyuQhRGXuBa4E+wpK55cjq4P99mjKx/2Yc\n",
       "GAaGzew5wsqcd2fUVUJA0fJo5CJEZdwK7HL3h6cocw/wOwBmtsDMjsoo8yXgnCg6bCVhFPI1M+uN\n",
       "TGCYWRvBdPYY8DWgz8yWm9kC4J2E0U6aF4HO6r6aEPkjcRFiauJRxVPu/rHEOc/Yvwj4RTN7iOAP\n",
       "eV3yGtF17qCwTMA9wPsi89jRwF4zOxh9Ng58zN2fBS4F7iWkO78/ilBLcydwVuSz2ZDxuRANRSn3\n",
       "hRBC5I5GLkIIIXJH4iKEECJ3JC5CCCFyR+IihBAidyQuQgghckfiIoQQInckLkIIIXJH4iKEECJ3\n",
       "/j9j16Co5MuwagAAAABJRU5ErkJggg==\n"
      ],
      "text/plain": [
       "<matplotlib.figure.Figure at 0x4d0eb90>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# MS와 IBM의 산포도\n",
    "fig = plt.figure()\n",
    "ax = fig.add_subplot(1, 1, 1)\n",
    "ax.scatter(returns.MSFT, returns.IBM)\n",
    "ax.set_xlabel('MicroSoft')\n",
    "ax.set_ylabel('IBM')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.49608477623932762"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# MicroSoft와 IBM의 주식 퍼센트 변화율의 상관관계 구하기\n",
    "returns.MSFT.corr(returns.IBM)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.00021609902459138997"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 공분산\n",
    "returns.MSFT.cov(returns.IBM)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>AAPL</th>\n",
       "      <th>IBM</th>\n",
       "      <th>MSFT</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AAPL</th>\n",
       "      <td> 1.000000</td>\n",
       "      <td> 0.409605</td>\n",
       "      <td> 0.423189</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>IBM</th>\n",
       "      <td> 0.409605</td>\n",
       "      <td> 1.000000</td>\n",
       "      <td> 0.496085</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MSFT</th>\n",
       "      <td> 0.423189</td>\n",
       "      <td> 0.496085</td>\n",
       "      <td> 1.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          AAPL       IBM      MSFT\n",
       "AAPL  1.000000  0.409605  0.423189\n",
       "IBM   0.409605  1.000000  0.496085\n",
       "MSFT  0.423189  0.496085  1.000000"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "returns.corr()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>AAPL</th>\n",
       "      <th>IBM</th>\n",
       "      <th>MSFT</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AAPL</th>\n",
       "      <td> 0.001032</td>\n",
       "      <td> 0.000252</td>\n",
       "      <td> 0.000309</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>IBM</th>\n",
       "      <td> 0.000252</td>\n",
       "      <td> 0.000367</td>\n",
       "      <td> 0.000216</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MSFT</th>\n",
       "      <td> 0.000309</td>\n",
       "      <td> 0.000216</td>\n",
       "      <td> 0.000517</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          AAPL       IBM      MSFT\n",
       "AAPL  0.001032  0.000252  0.000309\n",
       "IBM   0.000252  0.000367  0.000216\n",
       "MSFT  0.000309  0.000216  0.000517"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "returns.cov()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AAPL    0.409605\n",
       "IBM     1.000000\n",
       "MSFT    0.496085\n",
       "dtype: float64"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "returns.corrwith(returns.IBM)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>AAPL</th>\n",
       "      <th>IBM</th>\n",
       "      <th>MSFT</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2000-01-03</th>\n",
       "      <td> 133949200</td>\n",
       "      <td> 10347700</td>\n",
       "      <td> 53228400</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-04</th>\n",
       "      <td> 128094400</td>\n",
       "      <td>  8227800</td>\n",
       "      <td> 54119000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-05</th>\n",
       "      <td> 194580400</td>\n",
       "      <td> 12733200</td>\n",
       "      <td> 64059600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-06</th>\n",
       "      <td> 191993200</td>\n",
       "      <td>  7971900</td>\n",
       "      <td> 54976600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-07</th>\n",
       "      <td> 115183600</td>\n",
       "      <td> 11856700</td>\n",
       "      <td> 62013600</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 AAPL       IBM      MSFT\n",
       "Date                                     \n",
       "2000-01-03  133949200  10347700  53228400\n",
       "2000-01-04  128094400   8227800  54119000\n",
       "2000-01-05  194580400  12733200  64059600\n",
       "2000-01-06  191993200   7971900  54976600\n",
       "2000-01-07  115183600  11856700  62013600"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "volume.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AAPL   -0.057955\n",
       "IBM    -0.007892\n",
       "MSFT   -0.014295\n",
       "dtype: float64"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 시가총액과 주식 퍼센트 변화율간의 상관관계\n",
    "# DataFrame을 넘기면 맞아떨어지는 이름의 칼럼의 이름에 대한 상관관계를 계산\n",
    "returns.corrwith(volume)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 유일 값, 카운트, 멤버십"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    c\n",
       "1    a\n",
       "2    d\n",
       "3    a\n",
       "4    a\n",
       "5    b\n",
       "6    b\n",
       "7    c\n",
       "8    c\n",
       "dtype: object"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 유일 값\n",
    "obj = Series(['c', 'a', 'd', 'a', 'a', 'b', 'b', 'c', 'c'])\n",
    "obj"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['c', 'a', 'd', 'b'], dtype=object)"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "uniques = obj.unique()\n",
    "uniques"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "c    3\n",
       "a    3\n",
       "b    2\n",
       "d    1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 카운트\n",
    "obj.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "c    3\n",
       "a    3\n",
       "b    2\n",
       "d    1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.value_counts(obj.values, sort=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    c\n",
       "1    a\n",
       "2    d\n",
       "3    a\n",
       "4    a\n",
       "5    b\n",
       "6    b\n",
       "7    c\n",
       "8    c\n",
       "dtype: object"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 멤버십\n",
    "obj"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     True\n",
       "1    False\n",
       "2    False\n",
       "3    False\n",
       "4    False\n",
       "5     True\n",
       "6     True\n",
       "7     True\n",
       "8     True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mask = obj.isin(['b', 'c'])\n",
    "mask"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 누락값 처리"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 누락값 찾기 : isnull\n",
    "* 누락값 제거 : dropna\n",
    "* 누락값 채우기 : fillna"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 누락값 찾기 : isnull"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* pandas는 누락된 데이터를 실수든 아니든 모두 NaN으로 취급"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     aardvark\n",
       "1    artichoke\n",
       "2          NaN\n",
       "3      avocado\n",
       "dtype: object"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_data = Series(['aardvark', 'artichoke', np.nan, 'avocado'])\n",
    "string_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1    False\n",
       "2     True\n",
       "3    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_data.isnull()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0         None\n",
       "1    artichoke\n",
       "2          NaN\n",
       "3      avocado\n",
       "dtype: object"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_data[0] = None\n",
    "string_data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     True\n",
       "1    False\n",
       "2     True\n",
       "3    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "string_data.isnull()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 누락값 제거 : dropna"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    NaN\n",
       "2    3.5\n",
       "3    NaN\n",
       "4    7.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from numpy import nan as NA\n",
    "\n",
    "data = Series([1, NA, 3.5, NA, 7])\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "2    3.5\n",
       "4    7.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.dropna()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>  1</td>\n",
       "      <td> 6.5</td>\n",
       "      <td>  3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>  1</td>\n",
       "      <td> NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td> NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td> 9.0</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0    1   2\n",
       "0   1  6.5   3\n",
       "1   1  NaN NaN\n",
       "2 NaN  NaN NaN\n",
       "3 NaN  9.0   2"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = DataFrame([[1., 6.5, 3.], [1., NA, NA],\n",
    "                  [NA, NA, NA], [NA, 9, 2]])\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 1</td>\n",
       "      <td> 6.5</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   0    1  2\n",
       "0  1  6.5  3"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cleaned = data.dropna()\n",
    "cleaned"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>  1</td>\n",
       "      <td> 6.5</td>\n",
       "      <td>  3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>  1</td>\n",
       "      <td> NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td> 9.0</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0    1   2\n",
       "0   1  6.5   3\n",
       "1   1  NaN NaN\n",
       "3 NaN  9.0   2"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 모든 값이 NA인 로우만 제외\n",
    "data.dropna(how='all')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>  1</td>\n",
       "      <td> 6.5</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td> 9.0</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0    1  2\n",
       "0   1  6.5  3\n",
       "3 NaN  9.0  2"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 몇개 이상 값이 들어있는가를 기준으로\n",
    "data.dropna(thresh=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 누락값 채우기 : fillna"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.864298</td>\n",
       "      <td>      NaN</td>\n",
       "      <td>      NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.056151</td>\n",
       "      <td>      NaN</td>\n",
       "      <td>      NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 0.226936</td>\n",
       "      <td>      NaN</td>\n",
       "      <td>      NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 1.825479</td>\n",
       "      <td>      NaN</td>\n",
       "      <td> 0.528780</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.040804</td>\n",
       "      <td>      NaN</td>\n",
       "      <td>-0.996493</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> 1.586317</td>\n",
       "      <td>-0.004980</td>\n",
       "      <td>-0.039019</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>-0.311639</td>\n",
       "      <td>-2.297522</td>\n",
       "      <td>-0.975941</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2\n",
       "0 -0.864298       NaN       NaN\n",
       "1 -0.056151       NaN       NaN\n",
       "2  0.226936       NaN       NaN\n",
       "3  1.825479       NaN  0.528780\n",
       "4 -0.040804       NaN -0.996493\n",
       "5  1.586317 -0.004980 -0.039019\n",
       "6 -0.311639 -2.297522 -0.975941"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = DataFrame(np.random.randn(7, 3))\n",
    "df.ix[:4, 1] = NA; df.ix[:2, 2] = NA\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.864298</td>\n",
       "      <td> 0.000000</td>\n",
       "      <td> 0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.056151</td>\n",
       "      <td> 0.000000</td>\n",
       "      <td> 0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 0.226936</td>\n",
       "      <td> 0.000000</td>\n",
       "      <td> 0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 1.825479</td>\n",
       "      <td> 0.000000</td>\n",
       "      <td> 0.528780</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.040804</td>\n",
       "      <td> 0.000000</td>\n",
       "      <td>-0.996493</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> 1.586317</td>\n",
       "      <td>-0.004980</td>\n",
       "      <td>-0.039019</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>-0.311639</td>\n",
       "      <td>-2.297522</td>\n",
       "      <td>-0.975941</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2\n",
       "0 -0.864298  0.000000  0.000000\n",
       "1 -0.056151  0.000000  0.000000\n",
       "2  0.226936  0.000000  0.000000\n",
       "3  1.825479  0.000000  0.528780\n",
       "4 -0.040804  0.000000 -0.996493\n",
       "5  1.586317 -0.004980 -0.039019\n",
       "6 -0.311639 -2.297522 -0.975941"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fillna(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.864298</td>\n",
       "      <td> 0.500000</td>\n",
       "      <td>      NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.056151</td>\n",
       "      <td> 0.500000</td>\n",
       "      <td>      NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 0.226936</td>\n",
       "      <td> 0.500000</td>\n",
       "      <td>      NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 1.825479</td>\n",
       "      <td> 0.500000</td>\n",
       "      <td> 0.528780</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.040804</td>\n",
       "      <td> 0.500000</td>\n",
       "      <td>-0.996493</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> 1.586317</td>\n",
       "      <td>-0.004980</td>\n",
       "      <td>-0.039019</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>-0.311639</td>\n",
       "      <td>-2.297522</td>\n",
       "      <td>-0.975941</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2\n",
       "0 -0.864298  0.500000       NaN\n",
       "1 -0.056151  0.500000       NaN\n",
       "2  0.226936  0.500000       NaN\n",
       "3  1.825479  0.500000  0.528780\n",
       "4 -0.040804  0.500000 -0.996493\n",
       "5  1.586317 -0.004980 -0.039019\n",
       "6 -0.311639 -2.297522 -0.975941"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# key는 컬럼 색인을 뜻함\n",
    "df.fillna({1: 0.5, 3: -1})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 1.070430</td>\n",
       "      <td> 0.929998</td>\n",
       "      <td> 1.464021</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 0.834314</td>\n",
       "      <td> 0.040356</td>\n",
       "      <td>-0.028218</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-1.251520</td>\n",
       "      <td>      NaN</td>\n",
       "      <td> 0.568477</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.546254</td>\n",
       "      <td>      NaN</td>\n",
       "      <td>-0.464190</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.729633</td>\n",
       "      <td>      NaN</td>\n",
       "      <td>      NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>-1.925968</td>\n",
       "      <td>      NaN</td>\n",
       "      <td>      NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2\n",
       "0  1.070430  0.929998  1.464021\n",
       "1  0.834314  0.040356 -0.028218\n",
       "2 -1.251520       NaN  0.568477\n",
       "3 -0.546254       NaN -0.464190\n",
       "4 -0.729633       NaN       NaN\n",
       "5 -1.925968       NaN       NaN"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 보간법 가능\n",
    "df = DataFrame(np.random.randn(6, 3))\n",
    "df.ix[2:, 1] = NA; df.ix[4:, 2] = NA\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 1.070430</td>\n",
       "      <td> 0.929998</td>\n",
       "      <td> 1.464021</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 0.834314</td>\n",
       "      <td> 0.040356</td>\n",
       "      <td>-0.028218</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-1.251520</td>\n",
       "      <td> 0.040356</td>\n",
       "      <td> 0.568477</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.546254</td>\n",
       "      <td> 0.040356</td>\n",
       "      <td>-0.464190</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.729633</td>\n",
       "      <td> 0.040356</td>\n",
       "      <td>-0.464190</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>-1.925968</td>\n",
       "      <td> 0.040356</td>\n",
       "      <td>-0.464190</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2\n",
       "0  1.070430  0.929998  1.464021\n",
       "1  0.834314  0.040356 -0.028218\n",
       "2 -1.251520  0.040356  0.568477\n",
       "3 -0.546254  0.040356 -0.464190\n",
       "4 -0.729633  0.040356 -0.464190\n",
       "5 -1.925968  0.040356 -0.464190"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fillna(method='ffill')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 1.070430</td>\n",
       "      <td> 0.929998</td>\n",
       "      <td> 1.464021</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 0.834314</td>\n",
       "      <td> 0.040356</td>\n",
       "      <td>-0.028218</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-1.251520</td>\n",
       "      <td> 0.485177</td>\n",
       "      <td> 0.568477</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.546254</td>\n",
       "      <td> 0.485177</td>\n",
       "      <td>-0.464190</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.729633</td>\n",
       "      <td> 0.485177</td>\n",
       "      <td> 0.385023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>-1.925968</td>\n",
       "      <td> 0.485177</td>\n",
       "      <td> 0.385023</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2\n",
       "0  1.070430  0.929998  1.464021\n",
       "1  0.834314  0.040356 -0.028218\n",
       "2 -1.251520  0.485177  0.568477\n",
       "3 -0.546254  0.485177 -0.464190\n",
       "4 -0.729633  0.485177  0.385023\n",
       "5 -1.925968  0.485177  0.385023"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 함수도 넘길 수 있다.\n",
    "df.fillna(df.mean(0))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 계층적 색인"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a  1    0.299331\n",
       "   2   -2.320418\n",
       "   3   -1.333153\n",
       "b  1    0.757166\n",
       "   2   -1.473082\n",
       "   3   -0.300510\n",
       "c  1    0.986354\n",
       "   2    1.393783\n",
       "d  2   -1.309509\n",
       "   3   -0.417365\n",
       "dtype: float64"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = Series(np.random.randn(10),\n",
    "              index = [['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd', 'd'],\n",
    "                       [1, 2, 3, 1, 2, 3, 1, 2, 2, 3]])\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "MultiIndex(levels=[[u'a', u'b', u'c', u'd'], [1, 2, 3]],\n",
       "           labels=[[0, 0, 0, 1, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 1, 2]])"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1    0.757166\n",
       "2   -1.473082\n",
       "3   -0.300510\n",
       "dtype: float64"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['b']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "b  1    0.757166\n",
       "   2   -1.473082\n",
       "   3   -0.300510\n",
       "c  1    0.986354\n",
       "   2    1.393783\n",
       "dtype: float64"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data['b':'c']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td> 0.299331</td>\n",
       "      <td>-2.320418</td>\n",
       "      <td>-1.333153</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td> 0.757166</td>\n",
       "      <td>-1.473082</td>\n",
       "      <td>-0.300510</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td> 0.986354</td>\n",
       "      <td> 1.393783</td>\n",
       "      <td>      NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>      NaN</td>\n",
       "      <td>-1.309509</td>\n",
       "      <td>-0.417365</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          1         2         3\n",
       "a  0.299331 -2.320418 -1.333153\n",
       "b  0.757166 -1.473082 -0.300510\n",
       "c  0.986354  1.393783       NaN\n",
       "d       NaN -1.309509 -0.417365"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 계층 색인 풀기 - wide 형\n",
    "data.unstack()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a  1    0.299331\n",
       "   2   -2.320418\n",
       "   3   -1.333153\n",
       "b  1    0.757166\n",
       "   2   -1.473082\n",
       "   3   -0.300510\n",
       "c  1    0.986354\n",
       "   2    1.393783\n",
       "d  2   -1.309509\n",
       "   3   -0.417365\n",
       "dtype: float64"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 데이터프레임 로우,칼럼 색인을 계층 색인으로 묶어내기 - long 형\n",
    "data.unstack().stack()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">Ohio</th>\n",
       "      <th>Colorado</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Green</th>\n",
       "      <th>Red</th>\n",
       "      <th>Green</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>1</th>\n",
       "      <td> 0</td>\n",
       "      <td>  1</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 3</td>\n",
       "      <td>  4</td>\n",
       "      <td>  5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>1</th>\n",
       "      <td> 6</td>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 9</td>\n",
       "      <td> 10</td>\n",
       "      <td> 11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Ohio     Colorado\n",
       "    Green Red    Green\n",
       "a 1     0   1        2\n",
       "  2     3   4        5\n",
       "b 1     6   7        8\n",
       "  2     9  10       11"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame = DataFrame(np.arange(12).reshape((4, 3)),\n",
    "                  index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],\n",
    "                  columns=[['Ohio', 'Ohio', 'Colorado'],\n",
    "                           ['Green', 'Red', 'Green']])\n",
    "frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">Ohio</th>\n",
       "      <th>Colorado</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>Green</th>\n",
       "      <th>Red</th>\n",
       "      <th>Green</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>1</th>\n",
       "      <td> 0</td>\n",
       "      <td>  1</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 3</td>\n",
       "      <td>  4</td>\n",
       "      <td>  5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>1</th>\n",
       "      <td> 6</td>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 9</td>\n",
       "      <td> 10</td>\n",
       "      <td> 11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Ohio     Colorado\n",
       "          Green Red    Green\n",
       "key1 key2                   \n",
       "a    1        0   1        2\n",
       "     2        3   4        5\n",
       "b    1        6   7        8\n",
       "     2        9  10       11"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 계층 색인에 이름 주기\n",
    "frame.index.names = ['key1', 'key2']\n",
    "frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>state</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Ohio</th>\n",
       "      <th>Colorado</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>color</th>\n",
       "      <th>Green</th>\n",
       "      <th>Red</th>\n",
       "      <th>Green</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>1</th>\n",
       "      <td> 0</td>\n",
       "      <td>  1</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 3</td>\n",
       "      <td>  4</td>\n",
       "      <td>  5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>1</th>\n",
       "      <td> 6</td>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 9</td>\n",
       "      <td> 10</td>\n",
       "      <td> 11</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "state      Ohio     Colorado\n",
       "color     Green Red    Green\n",
       "key1 key2                   \n",
       "a    1        0   1        2\n",
       "     2        3   4        5\n",
       "b    1        6   7        8\n",
       "     2        9  10       11"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.columns.names = ['state', 'color']\n",
    "frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 104,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>color</th>\n",
       "      <th>Green</th>\n",
       "      <th>Red</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>1</th>\n",
       "      <td> 0</td>\n",
       "      <td>  1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 3</td>\n",
       "      <td>  4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>1</th>\n",
       "      <td> 6</td>\n",
       "      <td>  7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 9</td>\n",
       "      <td> 10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "color      Green  Red\n",
       "key1 key2            \n",
       "a    1         0    1\n",
       "     2         3    4\n",
       "b    1         6    7\n",
       "     2         9   10"
      ]
     },
     "execution_count": 104,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame['Ohio']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>color</th>\n",
       "      <th>Green</th>\n",
       "      <th>Red</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 0</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 3</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "color  Green  Red\n",
       "key2             \n",
       "1          0    1\n",
       "2          3    4"
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame['Ohio'].ix['a']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th>state</th>\n",
       "      <th colspan=\"2\" halign=\"left\">Ohio</th>\n",
       "      <th>Colorado</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>color</th>\n",
       "      <th>Green</th>\n",
       "      <th>Red</th>\n",
       "      <th>Green</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>  6</td>\n",
       "      <td>  8</td>\n",
       "      <td> 10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 12</td>\n",
       "      <td> 14</td>\n",
       "      <td> 16</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "state  Ohio     Colorado\n",
       "color Green Red    Green\n",
       "key2                    \n",
       "1         6   8       10\n",
       "2        12  14       16"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.sum(level='key2')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 107,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>color</th>\n",
       "      <th>Green</th>\n",
       "      <th>Red</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">a</th>\n",
       "      <th>1</th>\n",
       "      <td>  2</td>\n",
       "      <td>  1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>  8</td>\n",
       "      <td>  4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">b</th>\n",
       "      <th>1</th>\n",
       "      <td> 14</td>\n",
       "      <td>  7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 20</td>\n",
       "      <td> 10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "color      Green  Red\n",
       "key1 key2            \n",
       "a    1         2    1\n",
       "     2         8    4\n",
       "b    1        14    7\n",
       "     2        20   10"
      ]
     },
     "execution_count": 107,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame.sum(level='color', axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 로우 색인과 컬럼 색인 교환"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 108,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> 7</td>\n",
       "      <td> one</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> 6</td>\n",
       "      <td> one</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2</td>\n",
       "      <td> 5</td>\n",
       "      <td> one</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 3</td>\n",
       "      <td> 4</td>\n",
       "      <td> two</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 4</td>\n",
       "      <td> 3</td>\n",
       "      <td> two</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> 5</td>\n",
       "      <td> 2</td>\n",
       "      <td> two</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td> 6</td>\n",
       "      <td> 1</td>\n",
       "      <td> two</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b    c  d\n",
       "0  0  7  one  0\n",
       "1  1  6  one  1\n",
       "2  2  5  one  2\n",
       "3  3  4  two  0\n",
       "4  4  3  two  1\n",
       "5  5  2  two  2\n",
       "6  6  1  two  3"
      ]
     },
     "execution_count": 108,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "frame = DataFrame({'a': range(7),\n",
    "                   'b': range(7, 0, -1),\n",
    "                   'c': ['one', 'one', 'one', 'two', 'two', 'two', 'two'],\n",
    "                   'd': [0, 1, 2, 0, 1, 2, 3]})\n",
    "frame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 110,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">one</th>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> 7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> 6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2</td>\n",
       "      <td> 5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">two</th>\n",
       "      <th>0</th>\n",
       "      <td> 3</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 4</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 5</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 6</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       a  b\n",
       "c   d      \n",
       "one 0  0  7\n",
       "    1  1  6\n",
       "    2  2  5\n",
       "two 0  3  4\n",
       "    1  4  3\n",
       "    2  5  2\n",
       "    3  6  1"
      ]
     },
     "execution_count": 110,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#컬럼 색인을 로우 색인으로 옮길 수 있다.\n",
    "frame2 = frame.set_index(['c', 'd'])\n",
    "frame2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 113,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> one</td>\n",
       "      <td> 0</td>\n",
       "      <td> 0</td>\n",
       "      <td> 7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> one</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> one</td>\n",
       "      <td> 2</td>\n",
       "      <td> 2</td>\n",
       "      <td> 5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> two</td>\n",
       "      <td> 0</td>\n",
       "      <td> 3</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> two</td>\n",
       "      <td> 1</td>\n",
       "      <td> 4</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> two</td>\n",
       "      <td> 2</td>\n",
       "      <td> 5</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td> two</td>\n",
       "      <td> 3</td>\n",
       "      <td> 6</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     c  d  a  b\n",
       "0  one  0  0  7\n",
       "1  one  1  1  6\n",
       "2  one  2  2  5\n",
       "3  two  0  3  4\n",
       "4  two  1  4  3\n",
       "5  two  2  5  2\n",
       "6  two  3  6  1"
      ]
     },
     "execution_count": 113,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#로우 색인 -> 컬럼 색인으로\n",
    "frame2.reset_index()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 데이터 입출력"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 파일 읽기\n",
    "* 파일 쓰기\n",
    "* 웹 내용 긁어오기\n",
    "* 데이터베이스 사용"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 파일 읽기"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "함수 | 설명 |\n",
    "---------- | ----------|\n",
    "read_csv | 파일, URL 또는 파일과 유사한 객체로부터 구분된 데이터 읽어옴. 구분자는 쉼표(,)를 기본으로.\n",
    "read_table | read_csv와 유사. 구분자는 탭('\\t')을 기본으로.\n",
    "read_fwf | 고정폭 컬럼 형식에서 읽어옴(구분자 없는 경우)\n",
    "read_clipboard | 클립보드에 있는 데이터를 읽어오는 read_table 함수. 웹페이지에서 표를 긁어올 때 유용"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 152,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from pandas import DataFrame, Series\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 157,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a,b,c,d,message\r\n",
      "1,2,3,4,hello\r\n",
      "5,6,7,8,world\r\n",
      "9,10,11,12,foo\r\n"
     ]
    }
   ],
   "source": [
    "!cat ex1.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 159,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>message</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 1</td>\n",
       "      <td>  2</td>\n",
       "      <td>  3</td>\n",
       "      <td>  4</td>\n",
       "      <td> hello</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 5</td>\n",
       "      <td>  6</td>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "      <td> world</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 9</td>\n",
       "      <td> 10</td>\n",
       "      <td> 11</td>\n",
       "      <td> 12</td>\n",
       "      <td>   foo</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a   b   c   d message\n",
       "0  1   2   3   4   hello\n",
       "1  5   6   7   8   world\n",
       "2  9  10  11  12     foo"
      ]
     },
     "execution_count": 159,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('ex1.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 160,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>message</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 1</td>\n",
       "      <td>  2</td>\n",
       "      <td>  3</td>\n",
       "      <td>  4</td>\n",
       "      <td> hello</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 5</td>\n",
       "      <td>  6</td>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "      <td> world</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 9</td>\n",
       "      <td> 10</td>\n",
       "      <td> 11</td>\n",
       "      <td> 12</td>\n",
       "      <td>   foo</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a   b   c   d message\n",
       "0  1   2   3   4   hello\n",
       "1  5   6   7   8   world\n",
       "2  9  10  11  12     foo"
      ]
     },
     "execution_count": 160,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_table('ex1.csv', sep=',')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 161,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> a</td>\n",
       "      <td>  b</td>\n",
       "      <td>  c</td>\n",
       "      <td>  d</td>\n",
       "      <td> message</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td>  2</td>\n",
       "      <td>  3</td>\n",
       "      <td>  4</td>\n",
       "      <td>   hello</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 5</td>\n",
       "      <td>  6</td>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "      <td>   world</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 9</td>\n",
       "      <td> 10</td>\n",
       "      <td> 11</td>\n",
       "      <td> 12</td>\n",
       "      <td>     foo</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   0   1   2   3        4\n",
       "0  a   b   c   d  message\n",
       "1  1   2   3   4    hello\n",
       "2  5   6   7   8    world\n",
       "3  9  10  11  12      foo"
      ]
     },
     "execution_count": 161,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_csv('ex1.csv', header=None)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "      <th>f</th>\n",
       "      <th>message2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>  b</td>\n",
       "      <td>  c</td>\n",
       "      <td>  d</td>\n",
       "      <td> message</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>  2</td>\n",
       "      <td>  3</td>\n",
       "      <td>  4</td>\n",
       "      <td>   hello</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>  6</td>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "      <td>   world</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td> 10</td>\n",
       "      <td> 11</td>\n",
       "      <td> 12</td>\n",
       "      <td>     foo</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    d   e   f message2\n",
       "a   b   c   d  message\n",
       "1   2   3   4    hello\n",
       "5   6   7   8    world\n",
       "9  10  11  12      foo"
      ]
     },
     "execution_count": 164,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_csv('ex1.csv', header=None, names=['d', 'e', 'f', 'message2'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 165,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>message</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 1</td>\n",
       "      <td> 2</td>\n",
       "      <td> 3</td>\n",
       "      <td> 4</td>\n",
       "      <td> hello</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 5</td>\n",
       "      <td> 6</td>\n",
       "      <td> 7</td>\n",
       "      <td> 8</td>\n",
       "      <td> world</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a  b  c  d message\n",
       "0  1  2  3  4   hello\n",
       "1  5  6  7  8   world"
      ]
     },
     "execution_count": 165,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 몇 줄만 읽어오기\n",
    "pd.read_csv('ex1.csv', nrows=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 171,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.io.parsers.TextFileReader at 0x5436210>"
      ]
     },
     "execution_count": 171,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 로우 단위로, 여러 조각으로 읽어오기\n",
    "chunker = pd.read_csv('ex6.csv', chunksize=1000)\n",
    "chunker"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 170,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10001 03_data/file/ex6.csv\r\n"
     ]
    }
   ],
   "source": [
    "!wc -l ex6.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 169,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "one,two,three,four,key\r\n",
      "0.467976300189,-0.0386485396255,-0.295344251987,-1.82472622729,L\r\n",
      "-0.358893469543,1.40445260007,0.704964644926,-0.200638304015,B\r\n",
      "-0.50184039929,0.659253707223,-0.421690619312,-0.0576883018364,G\r\n",
      "0.204886212202,1.07413396504,1.38836131252,-0.982404023494,R\r\n",
      "0.354627914484,-0.133115852296,0.283762637978,-0.837062961653,Q\r\n",
      "1.81748001608,0.742272722638,0.419394843928,-2.25103520513,Q\r\n",
      "-0.776764319165,0.935517747061,-0.332871759623,-1.87564085416,U\r\n",
      "-0.913134961617,1.53062351168,-0.572656719239,0.477252252981,K\r\n",
      "0.358479538224,-0.49757199147,-0.367016188009,0.507701778685,S\r\n"
     ]
    }
   ],
   "source": [
    "!head ex6.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "E    368\n",
       "X    364\n",
       "L    346\n",
       "O    343\n",
       "Q    340\n",
       "M    338\n",
       "J    337\n",
       "F    335\n",
       "K    334\n",
       "H    330\n",
       "V    328\n",
       "I    327\n",
       "U    326\n",
       "P    324\n",
       "D    320\n",
       "A    320\n",
       "R    318\n",
       "Y    314\n",
       "G    308\n",
       "S    308\n",
       "N    306\n",
       "W    305\n",
       "T    304\n",
       "B    302\n",
       "Z    288\n",
       "C    286\n",
       "4    171\n",
       "6    166\n",
       "7    164\n",
       "8    162\n",
       "3    162\n",
       "5    157\n",
       "2    152\n",
       "0    151\n",
       "9    150\n",
       "1    146\n",
       "dtype: float64"
      ]
     },
     "execution_count": 172,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# chunker로 쪼개진 파일을 순회하면서, key 칼럼에 있는 값 세어보기\n",
    "tot = Series([])\n",
    "for piece in chunker :\n",
    "    tot = tot.add(piece['key'].value_counts(), fill_value=0)\n",
    "    \n",
    "tot = tot.order(ascending=False)\n",
    "tot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 173,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "      <th>three</th>\n",
       "      <th>four</th>\n",
       "      <th>key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0.467976</td>\n",
       "      <td>-0.038649</td>\n",
       "      <td>-0.295344</td>\n",
       "      <td>-1.824726</td>\n",
       "      <td> B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.358893</td>\n",
       "      <td> 1.404453</td>\n",
       "      <td> 0.704965</td>\n",
       "      <td>-0.200638</td>\n",
       "      <td> M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.501840</td>\n",
       "      <td> 0.659254</td>\n",
       "      <td>-0.421691</td>\n",
       "      <td>-0.057688</td>\n",
       "      <td> N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 0.204886</td>\n",
       "      <td> 1.074134</td>\n",
       "      <td> 1.388361</td>\n",
       "      <td>-0.982404</td>\n",
       "      <td> N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 0.354628</td>\n",
       "      <td>-0.133116</td>\n",
       "      <td> 0.283763</td>\n",
       "      <td>-0.837063</td>\n",
       "      <td> Y</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        one       two     three      four key\n",
       "0  0.467976 -0.038649 -0.295344 -1.824726   B\n",
       "1 -0.358893  1.404453  0.704965 -0.200638   M\n",
       "2 -0.501840  0.659254 -0.421691 -0.057688   N\n",
       "3  0.204886  1.074134  1.388361 -0.982404   N\n",
       "4  0.354628 -0.133116  0.283763 -0.837063   Y"
      ]
     },
     "execution_count": 173,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "piece.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 176,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "K    42\n",
       "M    41\n",
       "U    39\n",
       "Y    38\n",
       "P    38\n",
       "E    35\n",
       "R    34\n",
       "Q    34\n",
       "I    33\n",
       "B    33\n",
       "T    33\n",
       "F    33\n",
       "H    33\n",
       "A    32\n",
       "G    31\n",
       "Z    31\n",
       "L    30\n",
       "J    29\n",
       "N    29\n",
       "S    29\n",
       "X    28\n",
       "W    28\n",
       "C    28\n",
       "3    26\n",
       "D    25\n",
       "V    23\n",
       "6    22\n",
       "4    21\n",
       "O    19\n",
       "0    17\n",
       "2    15\n",
       "8    15\n",
       "5    15\n",
       "9    15\n",
       "7    14\n",
       "1    12\n",
       "dtype: int64"
      ]
     },
     "execution_count": 176,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "piece['key'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 파일 쓰기"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 178,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>message</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 1</td>\n",
       "      <td>  2</td>\n",
       "      <td>  3</td>\n",
       "      <td>  4</td>\n",
       "      <td> hello</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 5</td>\n",
       "      <td>  6</td>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "      <td> world</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 9</td>\n",
       "      <td> 10</td>\n",
       "      <td> 11</td>\n",
       "      <td> 12</td>\n",
       "      <td>   foo</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a   b   c   d message\n",
       "0  1   2   3   4   hello\n",
       "1  5   6   7   8   world\n",
       "2  9  10  11  12     foo"
      ]
     },
     "execution_count": 178,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_csv('ex1.csv')\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 179,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 쓰기\n",
    "data.to_csv('out.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 180,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ",a,b,c,d,message\r\n",
      "0,1,2,3,4,hello\r\n",
      "1,5,6,7,8,world\r\n",
      "2,9,10,11,12,foo\r\n"
     ]
    }
   ],
   "source": [
    "!cat out.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 181,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 다른 구분자 사용\n",
    "data.to_csv('out2.csv', sep='|')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 183,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "|a|b|c|d|message\r\n",
      "0|1|2|3|4|hello\r\n",
      "1|5|6|7|8|world\r\n",
      "2|9|10|11|12|foo\r\n"
     ]
    }
   ],
   "source": [
    "!cat out2.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 185,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# 컬럼의 일부분반 저장\n",
    "data.to_csv('out3.csv', columns=['a', 'b', 'd'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 186,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ",a,b,d\r\n",
      "0,1,2,4\r\n",
      "1,5,6,8\r\n",
      "2,9,10,12\r\n"
     ]
    }
   ],
   "source": [
    "!cat out3.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 웹 내용 긁어오기"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* json 데이터 다루기\n",
    "* 웹 API"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### json 데이터 다루기"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 191,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# JSON 데이터\n",
    "obj = \"\"\"\n",
    "{\n",
    "    \"name\": \"Wes\",\n",
    "    \"places_lived\": [\"United States\", \"Spain\", \"Germany\"],\n",
    "    \"pet\": null, \"siblings\": [{\"name\": \"Scott\", \"age\":25, \"pet\":\"Zuko\"},\n",
    "                                {\"name\": \"Katie\", \"age\":33, \"pet\": \"Cisco\"}]\n",
    "}\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 192,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{u'name': u'Wes',\n",
       " u'pet': None,\n",
       " u'places_lived': [u'United States', u'Spain', u'Germany'],\n",
       " u'siblings': [{u'age': 25, u'name': u'Scott', u'pet': u'Zuko'},\n",
       "  {u'age': 33, u'name': u'Katie', u'pet': u'Cisco'}]}"
      ]
     },
     "execution_count": 192,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import json\n",
    "result = json.loads(obj)\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 195,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> Scott</td>\n",
       "      <td> 25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> Katie</td>\n",
       "      <td> 33</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    name  age\n",
       "0  Scott   25\n",
       "1  Katie   33"
      ]
     },
     "execution_count": 195,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# JSON 객체의 리스트를 DataFrame 생성자로 넘기고 데이터 필드 선택 가능\n",
    "siblings = DataFrame(result['siblings'], columns=['name', 'age'])\n",
    "siblings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 웹 API"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 209,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Response [200]>"
      ]
     },
     "execution_count": 209,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import requests\n",
    "\n",
    "url = 'http://www.burgerking.co.kr/api/store/searchmap/empty/?areacd='\n",
    "resp = requests.get(url)\n",
    "resp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 233,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import json\n",
    "data = json.loads(resp.text)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 237,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{u'AllHour': u'N',\n",
       "  u'CloseCleaning': u'',\n",
       "  u'ClosePeakSeason': u'',\n",
       "  u'CloseWeekday': u'22:00',\n",
       "  u'Delivery': u'N',\n",
       "  u'DriveThrough': u'N',\n",
       "  u'Morning': u'N',\n",
       "  u'NewAddr': u'\\uc11c\\uc6b8\\ud2b9\\ubcc4\\uc2dc \\uae08\\ucc9c\\uad6c \\uac00\\uc0b0\\ub514\\uc9c0\\ud1381\\ub85c ',\n",
       "  u'NewAddr2': u'168 \\uc6b0\\ub9bc\\ub77c\\uc774\\uc628\\uc2a4\\ubc38\\ub9ac A\\ub3d9',\n",
       "  u'OpenTime': u'9:00',\n",
       "  u'PhoneNumber': u'02-853-0332',\n",
       "  u'PointX': 37.4799652,\n",
       "  u'PointY': 126.882637,\n",
       "  u'StoreNM': u'\\uac00\\uc0b0\\ub514\\uc9c0\\ud138\\uc810',\n",
       "  u'StoreSQ': 1},\n",
       " {u'AllHour': u'N',\n",
       "  u'CloseCleaning': u'',\n",
       "  u'ClosePeakSeason': u'/ \\uae08, \\ud1a0: 11:00~24:00',\n",
       "  u'CloseWeekday': u'22:30',\n",
       "  u'Delivery': u'N',\n",
       "  u'DriveThrough': u'N',\n",
       "  u'Morning': u'N',\n",
       "  u'NewAddr': u'\\uc11c\\uc6b8\\ud2b9\\ubcc4\\uc2dc \\uae08\\ucc9c\\uad6c \\ub514\\uc9c0\\ud138\\ub85c ',\n",
       "  u'NewAddr2': u'10\\uae38 9 \\ud604\\ub300\\uc544\\uc6b8\\ub81b 6\\uce35',\n",
       "  u'OpenTime': u'11:00',\n",
       "  u'PhoneNumber': u'02-2136-9962',\n",
       "  u'PointX': 37.47762,\n",
       "  u'PointY': 126.889053,\n",
       "  u'StoreNM': u'\\uac00\\uc0b0\\ud604\\ub300\\uc544\\uc6b8\\ub81b\\uc810',\n",
       "  u'StoreSQ': 2},\n",
       " {u'AllHour': u'N',\n",
       "  u'CloseCleaning': u'',\n",
       "  u'ClosePeakSeason': u'',\n",
       "  u'CloseWeekday': u'23:30',\n",
       "  u'Delivery': u'N',\n",
       "  u'DriveThrough': u'N',\n",
       "  u'Morning': u'N',\n",
       "  u'NewAddr': u'\\uc11c\\uc6b8\\ud2b9\\ubcc4\\uc2dc \\uac15\\uc11c\\uad6c \\uc591\\ucc9c\\ub85c',\n",
       "  u'NewAddr2': u'559 \\uc774\\ub9c8\\ud2b83\\uce35 (\\uac00\\uc591\\ub3d9)',\n",
       "  u'OpenTime': u'10:00',\n",
       "  u'PhoneNumber': u'02-3664-0221',\n",
       "  u'PointX': 37.5581932,\n",
       "  u'PointY': 126.861816,\n",
       "  u'StoreNM': u'\\uac00\\uc591\\uc774\\ub9c8\\ud2b8\\uc810',\n",
       "  u'StoreSQ': 3}]"
      ]
     },
     "execution_count": 237,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 238,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[u'Delivery',\n",
       " u'OpenTime',\n",
       " u'ClosePeakSeason',\n",
       " u'NewAddr',\n",
       " u'CloseWeekday',\n",
       " u'StoreNM',\n",
       " u'DriveThrough',\n",
       " u'Morning',\n",
       " u'StoreSQ',\n",
       " u'AllHour',\n",
       " u'PhoneNumber',\n",
       " u'NewAddr2',\n",
       " u'PointX',\n",
       " u'PointY',\n",
       " u'CloseCleaning']"
      ]
     },
     "execution_count": 238,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[0].keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 229,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "fields = ['StoreNM', 'Delivery', 'OpenTime', 'PointX', 'PointY']\n",
    "burger_store = DataFrame(data, columns=fields)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 230,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>StoreNM</th>\n",
       "      <th>Delivery</th>\n",
       "      <th>OpenTime</th>\n",
       "      <th>PointX</th>\n",
       "      <th>PointY</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>   가산디지털점</td>\n",
       "      <td> N</td>\n",
       "      <td>  9:00</td>\n",
       "      <td> 37.479965</td>\n",
       "      <td> 126.882637</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 가산현대아울렛점</td>\n",
       "      <td> N</td>\n",
       "      <td> 11:00</td>\n",
       "      <td> 37.477620</td>\n",
       "      <td> 126.889053</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>   가양이마트점</td>\n",
       "      <td> N</td>\n",
       "      <td> 10:00</td>\n",
       "      <td> 37.558193</td>\n",
       "      <td> 126.861816</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>    강남NC점</td>\n",
       "      <td> N</td>\n",
       "      <td>      </td>\n",
       "      <td> 37.509400</td>\n",
       "      <td> 127.007309</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>    강남교보점</td>\n",
       "      <td> N</td>\n",
       "      <td>      </td>\n",
       "      <td> 37.504570</td>\n",
       "      <td> 127.023590</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    StoreNM Delivery OpenTime     PointX      PointY\n",
       "0    가산디지털점        N     9:00  37.479965  126.882637\n",
       "1  가산현대아울렛점        N    11:00  37.477620  126.889053\n",
       "2    가양이마트점        N    10:00  37.558193  126.861816\n",
       "3     강남NC점        N           37.509400  127.007309\n",
       "4     강남교보점        N           37.504570  127.023590"
      ]
     },
     "execution_count": 230,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "burger_store.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 데이터 다듬기"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 데이터 합치기 \n",
    "* 재형성과 피벗\n",
    "* 데이터 변형\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 데이터 합치기"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* merge\n",
    "* concat\n",
    "* combine_first"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### merge"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 하나 이상의 키를 기준으로 DataFrame의 로우를 합친다. SQL이나 다른 관계형 데이터베이스의 join 연산과 비슷하다.\n",
    "* 기본적으로 내부조인inner join을 수행하여 교집합을 반환한다.\n",
    "* how 인자로 왼쪽 우선 외부조인(left), 오른쪽 우선 외부조인(right), 완전 외부조인(outer)를 수행할 수 있다. \n",
    "* 색인도 merge key가 될 수 있다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 내부조인inner join"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 259,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 3</td>\n",
       "      <td> c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 4</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> 5</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td> 6</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   data1 key\n",
       "0      0   b\n",
       "1      1   b\n",
       "2      2   a\n",
       "3      3   c\n",
       "4      4   a\n",
       "5      5   a\n",
       "6      6   b"
      ]
     },
     "execution_count": 259,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1 = DataFrame({'key' : ['b','b','a','c','a','a','b'],\n",
    "                'data1': range(7)})\n",
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 260,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data2</th>\n",
       "      <th>key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2</td>\n",
       "      <td> d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   data2 key\n",
       "0      0   a\n",
       "1      1   b\n",
       "2      2   d"
      ]
     },
     "execution_count": 260,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = DataFrame({'key' : ['a','b','d'],\n",
    "                'data2': range(3)})\n",
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 252,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>key</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 6</td>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 2</td>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 4</td>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> 5</td>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   data1 key  data2\n",
       "0      0   b      1\n",
       "1      1   b      1\n",
       "2      6   b      1\n",
       "3      2   a      0\n",
       "4      4   a      0\n",
       "5      5   a      0"
      ]
     },
     "execution_count": 252,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 암묵적으로 겹치는 컬럼을 키로 사용해서 합침\n",
    "pd.merge(df1,df2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 253,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>key</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 6</td>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 2</td>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 4</td>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> 5</td>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   data1 key  data2\n",
       "0      0   b      1\n",
       "1      1   b      1\n",
       "2      6   b      1\n",
       "3      2   a      0\n",
       "4      4   a      0\n",
       "5      5   a      0"
      ]
     },
     "execution_count": 253,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 명시적으로 키를 설정할 수 있다.\n",
    "pd.merge(df1,df2, on='key')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 256,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>lkey</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 3</td>\n",
       "      <td> c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 4</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> 5</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td> 6</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   data1 lkey\n",
       "0      0    b\n",
       "1      1    b\n",
       "2      2    a\n",
       "3      3    c\n",
       "4      4    a\n",
       "5      5    a\n",
       "6      6    b"
      ]
     },
     "execution_count": 256,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 공통되는 칼럼이 없드면 정해줄수 있다.\n",
    "\n",
    "df3 = DataFrame({'lkey' : ['b','b','a','c','a','a','b'],\n",
    "                'data1': range(7)})\n",
    "df3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 257,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data2</th>\n",
       "      <th>rkey</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2</td>\n",
       "      <td> d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   data2 rkey\n",
       "0      0    a\n",
       "1      1    b\n",
       "2      2    d"
      ]
     },
     "execution_count": 257,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4 = DataFrame({'rkey' : ['a','b','d'],\n",
    "                'data2': range(3)})\n",
    "df4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 258,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>lkey</th>\n",
       "      <th>data2</th>\n",
       "      <th>rkey</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 6</td>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 2</td>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 4</td>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> 5</td>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   data1 lkey  data2 rkey\n",
       "0      0    b      1    b\n",
       "1      1    b      1    b\n",
       "2      6    b      1    b\n",
       "3      2    a      0    a\n",
       "4      4    a      0    a\n",
       "5      5    a      0    a"
      ]
     },
     "execution_count": 258,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 따로 정해줌\n",
    "pd.merge(df3, df4, left_on='lkey', right_on='rkey')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 완전 외부조인(outer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 261,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 3</td>\n",
       "      <td> c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 4</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> 5</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td> 6</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   data1 key\n",
       "0      0   b\n",
       "1      1   b\n",
       "2      2   a\n",
       "3      3   c\n",
       "4      4   a\n",
       "5      5   a\n",
       "6      6   b"
      ]
     },
     "execution_count": 261,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 262,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data2</th>\n",
       "      <th>key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2</td>\n",
       "      <td> d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   data2 key\n",
       "0      0   a\n",
       "1      1   b\n",
       "2      2   d"
      ]
     },
     "execution_count": 262,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 263,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>key</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>  0</td>\n",
       "      <td> b</td>\n",
       "      <td>  1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>  1</td>\n",
       "      <td> b</td>\n",
       "      <td>  1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>  6</td>\n",
       "      <td> b</td>\n",
       "      <td>  1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>  2</td>\n",
       "      <td> a</td>\n",
       "      <td>  0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>  4</td>\n",
       "      <td> a</td>\n",
       "      <td>  0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>  5</td>\n",
       "      <td> a</td>\n",
       "      <td>  0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>  3</td>\n",
       "      <td> c</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td> d</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   data1 key  data2\n",
       "0      0   b      1\n",
       "1      1   b      1\n",
       "2      6   b      1\n",
       "3      2   a      0\n",
       "4      4   a      0\n",
       "5      5   a      0\n",
       "6      3   c    NaN\n",
       "7    NaN   d      2"
      ]
     },
     "execution_count": 263,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how='outer')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 왼쪽 우선 외부조인(left)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 264,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "df1 = DataFrame({'key' : ['b','b','a','c','a','b'],\n",
    "                'data1': range(6)})\n",
    "df2 = DataFrame({'key' : ['a','b','a', 'b', 'd'],\n",
    "                'data2': range(5)})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 266,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 3</td>\n",
       "      <td> c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 4</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> 5</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   data1 key\n",
       "0      0   b\n",
       "1      1   b\n",
       "2      2   a\n",
       "3      3   c\n",
       "4      4   a\n",
       "5      5   b"
      ]
     },
     "execution_count": 266,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 267,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data2</th>\n",
       "      <th>key</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2</td>\n",
       "      <td> a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 3</td>\n",
       "      <td> b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 4</td>\n",
       "      <td> d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   data2 key\n",
       "0      0   a\n",
       "1      1   b\n",
       "2      2   a\n",
       "3      3   b\n",
       "4      4   d"
      ]
     },
     "execution_count": 267,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 268,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>key</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0 </th>\n",
       "      <td> 0</td>\n",
       "      <td> b</td>\n",
       "      <td>  1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1 </th>\n",
       "      <td> 0</td>\n",
       "      <td> b</td>\n",
       "      <td>  3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2 </th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "      <td>  1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3 </th>\n",
       "      <td> 1</td>\n",
       "      <td> b</td>\n",
       "      <td>  3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4 </th>\n",
       "      <td> 2</td>\n",
       "      <td> a</td>\n",
       "      <td>  0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5 </th>\n",
       "      <td> 2</td>\n",
       "      <td> a</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6 </th>\n",
       "      <td> 3</td>\n",
       "      <td> c</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7 </th>\n",
       "      <td> 4</td>\n",
       "      <td> a</td>\n",
       "      <td>  0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8 </th>\n",
       "      <td> 4</td>\n",
       "      <td> a</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9 </th>\n",
       "      <td> 5</td>\n",
       "      <td> b</td>\n",
       "      <td>  1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td> 5</td>\n",
       "      <td> b</td>\n",
       "      <td>  3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    data1 key  data2\n",
       "0       0   b      1\n",
       "1       0   b      3\n",
       "2       1   b      1\n",
       "3       1   b      3\n",
       "4       2   a      0\n",
       "5       2   a      2\n",
       "6       3   c    NaN\n",
       "7       4   a      0\n",
       "8       4   a      2\n",
       "9       5   b      1\n",
       "10      5   b      3"
      ]
     },
     "execution_count": 268,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 왼쪽 우선 외부조인은 왼쪽의 모든 로우를 포함하는 결과를 반환한다\n",
    "pd.merge(df1, df2, on='key', how='left')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 오른쪽 우선 외부조인(right)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 269,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>key</th>\n",
       "      <th>data2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0 </th>\n",
       "      <td>  0</td>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1 </th>\n",
       "      <td>  1</td>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2 </th>\n",
       "      <td>  5</td>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3 </th>\n",
       "      <td>  0</td>\n",
       "      <td> b</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4 </th>\n",
       "      <td>  1</td>\n",
       "      <td> b</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5 </th>\n",
       "      <td>  5</td>\n",
       "      <td> b</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6 </th>\n",
       "      <td>  2</td>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7 </th>\n",
       "      <td>  4</td>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8 </th>\n",
       "      <td>  2</td>\n",
       "      <td> a</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9 </th>\n",
       "      <td>  4</td>\n",
       "      <td> a</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>NaN</td>\n",
       "      <td> d</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    data1 key  data2\n",
       "0       0   b      1\n",
       "1       1   b      1\n",
       "2       5   b      1\n",
       "3       0   b      3\n",
       "4       1   b      3\n",
       "5       5   b      3\n",
       "6       2   a      0\n",
       "7       4   a      0\n",
       "8       2   a      2\n",
       "9       4   a      2\n",
       "10    NaN   d      4"
      ]
     },
     "execution_count": 269,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 오른쪽 우선 외부조인은 오른쪽의 모든 로우를 포함하는 결과를 반환한다\n",
    "pd.merge(df1, df2, on='key', how='right')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### 색인 merge"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 273,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> a</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> a</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> b</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> c</td>\n",
       "      <td> 5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key  value\n",
       "0   a      0\n",
       "1   b      1\n",
       "2   a      2\n",
       "3   a      3\n",
       "4   b      4\n",
       "5   c      5"
      ]
     },
     "execution_count": 273,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "left1 = DataFrame({'key' : ['a', 'b', 'a', 'a', 'b', 'c'],\n",
    "                           'value' : range(6)})\n",
    "left1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 274,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>group_val</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td> 3.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td> 7.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   group_val\n",
       "a        3.5\n",
       "b        7.0"
      ]
     },
     "execution_count": 274,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "right1 = DataFrame({'group_val' : [3.5, 7]}, index=['a', 'b'])\n",
    "right1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 275,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key</th>\n",
       "      <th>value</th>\n",
       "      <th>group_val</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "      <td> 3.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> a</td>\n",
       "      <td> 2</td>\n",
       "      <td> 3.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> a</td>\n",
       "      <td> 3</td>\n",
       "      <td> 3.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "      <td> 7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> b</td>\n",
       "      <td> 4</td>\n",
       "      <td> 7.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key  value  group_val\n",
       "0   a      0        3.5\n",
       "2   a      2        3.5\n",
       "3   a      3        3.5\n",
       "1   b      1        7.0\n",
       "4   b      4        7.0"
      ]
     },
     "execution_count": 275,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 기본은 교집합\n",
    "pd.merge(left1, right1, left_on='key', right_index=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 276,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>key</th>\n",
       "      <th>value</th>\n",
       "      <th>group_val</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "      <td> 3.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> a</td>\n",
       "      <td> 2</td>\n",
       "      <td> 3.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> a</td>\n",
       "      <td> 3</td>\n",
       "      <td> 3.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> b</td>\n",
       "      <td> 1</td>\n",
       "      <td> 7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> b</td>\n",
       "      <td> 4</td>\n",
       "      <td> 7.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> c</td>\n",
       "      <td> 5</td>\n",
       "      <td> NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  key  value  group_val\n",
       "0   a      0        3.5\n",
       "2   a      2        3.5\n",
       "3   a      3        3.5\n",
       "1   b      1        7.0\n",
       "4   b      4        7.0\n",
       "5   c      5        NaN"
      ]
     },
     "execution_count": 276,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#  외부조인\n",
    "pd.merge(left1, right1, left_on='key', right_index=True, how='outer')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### concat"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "하나의 축을 따라 객체를 이어붙인다."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* numpy\n",
    "  - concatenate 함수\n",
    "* pandas\n",
    "  * concat 함수\n",
    "      * Series\n",
    "      * DataFrame\n",
    "      * axis 인자\n",
    "      * join_axes 인자\n",
    "      * keys 인자\n",
    "      * ignore_index 인자"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 279,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3],\n",
       "       [ 4,  5,  6,  7],\n",
       "       [ 8,  9, 10, 11]])"
      ]
     },
     "execution_count": 279,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# numpy의 concatenate 함수\n",
    "arr = np.arange(12).reshape((3, 4))\n",
    "arr"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 280,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3,  0,  1,  2,  3],\n",
       "       [ 4,  5,  6,  7,  4,  5,  6,  7],\n",
       "       [ 8,  9, 10, 11,  8,  9, 10, 11]])"
      ]
     },
     "execution_count": 280,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.concatenate([arr, arr], axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 284,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# pandas의 concatenate 함수\n",
    "\n",
    "# Series 객체\n",
    "s1 = Series([0, 1], index=['a', 'b'])\n",
    "s2 = Series([2, 3, 4], index=['c', 'd', 'e'])\n",
    "s3 = Series([5, 6], index=['f', 'g'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 283,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    0\n",
       "b    1\n",
       "dtype: int64"
      ]
     },
     "execution_count": 283,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 286,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "c    2\n",
       "d    3\n",
       "e    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 286,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 287,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "f    5\n",
       "g    6\n",
       "dtype: int64"
      ]
     },
     "execution_count": 287,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 288,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    0\n",
       "b    1\n",
       "c    2\n",
       "d    3\n",
       "e    4\n",
       "f    5\n",
       "g    6\n",
       "dtype: int64"
      ]
     },
     "execution_count": 288,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 합치기\n",
    "pd.concat([s1, s2, s3])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 282,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>  0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>  1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>NaN</td>\n",
       "      <td>  2</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>d</th>\n",
       "      <td>NaN</td>\n",
       "      <td>  3</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>e</th>\n",
       "      <td>NaN</td>\n",
       "      <td>  4</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>f</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>g</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0   1   2\n",
       "a   0 NaN NaN\n",
       "b   1 NaN NaN\n",
       "c NaN   2 NaN\n",
       "d NaN   3 NaN\n",
       "e NaN   4 NaN\n",
       "f NaN NaN   5\n",
       "g NaN NaN   6"
      ]
     },
     "execution_count": 282,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 축 바꿔서\n",
    "pd.concat([s1, s2, s3], axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 290,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    0\n",
       "b    5\n",
       "f    5\n",
       "g    6\n",
       "dtype: int64"
      ]
     },
     "execution_count": 290,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s4 = pd.concat([s1*5, s3])\n",
    "s4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 291,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>  0</td>\n",
       "      <td>  0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>  1</td>\n",
       "      <td>  5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>e</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0   1\n",
       "a   0   0\n",
       "c NaN NaN\n",
       "b   1   5\n",
       "e NaN NaN"
      ]
     },
     "execution_count": 291,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([s1, s4], axis=1, join_axes=[['a', 'c', 'b', 'e']])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 292,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# DataFrame도 비슷하게 작동한다\n",
    "\n",
    "df1 = DataFrame(np.arange(6).reshape(3, 2), index = ['a', 'b', 'c'],\n",
    "                        columns = ['one', 'two'])\n",
    "\n",
    "df2 = DataFrame(5 + np.arange(4).reshape(2, 2), index = ['a',  'c'],\n",
    "                        columns = ['three', 'four'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 293,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td> 0</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td> 2</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td> 4</td>\n",
       "      <td> 5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   one  two\n",
       "a    0    1\n",
       "b    2    3\n",
       "c    4    5"
      ]
     },
     "execution_count": 293,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 294,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>three</th>\n",
       "      <th>four</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td> 5</td>\n",
       "      <td> 6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td> 7</td>\n",
       "      <td> 8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   three  four\n",
       "a      5     6\n",
       "c      7     8"
      ]
     },
     "execution_count": 294,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 295,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>four</th>\n",
       "      <th>one</th>\n",
       "      <th>three</th>\n",
       "      <th>two</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>NaN</td>\n",
       "      <td>  0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td>NaN</td>\n",
       "      <td>  2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>NaN</td>\n",
       "      <td>  4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td>  6</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  5</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td>  8</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  7</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   four  one  three  two\n",
       "a   NaN    0    NaN    1\n",
       "b   NaN    2    NaN    3\n",
       "c   NaN    4    NaN    5\n",
       "a     6  NaN      5  NaN\n",
       "c     8  NaN      7  NaN"
      ]
     },
     "execution_count": 295,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([df1, df2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 296,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "      <th>three</th>\n",
       "      <th>four</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td> 0</td>\n",
       "      <td> 1</td>\n",
       "      <td>  5</td>\n",
       "      <td>  6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td> 2</td>\n",
       "      <td> 3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td> 4</td>\n",
       "      <td> 5</td>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   one  two  three  four\n",
       "a    0    1      5     6\n",
       "b    2    3    NaN   NaN\n",
       "c    4    5      7     8"
      ]
     },
     "execution_count": 296,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([df1, df2], axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 297,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"2\" halign=\"left\">level1</th>\n",
       "      <th colspan=\"2\" halign=\"left\">level2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "      <th>three</th>\n",
       "      <th>four</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>a</th>\n",
       "      <td> 0</td>\n",
       "      <td> 1</td>\n",
       "      <td>  5</td>\n",
       "      <td>  6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>b</th>\n",
       "      <td> 2</td>\n",
       "      <td> 3</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>c</th>\n",
       "      <td> 4</td>\n",
       "      <td> 5</td>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  level1     level2     \n",
       "     one two  three four\n",
       "a      0   1      5    6\n",
       "b      2   3    NaN  NaN\n",
       "c      4   5      7    8"
      ]
     },
     "execution_count": 297,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([df1, df2], axis=1, keys=['level1', 'level2'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### combine_first"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "두 객체를 포개서 한 객체에서 누락된 데이터를 다른 객체에 있는 값으로 채울 수 있게 한다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 298,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "f    NaN\n",
       "e    2.5\n",
       "d    NaN\n",
       "c    3.5\n",
       "b    4.5\n",
       "a    NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 298,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = Series([np.nan, 2.5, np.nan, 3.5, 4.5, np.nan], index=['f', 'e', 'd', 'c', 'b', 'a'])\n",
    "a\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 299,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "f     0\n",
       "e     1\n",
       "d     2\n",
       "c     3\n",
       "b     4\n",
       "a   NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 299,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b = Series(np.arange(len(a), dtype=np.float64), index=['f', 'e', 'd', 'c', 'b', 'a'])\n",
    "b[-1] = np.nan\n",
    "b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 303,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0. ,  2.5,  2. ,  3.5,  4.5,  nan])"
      ]
     },
     "execution_count": 303,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 두 자료의 색인이 겹친다. 이러면(색인이 일부겹치거나 완전히 같거나) 머지나 이어붙이기로는 불가능\n",
    "# 이때 조건절로 합침\n",
    "np.where(pd.isnull(a), b, a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 304,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    NaN\n",
       "b    4.5\n",
       "c    3.0\n",
       "d    2.0\n",
       "e    1.0\n",
       "f    0.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 304,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 위의 것과 동일하다.\n",
    "b[:-2].combine_first(a[2:])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 300,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "d    NaN\n",
       "c    3.5\n",
       "b    4.5\n",
       "a    NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 300,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a[2:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 301,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "f    0\n",
       "e    1\n",
       "d    2\n",
       "c    3\n",
       "dtype: float64"
      ]
     },
     "execution_count": 301,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b[:-2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 305,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>  1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>  2</td>\n",
       "      <td>  6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>  5</td>\n",
       "      <td>NaN</td>\n",
       "      <td> 10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>  6</td>\n",
       "      <td> 14</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    a   b   c\n",
       "0   1 NaN   2\n",
       "1 NaN   2   6\n",
       "2   5 NaN  10\n",
       "3 NaN   6  14"
      ]
     },
     "execution_count": 305,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# DataFrame은 컬럼에 대해 위와 같이 동작\n",
    "df1 = DataFrame({'a': [1., np.nan, 5., np.nan],\n",
    "                            'b': [np.nan, 2., np.nan, 6.],\n",
    "                            'c': range(2, 18, 4)})\n",
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 306,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>  5</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>  4</td>\n",
       "      <td>  3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>  4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>  3</td>\n",
       "      <td>  6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>  7</td>\n",
       "      <td>  8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    a   b\n",
       "0   5 NaN\n",
       "1   4   3\n",
       "2 NaN   4\n",
       "3   3   6\n",
       "4   7   8"
      ]
     },
     "execution_count": 306,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = DataFrame({'a': [5., 4., np.nan, 3., 7],\n",
    "                            'b': [np.nan, 3., 4., 6., 8.]})\n",
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 307,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>  2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 4</td>\n",
       "      <td>  2</td>\n",
       "      <td>  6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 5</td>\n",
       "      <td>  4</td>\n",
       "      <td> 10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 3</td>\n",
       "      <td>  6</td>\n",
       "      <td> 14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 7</td>\n",
       "      <td>  8</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   a   b   c\n",
       "0  1 NaN   2\n",
       "1  4   2   6\n",
       "2  5   4  10\n",
       "3  3   6  14\n",
       "4  7   8 NaN"
      ]
     },
     "execution_count": 307,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.combine_first(df2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 재형성과 피벗"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "표 형식의 데이터를 재배치하는 다양한 기본 연산이 존재하는데, 이런 연산을 재형성reshaping 또는 피벗이라고 한다.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 계층적 색인으로 재형성\n",
    "* 피버팅"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 계층적 색인으로 재형성"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* stack : 데이터의 컬럼을 로우로 피벗 또는 회전\n",
    "* unstack : 로우를 칼럼으로 피벗시킨다"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 309,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>number</th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "      <th>three</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>state</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td> 0</td>\n",
       "      <td> 1</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Colorado</th>\n",
       "      <td> 3</td>\n",
       "      <td> 4</td>\n",
       "      <td> 5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "number    one  two  three\n",
       "state                    \n",
       "Ohio        0    1      2\n",
       "Colorado    3    4      5"
      ]
     },
     "execution_count": 309,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = DataFrame(np.arange(6).reshape((2, 3)),\n",
    "                         index=pd.Index(['Ohio', 'Colorado'], name='state'),\n",
    "                        columns = pd.Index(['one', 'two', 'three'], name='number'))\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 310,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "state     number\n",
       "Ohio      one       0\n",
       "          two       1\n",
       "          three     2\n",
       "Colorado  one       3\n",
       "          two       4\n",
       "          three     5\n",
       "dtype: int64"
      ]
     },
     "execution_count": 310,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# long 포맷으로 변형\n",
    "result = data.stack()\n",
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 311,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>number</th>\n",
       "      <th>one</th>\n",
       "      <th>two</th>\n",
       "      <th>three</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>state</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Ohio</th>\n",
       "      <td> 0</td>\n",
       "      <td> 1</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Colorado</th>\n",
       "      <td> 3</td>\n",
       "      <td> 4</td>\n",
       "      <td> 5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "number    one  two  three\n",
       "state                    \n",
       "Ohio        0    1      2\n",
       "Colorado    3    4      5"
      ]
     },
     "execution_count": 311,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# wide 포맷으로 변형\n",
    "result.unstack()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 피버팅"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 337,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ldata = DataFrame({'date': np.random.rand(9),\n",
    "                          'item':['a','a','c', 'b', 'a', 'b', 'c', 'a', 'a'],\n",
    "                          'value': np.arange(9)})       "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 338,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>item</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0.091794</td>\n",
       "      <td> a</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 0.524387</td>\n",
       "      <td> a</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 0.412385</td>\n",
       "      <td> c</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 0.247175</td>\n",
       "      <td> b</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> 0.731769</td>\n",
       "      <td> a</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> 0.286297</td>\n",
       "      <td> b</td>\n",
       "      <td> 5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td> 0.881872</td>\n",
       "      <td> c</td>\n",
       "      <td> 6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td> 0.404667</td>\n",
       "      <td> a</td>\n",
       "      <td> 7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td> 0.383669</td>\n",
       "      <td> a</td>\n",
       "      <td> 8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       date item  value\n",
       "0  0.091794    a      0\n",
       "1  0.524387    a      1\n",
       "2  0.412385    c      2\n",
       "3  0.247175    b      3\n",
       "4  0.731769    a      4\n",
       "5  0.286297    b      5\n",
       "6  0.881872    c      6\n",
       "7  0.404667    a      7\n",
       "8  0.383669    a      8"
      ]
     },
     "execution_count": 338,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ldata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 339,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>item</th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0.091794</th>\n",
       "      <td>  0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0.247175</th>\n",
       "      <td>NaN</td>\n",
       "      <td>  3</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0.286297</th>\n",
       "      <td>NaN</td>\n",
       "      <td>  5</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0.383669</th>\n",
       "      <td>  8</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0.404667</th>\n",
       "      <td>  7</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "item       a   b   c\n",
       "date                \n",
       "0.091794   0 NaN NaN\n",
       "0.247175 NaN   3 NaN\n",
       "0.286297 NaN   5 NaN\n",
       "0.383669   8 NaN NaN\n",
       "0.404667   7 NaN NaN"
      ]
     },
     "execution_count": 339,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pivoted = ldata.pivot('date', 'item', 'value')\n",
    "pivoted.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 데이터 변형"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 중복제거"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 343,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>k1</th>\n",
       "      <th>k2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> one</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> one</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> one</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> two</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> two</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> two</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td> two</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    k1  k2\n",
       "0  one   1\n",
       "1  one   1\n",
       "2  one   2\n",
       "3  two   3\n",
       "4  two   3\n",
       "5  two   4\n",
       "6  two   4"
      ]
     },
     "execution_count": 343,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = DataFrame({'k1': ['one']*3 + ['two']*4,\n",
    "                          'k2': [1, 1, 2, 3, 3, 4, 4]})\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 345,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    False\n",
       "1     True\n",
       "2    False\n",
       "3    False\n",
       "4     True\n",
       "5    False\n",
       "6     True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 345,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 중복찾기\n",
    "data.duplicated()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 347,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>k1</th>\n",
       "      <th>k2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> one</td>\n",
       "      <td> 1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> one</td>\n",
       "      <td> 2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> two</td>\n",
       "      <td> 3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td> two</td>\n",
       "      <td> 4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    k1  k2\n",
       "0  one   1\n",
       "2  one   2\n",
       "3  two   3\n",
       "5  two   4"
      ]
     },
     "execution_count": 347,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 중복제거\n",
    "data.drop_duplicates()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 함수 매핑"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 348,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>food</th>\n",
       "      <th>ounces</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>       bacon</td>\n",
       "      <td>  4.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> pulled pork</td>\n",
       "      <td>  3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>       bacon</td>\n",
       "      <td> 12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>    Pastrami</td>\n",
       "      <td>  6.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> corned beef</td>\n",
       "      <td>  7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>       Bacon</td>\n",
       "      <td>  8.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>    pastrami</td>\n",
       "      <td>  3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>   honey ham</td>\n",
       "      <td>  5.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>    nova lox</td>\n",
       "      <td>  6.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          food  ounces\n",
       "0        bacon     4.0\n",
       "1  pulled pork     3.0\n",
       "2        bacon    12.0\n",
       "3     Pastrami     6.0\n",
       "4  corned beef     7.5\n",
       "5        Bacon     8.0\n",
       "6     pastrami     3.0\n",
       "7    honey ham     5.0\n",
       "8     nova lox     6.0"
      ]
     },
     "execution_count": 348,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = DataFrame({'food' : ['bacon', 'pulled pork', 'bacon', 'Pastrami', 'corned beef', 'Bacon', 'pastrami', 'honey ham', 'nova lox'],\n",
    "                          'ounces' : [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 355,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "meat_to_animal = {\n",
    "    'bacon' : 'pig',\n",
    "    'pulled pork' : 'pig',\n",
    "    'pastrami' : 'cow',\n",
    "    'corned beef' : 'cow',\n",
    "    'honey ham' : 'pig',\n",
    "    'nova lox' : 'salmon'\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 356,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>food</th>\n",
       "      <th>ounces</th>\n",
       "      <th>animal</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>       bacon</td>\n",
       "      <td>  4.0</td>\n",
       "      <td>    pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> pulled pork</td>\n",
       "      <td>  3.0</td>\n",
       "      <td>    pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>       bacon</td>\n",
       "      <td> 12.0</td>\n",
       "      <td>    pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>    Pastrami</td>\n",
       "      <td>  6.0</td>\n",
       "      <td>    cow</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td> corned beef</td>\n",
       "      <td>  7.5</td>\n",
       "      <td>    cow</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>       Bacon</td>\n",
       "      <td>  8.0</td>\n",
       "      <td>    pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>    pastrami</td>\n",
       "      <td>  3.0</td>\n",
       "      <td>    cow</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>   honey ham</td>\n",
       "      <td>  5.0</td>\n",
       "      <td>    pig</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>    nova lox</td>\n",
       "      <td>  6.0</td>\n",
       "      <td> salmon</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          food  ounces  animal\n",
       "0        bacon     4.0     pig\n",
       "1  pulled pork     3.0     pig\n",
       "2        bacon    12.0     pig\n",
       "3     Pastrami     6.0     cow\n",
       "4  corned beef     7.5     cow\n",
       "5        Bacon     8.0     pig\n",
       "6     pastrami     3.0     cow\n",
       "7    honey ham     5.0     pig\n",
       "8     nova lox     6.0  salmon"
      ]
     },
     "execution_count": 356,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 육류의 이름을 소문자로 바꾸고, 해당 동물로 매핑\n",
    "data['animal'] = data['food'].map(str.lower).map(meat_to_animal)\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 357,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       pig\n",
       "1       pig\n",
       "2       pig\n",
       "3       cow\n",
       "4       cow\n",
       "5       pig\n",
       "6       cow\n",
       "7       pig\n",
       "8    salmon\n",
       "Name: food, dtype: object"
      ]
     },
     "execution_count": 357,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 함수 형태로 넘길수도 있다.\n",
    "data['food'].map(lambda x: meat_to_animal[x.lower()])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 값 치환"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 358,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       1\n",
       "1    -999\n",
       "2       2\n",
       "3    -999\n",
       "4   -1000\n",
       "5       3\n",
       "dtype: float64"
      ]
     },
     "execution_count": 358,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = Series([1., -999., 2., -999., -1000., 3.])\n",
    "data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 359,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       1\n",
       "1    -999\n",
       "2       2\n",
       "3    -999\n",
       "4   -1000\n",
       "5       3\n",
       "dtype: float64"
      ]
     },
     "execution_count": 359,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.replace(-999, np.nan)\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 특이값 찾아내고 치환하기"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 360,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td> 1000.000000</td>\n",
       "      <td> 1000.000000</td>\n",
       "      <td> 1000.000000</td>\n",
       "      <td> 1000.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>   -0.067684</td>\n",
       "      <td>    0.067924</td>\n",
       "      <td>    0.025598</td>\n",
       "      <td>   -0.002298</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>    0.998035</td>\n",
       "      <td>    0.992106</td>\n",
       "      <td>    1.006835</td>\n",
       "      <td>    0.996794</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>   -3.428254</td>\n",
       "      <td>   -3.548824</td>\n",
       "      <td>   -3.184377</td>\n",
       "      <td>   -3.745356</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>   -0.774890</td>\n",
       "      <td>   -0.591841</td>\n",
       "      <td>   -0.641675</td>\n",
       "      <td>   -0.644144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>   -0.116401</td>\n",
       "      <td>    0.101143</td>\n",
       "      <td>    0.002073</td>\n",
       "      <td>   -0.013611</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>    0.616366</td>\n",
       "      <td>    0.780282</td>\n",
       "      <td>    0.680391</td>\n",
       "      <td>    0.654328</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>    3.366626</td>\n",
       "      <td>    2.653656</td>\n",
       "      <td>    3.260383</td>\n",
       "      <td>    3.927528</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 0            1            2            3\n",
       "count  1000.000000  1000.000000  1000.000000  1000.000000\n",
       "mean     -0.067684     0.067924     0.025598    -0.002298\n",
       "std       0.998035     0.992106     1.006835     0.996794\n",
       "min      -3.428254    -3.548824    -3.184377    -3.745356\n",
       "25%      -0.774890    -0.591841    -0.641675    -0.644144\n",
       "50%      -0.116401     0.101143     0.002073    -0.013611\n",
       "75%       0.616366     0.780282     0.680391     0.654328\n",
       "max       3.366626     2.653656     3.260383     3.927528"
      ]
     },
     "execution_count": 360,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.random.seed(12345)\n",
    "\n",
    "data = DataFrame(np.random.randn(1000, 4))\n",
    "data.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 361,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   -0.555730\n",
       "1    0.281746\n",
       "2   -1.296221\n",
       "3    0.886429\n",
       "4   -0.438570\n",
       "Name: 3, dtype: float64"
      ]
     },
     "execution_count": 361,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 위 자료에서 한 컬럼에서 절대값이 3을 초과하는 값 찾기\n",
    "col = data[3]\n",
    "\n",
    "col.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 363,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "97     3.927528\n",
       "305   -3.399312\n",
       "400   -3.745356\n",
       "Name: 3, dtype: float64"
      ]
     },
     "execution_count": 363,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col[np.abs(col) > 3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 364,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>5  </th>\n",
       "      <td>-0.539741</td>\n",
       "      <td> 0.476985</td>\n",
       "      <td> 3.248944</td>\n",
       "      <td>-1.021228</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>97 </th>\n",
       "      <td>-0.774363</td>\n",
       "      <td> 0.552936</td>\n",
       "      <td> 0.106061</td>\n",
       "      <td> 3.927528</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>-0.655054</td>\n",
       "      <td>-0.565230</td>\n",
       "      <td> 3.176873</td>\n",
       "      <td> 0.959533</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>305</th>\n",
       "      <td>-2.315555</td>\n",
       "      <td> 0.457246</td>\n",
       "      <td>-0.025907</td>\n",
       "      <td>-3.399312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>324</th>\n",
       "      <td> 0.050188</td>\n",
       "      <td> 1.951312</td>\n",
       "      <td> 3.260383</td>\n",
       "      <td> 0.963301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>400</th>\n",
       "      <td> 0.146326</td>\n",
       "      <td> 0.508391</td>\n",
       "      <td>-0.196713</td>\n",
       "      <td>-3.745356</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>499</th>\n",
       "      <td>-0.293333</td>\n",
       "      <td>-0.242459</td>\n",
       "      <td>-3.056990</td>\n",
       "      <td> 1.918403</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>523</th>\n",
       "      <td>-3.428254</td>\n",
       "      <td>-0.296336</td>\n",
       "      <td>-0.439938</td>\n",
       "      <td>-0.867165</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>586</th>\n",
       "      <td> 0.275144</td>\n",
       "      <td> 1.179227</td>\n",
       "      <td>-3.184377</td>\n",
       "      <td> 1.369891</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>808</th>\n",
       "      <td>-0.362528</td>\n",
       "      <td>-3.548824</td>\n",
       "      <td> 1.553205</td>\n",
       "      <td>-2.186301</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>900</th>\n",
       "      <td> 3.366626</td>\n",
       "      <td>-2.372214</td>\n",
       "      <td> 0.851010</td>\n",
       "      <td> 1.332846</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0         1         2         3\n",
       "5   -0.539741  0.476985  3.248944 -1.021228\n",
       "97  -0.774363  0.552936  0.106061  3.927528\n",
       "102 -0.655054 -0.565230  3.176873  0.959533\n",
       "305 -2.315555  0.457246 -0.025907 -3.399312\n",
       "324  0.050188  1.951312  3.260383  0.963301\n",
       "400  0.146326  0.508391 -0.196713 -3.745356\n",
       "499 -0.293333 -0.242459 -3.056990  1.918403\n",
       "523 -3.428254 -0.296336 -0.439938 -0.867165\n",
       "586  0.275144  1.179227 -3.184377  1.369891\n",
       "808 -0.362528 -3.548824  1.553205 -2.186301\n",
       "900  3.366626 -2.372214  0.851010  1.332846"
      ]
     },
     "execution_count": 364,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[(np.abs(data)>3).any(1)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 데이터 그룹 연산"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "그룹 연산 = 분리-적용-결합"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src='./fig.9.1.png'>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 365,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>data1</th>\n",
       "      <th>data2</th>\n",
       "      <th>key1</th>\n",
       "      <th>key2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 1.150765</td>\n",
       "      <td> 1.199915</td>\n",
       "      <td> a</td>\n",
       "      <td> one</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.997174</td>\n",
       "      <td>-0.451814</td>\n",
       "      <td> a</td>\n",
       "      <td> two</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 0.046486</td>\n",
       "      <td>-0.155385</td>\n",
       "      <td> b</td>\n",
       "      <td> one</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.610441</td>\n",
       "      <td>-0.153514</td>\n",
       "      <td> b</td>\n",
       "      <td> two</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.394982</td>\n",
       "      <td> 0.011194</td>\n",
       "      <td> a</td>\n",
       "      <td> one</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      data1     data2 key1 key2\n",
       "0  1.150765  1.199915    a  one\n",
       "1 -0.997174 -0.451814    a  two\n",
       "2  0.046486 -0.155385    b  one\n",
       "3 -0.610441 -0.153514    b  two\n",
       "4 -0.394982  0.011194    a  one"
      ]
     },
     "execution_count": 365,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],\n",
    "                'key2' : ['one', 'two', 'one', 'two', 'one'],\n",
    "                'data1' : np.random.randn(5),\n",
    "                'data2' : np.random.randn(5)})\n",
    "\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 366,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.SeriesGroupBy object at 0x5c66d10>"
      ]
     },
     "execution_count": 366,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 데이터를 key1으로 묶고 각 그룹에서 data1의 평균을 구하는 방법\n",
    "grouped = df['data1'].groupby(df['key1'])\n",
    "\n",
    "# grouped 변수는 Groupby 객체\n",
    "grouped"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 367,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1\n",
       "a      -0.080463\n",
       "b      -0.281977\n",
       "Name: data1, dtype: float64"
      ]
     },
     "execution_count": 367,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 평균 구하기\n",
    "grouped.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 368,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "key1  key2\n",
       "a     one     0.377892\n",
       "      two    -0.997174\n",
       "b     one     0.046486\n",
       "      two    -0.610441\n",
       "Name: data1, dtype: float64"
      ]
     },
     "execution_count": 368,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 색인 여러개 넘기기\n",
    "means = df['data1'].groupby([df['key1'],df['key2']]).mean()\n",
    "\n",
    "means"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 참고자료 \n",
    "\n",
    " * [1] 파이썬 라이브러리를 활용한 데이터 분석 - http://www.hanbit.co.kr/book/look.html?isbn=978-89-6848-047-8\n",
    " * [2] 버거지수 - http://nbviewer.ipython.org/gist/hyeshik/cf9f3d7686e07eedbfda?revision=6\n",
    " * [3] 데이터/수치 분석을 위한 파이썬 라이브러리 SciPy와 NumPy -  http://www.hanbit.co.kr/ebook/look.html?isbn=9788968486135"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}