{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "EMlGf1UM82fB"
      },
      "source": [
        "# 머신 러닝 교과서 3판"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "<table align=\"left\">\n",
        "  <td>\n",
        "    <a href=\"https://colab.research.google.com/github/rickiepark/python-machine-learning-book-3rd-edition/blob/master/ch01/ch01.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
        "  </td>\n",
        "</table>"
      ],
      "metadata": {
        "id": "Af6dqvBb835-"
      }
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4OEjozLi82fD"
      },
      "source": [
        "# 1장 - 컴퓨터는 데이터에서 배운다"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Cls5awfS82fD"
      },
      "source": [
        "### 목차"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8h1lJUnE82fE"
      },
      "source": [
        "- 데이터를 지식으로 바꾸는 지능적인 시스템 구축\n",
        "    - 머신 러닝의 세 가지 종류\n",
        "    - 지도 학습으로 미래 예측\n",
        "    - 강화 학습으로 반응형 문제 해결\n",
        "    - 비지도 학습으로 숨겨진 구조 발견\n",
        "- 기본 용어와 표기법 소개\n",
        "    - 이 책에서 사용하는 표기법과 규칙\n",
        "    - 머신 러닝 용어\n",
        "- 머신 러닝 시스템 구축 로드맵\n",
        "    - 전처리: 데이터 형태 갖추기\n",
        "    - 예측 모델 훈련과 선택\n",
        "    - 모델을 평가하고 본 적 없는 샘플로 예측"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T05:49:12.483577Z",
          "iopub.status.busy": "2021-10-23T05:49:12.481184Z",
          "iopub.status.idle": "2021-10-23T05:49:12.487647Z",
          "shell.execute_reply": "2021-10-23T05:49:12.488360Z"
        },
        "id": "X6z9vtFh82fE"
      },
      "outputs": [],
      "source": [
        "from IPython.display import Image"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KOoufhrb82fF"
      },
      "source": [
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tfuvDeNg82fF"
      },
      "source": [
        "# 1.1 데이터를 지식으로 바꾸는 지능적인 시스템 구축"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kya5EzwF82fG"
      },
      "source": [
        "* 인공 지능 > 머신 러닝\n",
        "* 수동으로 만드는 규칙 --> 데이터에서 규칙(지식)을 도출"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ev7gdi0582fG"
      },
      "source": [
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eU6i1BOw82fG"
      },
      "source": [
        "# 1.2 머신 러닝의 세 가지 종류"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T05:49:12.500280Z",
          "iopub.status.busy": "2021-10-23T05:49:12.497846Z",
          "iopub.status.idle": "2021-10-23T05:49:12.510764Z",
          "shell.execute_reply": "2021-10-23T05:49:12.509927Z"
        },
        "id": "7hBbgddM82fH",
        "outputId": "09b882fd-2d6c-4300-811e-93da5e1ee37e",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 347
        }
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JtIQ1\" width=\"500\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 2
        }
      ],
      "source": [
        "Image(url='https://git.io/JtIQ1', width=500)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DmunVnyO82fH"
      },
      "source": [
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZOICtvgj82fH"
      },
      "source": [
        "## 1.2.1 지도 학습으로 미래 예측하기"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T05:49:12.518182Z",
          "iopub.status.busy": "2021-10-23T05:49:12.517089Z",
          "iopub.status.idle": "2021-10-23T05:49:12.522283Z",
          "shell.execute_reply": "2021-10-23T05:49:12.521540Z"
        },
        "id": "R2kHImjh82fH",
        "outputId": "525946e8-8288-495b-f6e9-253b6ac221bb",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 311
        }
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JtIQH\" width=\"500\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 3
        }
      ],
      "source": [
        "Image(url='https://git.io/JtIQH', width=500)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bGi4oI4082fI"
      },
      "source": [
        "### 분류: 클래스 레이블 예측"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T05:49:12.529674Z",
          "iopub.status.busy": "2021-10-23T05:49:12.528114Z",
          "iopub.status.idle": "2021-10-23T05:49:12.532739Z",
          "shell.execute_reply": "2021-10-23T05:49:12.532139Z"
        },
        "id": "fKuX95HV82fI",
        "outputId": "c68d5a24-d4ca-41d4-db68-acf0452f1931",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 304
        }
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JtIQ5\" width=\"300\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 4
        }
      ],
      "source": [
        "Image(url='https://git.io/JtIQ5', width=300)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cbd8gBUR82fI"
      },
      "source": [
        "- 이진 분류(binary classification) vs 다중 분류(multiclass classification)\n",
        "- 양성 클래스(positive class) / 음성 클래스(negative class)\n",
        "- 결정 경계(decision boundary)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "F8RRq7w282fI"
      },
      "source": [
        "### 회귀: 연속적인 출력 값 예측"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T05:49:12.539177Z",
          "iopub.status.busy": "2021-10-23T05:49:12.538481Z",
          "iopub.status.idle": "2021-10-23T05:49:12.542603Z",
          "shell.execute_reply": "2021-10-23T05:49:12.543071Z"
        },
        "id": "cW6G8hot82fI",
        "outputId": "813ee560-9a37-4590-9a52-552c13418d91",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 310
        }
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JtIQd\" width=\"300\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 5
        }
      ],
      "source": [
        "Image(url='https://git.io/JtIQd', width=300)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qNiT2wm-82fI"
      },
      "source": [
        "- 예측 변수(특성) vs 반응 변수(타깃)\n",
        "- 선형 회귀(linear regression)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jIppNyl-82fI"
      },
      "source": [
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pi_74fXz82fI"
      },
      "source": [
        "## 1.2.2 강화 학습으로 반응형 문제 해결"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T05:49:12.549671Z",
          "iopub.status.busy": "2021-10-23T05:49:12.548963Z",
          "iopub.status.idle": "2021-10-23T05:49:12.552796Z",
          "shell.execute_reply": "2021-10-23T05:49:12.553275Z"
        },
        "id": "naGjad3M82fI",
        "outputId": "b409595e-2937-4b1f-fb94-ac31999a64cf",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 173
        }
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JtIQN\" width=\"300\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 6
        }
      ],
      "source": [
        "Image(url='https://git.io/JtIQN', width=300)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9TIAZdKe82fJ"
      },
      "source": [
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Xz25GYqR82fJ"
      },
      "source": [
        "## 1.2.3 비지도 학습으로 숨겨진 구조 발견"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kyw0O8JA82fJ"
      },
      "source": [
        "### 군집: 서브그룹 찾기"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T05:49:12.560557Z",
          "iopub.status.busy": "2021-10-23T05:49:12.558980Z",
          "iopub.status.idle": "2021-10-23T05:49:12.563643Z",
          "shell.execute_reply": "2021-10-23T05:49:12.563004Z"
        },
        "id": "HRplMAUh82fJ",
        "outputId": "f30179c1-65f3-4d33-aa4f-09277b98164a",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 305
        }
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JtIQx\" width=\"300\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 7
        }
      ],
      "source": [
        "Image(url='https://git.io/JtIQx', width=300)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1zxDdbU782fJ"
      },
      "source": [
        "- 클러스터(cluster)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_Q7O2OXE82fJ"
      },
      "source": [
        "### 차원 축소: 데이터 압축"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T05:49:12.570059Z",
          "iopub.status.busy": "2021-10-23T05:49:12.568804Z",
          "iopub.status.idle": "2021-10-23T05:49:12.572603Z",
          "shell.execute_reply": "2021-10-23T05:49:12.573073Z"
        },
        "id": "1jpU8COr82fJ",
        "outputId": "524f0ca8-3fbd-41f1-ae46-1f5d0a06d2ea",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 217
        }
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JtIQp\" width=\"500\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 8
        }
      ],
      "source": [
        "Image(url='https://git.io/JtIQp', width=500)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NT-lLdAX82fJ"
      },
      "source": [
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6g11F3s082fJ"
      },
      "source": [
        "# 1.3 기본 용어와 표기법 소개"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XWdfJKUb82fK"
      },
      "source": [
        "### 1.3.1 이 책에서 사용하는 표기법과 규칙"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T05:49:12.579115Z",
          "iopub.status.busy": "2021-10-23T05:49:12.578448Z",
          "iopub.status.idle": "2021-10-23T05:49:12.582897Z",
          "shell.execute_reply": "2021-10-23T05:49:12.583566Z"
        },
        "id": "45MscfUo82fK",
        "outputId": "4175ee63-7c9b-47f8-8d1d-9ee17c481988",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 405
        }
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JtI7e\" width=\"500\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 9
        }
      ],
      "source": [
        "Image(url='https://git.io/JtI7e', width=500)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "IvTS37-y82fK"
      },
      "source": [
        "$\\boldsymbol{X} \\in \\mathbb{R}^{150\\times4}$\n",
        "\n",
        "$\\begin{bmatrix}\n",
        "x_1^{(1)} & x_2^{(1)} & x_3^{(1)} & x_4^{(1)} \\\\\n",
        "x_1^{(2)} & x_2^{(2)} & x_3^{(2)} & x_4^{(2)} \\\\\n",
        "\\vdots & \\vdots & \\vdots & \\vdots \\\\\n",
        "x_1^{(150)} & x_2^{(150)} & x_3^{(150)} & x_4^{(150)} \\\\\n",
        "\\end{bmatrix}$\n",
        "\n",
        "벡터 : $\\boldsymbol{x}\\in\\mathbb{R}^{n\\times1}$\n",
        "\n",
        "행렬 : $\\boldsymbol{X}\\in\\mathbb{R}^{n\\times m}$\n",
        "\n",
        "샘플(행 벡터): $\\boldsymbol{x}^{(i)}=\\begin{bmatrix} x_1^{(i)} & x_2^{(i)} & x_3^{(i)} & x_4^{(i)}\\end{bmatrix}$\n",
        "\n",
        "특성(열 벡터): $\\boldsymbol{x}_j=\\begin{bmatrix} x_j^{(1)} \\\\ x_j^{(2)} \\\\ \\vdots \\\\ x_j^{(150)} \\end{bmatrix}$\n",
        "\n",
        "타깃(열 벡터): $\\boldsymbol{y}=\\begin{bmatrix} y^{(1)} \\\\ y^{(2)} \\\\ \\vdots \\\\ y^{(150)} \\end{bmatrix}$ $(y \\in \\text{\\{Setosa, Versicolor, Virginica\\}})$"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DAtZHNdW82fK"
      },
      "source": [
        "## 1.3.2 머신 러닝 용어"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JrHEeFbQ82fK"
      },
      "source": [
        "- 훈련 샘플\n",
        "- 훈련\n",
        "- 특성(변수, 입력)\n",
        "- 타깃(출력, 반응 변수, 정답)\n",
        "- 손실 함수(비용 함수)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qE4rIlmi82fK"
      },
      "source": [
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nlDhJ0d982fK"
      },
      "source": [
        "# 1.4 머신 러닝 시스템 구축 로드맵"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T05:49:12.587963Z",
          "iopub.status.busy": "2021-10-23T05:49:12.587146Z",
          "iopub.status.idle": "2021-10-23T05:49:12.591308Z",
          "shell.execute_reply": "2021-10-23T05:49:12.591794Z"
        },
        "id": "lLtWLYqn82fO",
        "outputId": "553d615d-c2de-46a8-e846-c29b0124c087",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 531
        }
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/JtI7J\" width=\"700\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 10
        }
      ],
      "source": [
        "Image(url='https://git.io/JtI7J', width=700)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GWB-t8BP82fO"
      },
      "source": [
        "## 1.4.1 전처리: 데이터 형태 갖추기"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gqyQuDH182fO"
      },
      "source": [
        "- 스케일 조정\n",
        "- 차원 축소\n",
        "- 훈련 데이터셋과 테스트 데이터셋"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8OycgcQZ82fP"
      },
      "source": [
        "## 1.4.2 예측 모델 훈련과 선택"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JY6euFuj82fP"
      },
      "source": [
        "- 공짜 점심 없음\n",
        "- 대표적인 분류 지표: 정확도(accuracy)\n",
        "- 하이퍼파라미터\n",
        "- 교차 검증"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7S3hsbdU82fP"
      },
      "source": [
        "### 1.4.3 모델을 평가하고 본 적 없는 샘플로 예측"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1VoaMOjs82fP"
      },
      "source": [
        "* 일반화 오차(일반화 성능)\n",
        "* 훈련 데이터셋에서 사용한 전처리 파라미터로 테스트 데이터셋/실제 데이터 변환"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QYOQE5Cg82fP"
      },
      "source": [
        "<br>"
      ]
    }
  ],
  "metadata": {
    "anaconda-cloud": {},
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.3"
    },
    "colab": {
      "provenance": []
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}