{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_PLo5qcAawlk"
      },
      "source": [
        "# 머신 러닝 교과서 3판"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jlmrX27Yawlo"
      },
      "source": [
        "# 9장 - 웹 애플리케이션에 머신 러닝 모델 내장하기"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0NSuaIVEawlp"
      },
      "source": [
        "**아래 링크를 통해 이 노트북을 주피터 노트북 뷰어(nbviewer.jupyter.org)로 보거나 구글 코랩(colab.research.google.com)에서 실행할 수 있습니다.**\n",
        "\n",
        "<table class=\"tfo-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://nbviewer.org/github/rickiepark/python-machine-learning-book-3rd-edition/blob/master/ch09/ch09.ipynb\"><img src=\"https://jupyter.org/assets/share.png\" width=\"60\" />주피터 노트북 뷰어로 보기</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/rickiepark/python-machine-learning-book-3rd-edition/blob/master/ch09/ch09.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />구글 코랩(Colab)에서 실행하기</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "vTRiUUjbawlp"
      },
      "source": [
        "### 목차"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "D29IAkPMawlp"
      },
      "source": [
        "- 8장 정리 - 영화 리뷰 분류를 위한 모델 훈련하기\n",
        "- 학습된 사이킷런 추정기 저장\n",
        "- 데이터를 저장하기 위해 SQLite 데이터베이스 설정\n",
        "- 플라스크 웹 애플리케이션 개발\n",
        "    - 첫 번째 플라스크 애플리케이션\n",
        "    - 폼 검증과 화면 출력\n",
        "- 영화 리뷰 분류기를 웹 애플리케이션으로 만들기\n",
        "- 공개 서버에 웹 애플리케이션 배포\n",
        "    - 영화 분류기 업데이트\n",
        "- 요약"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:41:51.447133Z",
          "iopub.status.busy": "2021-10-23T10:41:51.446490Z",
          "iopub.status.idle": "2021-10-23T10:41:51.449037Z",
          "shell.execute_reply": "2021-10-23T10:41:51.449453Z"
        },
        "id": "DXLkk_PSawlq"
      },
      "outputs": [],
      "source": [
        "from IPython.display import Image"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rq-n6UClawlq"
      },
      "source": [
        "플래스크(Flask) 웹 애플리케이션 코드는 다음 디렉토리에 있습니다:\n",
        "    \n",
        "- `1st_flask_app_1/`: 간단한 플래스크 웹 애플리케이션\n",
        "- `1st_flask_app_2/`: `1st_flask_app_1`에 폼 검증과 렌더링을 추가하여 확장한 버전\n",
        "- `movieclassifier/`: 웹 애플리케이션에 내장한 영화 리뷰 분류기\n",
        "- `movieclassifier_with_update/`: `movieclassifier`와 같지만 초기화를 위해 sqlite 데이터베이스를 사용합니다."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "y7BU7MFwawlq"
      },
      "source": [
        "웹 애플리케이션을 로컬에서 실행하려면 `cd`로 (위에 나열된) 각 디렉토리에 들어가서 메인 애플리케이션 스크립트를 실행합니다.\n",
        "\n",
        "    cd ./1st_flask_app_1\n",
        "    python app.py\n",
        "    \n",
        "터미널에서 다음같은 내용일 출력됩니다.\n",
        "    \n",
        "     * Running on http://127.0.0.1:5000/\n",
        "     * Restarting with reloader\n",
        "     \n",
        "웹 브라우저를 열고 터미널에 출력된 주소(일반적으로 http://127.0.0.1:5000/)를 입력하여 웹 애플리케이션에 접속합니다."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qtmgp48bawlr"
      },
      "source": [
        "**이 튜토리얼로 만든 예제 애플리케이션 데모는 다음 주소에서 볼 수 있습니다: http://haesun.pythonanywhere.com/**."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "K3cBNe2Nawlr"
      },
      "source": [
        "<br>\n",
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jxZ6m_xNawlr"
      },
      "source": [
        "# 8장 정리 - 영화 리뷰 분류를 위한 모델 훈련하기"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5MFI3KC4awlr"
      },
      "source": [
        "이 절은 8장의 마지막 섹션에서 훈련한 로지스틱 회귀 모델을 다시 사용합니다. 이어지는 코드 블럭을 실행하여 다음 절에서 사용할 모델을 훈련시키겠습니다."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AJKe25Wzawlr"
      },
      "source": [
        "**노트**\n",
        "\n",
        "다음 코드는 8장에서 만든 `movie_data.csv` 데이터셋을 사용합니다.\n",
        "\n",
        "**코랩을 사용할 때는 다음 셀을 실행하세요.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:41:51.456769Z",
          "iopub.status.busy": "2021-10-23T10:41:51.454589Z",
          "iopub.status.idle": "2021-10-23T10:41:54.664242Z",
          "shell.execute_reply": "2021-10-23T10:41:54.663029Z"
        },
        "id": "rgIZKmUhawlr",
        "outputId": "408f2841-d54c-49b7-fd3d-1f2714cd5959"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--2023-11-10 05:06:49--  https://github.com/rickiepark/python-machine-learning-book-3rd-edition/raw/master/ch09/movie_data.csv.gz\n",
            "Resolving github.com (github.com)... 140.82.113.3\n",
            "Connecting to github.com (github.com)|140.82.113.3|:443... connected.\n",
            "HTTP request sent, awaiting response... 302 Found\n",
            "Location: https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch09/movie_data.csv.gz [following]\n",
            "--2023-11-10 05:06:50--  https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch09/movie_data.csv.gz\n",
            "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...\n",
            "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 26521894 (25M) [application/octet-stream]\n",
            "Saving to: ‘movie_data.csv.gz’\n",
            "\n",
            "movie_data.csv.gz   100%[===================>]  25.29M   153MB/s    in 0.2s    \n",
            "\n",
            "2023-11-10 05:06:50 (153 MB/s) - ‘movie_data.csv.gz’ saved [26521894/26521894]\n",
            "\n"
          ]
        }
      ],
      "source": [
        "!wget https://github.com/rickiepark/python-machine-learning-book-3rd-edition/raw/master/ch09/movie_data.csv.gz"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:41:54.670293Z",
          "iopub.status.busy": "2021-10-23T10:41:54.669447Z",
          "iopub.status.idle": "2021-10-23T10:41:55.469864Z",
          "shell.execute_reply": "2021-10-23T10:41:55.470651Z"
        },
        "id": "1M6joYaSawls"
      },
      "outputs": [],
      "source": [
        "import gzip\n",
        "\n",
        "\n",
        "with gzip.open('movie_data.csv.gz') as f_in, open('movie_data.csv', 'wb') as f_out:\n",
        "    f_out.writelines(f_in)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:41:55.475329Z",
          "iopub.status.busy": "2021-10-23T10:41:55.474202Z",
          "iopub.status.idle": "2021-10-23T10:41:56.383788Z",
          "shell.execute_reply": "2021-10-23T10:41:56.384255Z"
        },
        "id": "BX9nOi6iawls",
        "outputId": "523f7d83-7a0b-4f04-a5d1-6fac24954204"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "[nltk_data] Downloading package stopwords to /root/nltk_data...\n",
            "[nltk_data]   Unzipping corpora/stopwords.zip.\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "True"
            ]
          },
          "metadata": {},
          "execution_count": 4
        }
      ],
      "source": [
        "import nltk\n",
        "nltk.download('stopwords')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:41:56.392581Z",
          "iopub.status.busy": "2021-10-23T10:41:56.391621Z",
          "iopub.status.idle": "2021-10-23T10:41:56.396076Z",
          "shell.execute_reply": "2021-10-23T10:41:56.394915Z"
        },
        "id": "pY02dzhLawls"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import re\n",
        "from nltk.corpus import stopwords\n",
        "from nltk.stem import PorterStemmer\n",
        "\n",
        "stop = stopwords.words('english')\n",
        "porter = PorterStemmer()\n",
        "\n",
        "def tokenizer(text):\n",
        "    text = re.sub('<[^>]*>', '', text)\n",
        "    emoticons = re.findall('(?::|;|=)(?:-)?(?:\\)|\\(|D|P)', text.lower())\n",
        "    text = re.sub('[\\W]+', ' ', text.lower()) + ' '.join(emoticons).replace('-', '')\n",
        "    tokenized = [w for w in text.split() if w not in stop]\n",
        "    return tokenized\n",
        "\n",
        "def stream_docs(path):\n",
        "    with open(path, 'r', encoding='utf-8') as csv:\n",
        "        next(csv) # skip header\n",
        "        for line in csv:\n",
        "            text, label = line[:-3], int(line[-2])\n",
        "            yield text, label"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:41:56.401846Z",
          "iopub.status.busy": "2021-10-23T10:41:56.400926Z",
          "iopub.status.idle": "2021-10-23T10:41:56.404847Z",
          "shell.execute_reply": "2021-10-23T10:41:56.405486Z"
        },
        "id": "QdRlmGbYawlt",
        "outputId": "51360a90-6a5f-43bd-ae06-677b7b9a2dc7"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "('\"In 1974, the teenager Martha Moxley (Maggie Grace) moves to the high-class area of Belle Haven, Greenwich, Connecticut. On the Mischief Night, eve of Halloween, she was murdered in the backyard of her house and her murder remained unsolved. Twenty-two years later, the writer Mark Fuhrman (Christopher Meloni), who is a former LA detective that has fallen in disgrace for perjury in O.J. Simpson trial and moved to Idaho, decides to investigate the case with his partner Stephen Weeks (Andrew Mitchell) with the purpose of writing a book. The locals squirm and do not welcome them, but with the support of the retired detective Steve Carroll (Robert Forster) that was in charge of the investigation in the 70\\'s, they discover the criminal and a net of power and money to cover the murder.<br /><br />\"\"Murder in Greenwich\"\" is a good TV movie, with the true story of a murder of a fifteen years old girl that was committed by a wealthy teenager whose mother was a Kennedy. The powerful and rich family used their influence to cover the murder for more than twenty years. However, a snoopy detective and convicted perjurer in disgrace was able to disclose how the hideous crime was committed. The screenplay shows the investigation of Mark and the last days of Martha in parallel, but there is a lack of the emotion in the dramatization. My vote is seven.<br /><br />Title (Brazil): Not Available\"',\n",
              " 1)"
            ]
          },
          "metadata": {},
          "execution_count": 6
        }
      ],
      "source": [
        "next(stream_docs(path='movie_data.csv'))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:41:56.412040Z",
          "iopub.status.busy": "2021-10-23T10:41:56.410730Z",
          "iopub.status.idle": "2021-10-23T10:41:56.413967Z",
          "shell.execute_reply": "2021-10-23T10:41:56.413419Z"
        },
        "id": "5rS-FN4qawlt"
      },
      "outputs": [],
      "source": [
        "def get_minibatch(doc_stream, size):\n",
        "    docs, y = [], []\n",
        "    try:\n",
        "        for _ in range(size):\n",
        "            text, label = next(doc_stream)\n",
        "            docs.append(text)\n",
        "            y.append(label)\n",
        "    except StopIteration:\n",
        "        return None, None\n",
        "    return docs, y"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:41:56.419684Z",
          "iopub.status.busy": "2021-10-23T10:41:56.418851Z",
          "iopub.status.idle": "2021-10-23T10:41:56.422238Z",
          "shell.execute_reply": "2021-10-23T10:41:56.422656Z"
        },
        "id": "YtYRd4hOawlt"
      },
      "outputs": [],
      "source": [
        "from sklearn.feature_extraction.text import HashingVectorizer\n",
        "from sklearn.linear_model import SGDClassifier\n",
        "\n",
        "vect = HashingVectorizer(decode_error='ignore',\n",
        "                         n_features=2**21,\n",
        "                         preprocessor=None,\n",
        "                         tokenizer=tokenizer)\n",
        "\n",
        "clf = SGDClassifier(loss='log_loss', random_state=1, max_iter=1)\n",
        "doc_stream = stream_docs(path='movie_data.csv')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HjosjLbrawlt"
      },
      "source": [
        "`pyprind`는 주피터 노트북에서 진행바를 출력하기 위한 유틸리티입니다. `pyprind` 패키지를 설치하려면 다음 셀을 실행하세요."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:41:56.431143Z",
          "iopub.status.busy": "2021-10-23T10:41:56.430400Z",
          "iopub.status.idle": "2021-10-23T10:41:57.440255Z",
          "shell.execute_reply": "2021-10-23T10:41:57.438985Z"
        },
        "id": "M_cxkFqsawlt",
        "outputId": "d48131dd-3587-44a6-ec8c-adcb72983afc"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Collecting pyprind\n",
            "  Downloading PyPrind-2.11.3-py2.py3-none-any.whl (8.4 kB)\n",
            "Installing collected packages: pyprind\n",
            "Successfully installed pyprind-2.11.3\n"
          ]
        }
      ],
      "source": [
        "!pip install pyprind"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:41:57.448873Z",
          "iopub.status.busy": "2021-10-23T10:41:57.447307Z",
          "iopub.status.idle": "2021-10-23T10:42:23.842836Z",
          "shell.execute_reply": "2021-10-23T10:42:23.843698Z"
        },
        "id": "tPgFFoRZawlt",
        "outputId": "f23ba907-a132-4c77-9bc5-7f47bf748fe5"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "0% [##############################] 100% | ETA: 00:00:00\n",
            "Total time elapsed: 00:00:52\n"
          ]
        }
      ],
      "source": [
        "import pyprind\n",
        "pbar = pyprind.ProgBar(45)\n",
        "\n",
        "classes = np.array([0, 1])\n",
        "for _ in range(45):\n",
        "    X_train, y_train = get_minibatch(doc_stream, size=1000)\n",
        "    if not X_train:\n",
        "        break\n",
        "    X_train = vect.transform(X_train)\n",
        "    clf.partial_fit(X_train, y_train, classes=classes)\n",
        "    pbar.update()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:23.852163Z",
          "iopub.status.busy": "2021-10-23T10:42:23.850938Z",
          "iopub.status.idle": "2021-10-23T10:42:26.459487Z",
          "shell.execute_reply": "2021-10-23T10:42:26.460179Z"
        },
        "id": "Z7atznfEawlu",
        "outputId": "12a47a6d-a0fc-4375-e8c6-054558bff212"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "정확도: 0.868\n"
          ]
        }
      ],
      "source": [
        "X_test, y_test = get_minibatch(doc_stream, size=5000)\n",
        "X_test = vect.transform(X_test)\n",
        "print('정확도: %.3f' % clf.score(X_test, y_test))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:26.465013Z",
          "iopub.status.busy": "2021-10-23T10:42:26.464064Z",
          "iopub.status.idle": "2021-10-23T10:42:26.493153Z",
          "shell.execute_reply": "2021-10-23T10:42:26.494466Z"
        },
        "id": "F-YXSL6lawlu"
      },
      "outputs": [],
      "source": [
        "clf = clf.partial_fit(X_test, y_test)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1V1obTq2awlu"
      },
      "source": [
        "### 노트\n",
        "\n",
        "pickle 파일을 만드는 것이 조금 까다로울 수 있기 때문에 `pickle-test-scripts/` 디렉토리에 올바르게 환경이 설정되었는지 확인하는 간단한 테스트 스크립트를 추가했습니다. 기본적으로 `movie_data` 데이터 일부를 포함하고 있고 `ch08`의 관련된 코드를 정리한 버전입니다.\n",
        "\n",
        "다음처럼 실행하면\n",
        "\n",
        "    python pickle-dump-test.py\n",
        "\n",
        "`movie_data_small.csv`에서 작은 분류 모델을 훈련하고 2개의 pickle 파일을 만듭니다.\n",
        "\n",
        "    stopwords.pkl\n",
        "    classifier.pkl\n",
        "\n",
        "그다음 아래 명령을 실행하면\n",
        "\n",
        "    python pickle-load-test.py\n",
        "\n",
        "다음 2줄이 출력되어야 합니다:\n",
        "\n",
        "    Prediction: positive\n",
        "    Probability: 85.71%"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GSLt4q2iawlu"
      },
      "source": [
        "<br>\n",
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3mVIgR16awlu"
      },
      "source": [
        "# 학습된 사이킷런 추정기 저장"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZGgh44F5awlu"
      },
      "source": [
        "앞에서 로지스틱 회귀 모델을 훈련한 후에 분류기, 불용어, 포터 어간 추출기, `HashingVectorizer`를 로컬 디스크에 직렬화된 객체로 저장합니다. 나중에 웹 애플리케이션에서 학습된 분류기를 이용하겠습니다."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:26.500945Z",
          "iopub.status.busy": "2021-10-23T10:42:26.499786Z",
          "iopub.status.idle": "2021-10-23T10:42:26.804714Z",
          "shell.execute_reply": "2021-10-23T10:42:26.805413Z"
        },
        "id": "MCaQNEgzawlv"
      },
      "outputs": [],
      "source": [
        "import pickle\n",
        "import os\n",
        "\n",
        "dest = os.path.join('movieclassifier', 'pkl_objects')\n",
        "if not os.path.exists(dest):\n",
        "    os.makedirs(dest)\n",
        "\n",
        "pickle.dump(stop, open(os.path.join(dest, 'stopwords.pkl'), 'wb'), protocol=4)\n",
        "pickle.dump(clf, open(os.path.join(dest, 'classifier.pkl'), 'wb'), protocol=4)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-go6y-Y0awlv"
      },
      "source": [
        "그다음 나중에 임포트할 수 있도록 별도의 파일에 `HashingVectorizer`를 저장합니다."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:26.810974Z",
          "iopub.status.busy": "2021-10-23T10:42:26.809774Z",
          "iopub.status.idle": "2021-10-23T10:42:26.944071Z",
          "shell.execute_reply": "2021-10-23T10:42:26.944823Z"
        },
        "id": "rlgFcHlYawlv",
        "outputId": "0cd03f32-d9b3-48df-bd8d-df8503ede7e9"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Writing movieclassifier/vectorizer.py\n"
          ]
        }
      ],
      "source": [
        "%%writefile movieclassifier/vectorizer.py\n",
        "from sklearn.feature_extraction.text import HashingVectorizer\n",
        "import re\n",
        "import os\n",
        "import pickle\n",
        "\n",
        "cur_dir = os.path.dirname(__file__)\n",
        "stop = pickle.load(open(\n",
        "                os.path.join(cur_dir,\n",
        "                'pkl_objects',\n",
        "                'stopwords.pkl'), 'rb'))\n",
        "\n",
        "def tokenizer(text):\n",
        "    text = re.sub('<[^>]*>', '', text)\n",
        "    emoticons = re.findall('(?::|;|=)(?:-)?(?:\\)|\\(|D|P)',\n",
        "                           text.lower())\n",
        "    text = re.sub('[\\W]+', ' ', text.lower()) \\\n",
        "                   + ' '.join(emoticons).replace('-', '')\n",
        "    tokenized = [w for w in text.split() if w not in stop]\n",
        "    return tokenized\n",
        "\n",
        "vect = HashingVectorizer(decode_error='ignore',\n",
        "                         n_features=2**21,\n",
        "                         preprocessor=None,\n",
        "                         tokenizer=tokenizer)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KBCFPHclawlv"
      },
      "source": [
        "이전 코드 셀을 실행한 후에 객체가 올바르게 저장되었는지 확인하기 위해 IPython 노트북 커널을 재시작할 수 있습니다."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FgwwuZH3awlv"
      },
      "source": [
        "먼저 현재 파이썬 디렉토리를 `movieclassifer`로 변경합니다:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:26.949809Z",
          "iopub.status.busy": "2021-10-23T10:42:26.948695Z",
          "iopub.status.idle": "2021-10-23T10:42:26.952041Z",
          "shell.execute_reply": "2021-10-23T10:42:26.952760Z"
        },
        "id": "x2ss0iwKawlv"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "os.chdir('movieclassifier')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:26.957406Z",
          "iopub.status.busy": "2021-10-23T10:42:26.955677Z",
          "iopub.status.idle": "2021-10-23T10:42:26.974758Z",
          "shell.execute_reply": "2021-10-23T10:42:26.973983Z"
        },
        "id": "RUjz2v7Gawlv"
      },
      "outputs": [],
      "source": [
        "import pickle\n",
        "import re\n",
        "import os\n",
        "from vectorizer import vect\n",
        "\n",
        "clf = pickle.load(open(os.path.join('pkl_objects', 'classifier.pkl'), 'rb'))"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:26.981499Z",
          "iopub.status.busy": "2021-10-23T10:42:26.980768Z",
          "iopub.status.idle": "2021-10-23T10:42:26.983474Z",
          "shell.execute_reply": "2021-10-23T10:42:26.983945Z"
        },
        "id": "6UMp7VH1awlw",
        "outputId": "77119718-8d0d-4bad-82b1-034877b21aed"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "예측: 양성\n",
            "확률: 95.55%\n"
          ]
        }
      ],
      "source": [
        "import numpy as np\n",
        "label = {0:'음성', 1:'양성'}\n",
        "\n",
        "example = [\"I love this movie. It's amazing.\"]\n",
        "X = vect.transform(example)\n",
        "print('예측: %s\\n확률: %.2f%%' %\\\n",
        "      (label[clf.predict(X)[0]],\n",
        "       np.max(clf.predict_proba(X))*100))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Yy5BKna7awlw"
      },
      "source": [
        "<br>\n",
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "joixYXoXawlw"
      },
      "source": [
        "# 데이터를 저장하기 위해 SQLite 데이터베이스 설정"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "17ckroSPawlw"
      },
      "source": [
        "이 코드를 실행하기 전에 현재 위치가 `movieclassifier` 디렉토리인지 확인합니다."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 36
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:26.989598Z",
          "iopub.status.busy": "2021-10-23T10:42:26.988763Z",
          "iopub.status.idle": "2021-10-23T10:42:26.991829Z",
          "shell.execute_reply": "2021-10-23T10:42:26.992292Z"
        },
        "id": "xlQDh61Gawlw",
        "outputId": "d13fd36e-7d04-45f0-9fe1-36c6a3bfd3ef"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'/content/movieclassifier'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 18
        }
      ],
      "source": [
        "os.getcwd()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:26.999325Z",
          "iopub.status.busy": "2021-10-23T10:42:26.997317Z",
          "iopub.status.idle": "2021-10-23T10:42:27.018500Z",
          "shell.execute_reply": "2021-10-23T10:42:27.017642Z"
        },
        "id": "9cZe4L0tawlx"
      },
      "outputs": [],
      "source": [
        "import sqlite3\n",
        "import os\n",
        "\n",
        "conn = sqlite3.connect('reviews.sqlite')\n",
        "c = conn.cursor()\n",
        "\n",
        "c.execute('DROP TABLE IF EXISTS review_db')\n",
        "c.execute('CREATE TABLE review_db (review TEXT, sentiment INTEGER, date TEXT)')\n",
        "\n",
        "example1 = 'I love this movie'\n",
        "c.execute(\"INSERT INTO review_db (review, sentiment, date) VALUES (?, ?, DATETIME('now'))\", (example1, 1))\n",
        "\n",
        "example2 = 'I disliked this movie'\n",
        "c.execute(\"INSERT INTO review_db (review, sentiment, date) VALUES (?, ?, DATETIME('now'))\", (example2, 0))\n",
        "\n",
        "conn.commit()\n",
        "conn.close()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.023247Z",
          "iopub.status.busy": "2021-10-23T10:42:27.022518Z",
          "iopub.status.idle": "2021-10-23T10:42:27.025468Z",
          "shell.execute_reply": "2021-10-23T10:42:27.025889Z"
        },
        "id": "Mbe3UZgfawlx"
      },
      "outputs": [],
      "source": [
        "conn = sqlite3.connect('reviews.sqlite')\n",
        "c = conn.cursor()\n",
        "\n",
        "c.execute(\"SELECT * FROM review_db WHERE date BETWEEN '2017-01-01 10:10:10' AND DATETIME('now')\")\n",
        "results = c.fetchall()\n",
        "\n",
        "conn.close()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.030023Z",
          "iopub.status.busy": "2021-10-23T10:42:27.029361Z",
          "iopub.status.idle": "2021-10-23T10:42:27.032770Z",
          "shell.execute_reply": "2021-10-23T10:42:27.032127Z"
        },
        "id": "vwa3g4tQawlx",
        "outputId": "382b2d75-8ac6-485c-9aba-923753729f3c"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[('I love this movie', 1, '2023-11-10 05:08:08'), ('I disliked this movie', 0, '2023-11-10 05:08:08')]\n"
          ]
        }
      ],
      "source": [
        "print(results)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 478
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.037707Z",
          "iopub.status.busy": "2021-10-23T10:42:27.037058Z",
          "iopub.status.idle": "2021-10-23T10:42:27.040119Z",
          "shell.execute_reply": "2021-10-23T10:42:27.040586Z"
        },
        "id": "1Wt6k_PLawlx",
        "outputId": "c4c65349-38ea-4316-ff2e-d2a27e368ba2"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts3V\" width=\"700\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 22
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts3V', width=700)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6RKw4dd0awlx"
      },
      "source": [
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "W4V6Gx6Tawlx"
      },
      "source": [
        "# 플라스크 웹 애플리케이션 개발"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kdv7WCFVawlx"
      },
      "source": [
        "..."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0i4Ktl2Oawly"
      },
      "source": [
        "## 첫 번째 플라스크 애플리케이션"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PbqsOJouawly"
      },
      "source": [
        "..."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 241
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.046556Z",
          "iopub.status.busy": "2021-10-23T10:42:27.045885Z",
          "iopub.status.idle": "2021-10-23T10:42:27.049598Z",
          "shell.execute_reply": "2021-10-23T10:42:27.048819Z"
        },
        "id": "sqVxyOD_awly",
        "outputId": "9a847611-2267-4209-b63c-5c7803316b0b"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts3o\" width=\"700\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 23
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts3o', width=700)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HL4wMjzUawly"
      },
      "source": [
        "## 폼 검증과 화면 출력"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 238
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.054663Z",
          "iopub.status.busy": "2021-10-23T10:42:27.053945Z",
          "iopub.status.idle": "2021-10-23T10:42:27.056803Z",
          "shell.execute_reply": "2021-10-23T10:42:27.057255Z"
        },
        "id": "fYzE9i3-awly",
        "outputId": "3aeeb721-3afd-4430-ba78-05c19396f3bd"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts3K\" width=\"400\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 24
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts3K', width=400)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 25,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 102
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.061920Z",
          "iopub.status.busy": "2021-10-23T10:42:27.061284Z",
          "iopub.status.idle": "2021-10-23T10:42:27.064488Z",
          "shell.execute_reply": "2021-10-23T10:42:27.064928Z"
        },
        "id": "aKBsBCFWawly",
        "outputId": "11e376ac-a05a-4e4b-eb1d-59cbe81cc70d"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts36\" width=\"400\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 25
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts36', width=400)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UC5SMipWawly"
      },
      "source": [
        "<br>\n",
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QugsT1TGawlz"
      },
      "source": [
        "## 화면 요약"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 26,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 336
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.069942Z",
          "iopub.status.busy": "2021-10-23T10:42:27.069287Z",
          "iopub.status.idle": "2021-10-23T10:42:27.072572Z",
          "shell.execute_reply": "2021-10-23T10:42:27.073012Z"
        },
        "id": "B-pYhG_Dawlz",
        "outputId": "259df438-1337-4561-ec17-2f08e479dc6d"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts3P\" width=\"800\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 26
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts3P', width=800)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 27,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 545
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.077697Z",
          "iopub.status.busy": "2021-10-23T10:42:27.077068Z",
          "iopub.status.idle": "2021-10-23T10:42:27.081051Z",
          "shell.execute_reply": "2021-10-23T10:42:27.080402Z"
        },
        "id": "LGy4kOWOawlz",
        "outputId": "ca0a43f9-51c9-471b-bab3-c89681c4699b"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts3X\" width=\"800\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 27
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts3X', width=800)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 443
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.085751Z",
          "iopub.status.busy": "2021-10-23T10:42:27.085109Z",
          "iopub.status.idle": "2021-10-23T10:42:27.088219Z",
          "shell.execute_reply": "2021-10-23T10:42:27.088648Z"
        },
        "id": "VySNEDCBawlz",
        "outputId": "eb1966f7-3f6a-44b4-c001-5790f16cea88"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts31\" width=\"400\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 28
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts31', width=400)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Sboq5JaDawlz"
      },
      "source": [
        "# 영화 리뷰 분류기를 웹 애플리케이션으로 만들기"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 29,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 249
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.093471Z",
          "iopub.status.busy": "2021-10-23T10:42:27.092785Z",
          "iopub.status.idle": "2021-10-23T10:42:27.096163Z",
          "shell.execute_reply": "2021-10-23T10:42:27.096614Z"
        },
        "id": "OXmQFwofawlz",
        "outputId": "22792cfc-c262-40ad-bb98-66fd1e4b1000"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts3M\" width=\"400\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 29
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts3M', width=400)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 30,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 252
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.101553Z",
          "iopub.status.busy": "2021-10-23T10:42:27.100824Z",
          "iopub.status.idle": "2021-10-23T10:42:27.104703Z",
          "shell.execute_reply": "2021-10-23T10:42:27.104095Z"
        },
        "id": "5hlCDl4Qawlz",
        "outputId": "af31f7a1-179b-4d73-f076-f864034541e0"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts3D\" width=\"400\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 30
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts3D', width=400)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 31,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 144
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.109273Z",
          "iopub.status.busy": "2021-10-23T10:42:27.108628Z",
          "iopub.status.idle": "2021-10-23T10:42:27.112209Z",
          "shell.execute_reply": "2021-10-23T10:42:27.112635Z"
        },
        "id": "H2IWwFo4awl0",
        "outputId": "22b42cd3-5594-42b5-8102-7698aab18fb0"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts3y\" width=\"400\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 31
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts3y', width=400)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 32,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 296
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.117420Z",
          "iopub.status.busy": "2021-10-23T10:42:27.116719Z",
          "iopub.status.idle": "2021-10-23T10:42:27.120071Z",
          "shell.execute_reply": "2021-10-23T10:42:27.120483Z"
        },
        "id": "NDdF8d_Dawl0",
        "outputId": "4a6b79b1-32f1-4b91-a4ba-ff763cac8187"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts3S\" width=\"200\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 32
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts3S', width=200)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 33,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 288
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.125664Z",
          "iopub.status.busy": "2021-10-23T10:42:27.124733Z",
          "iopub.status.idle": "2021-10-23T10:42:27.129014Z",
          "shell.execute_reply": "2021-10-23T10:42:27.129667Z"
        },
        "id": "oDMBlVbjawl0",
        "outputId": "b42887f4-7c9a-4b4f-84fe-8f8c1a6774f0"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts32\" width=\"400\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 33
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts32', width=400)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QYjcDtqpawl0"
      },
      "source": [
        "<br>\n",
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yDhAf2xQawl0"
      },
      "source": [
        "# 공개 서버에 웹 애플리케이션 배포"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 34,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 476
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.135494Z",
          "iopub.status.busy": "2021-10-23T10:42:27.134595Z",
          "iopub.status.idle": "2021-10-23T10:42:27.138673Z",
          "shell.execute_reply": "2021-10-23T10:42:27.139317Z"
        },
        "id": "mUUr4cloawl1",
        "outputId": "bc440e92-660a-42e5-d730-670e2dc68e0b"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<img src=\"https://git.io/Jts39\" width=\"600\"/>"
            ],
            "text/plain": [
              "<IPython.core.display.Image object>"
            ]
          },
          "metadata": {},
          "execution_count": 34
        }
      ],
      "source": [
        "Image(url='https://git.io/Jts39', width=600)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Tz5xeyR_awl1"
      },
      "source": [
        "<br>\n",
        "<br>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bfE3tvN6awl1"
      },
      "source": [
        "## 영화 분류기 업데이트"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YDM3DF_Jawl1"
      },
      "source": [
        "다운로드한 깃허브 저장소에 들어있는 movieclassifier_with_update 디렉토리를 사용합니다(그렇지 않으면 `movieclassifier` 디렉토리를 복사해서 사용하세요)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RDF4lqx5awl1"
      },
      "source": [
        "**코랩을 사용할 때는 다음 셀을 실행하세요.**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 35,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.148624Z",
          "iopub.status.busy": "2021-10-23T10:42:27.144514Z",
          "iopub.status.idle": "2021-10-23T10:42:27.275568Z",
          "shell.execute_reply": "2021-10-23T10:42:27.274671Z"
        },
        "id": "JC0mvtTBawl2"
      },
      "outputs": [],
      "source": [
        "!cp -r ../movieclassifier ../movieclassifier_with_update"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 36,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 36
        },
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.282672Z",
          "iopub.status.busy": "2021-10-23T10:42:27.281951Z",
          "iopub.status.idle": "2021-10-23T10:42:27.327533Z",
          "shell.execute_reply": "2021-10-23T10:42:27.326879Z"
        },
        "id": "yWt0Z10Cawl2",
        "outputId": "f66d8354-6a1e-4577-fcca-6639f03ae6b9"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'./reviews.sqlite'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 36
        }
      ],
      "source": [
        "import shutil\n",
        "\n",
        "os.chdir('..')\n",
        "\n",
        "if not os.path.exists('movieclassifier_with_update'):\n",
        "    os.mkdir('movieclassifier_with_update')\n",
        "os.chdir('movieclassifier_with_update')\n",
        "\n",
        "if not os.path.exists('pkl_objects'):\n",
        "    os.mkdir('pkl_objects')\n",
        "\n",
        "shutil.copyfile('../movieclassifier/pkl_objects/classifier.pkl',\n",
        "                './pkl_objects/classifier.pkl')\n",
        "\n",
        "shutil.copyfile('../movieclassifier/reviews.sqlite',\n",
        "                './reviews.sqlite')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "S2yiWBk8awl2"
      },
      "source": [
        "SQLite 데이터베이스에 저장된 데이터로 분류기를 업데이트하는 함수를 정의합니다:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 37,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.334997Z",
          "iopub.status.busy": "2021-10-23T10:42:27.334266Z",
          "iopub.status.idle": "2021-10-23T10:42:27.336838Z",
          "shell.execute_reply": "2021-10-23T10:42:27.336185Z"
        },
        "id": "cEG3QW8wawl2"
      },
      "outputs": [],
      "source": [
        "import pickle\n",
        "import sqlite3\n",
        "import numpy as np\n",
        "\n",
        "# 로컬 디렉토리에서 HashingVectorizer를 임포트합니다\n",
        "from vectorizer import vect\n",
        "\n",
        "def update_model(db_path, model, batch_size=10000):\n",
        "\n",
        "    conn = sqlite3.connect(db_path)\n",
        "    c = conn.cursor()\n",
        "    c.execute('SELECT * from review_db')\n",
        "\n",
        "    results = c.fetchmany(batch_size)\n",
        "    while results:\n",
        "        data = np.array(results)\n",
        "        X = data[:, 0]\n",
        "        y = data[:, 1].astype(int)\n",
        "\n",
        "        classes = np.array([0, 1])\n",
        "        X_train = vect.transform(X)\n",
        "        clf.partial_fit(X_train, y, classes=classes)\n",
        "        results = c.fetchmany(batch_size)\n",
        "\n",
        "    conn.close()\n",
        "    return None"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_alTEyGaawl2"
      },
      "source": [
        "모델을 업데이트합니다:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 38,
      "metadata": {
        "execution": {
          "iopub.execute_input": "2021-10-23T10:42:27.342287Z",
          "iopub.status.busy": "2021-10-23T10:42:27.341581Z",
          "iopub.status.idle": "2021-10-23T10:42:27.392881Z",
          "shell.execute_reply": "2021-10-23T10:42:27.393819Z"
        },
        "id": "0qeEwRm5awl2"
      },
      "outputs": [],
      "source": [
        "cur_dir = '.'\n",
        "\n",
        "# app.py 파일에 이 코드를 삽입했다면 다음 경로를 사용하세요.\n",
        "\n",
        "# import os\n",
        "# cur_dir = os.path.dirname(__file__)\n",
        "\n",
        "clf = pickle.load(open(os.path.join(cur_dir,\n",
        "                 'pkl_objects',\n",
        "                 'classifier.pkl'), 'rb'))\n",
        "db = os.path.join(cur_dir, 'reviews.sqlite')\n",
        "\n",
        "update_model(db_path=db, model=clf, batch_size=10000)\n",
        "\n",
        "# classifier.pkl 파일을 업데이트하려면 다음 주석을 해제하세요.\n",
        "\n",
        "# pickle.dump(clf, open(os.path.join(cur_dir,\n",
        "#             'pkl_objects', 'classifier.pkl'), 'wb')\n",
        "#             , protocol=4)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6B3Gxlfhawl2"
      },
      "source": [
        "<br>\n",
        "<br>"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "ch09.ipynb",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}