{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "_PLo5qcAawlk" }, "source": [ "# 머신 러닝 교과서 3판" ] }, { "cell_type": "markdown", "metadata": { "id": "jlmrX27Yawlo" }, "source": [ "# 9장 - 웹 애플리케이션에 머신 러닝 모델 내장하기" ] }, { "cell_type": "markdown", "metadata": { "id": "0NSuaIVEawlp" }, "source": [ "**아래 링크를 통해 이 노트북을 주피터 노트북 뷰어(nbviewer.jupyter.org)로 보거나 구글 코랩(colab.research.google.com)에서 실행할 수 있습니다.**\n", "\n", "\n", " \n", " \n", "
\n", " 주피터 노트북 뷰어로 보기\n", " \n", " 구글 코랩(Colab)에서 실행하기\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "vTRiUUjbawlp" }, "source": [ "### 목차" ] }, { "cell_type": "markdown", "metadata": { "id": "D29IAkPMawlp" }, "source": [ "- 8장 정리 - 영화 리뷰 분류를 위한 모델 훈련하기\n", "- 학습된 사이킷런 추정기 저장\n", "- 데이터를 저장하기 위해 SQLite 데이터베이스 설정\n", "- 플라스크 웹 애플리케이션 개발\n", " - 첫 번째 플라스크 애플리케이션\n", " - 폼 검증과 화면 출력\n", "- 영화 리뷰 분류기를 웹 애플리케이션으로 만들기\n", "- 공개 서버에 웹 애플리케이션 배포\n", " - 영화 분류기 업데이트\n", "- 요약" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:41:51.447133Z", "iopub.status.busy": "2021-10-23T10:41:51.446490Z", "iopub.status.idle": "2021-10-23T10:41:51.449037Z", "shell.execute_reply": "2021-10-23T10:41:51.449453Z" }, "id": "DXLkk_PSawlq" }, "outputs": [], "source": [ "from IPython.display import Image" ] }, { "cell_type": "markdown", "metadata": { "id": "rq-n6UClawlq" }, "source": [ "플래스크(Flask) 웹 애플리케이션 코드는 다음 디렉토리에 있습니다:\n", " \n", "- `1st_flask_app_1/`: 간단한 플래스크 웹 애플리케이션\n", "- `1st_flask_app_2/`: `1st_flask_app_1`에 폼 검증과 렌더링을 추가하여 확장한 버전\n", "- `movieclassifier/`: 웹 애플리케이션에 내장한 영화 리뷰 분류기\n", "- `movieclassifier_with_update/`: `movieclassifier`와 같지만 초기화를 위해 sqlite 데이터베이스를 사용합니다." ] }, { "cell_type": "markdown", "metadata": { "id": "y7BU7MFwawlq" }, "source": [ "웹 애플리케이션을 로컬에서 실행하려면 `cd`로 (위에 나열된) 각 디렉토리에 들어가서 메인 애플리케이션 스크립트를 실행합니다.\n", "\n", " cd ./1st_flask_app_1\n", " python app.py\n", " \n", "터미널에서 다음같은 내용일 출력됩니다.\n", " \n", " * Running on http://127.0.0.1:5000/\n", " * Restarting with reloader\n", " \n", "웹 브라우저를 열고 터미널에 출력된 주소(일반적으로 http://127.0.0.1:5000/)를 입력하여 웹 애플리케이션에 접속합니다." ] }, { "cell_type": "markdown", "metadata": { "id": "qtmgp48bawlr" }, "source": [ "**이 튜토리얼로 만든 예제 애플리케이션 데모는 다음 주소에서 볼 수 있습니다: http://haesun.pythonanywhere.com/**." ] }, { "cell_type": "markdown", "metadata": { "id": "K3cBNe2Nawlr" }, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "jxZ6m_xNawlr" }, "source": [ "# 8장 정리 - 영화 리뷰 분류를 위한 모델 훈련하기" ] }, { "cell_type": "markdown", "metadata": { "id": "5MFI3KC4awlr" }, "source": [ "이 절은 8장의 마지막 섹션에서 훈련한 로지스틱 회귀 모델을 다시 사용합니다. 이어지는 코드 블럭을 실행하여 다음 절에서 사용할 모델을 훈련시키겠습니다." ] }, { "cell_type": "markdown", "metadata": { "id": "AJKe25Wzawlr" }, "source": [ "**노트**\n", "\n", "다음 코드는 8장에서 만든 `movie_data.csv` 데이터셋을 사용합니다.\n", "\n", "**코랩을 사용할 때는 다음 셀을 실행하세요.**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-10-23T10:41:51.456769Z", "iopub.status.busy": "2021-10-23T10:41:51.454589Z", "iopub.status.idle": "2021-10-23T10:41:54.664242Z", "shell.execute_reply": "2021-10-23T10:41:54.663029Z" }, "id": "rgIZKmUhawlr", "outputId": "408f2841-d54c-49b7-fd3d-1f2714cd5959" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "--2023-11-10 05:06:49-- https://github.com/rickiepark/python-machine-learning-book-3rd-edition/raw/master/ch09/movie_data.csv.gz\n", "Resolving github.com (github.com)... 140.82.113.3\n", "Connecting to github.com (github.com)|140.82.113.3|:443... connected.\n", "HTTP request sent, awaiting response... 302 Found\n", "Location: https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch09/movie_data.csv.gz [following]\n", "--2023-11-10 05:06:50-- https://raw.githubusercontent.com/rickiepark/python-machine-learning-book-3rd-edition/master/ch09/movie_data.csv.gz\n", "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...\n", "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 26521894 (25M) [application/octet-stream]\n", "Saving to: ‘movie_data.csv.gz’\n", "\n", "movie_data.csv.gz 100%[===================>] 25.29M 153MB/s in 0.2s \n", "\n", "2023-11-10 05:06:50 (153 MB/s) - ‘movie_data.csv.gz’ saved [26521894/26521894]\n", "\n" ] } ], "source": [ "!wget https://github.com/rickiepark/python-machine-learning-book-3rd-edition/raw/master/ch09/movie_data.csv.gz" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:41:54.670293Z", "iopub.status.busy": "2021-10-23T10:41:54.669447Z", "iopub.status.idle": "2021-10-23T10:41:55.469864Z", "shell.execute_reply": "2021-10-23T10:41:55.470651Z" }, "id": "1M6joYaSawls" }, "outputs": [], "source": [ "import gzip\n", "\n", "\n", "with gzip.open('movie_data.csv.gz') as f_in, open('movie_data.csv', 'wb') as f_out:\n", " f_out.writelines(f_in)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-10-23T10:41:55.475329Z", "iopub.status.busy": "2021-10-23T10:41:55.474202Z", "iopub.status.idle": "2021-10-23T10:41:56.383788Z", "shell.execute_reply": "2021-10-23T10:41:56.384255Z" }, "id": "BX9nOi6iawls", "outputId": "523f7d83-7a0b-4f04-a5d1-6fac24954204" }, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "[nltk_data] Downloading package stopwords to /root/nltk_data...\n", "[nltk_data] Unzipping corpora/stopwords.zip.\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "True" ] }, "metadata": {}, "execution_count": 4 } ], "source": [ "import nltk\n", "nltk.download('stopwords')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:41:56.392581Z", "iopub.status.busy": "2021-10-23T10:41:56.391621Z", "iopub.status.idle": "2021-10-23T10:41:56.396076Z", "shell.execute_reply": "2021-10-23T10:41:56.394915Z" }, "id": "pY02dzhLawls" }, "outputs": [], "source": [ "import numpy as np\n", "import re\n", "from nltk.corpus import stopwords\n", "from nltk.stem import PorterStemmer\n", "\n", "stop = stopwords.words('english')\n", "porter = PorterStemmer()\n", "\n", "def tokenizer(text):\n", " text = re.sub('<[^>]*>', '', text)\n", " emoticons = re.findall('(?::|;|=)(?:-)?(?:\\)|\\(|D|P)', text.lower())\n", " text = re.sub('[\\W]+', ' ', text.lower()) + ' '.join(emoticons).replace('-', '')\n", " tokenized = [w for w in text.split() if w not in stop]\n", " return tokenized\n", "\n", "def stream_docs(path):\n", " with open(path, 'r', encoding='utf-8') as csv:\n", " next(csv) # skip header\n", " for line in csv:\n", " text, label = line[:-3], int(line[-2])\n", " yield text, label" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-10-23T10:41:56.401846Z", "iopub.status.busy": "2021-10-23T10:41:56.400926Z", "iopub.status.idle": "2021-10-23T10:41:56.404847Z", "shell.execute_reply": "2021-10-23T10:41:56.405486Z" }, "id": "QdRlmGbYawlt", "outputId": "51360a90-6a5f-43bd-ae06-677b7b9a2dc7" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "('\"In 1974, the teenager Martha Moxley (Maggie Grace) moves to the high-class area of Belle Haven, Greenwich, Connecticut. On the Mischief Night, eve of Halloween, she was murdered in the backyard of her house and her murder remained unsolved. Twenty-two years later, the writer Mark Fuhrman (Christopher Meloni), who is a former LA detective that has fallen in disgrace for perjury in O.J. Simpson trial and moved to Idaho, decides to investigate the case with his partner Stephen Weeks (Andrew Mitchell) with the purpose of writing a book. The locals squirm and do not welcome them, but with the support of the retired detective Steve Carroll (Robert Forster) that was in charge of the investigation in the 70\\'s, they discover the criminal and a net of power and money to cover the murder.

\"\"Murder in Greenwich\"\" is a good TV movie, with the true story of a murder of a fifteen years old girl that was committed by a wealthy teenager whose mother was a Kennedy. The powerful and rich family used their influence to cover the murder for more than twenty years. However, a snoopy detective and convicted perjurer in disgrace was able to disclose how the hideous crime was committed. The screenplay shows the investigation of Mark and the last days of Martha in parallel, but there is a lack of the emotion in the dramatization. My vote is seven.

Title (Brazil): Not Available\"',\n", " 1)" ] }, "metadata": {}, "execution_count": 6 } ], "source": [ "next(stream_docs(path='movie_data.csv'))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:41:56.412040Z", "iopub.status.busy": "2021-10-23T10:41:56.410730Z", "iopub.status.idle": "2021-10-23T10:41:56.413967Z", "shell.execute_reply": "2021-10-23T10:41:56.413419Z" }, "id": "5rS-FN4qawlt" }, "outputs": [], "source": [ "def get_minibatch(doc_stream, size):\n", " docs, y = [], []\n", " try:\n", " for _ in range(size):\n", " text, label = next(doc_stream)\n", " docs.append(text)\n", " y.append(label)\n", " except StopIteration:\n", " return None, None\n", " return docs, y" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:41:56.419684Z", "iopub.status.busy": "2021-10-23T10:41:56.418851Z", "iopub.status.idle": "2021-10-23T10:41:56.422238Z", "shell.execute_reply": "2021-10-23T10:41:56.422656Z" }, "id": "YtYRd4hOawlt" }, "outputs": [], "source": [ "from sklearn.feature_extraction.text import HashingVectorizer\n", "from sklearn.linear_model import SGDClassifier\n", "\n", "vect = HashingVectorizer(decode_error='ignore',\n", " n_features=2**21,\n", " preprocessor=None,\n", " tokenizer=tokenizer)\n", "\n", "clf = SGDClassifier(loss='log_loss', random_state=1, max_iter=1)\n", "doc_stream = stream_docs(path='movie_data.csv')" ] }, { "cell_type": "markdown", "metadata": { "id": "HjosjLbrawlt" }, "source": [ "`pyprind`는 주피터 노트북에서 진행바를 출력하기 위한 유틸리티입니다. `pyprind` 패키지를 설치하려면 다음 셀을 실행하세요." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-10-23T10:41:56.431143Z", "iopub.status.busy": "2021-10-23T10:41:56.430400Z", "iopub.status.idle": "2021-10-23T10:41:57.440255Z", "shell.execute_reply": "2021-10-23T10:41:57.438985Z" }, "id": "M_cxkFqsawlt", "outputId": "d48131dd-3587-44a6-ec8c-adcb72983afc" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Collecting pyprind\n", " Downloading PyPrind-2.11.3-py2.py3-none-any.whl (8.4 kB)\n", "Installing collected packages: pyprind\n", "Successfully installed pyprind-2.11.3\n" ] } ], "source": [ "!pip install pyprind" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-10-23T10:41:57.448873Z", "iopub.status.busy": "2021-10-23T10:41:57.447307Z", "iopub.status.idle": "2021-10-23T10:42:23.842836Z", "shell.execute_reply": "2021-10-23T10:42:23.843698Z" }, "id": "tPgFFoRZawlt", "outputId": "f23ba907-a132-4c77-9bc5-7f47bf748fe5" }, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "0% [##############################] 100% | ETA: 00:00:00\n", "Total time elapsed: 00:00:52\n" ] } ], "source": [ "import pyprind\n", "pbar = pyprind.ProgBar(45)\n", "\n", "classes = np.array([0, 1])\n", "for _ in range(45):\n", " X_train, y_train = get_minibatch(doc_stream, size=1000)\n", " if not X_train:\n", " break\n", " X_train = vect.transform(X_train)\n", " clf.partial_fit(X_train, y_train, classes=classes)\n", " pbar.update()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-10-23T10:42:23.852163Z", "iopub.status.busy": "2021-10-23T10:42:23.850938Z", "iopub.status.idle": "2021-10-23T10:42:26.459487Z", "shell.execute_reply": "2021-10-23T10:42:26.460179Z" }, "id": "Z7atznfEawlu", "outputId": "12a47a6d-a0fc-4375-e8c6-054558bff212" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "정확도: 0.868\n" ] } ], "source": [ "X_test, y_test = get_minibatch(doc_stream, size=5000)\n", "X_test = vect.transform(X_test)\n", "print('정확도: %.3f' % clf.score(X_test, y_test))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:42:26.465013Z", "iopub.status.busy": "2021-10-23T10:42:26.464064Z", "iopub.status.idle": "2021-10-23T10:42:26.493153Z", "shell.execute_reply": "2021-10-23T10:42:26.494466Z" }, "id": "F-YXSL6lawlu" }, "outputs": [], "source": [ "clf = clf.partial_fit(X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": { "id": "1V1obTq2awlu" }, "source": [ "### 노트\n", "\n", "pickle 파일을 만드는 것이 조금 까다로울 수 있기 때문에 `pickle-test-scripts/` 디렉토리에 올바르게 환경이 설정되었는지 확인하는 간단한 테스트 스크립트를 추가했습니다. 기본적으로 `movie_data` 데이터 일부를 포함하고 있고 `ch08`의 관련된 코드를 정리한 버전입니다.\n", "\n", "다음처럼 실행하면\n", "\n", " python pickle-dump-test.py\n", "\n", "`movie_data_small.csv`에서 작은 분류 모델을 훈련하고 2개의 pickle 파일을 만듭니다.\n", "\n", " stopwords.pkl\n", " classifier.pkl\n", "\n", "그다음 아래 명령을 실행하면\n", "\n", " python pickle-load-test.py\n", "\n", "다음 2줄이 출력되어야 합니다:\n", "\n", " Prediction: positive\n", " Probability: 85.71%" ] }, { "cell_type": "markdown", "metadata": { "id": "GSLt4q2iawlu" }, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "3mVIgR16awlu" }, "source": [ "# 학습된 사이킷런 추정기 저장" ] }, { "cell_type": "markdown", "metadata": { "id": "ZGgh44F5awlu" }, "source": [ "앞에서 로지스틱 회귀 모델을 훈련한 후에 분류기, 불용어, 포터 어간 추출기, `HashingVectorizer`를 로컬 디스크에 직렬화된 객체로 저장합니다. 나중에 웹 애플리케이션에서 학습된 분류기를 이용하겠습니다." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:42:26.500945Z", "iopub.status.busy": "2021-10-23T10:42:26.499786Z", "iopub.status.idle": "2021-10-23T10:42:26.804714Z", "shell.execute_reply": "2021-10-23T10:42:26.805413Z" }, "id": "MCaQNEgzawlv" }, "outputs": [], "source": [ "import pickle\n", "import os\n", "\n", "dest = os.path.join('movieclassifier', 'pkl_objects')\n", "if not os.path.exists(dest):\n", " os.makedirs(dest)\n", "\n", "pickle.dump(stop, open(os.path.join(dest, 'stopwords.pkl'), 'wb'), protocol=4)\n", "pickle.dump(clf, open(os.path.join(dest, 'classifier.pkl'), 'wb'), protocol=4)" ] }, { "cell_type": "markdown", "metadata": { "id": "-go6y-Y0awlv" }, "source": [ "그다음 나중에 임포트할 수 있도록 별도의 파일에 `HashingVectorizer`를 저장합니다." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-10-23T10:42:26.810974Z", "iopub.status.busy": "2021-10-23T10:42:26.809774Z", "iopub.status.idle": "2021-10-23T10:42:26.944071Z", "shell.execute_reply": "2021-10-23T10:42:26.944823Z" }, "id": "rlgFcHlYawlv", "outputId": "0cd03f32-d9b3-48df-bd8d-df8503ede7e9" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Writing movieclassifier/vectorizer.py\n" ] } ], "source": [ "%%writefile movieclassifier/vectorizer.py\n", "from sklearn.feature_extraction.text import HashingVectorizer\n", "import re\n", "import os\n", "import pickle\n", "\n", "cur_dir = os.path.dirname(__file__)\n", "stop = pickle.load(open(\n", " os.path.join(cur_dir,\n", " 'pkl_objects',\n", " 'stopwords.pkl'), 'rb'))\n", "\n", "def tokenizer(text):\n", " text = re.sub('<[^>]*>', '', text)\n", " emoticons = re.findall('(?::|;|=)(?:-)?(?:\\)|\\(|D|P)',\n", " text.lower())\n", " text = re.sub('[\\W]+', ' ', text.lower()) \\\n", " + ' '.join(emoticons).replace('-', '')\n", " tokenized = [w for w in text.split() if w not in stop]\n", " return tokenized\n", "\n", "vect = HashingVectorizer(decode_error='ignore',\n", " n_features=2**21,\n", " preprocessor=None,\n", " tokenizer=tokenizer)" ] }, { "cell_type": "markdown", "metadata": { "id": "KBCFPHclawlv" }, "source": [ "이전 코드 셀을 실행한 후에 객체가 올바르게 저장되었는지 확인하기 위해 IPython 노트북 커널을 재시작할 수 있습니다." ] }, { "cell_type": "markdown", "metadata": { "id": "FgwwuZH3awlv" }, "source": [ "먼저 현재 파이썬 디렉토리를 `movieclassifer`로 변경합니다:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:42:26.949809Z", "iopub.status.busy": "2021-10-23T10:42:26.948695Z", "iopub.status.idle": "2021-10-23T10:42:26.952041Z", "shell.execute_reply": "2021-10-23T10:42:26.952760Z" }, "id": "x2ss0iwKawlv" }, "outputs": [], "source": [ "import os\n", "os.chdir('movieclassifier')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:42:26.957406Z", "iopub.status.busy": "2021-10-23T10:42:26.955677Z", "iopub.status.idle": "2021-10-23T10:42:26.974758Z", "shell.execute_reply": "2021-10-23T10:42:26.973983Z" }, "id": "RUjz2v7Gawlv" }, "outputs": [], "source": [ "import pickle\n", "import re\n", "import os\n", "from vectorizer import vect\n", "\n", "clf = pickle.load(open(os.path.join('pkl_objects', 'classifier.pkl'), 'rb'))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-10-23T10:42:26.981499Z", "iopub.status.busy": "2021-10-23T10:42:26.980768Z", "iopub.status.idle": "2021-10-23T10:42:26.983474Z", "shell.execute_reply": "2021-10-23T10:42:26.983945Z" }, "id": "6UMp7VH1awlw", "outputId": "77119718-8d0d-4bad-82b1-034877b21aed" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "예측: 양성\n", "확률: 95.55%\n" ] } ], "source": [ "import numpy as np\n", "label = {0:'음성', 1:'양성'}\n", "\n", "example = [\"I love this movie. It's amazing.\"]\n", "X = vect.transform(example)\n", "print('예측: %s\\n확률: %.2f%%' %\\\n", " (label[clf.predict(X)[0]],\n", " np.max(clf.predict_proba(X))*100))" ] }, { "cell_type": "markdown", "metadata": { "id": "Yy5BKna7awlw" }, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "joixYXoXawlw" }, "source": [ "# 데이터를 저장하기 위해 SQLite 데이터베이스 설정" ] }, { "cell_type": "markdown", "metadata": { "id": "17ckroSPawlw" }, "source": [ "이 코드를 실행하기 전에 현재 위치가 `movieclassifier` 디렉토리인지 확인합니다." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 36 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:26.989598Z", "iopub.status.busy": "2021-10-23T10:42:26.988763Z", "iopub.status.idle": "2021-10-23T10:42:26.991829Z", "shell.execute_reply": "2021-10-23T10:42:26.992292Z" }, "id": "xlQDh61Gawlw", "outputId": "d13fd36e-7d04-45f0-9fe1-36c6a3bfd3ef" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "'/content/movieclassifier'" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" } }, "metadata": {}, "execution_count": 18 } ], "source": [ "os.getcwd()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:42:26.999325Z", "iopub.status.busy": "2021-10-23T10:42:26.997317Z", "iopub.status.idle": "2021-10-23T10:42:27.018500Z", "shell.execute_reply": "2021-10-23T10:42:27.017642Z" }, "id": "9cZe4L0tawlx" }, "outputs": [], "source": [ "import sqlite3\n", "import os\n", "\n", "conn = sqlite3.connect('reviews.sqlite')\n", "c = conn.cursor()\n", "\n", "c.execute('DROP TABLE IF EXISTS review_db')\n", "c.execute('CREATE TABLE review_db (review TEXT, sentiment INTEGER, date TEXT)')\n", "\n", "example1 = 'I love this movie'\n", "c.execute(\"INSERT INTO review_db (review, sentiment, date) VALUES (?, ?, DATETIME('now'))\", (example1, 1))\n", "\n", "example2 = 'I disliked this movie'\n", "c.execute(\"INSERT INTO review_db (review, sentiment, date) VALUES (?, ?, DATETIME('now'))\", (example2, 0))\n", "\n", "conn.commit()\n", "conn.close()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:42:27.023247Z", "iopub.status.busy": "2021-10-23T10:42:27.022518Z", "iopub.status.idle": "2021-10-23T10:42:27.025468Z", "shell.execute_reply": "2021-10-23T10:42:27.025889Z" }, "id": "Mbe3UZgfawlx" }, "outputs": [], "source": [ "conn = sqlite3.connect('reviews.sqlite')\n", "c = conn.cursor()\n", "\n", "c.execute(\"SELECT * FROM review_db WHERE date BETWEEN '2017-01-01 10:10:10' AND DATETIME('now')\")\n", "results = c.fetchall()\n", "\n", "conn.close()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.030023Z", "iopub.status.busy": "2021-10-23T10:42:27.029361Z", "iopub.status.idle": "2021-10-23T10:42:27.032770Z", "shell.execute_reply": "2021-10-23T10:42:27.032127Z" }, "id": "vwa3g4tQawlx", "outputId": "382b2d75-8ac6-485c-9aba-923753729f3c" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "[('I love this movie', 1, '2023-11-10 05:08:08'), ('I disliked this movie', 0, '2023-11-10 05:08:08')]\n" ] } ], "source": [ "print(results)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 478 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.037707Z", "iopub.status.busy": "2021-10-23T10:42:27.037058Z", "iopub.status.idle": "2021-10-23T10:42:27.040119Z", "shell.execute_reply": "2021-10-23T10:42:27.040586Z" }, "id": "1Wt6k_PLawlx", "outputId": "c4c65349-38ea-4316-ff2e-d2a27e368ba2" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 22 } ], "source": [ "Image(url='https://git.io/Jts3V', width=700)" ] }, { "cell_type": "markdown", "metadata": { "id": "6RKw4dd0awlx" }, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": { "id": "W4V6Gx6Tawlx" }, "source": [ "# 플라스크 웹 애플리케이션 개발" ] }, { "cell_type": "markdown", "metadata": { "id": "kdv7WCFVawlx" }, "source": [ "..." ] }, { "cell_type": "markdown", "metadata": { "id": "0i4Ktl2Oawly" }, "source": [ "## 첫 번째 플라스크 애플리케이션" ] }, { "cell_type": "markdown", "metadata": { "id": "PbqsOJouawly" }, "source": [ "..." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 241 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.046556Z", "iopub.status.busy": "2021-10-23T10:42:27.045885Z", "iopub.status.idle": "2021-10-23T10:42:27.049598Z", "shell.execute_reply": "2021-10-23T10:42:27.048819Z" }, "id": "sqVxyOD_awly", "outputId": "9a847611-2267-4209-b63c-5c7803316b0b" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 23 } ], "source": [ "Image(url='https://git.io/Jts3o', width=700)" ] }, { "cell_type": "markdown", "metadata": { "id": "HL4wMjzUawly" }, "source": [ "## 폼 검증과 화면 출력" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 238 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.054663Z", "iopub.status.busy": "2021-10-23T10:42:27.053945Z", "iopub.status.idle": "2021-10-23T10:42:27.056803Z", "shell.execute_reply": "2021-10-23T10:42:27.057255Z" }, "id": "fYzE9i3-awly", "outputId": "3aeeb721-3afd-4430-ba78-05c19396f3bd" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 24 } ], "source": [ "Image(url='https://git.io/Jts3K', width=400)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 102 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.061920Z", "iopub.status.busy": "2021-10-23T10:42:27.061284Z", "iopub.status.idle": "2021-10-23T10:42:27.064488Z", "shell.execute_reply": "2021-10-23T10:42:27.064928Z" }, "id": "aKBsBCFWawly", "outputId": "11e376ac-a05a-4e4b-eb1d-59cbe81cc70d" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 25 } ], "source": [ "Image(url='https://git.io/Jts36', width=400)" ] }, { "cell_type": "markdown", "metadata": { "id": "UC5SMipWawly" }, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "QugsT1TGawlz" }, "source": [ "## 화면 요약" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 336 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.069942Z", "iopub.status.busy": "2021-10-23T10:42:27.069287Z", "iopub.status.idle": "2021-10-23T10:42:27.072572Z", "shell.execute_reply": "2021-10-23T10:42:27.073012Z" }, "id": "B-pYhG_Dawlz", "outputId": "259df438-1337-4561-ec17-2f08e479dc6d" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 26 } ], "source": [ "Image(url='https://git.io/Jts3P', width=800)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 545 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.077697Z", "iopub.status.busy": "2021-10-23T10:42:27.077068Z", "iopub.status.idle": "2021-10-23T10:42:27.081051Z", "shell.execute_reply": "2021-10-23T10:42:27.080402Z" }, "id": "LGy4kOWOawlz", "outputId": "ca0a43f9-51c9-471b-bab3-c89681c4699b" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 27 } ], "source": [ "Image(url='https://git.io/Jts3X', width=800)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 443 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.085751Z", "iopub.status.busy": "2021-10-23T10:42:27.085109Z", "iopub.status.idle": "2021-10-23T10:42:27.088219Z", "shell.execute_reply": "2021-10-23T10:42:27.088648Z" }, "id": "VySNEDCBawlz", "outputId": "eb1966f7-3f6a-44b4-c001-5790f16cea88" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 28 } ], "source": [ "Image(url='https://git.io/Jts31', width=400)" ] }, { "cell_type": "markdown", "metadata": { "id": "Sboq5JaDawlz" }, "source": [ "# 영화 리뷰 분류기를 웹 애플리케이션으로 만들기" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 249 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.093471Z", "iopub.status.busy": "2021-10-23T10:42:27.092785Z", "iopub.status.idle": "2021-10-23T10:42:27.096163Z", "shell.execute_reply": "2021-10-23T10:42:27.096614Z" }, "id": "OXmQFwofawlz", "outputId": "22792cfc-c262-40ad-bb98-66fd1e4b1000" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 29 } ], "source": [ "Image(url='https://git.io/Jts3M', width=400)" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 252 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.101553Z", "iopub.status.busy": "2021-10-23T10:42:27.100824Z", "iopub.status.idle": "2021-10-23T10:42:27.104703Z", "shell.execute_reply": "2021-10-23T10:42:27.104095Z" }, "id": "5hlCDl4Qawlz", "outputId": "af31f7a1-179b-4d73-f076-f864034541e0" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 30 } ], "source": [ "Image(url='https://git.io/Jts3D', width=400)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 144 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.109273Z", "iopub.status.busy": "2021-10-23T10:42:27.108628Z", "iopub.status.idle": "2021-10-23T10:42:27.112209Z", "shell.execute_reply": "2021-10-23T10:42:27.112635Z" }, "id": "H2IWwFo4awl0", "outputId": "22b42cd3-5594-42b5-8102-7698aab18fb0" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 31 } ], "source": [ "Image(url='https://git.io/Jts3y', width=400)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 296 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.117420Z", "iopub.status.busy": "2021-10-23T10:42:27.116719Z", "iopub.status.idle": "2021-10-23T10:42:27.120071Z", "shell.execute_reply": "2021-10-23T10:42:27.120483Z" }, "id": "NDdF8d_Dawl0", "outputId": "4a6b79b1-32f1-4b91-a4ba-ff763cac8187" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 32 } ], "source": [ "Image(url='https://git.io/Jts3S', width=200)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 288 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.125664Z", "iopub.status.busy": "2021-10-23T10:42:27.124733Z", "iopub.status.idle": "2021-10-23T10:42:27.129014Z", "shell.execute_reply": "2021-10-23T10:42:27.129667Z" }, "id": "oDMBlVbjawl0", "outputId": "b42887f4-7c9a-4b4f-84fe-8f8c1a6774f0" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 33 } ], "source": [ "Image(url='https://git.io/Jts32', width=400)" ] }, { "cell_type": "markdown", "metadata": { "id": "QYjcDtqpawl0" }, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "yDhAf2xQawl0" }, "source": [ "# 공개 서버에 웹 애플리케이션 배포" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 476 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.135494Z", "iopub.status.busy": "2021-10-23T10:42:27.134595Z", "iopub.status.idle": "2021-10-23T10:42:27.138673Z", "shell.execute_reply": "2021-10-23T10:42:27.139317Z" }, "id": "mUUr4cloawl1", "outputId": "bc440e92-660a-42e5-d730-670e2dc68e0b" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 34 } ], "source": [ "Image(url='https://git.io/Jts39', width=600)" ] }, { "cell_type": "markdown", "metadata": { "id": "Tz5xeyR_awl1" }, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "bfE3tvN6awl1" }, "source": [ "## 영화 분류기 업데이트" ] }, { "cell_type": "markdown", "metadata": { "id": "YDM3DF_Jawl1" }, "source": [ "다운로드한 깃허브 저장소에 들어있는 movieclassifier_with_update 디렉토리를 사용합니다(그렇지 않으면 `movieclassifier` 디렉토리를 복사해서 사용하세요)." ] }, { "cell_type": "markdown", "metadata": { "id": "RDF4lqx5awl1" }, "source": [ "**코랩을 사용할 때는 다음 셀을 실행하세요.**" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:42:27.148624Z", "iopub.status.busy": "2021-10-23T10:42:27.144514Z", "iopub.status.idle": "2021-10-23T10:42:27.275568Z", "shell.execute_reply": "2021-10-23T10:42:27.274671Z" }, "id": "JC0mvtTBawl2" }, "outputs": [], "source": [ "!cp -r ../movieclassifier ../movieclassifier_with_update" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 36 }, "execution": { "iopub.execute_input": "2021-10-23T10:42:27.282672Z", "iopub.status.busy": "2021-10-23T10:42:27.281951Z", "iopub.status.idle": "2021-10-23T10:42:27.327533Z", "shell.execute_reply": "2021-10-23T10:42:27.326879Z" }, "id": "yWt0Z10Cawl2", "outputId": "f66d8354-6a1e-4577-fcca-6639f03ae6b9" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "'./reviews.sqlite'" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" } }, "metadata": {}, "execution_count": 36 } ], "source": [ "import shutil\n", "\n", "os.chdir('..')\n", "\n", "if not os.path.exists('movieclassifier_with_update'):\n", " os.mkdir('movieclassifier_with_update')\n", "os.chdir('movieclassifier_with_update')\n", "\n", "if not os.path.exists('pkl_objects'):\n", " os.mkdir('pkl_objects')\n", "\n", "shutil.copyfile('../movieclassifier/pkl_objects/classifier.pkl',\n", " './pkl_objects/classifier.pkl')\n", "\n", "shutil.copyfile('../movieclassifier/reviews.sqlite',\n", " './reviews.sqlite')" ] }, { "cell_type": "markdown", "metadata": { "id": "S2yiWBk8awl2" }, "source": [ "SQLite 데이터베이스에 저장된 데이터로 분류기를 업데이트하는 함수를 정의합니다:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:42:27.334997Z", "iopub.status.busy": "2021-10-23T10:42:27.334266Z", "iopub.status.idle": "2021-10-23T10:42:27.336838Z", "shell.execute_reply": "2021-10-23T10:42:27.336185Z" }, "id": "cEG3QW8wawl2" }, "outputs": [], "source": [ "import pickle\n", "import sqlite3\n", "import numpy as np\n", "\n", "# 로컬 디렉토리에서 HashingVectorizer를 임포트합니다\n", "from vectorizer import vect\n", "\n", "def update_model(db_path, model, batch_size=10000):\n", "\n", " conn = sqlite3.connect(db_path)\n", " c = conn.cursor()\n", " c.execute('SELECT * from review_db')\n", "\n", " results = c.fetchmany(batch_size)\n", " while results:\n", " data = np.array(results)\n", " X = data[:, 0]\n", " y = data[:, 1].astype(int)\n", "\n", " classes = np.array([0, 1])\n", " X_train = vect.transform(X)\n", " clf.partial_fit(X_train, y, classes=classes)\n", " results = c.fetchmany(batch_size)\n", "\n", " conn.close()\n", " return None" ] }, { "cell_type": "markdown", "metadata": { "id": "_alTEyGaawl2" }, "source": [ "모델을 업데이트합니다:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "execution": { "iopub.execute_input": "2021-10-23T10:42:27.342287Z", "iopub.status.busy": "2021-10-23T10:42:27.341581Z", "iopub.status.idle": "2021-10-23T10:42:27.392881Z", "shell.execute_reply": "2021-10-23T10:42:27.393819Z" }, "id": "0qeEwRm5awl2" }, "outputs": [], "source": [ "cur_dir = '.'\n", "\n", "# app.py 파일에 이 코드를 삽입했다면 다음 경로를 사용하세요.\n", "\n", "# import os\n", "# cur_dir = os.path.dirname(__file__)\n", "\n", "clf = pickle.load(open(os.path.join(cur_dir,\n", " 'pkl_objects',\n", " 'classifier.pkl'), 'rb'))\n", "db = os.path.join(cur_dir, 'reviews.sqlite')\n", "\n", "update_model(db_path=db, model=clf, batch_size=10000)\n", "\n", "# classifier.pkl 파일을 업데이트하려면 다음 주석을 해제하세요.\n", "\n", "# pickle.dump(clf, open(os.path.join(cur_dir,\n", "# 'pkl_objects', 'classifier.pkl'), 'wb')\n", "# , protocol=4)" ] }, { "cell_type": "markdown", "metadata": { "id": "6B3Gxlfhawl2" }, "source": [ "
\n", "
" ] } ], "metadata": { "colab": { "name": "ch09.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 0 }