{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Td_EDk49y8wt" }, "source": [ "# 머신 러닝 교과서 3판" ] }, { "cell_type": "markdown", "metadata": { "id": "8vj8Oyxiy8w0" }, "source": [ "# 14장 - 텐서플로의 구조 자세히 알아보기 (2/3)" ] }, { "cell_type": "markdown", "metadata": { "id": "9SUdqsLFy8w0" }, "source": [ "**아래 링크를 통해 이 노트북을 주피터 노트북 뷰어(nbviewer.jupyter.org)로 보거나 구글 코랩(colab.research.google.com)에서 실행할 수 있습니다.**\n", "\n", "\n", " \n", " \n", "
\n", " 주피터 노트북 뷰어로 보기\n", " \n", " 구글 코랩(Colab)에서 실행하기\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "sZOZo9xwy8w0" }, "source": [ "### 목차" ] }, { "cell_type": "markdown", "metadata": { "id": "I76d0TZxy8w1" }, "source": [ "- 텐서플로 추정기\n", " - 특성 열 사용하기\n", " - 사전에 준비된 추정기로 머신 러닝 수행하기" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "CsQvsr58y8w1" }, "outputs": [], "source": [ "import numpy as np\n", "import tensorflow as tf\n", "import pandas as pd\n", "\n", "from IPython.display import Image" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 36 }, "id": "jCgZBvoRzZsB", "outputId": "fb992a11-479a-49be-bace-bfc7c24a49ab" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "'2.19.0'" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" } }, "metadata": {}, "execution_count": 2 } ], "source": [ "tf.__version__" ] }, { "cell_type": "markdown", "metadata": { "id": "kZAANkUJy8w1" }, "source": [ "## 텐서플로 추정기\n", "\n", "##### 사전에 준비된 추정기 사용하는 단계\n", "\n", " * **단계 1:** 데이터 로딩을 위해 입력 함수 정의하기\n", " * **단계 2:** 추정기와 데이터 사이를 연결하기 위해 특성 열 정의하기\n", " * **단계 3:** 추정기 객체를 만들거나 케라스 모델을 추정기로 바꾸기\n", " * **단계 4:** 추정기 사용하기: train() evaluate() predict()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "csAr00GCy8w2" }, "outputs": [], "source": [ "tf.random.set_seed(1)\n", "np.random.seed(1)" ] }, { "cell_type": "markdown", "metadata": { "id": "nOFiP8O9y8w2" }, "source": [ "### 특성 열 사용하기\n", "\n", "\n", " * 정의: https://developers.google.com/machine-learning/glossary/#feature_columns\n", " * 문서: https://www.tensorflow.org/api_docs/python/tf/feature_column" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 398 }, "id": "bJbH-xauy8w2", "outputId": "c8b22158-b08a-4597-b9c4-d5d931523e55" }, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "execution_count": 4 } ], "source": [ "Image(url='https://git.io/JL56E', width=700)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 271 }, "id": "YRGME3Lqy8w3", "outputId": "09da8c89-969f-44af-c920-3e360fb54ed9" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Downloading data from http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data\n", " 16384/Unknown \u001b[1m0s\u001b[0m 9us/step" ] }, { "output_type": "execute_result", "data": { "text/plain": [ " MPG Cylinders Displacement Horsepower Weight Acceleration \\\n", "393 27.0 4 140.0 86.0 2790.0 15.6 \n", "394 44.0 4 97.0 52.0 2130.0 24.6 \n", "395 32.0 4 135.0 84.0 2295.0 11.6 \n", "396 28.0 4 120.0 79.0 2625.0 18.6 \n", "397 31.0 4 119.0 82.0 2720.0 19.4 \n", "\n", " ModelYear Origin \n", "393 82 1 \n", "394 82 2 \n", "395 82 1 \n", "396 82 1 \n", "397 82 1 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MPGCylindersDisplacementHorsepowerWeightAccelerationModelYearOrigin
39327.04140.086.02790.015.6821
39444.0497.052.02130.024.6822
39532.04135.084.02295.011.6821
39628.04120.079.02625.018.6821
39731.04119.082.02720.019.4821
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "summary": "{\n \"name\": \"df\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"MPG\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6.8044103344816005,\n \"min\": 27.0,\n \"max\": 44.0,\n \"num_unique_values\": 5,\n \"samples\": [\n 44.0,\n 31.0,\n 32.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Cylinders\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 4,\n \"max\": 4,\n \"num_unique_values\": 1,\n \"samples\": [\n 4\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Displacement\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 16.813684902483452,\n \"min\": 97.0,\n \"max\": 140.0,\n \"num_unique_values\": 5,\n \"samples\": [\n 97.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Horsepower\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 13.992855319769443,\n \"min\": 52.0,\n \"max\": 86.0,\n \"num_unique_values\": 5,\n \"samples\": [\n 52.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Weight\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 285.62650437240586,\n \"min\": 2130.0,\n \"max\": 2790.0,\n \"num_unique_values\": 5,\n \"samples\": [\n 2130.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Acceleration\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.81123684721507,\n \"min\": 11.6,\n \"max\": 24.6,\n \"num_unique_values\": 5,\n \"samples\": [\n 24.6\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ModelYear\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 82,\n \"max\": 82,\n \"num_unique_values\": 1,\n \"samples\": [\n 82\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Origin\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 1,\n \"max\": 2,\n \"num_unique_values\": 2,\n \"samples\": [\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 6 } ], "source": [ "dataset_path = tf.keras.utils.get_file(\"auto-mpg.data\",\n", " (\"http://archive.ics.uci.edu/ml/machine-learning-databases\"\n", " \"/auto-mpg/auto-mpg.data\"))\n", "\n", "column_names = ['MPG', 'Cylinders', 'Displacement', 'Horsepower',\n", " 'Weight', 'Acceleration', 'ModelYear', 'Origin']\n", "\n", "df = pd.read_csv(dataset_path, names=column_names,\n", " na_values = \"?\", comment='\\t',\n", " sep=\" \", skipinitialspace=True)\n", "\n", "df.tail()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 401 }, "id": "lRR0q4Eky8w3", "outputId": "41e32f7a-6ac2-47a3-fe29-3db331c42c0c" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "MPG 0\n", "Cylinders 0\n", "Displacement 0\n", "Horsepower 6\n", "Weight 0\n", "Acceleration 0\n", "ModelYear 0\n", "Origin 0\n", "dtype: int64\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ " MPG Cylinders Displacement Horsepower Weight Acceleration \\\n", "387 27.0 4 140.0 86.0 2790.0 15.6 \n", "388 44.0 4 97.0 52.0 2130.0 24.6 \n", "389 32.0 4 135.0 84.0 2295.0 11.6 \n", "390 28.0 4 120.0 79.0 2625.0 18.6 \n", "391 31.0 4 119.0 82.0 2720.0 19.4 \n", "\n", " ModelYear Origin \n", "387 82 1 \n", "388 82 2 \n", "389 82 1 \n", "390 82 1 \n", "391 82 1 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MPGCylindersDisplacementHorsepowerWeightAccelerationModelYearOrigin
38727.04140.086.02790.015.6821
38844.0497.052.02130.024.6822
38932.04135.084.02295.011.6821
39028.04120.079.02625.018.6821
39131.04119.082.02720.019.4821
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "summary": "{\n \"name\": \"df\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"MPG\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6.8044103344816005,\n \"min\": 27.0,\n \"max\": 44.0,\n \"num_unique_values\": 5,\n \"samples\": [\n 44.0,\n 31.0,\n 32.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Cylinders\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 4,\n \"max\": 4,\n \"num_unique_values\": 1,\n \"samples\": [\n 4\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Displacement\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 16.813684902483452,\n \"min\": 97.0,\n \"max\": 140.0,\n \"num_unique_values\": 5,\n \"samples\": [\n 97.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Horsepower\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 13.992855319769443,\n \"min\": 52.0,\n \"max\": 86.0,\n \"num_unique_values\": 5,\n \"samples\": [\n 52.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Weight\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 285.62650437240586,\n \"min\": 2130.0,\n \"max\": 2790.0,\n \"num_unique_values\": 5,\n \"samples\": [\n 2130.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Acceleration\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.81123684721507,\n \"min\": 11.6,\n \"max\": 24.6,\n \"num_unique_values\": 5,\n \"samples\": [\n 24.6\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ModelYear\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 82,\n \"max\": 82,\n \"num_unique_values\": 1,\n \"samples\": [\n 82\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Origin\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 1,\n \"max\": 2,\n \"num_unique_values\": 2,\n \"samples\": [\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 7 } ], "source": [ "print(df.isna().sum())\n", "\n", "df = df.dropna()\n", "df = df.reset_index(drop=True)\n", "df.tail()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 311 }, "id": "VBr_cK0dy8w4", "outputId": "dc1c88ff-931e-4f84-bbc2-ab2f8f676c33" }, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " count mean std min 25% 50% 75% \\\n", "MPG 313.0 23.404153 7.666909 9.0 17.5 23.0 29.0 \n", "Cylinders 313.0 5.402556 1.701506 3.0 4.0 4.0 8.0 \n", "Displacement 313.0 189.512780 102.675646 68.0 104.0 140.0 260.0 \n", "Horsepower 313.0 102.929712 37.919046 46.0 75.0 92.0 120.0 \n", "Weight 313.0 2961.198083 848.602146 1613.0 2219.0 2755.0 3574.0 \n", "Acceleration 313.0 15.704473 2.725399 8.5 14.0 15.5 17.3 \n", "ModelYear 313.0 75.929712 3.675305 70.0 73.0 76.0 79.0 \n", "Origin 313.0 1.591054 0.807923 1.0 1.0 1.0 2.0 \n", "\n", " max \n", "MPG 46.6 \n", "Cylinders 8.0 \n", "Displacement 455.0 \n", "Horsepower 230.0 \n", "Weight 5140.0 \n", "Acceleration 24.8 \n", "ModelYear 82.0 \n", "Origin 3.0 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
MPG313.023.4041537.6669099.017.523.029.046.6
Cylinders313.05.4025561.7015063.04.04.08.08.0
Displacement313.0189.512780102.67564668.0104.0140.0260.0455.0
Horsepower313.0102.92971237.91904646.075.092.0120.0230.0
Weight313.02961.198083848.6021461613.02219.02755.03574.05140.0
Acceleration313.015.7044732.7253998.514.015.517.324.8
ModelYear313.075.9297123.67530570.073.076.079.082.0
Origin313.01.5910540.8079231.01.01.02.03.0
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "train_stats", "summary": "{\n \"name\": \"train_stats\",\n \"rows\": 8,\n \"fields\": [\n {\n \"column\": \"count\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.0,\n \"min\": 313.0,\n \"max\": 313.0,\n \"num_unique_values\": 1,\n \"samples\": [\n 313.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"mean\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1027.9938512887559,\n \"min\": 1.5910543130990416,\n \"max\": 2961.1980830670927,\n \"num_unique_values\": 8,\n \"samples\": [\n 5.402555910543131\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"std\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 294.16743824006835,\n \"min\": 0.8079234066512674,\n \"max\": 848.6021456860568,\n \"num_unique_values\": 8,\n \"samples\": [\n 1.7015057012867858\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"min\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 560.6379617071456,\n \"min\": 1.0,\n \"max\": 1613.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 3.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"25%\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 770.927615533947,\n \"min\": 1.0,\n \"max\": 2219.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 4.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"50%\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 957.5533158412494,\n \"min\": 1.0,\n \"max\": 2755.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 4.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"75%\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1240.5429962462865,\n \"min\": 2.0,\n \"max\": 3574.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 8.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"max\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1781.0508356826074,\n \"min\": 3.0,\n \"max\": 5140.0,\n \"num_unique_values\": 8,\n \"samples\": [\n 8.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 8 } ], "source": [ "import sklearn\n", "import sklearn.model_selection\n", "\n", "\n", "df_train, df_test = sklearn.model_selection.train_test_split(df, train_size=0.8)\n", "train_stats = df_train.describe().transpose()\n", "train_stats" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "0Xit7Hj4y8w4", "outputId": "4de36760-010b-4f0f-bc47-c28efddae4fc" }, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "/tmp/ipython-input-487790972.py:8: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '[-0.8243028 0.3511267 -0.8243028 -0.8243028 -0.8243028 1.52655621\n", " 0.3511267 -0.8243028 -0.8243028 -0.8243028 0.3511267 -0.8243028\n", " 0.3511267 1.52655621 1.52655621 1.52655621 0.3511267 1.52655621\n", " -0.8243028 0.3511267 1.52655621 -0.8243028 -0.8243028 0.3511267\n", " -0.8243028 -0.8243028 -0.8243028 0.3511267 -0.8243028 1.52655621\n", " 0.3511267 -0.8243028 0.3511267 -0.8243028 -0.8243028 1.52655621\n", " -0.8243028 1.52655621 1.52655621 -0.8243028 -0.8243028 -0.8243028\n", " -0.8243028 0.3511267 -0.8243028 1.52655621 -0.8243028 -0.8243028\n", " 1.52655621 -0.8243028 -0.8243028 -0.8243028 1.52655621 1.52655621\n", " 0.3511267 0.3511267 1.52655621 -0.8243028 -0.8243028 1.52655621\n", " 1.52655621 -0.8243028 -0.8243028 0.3511267 1.52655621 -0.8243028\n", " 0.3511267 -0.8243028 1.52655621 1.52655621 -0.8243028 -0.8243028\n", " -1.41201755 1.52655621 0.3511267 1.52655621 -0.8243028 -0.8243028\n", " -0.8243028 1.52655621 1.52655621 0.3511267 0.3511267 1.52655621\n", " -0.8243028 1.52655621 -0.23658805 -0.8243028 -0.8243028 0.3511267\n", " 0.3511267 -1.41201755 -0.8243028 1.52655621 -0.8243028 1.52655621\n", " -0.8243028 1.52655621 0.3511267 -0.8243028 0.3511267 -1.41201755\n", " 0.3511267 -0.8243028 -0.8243028 -0.8243028 -0.8243028 -0.8243028\n", " -0.8243028 1.52655621 1.52655621 -0.8243028 0.3511267 -0.8243028\n", " -0.8243028 0.3511267 1.52655621 -0.8243028 -0.8243028 0.3511267\n", " 0.3511267 -0.8243028 -0.8243028 -0.23658805 -0.8243028 0.3511267\n", " 0.3511267 0.3511267 -0.8243028 1.52655621 1.52655621 -0.8243028\n", " 0.3511267 1.52655621 1.52655621 -0.8243028 0.3511267 0.3511267\n", " -0.8243028 -0.8243028 -0.8243028 -0.8243028 1.52655621 -0.8243028\n", " -0.8243028 0.3511267 -0.8243028 -0.8243028 1.52655621 -0.8243028\n", " 1.52655621 -0.8243028 -0.8243028 1.52655621 1.52655621 -0.8243028\n", " 0.3511267 -0.8243028 1.52655621 0.3511267 -0.8243028 1.52655621\n", " -0.8243028 -0.8243028 1.52655621 1.52655621 -0.8243028 1.52655621\n", " 0.3511267 -0.8243028 -0.8243028 0.3511267 1.52655621 1.52655621\n", " -0.8243028 -0.8243028 -0.8243028 -0.8243028 -0.8243028 -0.8243028\n", " -0.8243028 1.52655621 -0.8243028 0.3511267 1.52655621 -0.8243028\n", " -0.23658805 -0.8243028 -0.8243028 -0.8243028 -0.8243028 -0.8243028\n", " 0.3511267 -0.8243028 -0.8243028 1.52655621 1.52655621 -0.8243028\n", " 1.52655621 -0.8243028 1.52655621 -0.8243028 -0.8243028 -0.8243028\n", " -0.8243028 0.3511267 -0.8243028 1.52655621 -0.8243028 0.3511267\n", " 1.52655621 -0.8243028 -0.8243028 -0.8243028 0.3511267 -0.8243028\n", " -0.8243028 -0.8243028 -0.8243028 0.3511267 -0.8243028 1.52655621\n", " -0.8243028 -0.8243028 -0.8243028 1.52655621 -0.8243028 1.52655621\n", " -0.8243028 0.3511267 -0.8243028 0.3511267 -0.8243028 -0.8243028\n", " 0.3511267 1.52655621 -0.8243028 0.3511267 1.52655621 0.3511267\n", " -0.8243028 -0.8243028 -0.8243028 1.52655621 1.52655621 -0.8243028\n", " -0.8243028 -0.8243028 1.52655621 0.3511267 0.3511267 -0.8243028\n", " -0.8243028 -0.8243028 1.52655621 0.3511267 0.3511267 0.3511267\n", " 1.52655621 -0.8243028 1.52655621 0.3511267 -0.8243028 -0.8243028\n", " -0.8243028 -0.8243028 1.52655621 -0.8243028 1.52655621 -0.8243028\n", " 1.52655621 -0.8243028 -0.8243028 -0.8243028 -0.8243028 -0.8243028\n", " 1.52655621 -0.8243028 -0.8243028 -0.8243028 0.3511267 -0.8243028\n", " 1.52655621 -0.8243028 -0.8243028 -0.8243028 -0.8243028 1.52655621\n", " -0.8243028 1.52655621 -0.8243028 -1.41201755 -0.8243028 1.52655621\n", " -0.8243028 1.52655621 0.3511267 0.3511267 0.3511267 -0.8243028\n", " -0.8243028 0.3511267 -0.8243028 1.52655621 -0.8243028 -0.8243028\n", " -0.8243028 0.3511267 -0.8243028 0.3511267 1.52655621 -0.8243028\n", " 1.52655621]' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.\n", " df_train_norm.loc[:, col_name] = (df_train_norm.loc[:, col_name] - mean)/std\n", "/tmp/ipython-input-487790972.py:9: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '[-0.8243028 -0.8243028 -0.8243028 -0.8243028 -0.8243028 -0.8243028\n", " -0.8243028 1.52655621 -0.8243028 -0.8243028 0.3511267 -0.8243028\n", " 1.52655621 -0.8243028 -0.8243028 1.52655621 -0.8243028 -0.8243028\n", " 1.52655621 0.3511267 -0.8243028 0.3511267 1.52655621 1.52655621\n", " 1.52655621 1.52655621 -0.8243028 0.3511267 0.3511267 -0.8243028\n", " 1.52655621 -0.8243028 1.52655621 0.3511267 0.3511267 1.52655621\n", " 0.3511267 0.3511267 -0.8243028 -0.8243028 1.52655621 1.52655621\n", " 0.3511267 0.3511267 0.3511267 0.3511267 1.52655621 -0.8243028\n", " 0.3511267 1.52655621 1.52655621 0.3511267 -0.8243028 -0.8243028\n", " -0.8243028 0.3511267 0.3511267 0.3511267 -0.8243028 -0.8243028\n", " 0.3511267 -0.8243028 -0.8243028 1.52655621 0.3511267 0.3511267\n", " -0.8243028 1.52655621 -0.8243028 1.52655621 -0.8243028 1.52655621\n", " 1.52655621 -0.8243028 -0.8243028 -0.8243028 1.52655621 1.52655621\n", " 1.52655621]' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.\n", " df_test_norm.loc[:, col_name] = (df_test_norm.loc[:, col_name] - mean)/std\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ " MPG Cylinders Displacement Horsepower Weight Acceleration \\\n", "203 28.0 -0.824303 -0.901020 -0.736562 -0.950031 0.255202 \n", "255 19.4 0.351127 0.413800 -0.340982 0.293190 0.548737 \n", "72 13.0 1.526556 1.144256 0.713897 1.339617 -0.625403 \n", "235 30.5 -0.824303 -0.891280 -1.053025 -1.072585 0.475353 \n", "37 14.0 1.526556 1.563051 1.636916 1.470420 -1.359240 \n", "\n", " ModelYear Origin \n", "203 76 3 \n", "255 78 1 \n", "72 72 1 \n", "235 77 1 \n", "37 71 1 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MPGCylindersDisplacementHorsepowerWeightAccelerationModelYearOrigin
20328.0-0.824303-0.901020-0.736562-0.9500310.255202763
25519.40.3511270.413800-0.3409820.2931900.548737781
7213.01.5265561.1442560.7138971.339617-0.625403721
23530.5-0.824303-0.891280-1.053025-1.0725850.475353771
3714.01.5265561.5630511.6369161.470420-1.359240711
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "summary": "{\n \"name\": \"df_train_norm\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"MPG\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 7.981353268713271,\n \"min\": 13.0,\n \"max\": 30.5,\n \"num_unique_values\": 5,\n \"samples\": [\n 19.4,\n 14.0,\n 13.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Cylinders\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.175429502520899,\n \"min\": -0.8243027980937295,\n \"max\": 1.5265562069480685,\n \"num_unique_values\": 3,\n \"samples\": [\n -0.8243027980937295,\n 0.3511267044271694,\n 1.5265562069480685\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Displacement\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.137623714031951,\n \"min\": -0.9010196973264667,\n \"max\": 1.563050504928142,\n \"num_unique_values\": 5,\n \"samples\": [\n 0.41380037107026135,\n 1.563050504928142,\n 1.1442559646239991\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Horsepower\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.1121032669054438,\n \"min\": -1.0530252500433732,\n \"max\": 1.636915871166799,\n \"num_unique_values\": 5,\n \"samples\": [\n -0.3409820120759746,\n 1.636915871166799,\n 0.713896858986838\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Weight\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.2105911244818783,\n \"min\": -1.0725851775112334,\n \"max\": 1.4704204122935787,\n \"num_unique_values\": 5,\n \"samples\": [\n 0.29319029912629085,\n 1.4704204122935787,\n 1.3396170663861022\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Acceleration\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.8263413945969514,\n \"min\": -1.359240485750421,\n \"max\": 0.5487369309010532,\n \"num_unique_values\": 5,\n \"samples\": [\n 0.5487369309010532,\n -1.359240485750421,\n -0.6254030178075461\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"ModelYear\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3,\n \"min\": 71,\n \"max\": 78,\n \"num_unique_values\": 5,\n \"samples\": [\n 78,\n 71,\n 72\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Origin\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 1,\n \"max\": 3,\n \"num_unique_values\": 2,\n \"samples\": [\n 1,\n 3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 9 } ], "source": [ "numeric_column_names = ['Cylinders', 'Displacement', 'Horsepower', 'Weight', 'Acceleration']\n", "\n", "df_train_norm, df_test_norm = df_train.copy(), df_test.copy()\n", "\n", "for col_name in numeric_column_names:\n", " mean = train_stats.loc[col_name, 'mean']\n", " std = train_stats.loc[col_name, 'std']\n", " df_train_norm.loc[:, col_name] = (df_train_norm.loc[:, col_name] - mean)/std\n", " df_test_norm.loc[:, col_name] = (df_test_norm.loc[:, col_name] - mean)/std\n", "\n", "df_train_norm.tail()" ] }, { "cell_type": "markdown", "metadata": { "id": "gsn1sbM5y8w4" }, "source": [ "#### 수치형 열" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "jyFdfmKFy8w5", "outputId": "5ef6376f-cf19-41ac-ac75-516f2449ad68" }, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "WARNING:tensorflow:From /tmp/ipython-input-2109796009.py:4: numeric_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.\n" ] }, { "output_type": "execute_result", "data": { "text/plain": [ "[NumericColumn(key='Cylinders', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),\n", " NumericColumn(key='Displacement', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),\n", " NumericColumn(key='Horsepower', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),\n", " NumericColumn(key='Weight', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None),\n", " NumericColumn(key='Acceleration', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]" ] }, "metadata": {}, "execution_count": 10 } ], "source": [ "numeric_features = []\n", "\n", "for col_name in numeric_column_names:\n", " numeric_features.append(tf.feature_column.numeric_column(key=col_name))\n", "\n", "numeric_features" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "OtZQ-fNHy8w5", "outputId": "3c2e1104-7181-4664-8620-2e9ae13fc7a7" }, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "WARNING:tensorflow:From /tmp/ipython-input-1410326794.py:5: bucketized_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.\n" ] }, { "output_type": "stream", "name": "stdout", "text": [ "[BucketizedColumn(source_column=NumericColumn(key='ModelYear', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), boundaries=(73, 76, 79))]\n" ] } ], "source": [ "feature_year = tf.feature_column.numeric_column(key=\"ModelYear\")\n", "\n", "bucketized_features = []\n", "\n", "bucketized_features.append(tf.feature_column.bucketized_column(\n", " source_column=feature_year,\n", " boundaries=[73, 76, 79]))\n", "\n", "print(bucketized_features)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "V1OIxbvmy8w6", "outputId": "2adafd46-52f6-46ac-89eb-29987c3b192d" }, "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "WARNING:tensorflow:From /tmp/ipython-input-4078329294.py:1: categorical_column_with_vocabulary_list (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.\n", "WARNING:tensorflow:From /tmp/ipython-input-4078329294.py:6: indicator_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.\n", "Instructions for updating:\n", "Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.\n" ] }, { "output_type": "stream", "name": "stdout", "text": [ "[IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='Origin', vocabulary_list=(1, 2, 3), dtype=tf.int64, default_value=-1, num_oov_buckets=0))]\n" ] } ], "source": [ "feature_origin = tf.feature_column.categorical_column_with_vocabulary_list(\n", " key='Origin',\n", " vocabulary_list=[1, 2, 3])\n", "\n", "categorical_indicator_features = []\n", "categorical_indicator_features.append(tf.feature_column.indicator_column(feature_origin))\n", "\n", "print(categorical_indicator_features)" ] }, { "cell_type": "markdown", "metadata": { "id": "LM8nvPgJy8w6" }, "source": [ "### 사전에 준비된 추정기로 머신러닝 수행하기" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "e5TtiGv6y8w7", "outputId": "0ab27fe6-499c-4e16-d399-f34c10861e11" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "키: dict_keys(['Cylinders', 'Displacement', 'Horsepower', 'Weight', 'Acceleration', 'ModelYear', 'Origin'])\n", "ModelYear: tf.Tensor([82 78 76 72 78 73 70 78], shape=(8,), dtype=int64)\n" ] } ], "source": [ "def train_input_fn(df_train, batch_size=8):\n", " df = df_train.copy()\n", " train_x, train_y = df, df.pop('MPG')\n", " dataset = tf.data.Dataset.from_tensor_slices((dict(train_x), train_y))\n", "\n", " # 셔플, 반복, 배치\n", " return dataset.shuffle(1000).repeat().batch(batch_size)\n", "\n", "## 조사\n", "ds = train_input_fn(df_train_norm)\n", "batch = next(iter(ds))\n", "print('키:', batch[0].keys())\n", "print('ModelYear:', batch[0]['ModelYear'])" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "E0EbaOx2y8w7", "outputId": "73eba0f4-e189-488f-b42b-a94b5a2ba37c" }, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "[NumericColumn(key='Cylinders', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='Displacement', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='Horsepower', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='Weight', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='Acceleration', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), BucketizedColumn(source_column=NumericColumn(key='ModelYear', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), boundaries=(73, 76, 79)), IndicatorColumn(categorical_column=VocabularyListCategoricalColumn(key='Origin', vocabulary_list=(1, 2, 3), dtype=tf.int64, default_value=-1, num_oov_buckets=0))]\n" ] } ], "source": [ "all_feature_columns = (numeric_features +\n", " bucketized_features +\n", " categorical_indicator_features)\n", "\n", "print(all_feature_columns)" ] }, { "cell_type": "markdown", "source": [ "**최신 텐서플로에서 `tf.estimator`가 삭제되어 더이상 사용할 수 없습니다.**" ], "metadata": { "id": "zAw02RlcYStW" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2tSgfsKHy8w7" }, "outputs": [], "source": [ "# regressor = tf.estimator.DNNRegressor(\n", "# feature_columns=all_feature_columns,\n", "# hidden_units=[32, 10],\n", "# model_dir='models/autompg-dnnregressor/')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "sESlS5e5y8w7", "scrolled": false }, "outputs": [], "source": [ "# EPOCHS = 1000\n", "# BATCH_SIZE = 8\n", "# total_steps = EPOCHS * int(np.ceil(len(df_train) / BATCH_SIZE))\n", "# print('훈련 스텝:', total_steps)\n", "\n", "# regressor.train(\n", "# input_fn=lambda:train_input_fn(df_train_norm, batch_size=BATCH_SIZE),\n", "# steps=total_steps)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "k7-xm6ggy8w8" }, "outputs": [], "source": [ "# reloaded_regressor = tf.estimator.DNNRegressor(\n", "# feature_columns=all_feature_columns,\n", "# hidden_units=[32, 10],\n", "# warm_start_from='models/autompg-dnnregressor/',\n", "# model_dir='models/autompg-dnnregressor/')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "BcSjYtdQy8w8" }, "outputs": [], "source": [ "# def eval_input_fn(df_test, batch_size=8):\n", "# df = df_test.copy()\n", "# test_x, test_y = df, df.pop('MPG')\n", "# dataset = tf.data.Dataset.from_tensor_slices((dict(test_x), test_y))\n", "\n", "# return dataset.batch(batch_size)\n", "\n", "# eval_results = reloaded_regressor.evaluate(\n", "# input_fn=lambda:eval_input_fn(df_test_norm, batch_size=8))\n", "\n", "# for key in eval_results:\n", "# print('{:15s} {}'.format(key, eval_results[key]))\n", "\n", "# print('평균 손실 {:.4f}'.format(eval_results['average_loss']))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "B8u7CKJ_y8w8" }, "outputs": [], "source": [ "# pred_res = regressor.predict(input_fn=lambda: eval_input_fn(df_test_norm, batch_size=8))\n", "\n", "# print(next(iter(pred_res)))" ] }, { "cell_type": "markdown", "metadata": { "id": "1WMTIRkXy8w9" }, "source": [ "#### Boosted Tree Regressor" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "zcrXe74iy8w9" }, "outputs": [], "source": [ "# boosted_tree = tf.estimator.BoostedTreesRegressor(\n", "# feature_columns=all_feature_columns,\n", "# n_batches_per_layer=20,\n", "# n_trees=200)\n", "\n", "# boosted_tree.train(\n", "# input_fn=lambda:train_input_fn(df_train_norm, batch_size=BATCH_SIZE))\n", "\n", "# eval_results = boosted_tree.evaluate(\n", "# input_fn=lambda:eval_input_fn(df_test_norm, batch_size=8))\n", "\n", "# print(eval_results)\n", "\n", "# print('평균 손실 {:.4f}'.format(eval_results['average_loss']))" ] } ], "metadata": { "accelerator": "GPU", "colab": { "name": "ch14_part2.ipynb", "provenance": [], "gpuType": "A100" }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 0 }