{ "cells": [ { "cell_type": "markdown", "id": "a72605dd", "metadata": {}, "source": [ "# ARIM-Academy: 基礎編 Scikit-learn(予測モデル)" ] }, { "cell_type": "markdown", "id": "1c2e3870", "metadata": {}, "source": [ "## 本編の目標\n", "\n", "この演習では、鉄の大気腐食量を気象データから予測するための『**大気腐食データセット**』を活用し、予測モデルにかかる機械学習の基礎を習得します。\n", "\n", "### 本編における内容\n", "* **予測アルゴリズムの習得**: このデータセットは、6か所で月次に測定された腐食量データを含んでいます。これを用いて、線形回帰やランダムフォレストなどの予測アルゴリズムを学び、腐食量の予測を実践します。\n", "\n", "* **特徴選択と次元削減の理解**: 気象因子を特徴量とするこのデータセットを分析し、適切な特徴量の選択や次元削減を行うことで、モデルのパフォーマンスを向上させる方法を理解します。\n", "\n", "* **モデル評価とパフォーマンス指標の理解**: データセットを使用してトレーニングしたモデルを評価し、R²などのパフォーマンス指標を用いて予測精度を評価するスキルを習得します。\n", "\n", "* **データの可視化と解釈**: 特徴量の分布や相関関係を視覚化し、データの特性やパターンを把握する方法を学びます。これにより、モデルの解釈や予測結果の説明に役立つ洞察を得ることができます。\n", "\n", "---\n", "\n", "## データセット\n", "\n", "**大気腐食データセット**は、日本の6地点における月次の標準試験片の腐食量と、当時の気象庁の観測データを組み合わせたものです。このデータセットは、松波らによる『海塩輸送シミュレーションと気象情報を用いた機械学習に基づく大気腐食量評価モデル開発と高精細腐食環境地図の作成』[1] で使用されたデータの一部に基づいています。ただし、本データセットはPython講義用に作成されたもので、論文で使用されたデータセットとは異なるものです。\n", "\n", "[1] 松波 成行, 柳生 進二郎, 篠原 正, 片山 英樹, 須藤 仁, 服部 康男, 平口 博丸 \"海塩輸送シミュレーションと気象情報を用いた機械学習に基づく大気腐食量評価モデル開発と高精細腐食環境地図の作成\", 土木学会論文集A1(構造・地震工学) Vol.75, p141-160(2019) https://www.jstage.jst.go.jp/article/jscejseee/75/2/75_141/_article/-char/ja/\n", "\n", "\n", "### 腐食観測量\n", "1. **Corrosion**: 腐食速度 (g/m2/y)\n", "\n", "### 気象変数\n", "気象データは気象庁の観測値を基にしており、以下の16変数が含まれます。これらの変数は、説明変数として使用します。\n", "\n", "1. **AT** : 平均気温(℃) \n", "2. **HAT** : 日最高気温の平均(℃) \n", "3. **LAT** : 日最低気温の平均(℃) \n", "4. **HT** : 最高気温(℃) \n", "5. **LT** : 最低気温(℃) \n", "6. **Rain** : 降水量の合計(mm) \n", "7. **MRain** : 日降水量の最大(mm) \n", "8. **Sun** : 日照時間(時間) \n", "9. **MSnow** : 最深積雪(cm) \n", "10. **Snow** : 降雪量合計(cm) \n", "11. **AW** : 平均風速(m/s) \n", "12. **MW** : 最大風速(m/s) \n", "13. **PMW** : 最大瞬間風速(m/s) \n", "14. **Vap** : 平均蒸気圧(hPa) \n", "15. **Hum** : 平均湿度(%) \n", "16. **LHum** : 最小相対湿度(%)\n", "---\n" ] }, { "cell_type": "markdown", "id": "17d62284", "metadata": {}, "source": [ "### 教材への接続\n", "google colabにおけるオンラインの場合にこのラインを実行します。(Google colabに接続しない場合には不要)" ] }, { "cell_type": "code", "execution_count": null, "id": "84bf1a6d", "metadata": {}, "outputs": [], "source": [ "!git clone https://github.com/ARIM-Academy/Advanced_Tutorial_1.git\n", "%cd Advanced_Tutorial_1" ] }, { "cell_type": "markdown", "id": "e0d32787", "metadata": {}, "source": [ "# 1.データセットの読み込みと前処理" ] }, { "cell_type": "markdown", "id": "a0a7516d", "metadata": {}, "source": [ "### ライブラリのインポート\n", "この演習で使用するPythonライブラリをimportします。なお、機械学習に関連するscikit-learnライブラリは、後半で別途importします。" ] }, { "cell_type": "code", "execution_count": 1, "id": "878e2fd9", "metadata": {}, "outputs": [], "source": [ "#ライブラリ\n", "import pandas as pd\n", "import numpy as np \n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "id": "7e2bdbea", "metadata": {}, "source": [ "\n", "\n", "### サンプルファイルの読み込み\n", "\n", "`pandas`ライブラリの`read_csv()`関数は、CSVファイルを読み込んで`pandas`の`DataFrame`形式に変換するための関数です。このセクションでは、[data]フォルダ内に保存されている`corrosion_data.csv`ファイルを`DataFrame`として読み込み、その結果を`df`という変数にうになっています。。" ] }, { "cell_type": "code", "execution_count": 3, "id": "48ba15f9", "metadata": {}, "outputs": [], "source": [ "#データセットの読み込み\n", "df = pd.read_csv('data/corrosin_data.csv', index_col=0)" ] }, { "cell_type": "code", "execution_count": 4, "id": "90b74b4e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Month | \n", "AT | \n", "HAT | \n", "LAT | \n", "HT | \n", "LT | \n", "Rain | \n", "Mrain | \n", "Sun | \n", "Msnow | \n", "Snow | \n", "AW | \n", "MW | \n", "PMW | \n", "Vap | \n", "Hum | \n", "Lhum | \n", "Corrosion | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Place | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
Naha | \n", "1 | \n", "28.3 | \n", "26.0 | \n", "31.1 | \n", "33.6 | \n", "23.4 | \n", "178.0 | \n", "79.0 | \n", "206.5 | \n", "0 | \n", "0 | \n", "6.0 | \n", "14.5 | \n", "20.9 | \n", "28.5 | \n", "74 | \n", "43 | \n", "674 | \n", "
Naha | \n", "2 | \n", "25.3 | \n", "23.5 | \n", "27.7 | \n", "30.8 | \n", "19.9 | \n", "200.0 | \n", "118.0 | \n", "129.7 | \n", "0 | \n", "0 | \n", "7.3 | \n", "22.0 | \n", "33.6 | \n", "23.4 | \n", "72 | \n", "40 | \n", "2606 | \n", "
Naha | \n", "3 | \n", "21.3 | \n", "19.1 | \n", "23.8 | \n", "28.5 | \n", "13.9 | \n", "121.0 | \n", "79.0 | \n", "120.0 | \n", "0 | \n", "0 | \n", "5.0 | \n", "12.7 | \n", "20.6 | \n", "17.4 | \n", "66 | \n", "34 | \n", "546 | \n", "
Naha | \n", "4 | \n", "17.3 | \n", "15.0 | \n", "19.6 | \n", "23.6 | \n", "10.3 | \n", "130.0 | \n", "52.0 | \n", "89.4 | \n", "0 | \n", "0 | \n", "5.4 | \n", "14.5 | \n", "21.5 | \n", "12.7 | \n", "64 | \n", "37 | \n", "704 | \n", "
Naha | \n", "5 | \n", "16.8 | \n", "14.1 | \n", "19.8 | \n", "23.5 | \n", "10.6 | \n", "66.0 | \n", "36.5 | \n", "145.4 | \n", "0 | \n", "0 | \n", "5.0 | \n", "13.1 | \n", "20.2 | \n", "11.8 | \n", "61 | \n", "32 | \n", "620 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
Choshi | \n", "8 | \n", "14.3 | \n", "10.8 | \n", "17.6 | \n", "21.3 | \n", "3.5 | \n", "70.5 | \n", "13.5 | \n", "232.2 | \n", "0 | \n", "0 | \n", "5.4 | \n", "15.6 | \n", "19.2 | \n", "11.0 | \n", "67 | \n", "22 | \n", "567 | \n", "
Choshi | \n", "9 | \n", "18.2 | \n", "15.2 | \n", "21.6 | \n", "26.1 | \n", "12.2 | \n", "151.0 | \n", "56.5 | \n", "257.0 | \n", "0 | \n", "0 | \n", "5.7 | \n", "21.1 | \n", "26.8 | \n", "16.6 | \n", "78 | \n", "30 | \n", "541 | \n", "
Choshi | \n", "10 | \n", "21.1 | \n", "18.9 | \n", "24.0 | \n", "27.3 | \n", "17.1 | \n", "177.5 | \n", "56.0 | \n", "172.3 | \n", "0 | \n", "0 | \n", "4.5 | \n", "14.3 | \n", "19.4 | \n", "22.3 | \n", "89 | \n", "46 | \n", "830 | \n", "
Choshi | \n", "11 | \n", "24.0 | \n", "21.6 | \n", "27.1 | \n", "30.9 | \n", "20.0 | \n", "77.5 | \n", "23.0 | \n", "214.9 | \n", "0 | \n", "0 | \n", "4.5 | \n", "12.8 | \n", "18.6 | \n", "26.5 | \n", "89 | \n", "53 | \n", "676 | \n", "
Choshi | \n", "12 | \n", "24.9 | \n", "22.6 | \n", "28.2 | \n", "33.1 | \n", "18.9 | \n", "113.5 | \n", "36.0 | \n", "212.7 | \n", "0 | \n", "0 | \n", "6.0 | \n", "14.1 | \n", "20.3 | \n", "28.2 | \n", "90 | \n", "55 | \n", "745 | \n", "
71 rows × 18 columns
\n", "\n", " | AT | \n", "HAT | \n", "LAT | \n", "HT | \n", "LT | \n", "Rain | \n", "Mrain | \n", "Sun | \n", "Msnow | \n", "Snow | \n", "AW | \n", "MW | \n", "PMW | \n", "Vap | \n", "Hum | \n", "Lhum | \n", "Corrosion | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Place | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
Naha | \n", "28.3 | \n", "26.0 | \n", "31.1 | \n", "33.6 | \n", "23.4 | \n", "178.0 | \n", "79.0 | \n", "206.5 | \n", "0 | \n", "0 | \n", "6.0 | \n", "14.5 | \n", "20.9 | \n", "28.5 | \n", "74 | \n", "43 | \n", "674 | \n", "
Naha | \n", "25.3 | \n", "23.5 | \n", "27.7 | \n", "30.8 | \n", "19.9 | \n", "200.0 | \n", "118.0 | \n", "129.7 | \n", "0 | \n", "0 | \n", "7.3 | \n", "22.0 | \n", "33.6 | \n", "23.4 | \n", "72 | \n", "40 | \n", "2606 | \n", "
Naha | \n", "21.3 | \n", "19.1 | \n", "23.8 | \n", "28.5 | \n", "13.9 | \n", "121.0 | \n", "79.0 | \n", "120.0 | \n", "0 | \n", "0 | \n", "5.0 | \n", "12.7 | \n", "20.6 | \n", "17.4 | \n", "66 | \n", "34 | \n", "546 | \n", "
Naha | \n", "17.3 | \n", "15.0 | \n", "19.6 | \n", "23.6 | \n", "10.3 | \n", "130.0 | \n", "52.0 | \n", "89.4 | \n", "0 | \n", "0 | \n", "5.4 | \n", "14.5 | \n", "21.5 | \n", "12.7 | \n", "64 | \n", "37 | \n", "704 | \n", "
Naha | \n", "16.8 | \n", "14.1 | \n", "19.8 | \n", "23.5 | \n", "10.6 | \n", "66.0 | \n", "36.5 | \n", "145.4 | \n", "0 | \n", "0 | \n", "5.0 | \n", "13.1 | \n", "20.2 | \n", "11.8 | \n", "61 | \n", "32 | \n", "620 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
Choshi | \n", "14.3 | \n", "10.8 | \n", "17.6 | \n", "21.3 | \n", "3.5 | \n", "70.5 | \n", "13.5 | \n", "232.2 | \n", "0 | \n", "0 | \n", "5.4 | \n", "15.6 | \n", "19.2 | \n", "11.0 | \n", "67 | \n", "22 | \n", "567 | \n", "
Choshi | \n", "18.2 | \n", "15.2 | \n", "21.6 | \n", "26.1 | \n", "12.2 | \n", "151.0 | \n", "56.5 | \n", "257.0 | \n", "0 | \n", "0 | \n", "5.7 | \n", "21.1 | \n", "26.8 | \n", "16.6 | \n", "78 | \n", "30 | \n", "541 | \n", "
Choshi | \n", "21.1 | \n", "18.9 | \n", "24.0 | \n", "27.3 | \n", "17.1 | \n", "177.5 | \n", "56.0 | \n", "172.3 | \n", "0 | \n", "0 | \n", "4.5 | \n", "14.3 | \n", "19.4 | \n", "22.3 | \n", "89 | \n", "46 | \n", "830 | \n", "
Choshi | \n", "24.0 | \n", "21.6 | \n", "27.1 | \n", "30.9 | \n", "20.0 | \n", "77.5 | \n", "23.0 | \n", "214.9 | \n", "0 | \n", "0 | \n", "4.5 | \n", "12.8 | \n", "18.6 | \n", "26.5 | \n", "89 | \n", "53 | \n", "676 | \n", "
Choshi | \n", "24.9 | \n", "22.6 | \n", "28.2 | \n", "33.1 | \n", "18.9 | \n", "113.5 | \n", "36.0 | \n", "212.7 | \n", "0 | \n", "0 | \n", "6.0 | \n", "14.1 | \n", "20.3 | \n", "28.2 | \n", "90 | \n", "55 | \n", "745 | \n", "
71 rows × 17 columns
\n", "\n", " | AT | \n", "HAT | \n", "LAT | \n", "HT | \n", "LT | \n", "Rain | \n", "Mrain | \n", "Sun | \n", "Msnow | \n", "Snow | \n", "AW | \n", "MW | \n", "PMW | \n", "Vap | \n", "Hum | \n", "Lhum | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Place | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
Naha | \n", "28.3 | \n", "26.0 | \n", "31.1 | \n", "33.6 | \n", "23.4 | \n", "178.0 | \n", "79.0 | \n", "206.5 | \n", "0 | \n", "0 | \n", "6.0 | \n", "14.5 | \n", "20.9 | \n", "28.5 | \n", "74 | \n", "43 | \n", "
Naha | \n", "25.3 | \n", "23.5 | \n", "27.7 | \n", "30.8 | \n", "19.9 | \n", "200.0 | \n", "118.0 | \n", "129.7 | \n", "0 | \n", "0 | \n", "7.3 | \n", "22.0 | \n", "33.6 | \n", "23.4 | \n", "72 | \n", "40 | \n", "
Naha | \n", "21.3 | \n", "19.1 | \n", "23.8 | \n", "28.5 | \n", "13.9 | \n", "121.0 | \n", "79.0 | \n", "120.0 | \n", "0 | \n", "0 | \n", "5.0 | \n", "12.7 | \n", "20.6 | \n", "17.4 | \n", "66 | \n", "34 | \n", "
Naha | \n", "17.3 | \n", "15.0 | \n", "19.6 | \n", "23.6 | \n", "10.3 | \n", "130.0 | \n", "52.0 | \n", "89.4 | \n", "0 | \n", "0 | \n", "5.4 | \n", "14.5 | \n", "21.5 | \n", "12.7 | \n", "64 | \n", "37 | \n", "
Naha | \n", "16.8 | \n", "14.1 | \n", "19.8 | \n", "23.5 | \n", "10.6 | \n", "66.0 | \n", "36.5 | \n", "145.4 | \n", "0 | \n", "0 | \n", "5.0 | \n", "13.1 | \n", "20.2 | \n", "11.8 | \n", "61 | \n", "32 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
Choshi | \n", "14.3 | \n", "10.8 | \n", "17.6 | \n", "21.3 | \n", "3.5 | \n", "70.5 | \n", "13.5 | \n", "232.2 | \n", "0 | \n", "0 | \n", "5.4 | \n", "15.6 | \n", "19.2 | \n", "11.0 | \n", "67 | \n", "22 | \n", "
Choshi | \n", "18.2 | \n", "15.2 | \n", "21.6 | \n", "26.1 | \n", "12.2 | \n", "151.0 | \n", "56.5 | \n", "257.0 | \n", "0 | \n", "0 | \n", "5.7 | \n", "21.1 | \n", "26.8 | \n", "16.6 | \n", "78 | \n", "30 | \n", "
Choshi | \n", "21.1 | \n", "18.9 | \n", "24.0 | \n", "27.3 | \n", "17.1 | \n", "177.5 | \n", "56.0 | \n", "172.3 | \n", "0 | \n", "0 | \n", "4.5 | \n", "14.3 | \n", "19.4 | \n", "22.3 | \n", "89 | \n", "46 | \n", "
Choshi | \n", "24.0 | \n", "21.6 | \n", "27.1 | \n", "30.9 | \n", "20.0 | \n", "77.5 | \n", "23.0 | \n", "214.9 | \n", "0 | \n", "0 | \n", "4.5 | \n", "12.8 | \n", "18.6 | \n", "26.5 | \n", "89 | \n", "53 | \n", "
Choshi | \n", "24.9 | \n", "22.6 | \n", "28.2 | \n", "33.1 | \n", "18.9 | \n", "113.5 | \n", "36.0 | \n", "212.7 | \n", "0 | \n", "0 | \n", "6.0 | \n", "14.1 | \n", "20.3 | \n", "28.2 | \n", "90 | \n", "55 | \n", "
71 rows × 16 columns
\n", "LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()