{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# [教學目標]\n", "- 以下程式碼將示範如何繪製特定特徵與目標值之間的散佈圖, 更直覺地看出特徵與目標的關係 \n", "- 繪製前需要觀察資料, 將異常值排除, 並且轉換成適合的數值單位輔助觀察 \n", "- 好的圖可以讓你更快認識資料, 繪圖畫的好也是一種藝術" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# [範例重點]\n", "- 直接列出的觀察方式 (In[3], Out[3])\n", "- 出現異常數值的資料調整方式 (In[4])\n", "- 散佈圖異常與其調整方式 (Out[5], In[6], Out[6])" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# 載入需要的套件\n", "import os\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "# 設定 data_path\n", "dir_data = './data/'" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Path of read in data: ./data/application_train.csv\n" ] }, { "data": { "text/html": [ "
\n", " | SK_ID_CURR | \n", "TARGET | \n", "NAME_CONTRACT_TYPE | \n", "CODE_GENDER | \n", "FLAG_OWN_CAR | \n", "FLAG_OWN_REALTY | \n", "CNT_CHILDREN | \n", "AMT_INCOME_TOTAL | \n", "AMT_CREDIT | \n", "AMT_ANNUITY | \n", "... | \n", "FLAG_DOCUMENT_18 | \n", "FLAG_DOCUMENT_19 | \n", "FLAG_DOCUMENT_20 | \n", "FLAG_DOCUMENT_21 | \n", "AMT_REQ_CREDIT_BUREAU_HOUR | \n", "AMT_REQ_CREDIT_BUREAU_DAY | \n", "AMT_REQ_CREDIT_BUREAU_WEEK | \n", "AMT_REQ_CREDIT_BUREAU_MON | \n", "AMT_REQ_CREDIT_BUREAU_QRT | \n", "AMT_REQ_CREDIT_BUREAU_YEAR | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "100002 | \n", "1 | \n", "Cash loans | \n", "M | \n", "N | \n", "Y | \n", "0 | \n", "202500.0 | \n", "406597.5 | \n", "24700.5 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "
1 | \n", "100003 | \n", "0 | \n", "Cash loans | \n", "F | \n", "N | \n", "N | \n", "0 | \n", "270000.0 | \n", "1293502.5 | \n", "35698.5 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
2 | \n", "100004 | \n", "0 | \n", "Revolving loans | \n", "M | \n", "Y | \n", "Y | \n", "0 | \n", "67500.0 | \n", "135000.0 | \n", "6750.0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
3 | \n", "100006 | \n", "0 | \n", "Cash loans | \n", "F | \n", "N | \n", "Y | \n", "0 | \n", "135000.0 | \n", "312682.5 | \n", "29686.5 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
4 | \n", "100007 | \n", "0 | \n", "Cash loans | \n", "M | \n", "N | \n", "Y | \n", "0 | \n", "121500.0 | \n", "513000.0 | \n", "21865.5 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
5 rows × 122 columns
\n", "\n", " | SK_ID_CURR | \n", "TARGET | \n", "NAME_CONTRACT_TYPE | \n", "CODE_GENDER | \n", "FLAG_OWN_CAR | \n", "FLAG_OWN_REALTY | \n", "CNT_CHILDREN | \n", "AMT_INCOME_TOTAL | \n", "AMT_CREDIT | \n", "AMT_ANNUITY | \n", "... | \n", "FLAG_DOCUMENT_18 | \n", "FLAG_DOCUMENT_19 | \n", "FLAG_DOCUMENT_20 | \n", "FLAG_DOCUMENT_21 | \n", "AMT_REQ_CREDIT_BUREAU_HOUR | \n", "AMT_REQ_CREDIT_BUREAU_DAY | \n", "AMT_REQ_CREDIT_BUREAU_WEEK | \n", "AMT_REQ_CREDIT_BUREAU_MON | \n", "AMT_REQ_CREDIT_BUREAU_QRT | \n", "AMT_REQ_CREDIT_BUREAU_YEAR | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "100002 | \n", "1 | \n", "0 | \n", "M | \n", "0 | \n", "1 | \n", "0 | \n", "202500.0 | \n", "406597.5 | \n", "24700.5 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "1.0 | \n", "
1 | \n", "100003 | \n", "0 | \n", "0 | \n", "F | \n", "0 | \n", "0 | \n", "0 | \n", "270000.0 | \n", "1293502.5 | \n", "35698.5 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
2 | \n", "100004 | \n", "0 | \n", "1 | \n", "M | \n", "1 | \n", "1 | \n", "0 | \n", "67500.0 | \n", "135000.0 | \n", "6750.0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
3 | \n", "100006 | \n", "0 | \n", "0 | \n", "F | \n", "0 | \n", "1 | \n", "0 | \n", "135000.0 | \n", "312682.5 | \n", "29686.5 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
4 | \n", "100007 | \n", "0 | \n", "0 | \n", "M | \n", "0 | \n", "1 | \n", "0 | \n", "121500.0 | \n", "513000.0 | \n", "21865.5 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "0.0 | \n", "
5 rows × 122 columns
\n", "