{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 資料視覺化 Data visualization"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- 資料視覺化(Data Visualization),指的是利用圖形化工具從資料庫、數據檔中萃取有用的資料,經過轉化使其成為易於閱讀、理解的資訊。日常生活中,我們時常運用到資料視覺化工具。\n",
"- 做資料分析時,必須記得以\"從資料視覺化開始,從資料視覺化結束\",拿到資料的一開始,為了使我們更理解資料間的關係,我們透過資料視覺化來進行輔助,以提供往後分析時的依據,而在做完資料分析後,必須為結果提供呈現,此時的資料視覺化幫助我們讓分析的結果具有說服力,加深結果的理解。\n",
"- 繪圖製表時,必須想到這張圖表所能提供的資訊量,以及能否使觀看者能夠容易理解。若所繪的圖表能夠傳達的資訊太少,就不需要繪圖的輔助,另一方面,如果繪圖的結果太複雜、牽涉維度太廣,亦會造成觀看者混淆,造成反效果。\n",
"- 本篇主要會介紹在 Python 中主要使用的兩個資料視覺化的函式庫 Matplotlib 以及 Seaborn 兩種工具,並各針對兩者的使用進行簡單介紹"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Matplotlib 官方文件: https://matplotlib.org/
\n",
"Seaborn 官方文件: https://seaborn.pydata.org/
\n",
"以上兩個是在進行資料視覺化時,常使用的兩個套件,Matplotlib的自由度高,Seaborn呈現方式多元成熟,兩者能夠互相搭配使用"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"-------------------------------------------------------------------------------------------------------------------------------"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 一、Matplotlib"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Matplotlib 同時也是經典的 Python 視覺化繪圖庫, Matplotlib 就是 MATLAB + Plot + Library 的簡稱,因為是模仿 MATLAB 建立的繪圖庫,所以繪圖風格會與 MATLAB 有點類似。\n",
"- 能處理幾乎所有二維以及三維的資料視覺化圖形\n",
"- 自定義程度高,能夠自由調整各類參數決定圖形的呈現、標籤"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 1. 載入套件"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import csv\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. 載入資料\n",
"這是份包含不同類別鋼鐵的資料,包含長度、亮度、面積等資訊
\n",
"鋼鐵的類別為: Pastry, Z_Scratch, K_Scatch, Stains, Dirtiness, Bumps, Other_Faults等,我們希望了解各種鋼鐵類別間,是否有因為不同的屬性差異而造成不同的分類結果,或者是屬性間的相關性,因此可以透過資料視覺化來先進行初步的了解"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"
\n", " | X_Minimum | \n", "X_Maximum | \n", "Y_Minimum | \n", "Y_Maximum | \n", "Pixels_Areas | \n", "X_Perimeter | \n", "Y_Perimeter | \n", "Sum_of_Luminosity | \n", "Minimum_of_Luminosity | \n", "Maximum_of_Luminosity | \n", "... | \n", "Orientation_Index | \n", "Luminosity_Index | \n", "SigmoidOfAreas | \n", "Pastry | \n", "Z_Scratch | \n", "K_Scatch | \n", "Stains | \n", "Dirtiness | \n", "Bumps | \n", "Other_Faults | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "42 | \n", "50 | \n", "270900 | \n", "270944 | \n", "267 | \n", "17 | \n", "44 | \n", "24220 | \n", "76 | \n", "108 | \n", "... | \n", "0.8182 | \n", "-0.2913 | \n", "0.5822 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
1 | \n", "645 | \n", "651 | \n", "2538079 | \n", "2538108 | \n", "108 | \n", "10 | \n", "30 | \n", "11397 | \n", "84 | \n", "123 | \n", "... | \n", "0.7931 | \n", "-0.1756 | \n", "0.2984 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
2 | \n", "829 | \n", "835 | \n", "1553913 | \n", "1553931 | \n", "71 | \n", "8 | \n", "19 | \n", "7972 | \n", "99 | \n", "125 | \n", "... | \n", "0.6667 | \n", "-0.1228 | \n", "0.2150 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
3 | \n", "853 | \n", "860 | \n", "369370 | \n", "369415 | \n", "176 | \n", "13 | \n", "45 | \n", "18996 | \n", "99 | \n", "126 | \n", "... | \n", "0.8444 | \n", "-0.1568 | \n", "0.5212 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
4 | \n", "1289 | \n", "1306 | \n", "498078 | \n", "498335 | \n", "2409 | \n", "60 | \n", "260 | \n", "246930 | \n", "37 | \n", "126 | \n", "... | \n", "0.9338 | \n", "-0.1992 | \n", "1.0000 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
5 rows × 34 columns
\n", "\n", " | X_Minimum | \n", "X_Maximum | \n", "Y_Minimum | \n", "Y_Maximum | \n", "Pixels_Areas | \n", "X_Perimeter | \n", "Y_Perimeter | \n", "Sum_of_Luminosity | \n", "Minimum_of_Luminosity | \n", "Maximum_of_Luminosity | \n", "... | \n", "Edges_X_Index | \n", "Edges_Y_Index | \n", "Outside_Global_Index | \n", "LogOfAreas | \n", "Log_X_Index | \n", "Log_Y_Index | \n", "Orientation_Index | \n", "Luminosity_Index | \n", "SigmoidOfAreas | \n", "class | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "42 | \n", "50 | \n", "270900 | \n", "270944 | \n", "267 | \n", "17 | \n", "44 | \n", "24220 | \n", "76 | \n", "108 | \n", "... | \n", "0.4706 | \n", "1.0000 | \n", "1.0 | \n", "2.4265 | \n", "0.9031 | \n", "1.6435 | \n", "0.8182 | \n", "-0.2913 | \n", "0.5822 | \n", "Pastry | \n", "
1 | \n", "645 | \n", "651 | \n", "2538079 | \n", "2538108 | \n", "108 | \n", "10 | \n", "30 | \n", "11397 | \n", "84 | \n", "123 | \n", "... | \n", "0.6000 | \n", "0.9667 | \n", "1.0 | \n", "2.0334 | \n", "0.7782 | \n", "1.4624 | \n", "0.7931 | \n", "-0.1756 | \n", "0.2984 | \n", "Pastry | \n", "
2 | \n", "829 | \n", "835 | \n", "1553913 | \n", "1553931 | \n", "71 | \n", "8 | \n", "19 | \n", "7972 | \n", "99 | \n", "125 | \n", "... | \n", "0.7500 | \n", "0.9474 | \n", "1.0 | \n", "1.8513 | \n", "0.7782 | \n", "1.2553 | \n", "0.6667 | \n", "-0.1228 | \n", "0.2150 | \n", "Pastry | \n", "
3 | \n", "853 | \n", "860 | \n", "369370 | \n", "369415 | \n", "176 | \n", "13 | \n", "45 | \n", "18996 | \n", "99 | \n", "126 | \n", "... | \n", "0.5385 | \n", "1.0000 | \n", "1.0 | \n", "2.2455 | \n", "0.8451 | \n", "1.6532 | \n", "0.8444 | \n", "-0.1568 | \n", "0.5212 | \n", "Pastry | \n", "
4 | \n", "1289 | \n", "1306 | \n", "498078 | \n", "498335 | \n", "2409 | \n", "60 | \n", "260 | \n", "246930 | \n", "37 | \n", "126 | \n", "... | \n", "0.2833 | \n", "0.9885 | \n", "1.0 | \n", "3.3818 | \n", "1.2305 | \n", "2.4099 | \n", "0.9338 | \n", "-0.1992 | \n", "1.0000 | \n", "Pastry | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1936 | \n", "249 | \n", "277 | \n", "325780 | \n", "325796 | \n", "273 | \n", "54 | \n", "22 | \n", "35033 | \n", "119 | \n", "141 | \n", "... | \n", "0.5185 | \n", "0.7273 | \n", "0.0 | \n", "2.4362 | \n", "1.4472 | \n", "1.2041 | \n", "-0.4286 | \n", "0.0026 | \n", "0.7254 | \n", "Other_Faults | \n", "
1937 | \n", "144 | \n", "175 | \n", "340581 | \n", "340598 | \n", "287 | \n", "44 | \n", "24 | \n", "34599 | \n", "112 | \n", "133 | \n", "... | \n", "0.7046 | \n", "0.7083 | \n", "0.0 | \n", "2.4579 | \n", "1.4914 | \n", "1.2305 | \n", "-0.4516 | \n", "-0.0582 | \n", "0.8173 | \n", "Other_Faults | \n", "
1938 | \n", "145 | \n", "174 | \n", "386779 | \n", "386794 | \n", "292 | \n", "40 | \n", "22 | \n", "37572 | \n", "120 | \n", "140 | \n", "... | \n", "0.7250 | \n", "0.6818 | \n", "0.0 | \n", "2.4654 | \n", "1.4624 | \n", "1.1761 | \n", "-0.4828 | \n", "0.0052 | \n", "0.7079 | \n", "Other_Faults | \n", "
1939 | \n", "137 | \n", "170 | \n", "422497 | \n", "422528 | \n", "419 | \n", "97 | \n", "47 | \n", "52715 | \n", "117 | \n", "140 | \n", "... | \n", "0.3402 | \n", "0.6596 | \n", "0.0 | \n", "2.6222 | \n", "1.5185 | \n", "1.4914 | \n", "-0.0606 | \n", "-0.0171 | \n", "0.9919 | \n", "Other_Faults | \n", "
1940 | \n", "1261 | \n", "1281 | \n", "87951 | \n", "87967 | \n", "103 | \n", "26 | \n", "22 | \n", "11682 | \n", "101 | \n", "133 | \n", "... | \n", "0.7692 | \n", "0.7273 | \n", "0.0 | \n", "2.0128 | \n", "1.3010 | \n", "1.2041 | \n", "-0.2000 | \n", "-0.1139 | \n", "0.5296 | \n", "Other_Faults | \n", "
1941 rows × 28 columns
\n", "