{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 如何使用預訓練的模型來分類照片中的物體" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "卷積神經網絡現在能夠在一些電腦視覺任務上勝過人類的肉眼,例如圖像分類。\n", "\n", "也就是說,給出一個物體的照片,我們可以讓電腦來回答這個照片是1000個特定\n", "種類物體中的哪一類這樣的問題。\n", "\n", "在本教程中,我們將使用幾個Keras己經內建好的預訓練模型來進行圖像分類, 其中包括了:\n", "* VGG16\n", "* VGG19\n", "* ResNet50\n", "* InceptionV3\n", "* InceptionResNetV2\n", "* Xception\n", "* MobileNet\n", "\n", "![imagenet](https://kaggle2.blob.core.windows.net/datasets-images/2798/4640/d21dc5df65889e105877476d81240e11/data-original.png)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Platform: Windows-7-6.1.7601-SP1\n", "Tensorflow version: 1.4.0\n", "Keras version: 2.1.1\n" ] } ], "source": [ "# 這個Jupyter Notebook的環境\n", "import platform\n", "import tensorflow\n", "import keras\n", "print(\"Platform: {}\".format(platform.platform()))\n", "print(\"Tensorflow version: {}\".format(tensorflow.__version__))\n", "print(\"Keras version: {}\".format(keras.__version__))\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg\n", "import numpy as np\n", "from IPython.display import Image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 開發一個簡單的照片分類器" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### VGG16" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.獲取範例圖像\n", "\n", "\n", "首先,我們需要一個我們可以進行分類的圖像。\n", "\n", "您可以在這裡從Google隨意檢索一些動物照片並下載到這個jupyter notebook所處的目錄。比如說:\n", "我下載了: https://www.elephantvoices.org/images/slider/evimg16tf.jpg\n", "\n", "![evimg16tf.jpg](https://www.elephantvoices.org/images/slider/evimg16tf.jpg)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2.加載VGG模型\n", "\n", "加載在Keras己經預訓練好的VGG-16的權重模型檔。" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "from keras.applications.vgg16 import VGG16\n", "\n", "# 載入權重\n", "model_vgg16 = VGG16()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.加載和準備圖像\n", "\n", "接下來,我們可以將圖像加載進來,並轉換成預訓練網絡所要求的張量規格。\n", "\n", "Keras提供了一些工具來幫助完成這一步驟。\n", "\n", "首先,我們可以使用load_img()函數加載圖像,並將其大小調整為224×224像素所需的大小。" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "from keras.preprocessing.image import load_img\n", "\n", "# 載入圖像檔\n", "img_file = 'evimg16tf.jpg'\n", "image = load_img(img_file, target_size=(224, 224)) # 因為VGG16的模型的輸入是224x224" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "接下來,我們可以將像素轉換為NumPy數組,以便我們可以在Keras中使用它。我們可以使用這個img_to_array()函數。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "image.shape: (224, 224, 3)\n" ] } ], "source": [ "from keras.preprocessing.image import img_to_array\n", "\n", "# 將圖像資料轉為numpy array\n", "image = img_to_array(image) # RGB\n", "\n", "print(\"image.shape:\", image.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "VGG16網絡期望單色階(grey)或多色階圖像(rgb)來作為輸入;這意味著輸入數組需要是轉換成四個維度:\n", "\n", "(圖像批次量,圖像高,圖像寬, 圖像色階數) -> (batch_size, img_height, img_width, img_channels)\n", "\n", "我們只有一個樣本(一個圖像)。我們可以通過調用reshape()來重新調整數組的形狀,並添加額外的維度。" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "image.shape: (1, 224, 224, 3)\n" ] } ], "source": [ "# 調整張量的維度\n", "image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))\n", "\n", "print(\"image.shape:\", image.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "接下來,我們需要按照VGG在訓練ImageNet數據一樣的方法來對圖像進行前處理。具體來說,從論文裡談到:\n", "> The only preprocessing we do is subtracting the mean RGB value, computed on the training set, from each pixel.\n", ">\n", "> __Very Deep Convolutional Networks for Large-Scale Image Recognition, 2014.__\n", "\n", "Keras提供了一個稱為preprocess_input()的函數來為VGG網絡準備新的圖像輸入。\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "from keras.applications.vgg16 import preprocess_input\n", "\n", "# 準備VGG模型的圖像\n", "image = preprocess_input(image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.做一個預測\n", "\n", "我們可以調用模型中的predict()函數來預測圖像屬於1000個已知對像類型的機率。" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "# 預測所有產出類別的機率\n", "\n", "y_pred = model_vgg16.predict(image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.解釋預測\n", "\n", "Keras提供了一個函數來解釋稱為decode_predictions()的概率。\n", "\n", "它可以返回一個類別的列表和每一個類別機率,為了簡單起見, 我們只會秀出第一個機率最高的種類。\n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "African_elephant (30.40%)\n" ] } ], "source": [ "from keras.applications.vgg16 import decode_predictions\n", "\n", "# 將機率轉換為類別標籤\n", "label = decode_predictions(y_pred)\n", "\n", "# 檢索最可能的結果,例如最高的概率\n", "label = label[0][0]\n", "\n", "# 打印預測結果\n", "print('%s (%.2f%%)' % (label[1], label[2]*100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### VGG19" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "image.shape: (224, 224, 3)\n", "image.shape: (1, 224, 224, 3)\n", "tusker (53.61%)\n" ] } ], "source": [ "from keras.applications.vgg19 import VGG19\n", "from keras.preprocessing import image\n", "from keras.applications.vgg19 import preprocess_input\n", "from keras.applications.vgg19 import decode_predictions\n", "\n", "# 載入權重\n", "model_vgg19 = VGG19(weights='imagenet')\n", "# 載入圖像檔\n", "img_file = 'evimg16tf.jpg'\n", "image = load_img(img_file, target_size=(224, 224)) # 因為VGG19的模型的輸入是224x224\n", "\n", "# 將圖像資料轉為numpy array\n", "image = img_to_array(image) # RGB\n", "print(\"image.shape:\", image.shape)\n", "\n", "# 調整張量的維度\n", "image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))\n", "print(\"image.shape:\", image.shape)\n", "\n", "# 準備模型所需要的圖像前處理\n", "image = preprocess_input(image)\n", "\n", "# 預測所有產出類別的機率\n", "y_pred = model_vgg19.predict(image)\n", "\n", "# 將機率轉換為類別標籤\n", "label = decode_predictions(y_pred)\n", "\n", "# 檢索最可能的結果,例如最高的概率\n", "label = label[0][0]\n", "\n", "# 打印預測結果\n", "print('%s (%.2f%%)' % (label[1], label[2]*100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ResNet50" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "image.shape: (224, 224, 3)\n", "image.shape: (1, 224, 224, 3)\n", "African_elephant (47.35%)\n" ] } ], "source": [ "from keras.applications.resnet50 import ResNet50\n", "from keras.preprocessing import image\n", "from keras.applications.resnet50 import preprocess_input\n", "from keras.applications.resnet50 import decode_predictions\n", "\n", "# 載入權重\n", "model_resnet50 = ResNet50(weights='imagenet')\n", "\n", "# 載入圖像檔\n", "img_file = 'evimg16tf.jpg'\n", "\n", "# 因為ResNet的模型的輸入是224x224\n", "image = load_img(img_file, target_size=(224, 224)) \n", "\n", "# 將圖像資料轉為numpy array\n", "image = img_to_array(image) # RGB\n", "print(\"image.shape:\", image.shape)\n", "\n", "# 調整張量的維度\n", "image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))\n", "print(\"image.shape:\", image.shape)\n", "\n", "# 準備模型所需要的圖像前處理\n", "image = preprocess_input(image)\n", "\n", "# 預測所有產出類別的機率\n", "y_pred = model_resnet50.predict(image)\n", "\n", "# 將機率轉換為類別標籤\n", "label = decode_predictions(y_pred)\n", "\n", "# 檢索最可能的結果,例如最高的概率\n", "label = label[0][0]\n", "\n", "# 打印預測結果\n", "print('%s (%.2f%%)' % (label[1], label[2]*100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### InceptionV3" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "image.shape: (299, 299, 3)\n", "image.shape: (1, 299, 299, 3)\n", "giant_panda (94.42%)\n" ] } ], "source": [ "from keras.applications.inception_v3 import InceptionV3\n", "from keras.preprocessing import image\n", "from keras.preprocessing.image import load_img\n", "from keras.applications.inception_v3 import preprocess_input\n", "from keras.applications.inception_v3 import decode_predictions\n", "\n", "# 載入權重\n", "model_inception_v3 = InceptionV3(weights='imagenet')\n", "\n", "# 載入圖像檔\n", "img_file = 'image.jpg'\n", "# InceptionV3的模型的輸入是299x299\n", "img = load_img(img_file, target_size=(299, 299)) \n", "\n", "# 將圖像資料轉為numpy array\n", "image = image.img_to_array(img) # RGB\n", "print(\"image.shape:\", image.shape)\n", "\n", "# 調整張量的維度\n", "image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))\n", "print(\"image.shape:\", image.shape)\n", "\n", "# 準備模型所需要的圖像前處理\n", "image = preprocess_input(image)\n", "\n", "# 預測所有產出類別的機率\n", "y_pred = model_inception_v3.predict(image)\n", "\n", "# 將機率轉換為類別標籤\n", "label = decode_predictions(y_pred)\n", "\n", "# 檢索最可能的結果,例如最高的概率\n", "label = label[0][0]\n", "\n", "# 打印預測結果\n", "print('%s (%.2f%%)' % (label[1], label[2]*100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### InceptionResNetV2" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "image.shape: (299, 299, 3)\n", "image.shape: (1, 299, 299, 3)\n", "African_elephant (62.94%)\n" ] } ], "source": [ "from keras.applications.inception_resnet_v2 import InceptionResNetV2\n", "from keras.preprocessing import image\n", "from keras.applications.inception_resnet_v2 import preprocess_input\n", "from keras.applications.inception_resnet_v2 import decode_predictions\n", "\n", "# 載入權重\n", "model_inception_resnet_v2 = InceptionResNetV2(weights='imagenet')\n", "\n", "# 載入圖像檔\n", "img_file = 'evimg16tf.jpg'\n", "\n", "# InceptionResNetV2的模型的輸入是299x299\n", "image = load_img(img_file, target_size=(299, 299)) \n", "\n", "# 將圖像資料轉為numpy array\n", "image = img_to_array(image) # RGB\n", "print(\"image.shape:\", image.shape)\n", "\n", "# 調整張量的維度\n", "image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))\n", "print(\"image.shape:\", image.shape)\n", "\n", "# 準備模型所需要的圖像前處理\n", "image = preprocess_input(image)\n", "\n", "# 預測所有產出類別的機率\n", "y_pred = model_inception_resnet_v2.predict(image)\n", "\n", "# 將機率轉換為類別標籤\n", "label = decode_predictions(y_pred)\n", "\n", "# 檢索最可能的結果,例如最高的概率\n", "label = label[0][0]\n", "\n", "# 打印預測結果\n", "print('%s (%.2f%%)' % (label[1], label[2]*100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### MobileNet" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "image.shape: (224, 224, 3)\n", "image.shape: (1, 224, 224, 3)\n", "African_elephant (90.81%)\n" ] } ], "source": [ "from keras.applications.mobilenet import MobileNet\n", "from keras.preprocessing import image\n", "from keras.applications.mobilenet import preprocess_input\n", "from keras.applications.mobilenet import decode_predictions\n", "\n", "# 載入權重\n", "model_mobilenet = MobileNet(weights='imagenet')\n", "\n", "# 載入圖像檔\n", "img_file = 'evimg16tf.jpg'\n", "\n", "# MobileNet的模型的輸入是224x224\n", "image = load_img(img_file, target_size=(224, 224)) \n", "\n", "# 將圖像資料轉為numpy array\n", "image = img_to_array(image) # RGB\n", "print(\"image.shape:\", image.shape)\n", "\n", "# 調整張量的維度\n", "image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))\n", "print(\"image.shape:\", image.shape)\n", "\n", "# 準備模型所需要的圖像前處理\n", "image = preprocess_input(image)\n", "\n", "# 預測所有產出類別的機率\n", "y_pred = model_mobilenet.predict(image)\n", "\n", "# 將機率轉換為類別標籤\n", "label = decode_predictions(y_pred)\n", "\n", "# 檢索最可能的結果,例如最高的概率\n", "label = label[0][0]\n", "\n", "# 打印預測結果\n", "print('%s (%.2f%%)' % (label[1], label[2]*100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 總結 (Conclusion)\n", "\n", "在這篇文章中有一些個人學習到的一些有趣的重點:\n", " \n", "* 在Keras中己經預建了許多高級的圖像識別的網絡及預訓練的權重\n", "* 需要了解每種高級圖像識別的網絡的結構與輸入的張量\n", "* 了解不同高級圖像識別的網絡的訓練變數量與預訓練的權重可以有效幫助圖像識別類型的任務" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "參考: \n", "* [How to Use The Pre-Trained VGG Model to Classify Objects in Photographs](https://machinelearningmastery.com/use-pre-trained-vgg-model-classify-objects-photographs/)\n", "* [Keras官網](http://keras.io/)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }