{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 人臉辨識 - 人臉特徵擷取(FaceNet) & 訓練人臉分類器\n", "\n", "人臉辨識大致可分成四個主要的步驟:\n", "1. 人臉偵測\n", "2. 人臉轉換、對齊與裁剪\n", "3. 人臉特徵擷取\n", "4. 人臉特徵比對\n", "\n", "在[[7.2-face-detect-align-and-crop](https://github.com/erhwenkuo/deep-learning-with-keras-notebooks/blob/master/7.2-face-detect-align-and-crop.ipynb)]己經詳細介紹如何進行\"人臉偵測\"、\"對齊\" & \"裁剪\"了, 因此這篇文章會著重在:\n", "3. 人臉特徴擷取 (使用__FaceNet__的模型與演算法)\n", "4. 人臉特徵比對 (使用__LinearSVC__的分類演算法)\n", "\n", "對應到以下的圖例的話, 就是使用深度學習所學習到的一個網絡來將人臉的圖像轉換成一個\"人臉特徴向量(representation)\", 然後利用這個\"人臉特徴向量(representation)\"來進行人臉的辨識(recognition)、人臉的比對(verification)或是人臉的聚類(cluster)。\n", "\n", "![openface](https://raw.githubusercontent.com/cmusatyalab/openface/master/images/summary.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 理論\n", "\n", "GOOGLE於2015年發表一個網絡結構名為FaceNet。它可以直接將人臉圖像映射至歐式空間,透過比對兩個人臉圖像映射(embedding)的歐幾里德距離能直接反應出人臉間的相似度。一旦生成該人臉圖像映射表示(embedding representation),識別,驗證,聚類等任務都可以用它來輕鬆完成。 FaceNet在LFW上達到了99.63%的準確率,在Youtube Faces DB達到95.12%。\n", "和大部份其它的演算法(先輸出高維度特徵向量,然後用PCA等降維,再用分類器分類)不同之處,FaceNet直接使用基於三元組的LMNN(最大邊界近鄰分類)的損失函數訓練神經網絡,網絡直接輸出為128維度的向量空間。\n", "\n", "![face-net](https://raw.githubusercontent.com/stdcoutzyx/Blogs/master/papers/imgs/n8-1.png)\n", "\n", "詳細: [FaceNet: A unified embedding for face recognition and clustering](https://arxiv.org/abs/1503.03832)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## face-recognition 專案說明\n", "\n", "[face-recognition](https://github.com/erhwenkuo/face-recognition)包含了使用MTCNN與FaceNet來進行人臉辨識。\n", "\n", "### 安裝\n", "\n", "```bash\n", "git clone https://github.com/erhwenkuo/face-recognition.git\n", "cd face-recognition\n", "...\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 資料集說明\n", "\n", "LFW資料集是一個常見的人臉資料集,歷史非常悠久。LFW資料集中收錄了5749位公眾人物的人臉影像,總共有超過一萬三千多張影像檔案。但大部份公眾人物的影像都只有一張,只有1680位有超過一張照片,而極少數有超過10張照片。\n", "\n", "資料集的網站: http://vis-www.cs.umass.edu/lfw" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 專案的檔案路徑佈局\n", "\n", "1. 使用Git從[erhwenkuo/face-recognition]https://github.com/erhwenkuo/face-recognition.git)下載整個專案源碼\n", "2. 在`face-recognition`的目錄裡產生二個子目錄`data`與`model`\n", "3. 從[Labeled Faces in the Wild資料集官網]點撃[All images as gzipped tar file](http://vis-www.cs.umass.edu/lfw/lfw.tgz)來下 載`lfw.tgz`。\n", "4. 解壓縮`lfw.tgz`到`face-recognition/data/`的目錄下\n", "5. 執行`01-face-detect-align-and-crop.ipynb`來進行臉部偵測、對齊 & 裁剪\n", "6. 下載Facenet模型檔[20170511-185253.zip(168M)](https://drive.google.com/file/d/0B5MzpY9kBtDVZ2RpVDYwWmxoSUk)並解壓縮到\"model/facenet\"的目錄下。\n", "7. 在\"model\"的目錄下產生一個子目錄\"svm\"來存放\"人臉分類器\"的模型。\n", "\n", "最後你的目錄結構看起來像這樣: (這裡只列出來在這個範例會用到的相關檔案與目錄)\n", "```\n", "face-recognition/\n", "├── 01-face-detect-align-and-crop.ipynb\n", "├── 02-face-embedding-and-recognition-classifier.ipynb\n", "├── detect_face.py\n", "├── facenet.py\n", "├── model/\n", "│ ├── svm/ <--- 人臉分類器(svm)的模型\n", "│ ├── mtcnn/\n", "│ │ ├── det1.npy\n", "│ │ ├── det2.npy\n", "│ │ └── det3.npy\n", "│ └── facenet/ <--- Facenet的模型\n", "│ └── 20170512-110547/\n", "│ ├── 20170512-110547.pb\n", "│ ├── model-20170512-110547.ckpt-250000.data-00000-of-00001\n", "│ ├── model-20170512-110547.ckpt-250000.index\n", "│ └── model-20170512-110547.meta\n", "└── data/\n", " ├── lfw/\n", " │ ├── Aaron_Eckhart/ \n", " │ │ └── Aaron_Eckhart_0001.jpg\n", " │ ├── ...\n", " │ └── Zydrunas_Ilgauskas/\n", " │ └── Zydrunas_Ilgauskas_0001.jpg\n", " └── lfw_crops/ <--- 經過偵測、對齊 & 裁剪後的人臉圖像\n", " ├── Aaron_Eckhart/ \n", " │ └── Aaron_Eckhart_0001.png\n", " ├── ...\n", " └── Zydrunas_Ilgauskas/\n", " └── Zydrunas_Ilgauskas_0001.png \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 1. 載入相關函式庫" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 屏蔽Jupyter的warning訊息\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "# Utilities相關函式庫\n", "import sys\n", "import os\n", "from tqdm import tqdm\n", "import math\n", "\n", "# 多維向量處理相關函式庫\n", "import numpy as np\n", "\n", "# 圖像處理相關函式庫\n", "import cv2\n", "\n", "# 深度學習相關函式庫\n", "import tensorflow as tf\n", "\n", "# 專案相關函式庫\n", "import facenet\n", "import detect_face\n", "\n", "# 模型序列化函式庫\n", "import pickle\n", "\n", "# 人臉分類器函式庫\n", "from sklearn.svm import SVC\n", "from sklearn.svm import LinearSVC" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 2. 設定相關設定與參數" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 專案的根目錄路徑\n", "ROOT_DIR = os.getcwd()\n", "\n", "# 訓練/驗證用的資料目錄\n", "DATA_PATH = os.path.join(ROOT_DIR, \"data\")\n", "\n", "# 模型的資料目錄\n", "MODEL_PATH = os.path.join(ROOT_DIR, \"model\")\n", "\n", "# FaceNet的模型\n", "FACENET_MODEL_PATH = os.path.join(MODEL_PATH, \"facenet\",\"20170512-110547\",\"20170512-110547.pb\")\n", "\n", "# Classifier的模型\n", "SVM_MODEL_PATH = os.path.join(MODEL_PATH, \"svm\", \"lfw_svm_classifier.pkl\")\n", "\n", "# 訓練/驗證用的圖像資料目錄\n", "IMG_IN_PATH = os.path.join(DATA_PATH, \"lfw\")\n", "\n", "# 訓練/驗證用的圖像資料目錄\n", "IMG_OUT_PATH = os.path.join(DATA_PATH, \"lfw_crops\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 3. 轉換每張人臉的圖像成為Facenet的人臉特徵向量(128 bytes)表示\n", "\n", "函式: `facenet.get_dataset`\n", "```\n", "參數:\n", " paths (string): 圖像資料集的檔案路徑\n", " has_class_directories (bool): 是否使用子目錄名作為人臉的identity (預設為True)\n", " path_expanduser (bool): 是否把path中包含的\"~\"和\"~user\"轉換成在作業系統下的用戶根目錄 (預設為False)\n", "回傳:\n", " dataset (list[ImageClass]): 人臉類別(ImageClass)的列表與圖像路徑\n", "```" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Origin: Number of classes: 5750\n", "Origin: Number of images: 13233\n", "Filtered: Number of classes: 423\n", "Filtered: Number of images: 5985\n", "Loading feature extraction model\n", "Model filename: D:\\pythonworks\\01_erhwen\\real-time-deep-face-recognition\\model\\facenet\\20170512-110547\\20170512-110547.pb\n", "Face embedding size: 128\n", "Calculating features for images\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "100%|████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:27<00:00, 4.59s/it]\n" ] } ], "source": [ "# 使用Tensorflow的Facenet模型\n", "with tf.Graph().as_default():\n", " with tf.Session() as sess:\n", " datadir = IMG_OUT_PATH # 經過偵測、對齊 & 裁剪後的人臉圖像目錄\n", " # 取得人臉類別(ImageClass)的列表與圖像路徑\n", " dataset = facenet.get_dataset(datadir) \n", " # 原始: 取得每個人臉圖像的路徑與標籤\n", " paths, labels, labels_dict = facenet.get_image_paths_and_labels(dataset) \n", " print('Origin: Number of classes: %d' % len(labels_dict))\n", " print('Origin: Number of images: %d' % len(paths))\n", " \n", " # 由於lfw的人臉圖像集中有很多的人臉類別只有1張的圖像, 對於訓練來說樣本太少\n", " # 因此我們只挑選圖像樣本張數大於5張的人臉類別\n", " \n", " # 過濾: 取得每個人臉圖像的路徑與標籤 (>=5)\n", " paths, labels, labels_dict = facenet.get_image_paths_and_labels(dataset, enable_filter=True, filter_size=5) \n", " print('Filtered: Number of classes: %d' % len(labels_dict))\n", " print('Filtered: Number of images: %d' % len(paths))\n", " \n", " # 載入Facenet模型\n", " print('Loading feature extraction model')\n", " modeldir = FACENET_MODEL_PATH #'/..Path to Pre-trained model../20170512-110547/20170512-110547.pb'\n", " facenet.load_model(modeldir)\n", "\n", " images_placeholder = tf.get_default_graph().get_tensor_by_name(\"input:0\")\n", " embeddings = tf.get_default_graph().get_tensor_by_name(\"embeddings:0\")\n", " phase_train_placeholder = tf.get_default_graph().get_tensor_by_name(\"phase_train:0\")\n", " embedding_size = embeddings.get_shape()[1]\n", " # 打印\"人臉特徵向量\"的向量大小\n", " print(\"Face embedding size: \", embedding_size)\n", " \n", " # 計算人臉特徵向量 (128 bytes)\n", " print('Calculating features for images')\n", " batch_size = 1000 # 批次量\n", " image_size = 160 # 要做為Facenet的圖像輸入的大小\n", " \n", " nrof_images = len(paths) # 總共要處理的人臉圖像\n", " # 計算總共要跑的批次數\n", " nrof_batches_per_epoch = int(math.ceil(1.0 * nrof_images / batch_size))\n", " # 構建一個變數來保存\"人臉特徵向量\"\n", " emb_array = np.zeros((nrof_images, embedding_size)) # <-- Face Embedding\n", " \n", " for i in tqdm(range(nrof_batches_per_epoch)):\n", " start_index = i * batch_size\n", " end_index = min((i + 1) * batch_size, nrof_images)\n", " paths_batch = paths[start_index:end_index]\n", " images = facenet.load_data(paths_batch, False, False, image_size)\n", " feed_dict = {images_placeholder: images, phase_train_placeholder: False}\n", " emb_array[start_index:end_index, :] = sess.run(embeddings, feed_dict=feed_dict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "經過過濾之後, 我們從lfw的人臉資料庫中選出**423**個人臉的類別(每個類別都至少有**5**張的圖像以上)。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 4. 保存人臉Facenet處理過的人臉embedding的資料\n", "\n", "為了能夠重覆地使用己經轉換過的人臉embedding的資料(一般來說可以把這樣的資料保存在資料庫中), 我們把這個資料透過pickle把相關資料保存到檔案中。" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 序列化相關可重覆使用的資料\n", "\n", "# 保存\"人臉embedding\"的資料\n", "emb_features_file = open(os.path.join(DATA_PATH,'lfw_emb_features.pkl'), 'wb')\n", "pickle.dump(emb_array, emb_features_file)\n", "emb_features_file.close()\n", "\n", "# 保存\"人臉embedding\"所對應的標籤(label)的資料\n", "emb_lables_file = open(os.path.join(DATA_PATH,'lfw_emb_labels.pkl'), 'wb')\n", "pickle.dump(labels, emb_lables_file)\n", "emb_lables_file.close()\n", "\n", "# 保存\"標籤(label)對應到人臉名稱的字典的資料\n", "emb_lables_dict_file = open(os.path.join(DATA_PATH,'lfw_emb_labels_dict.pkl'), 'wb')\n", "pickle.dump(labels_dict, emb_lables_dict_file)\n", "emb_lables_dict_file.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 5. 載入人臉Facenet處理過的相關的人臉embedding資料" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 反序列化相關可重覆使用的資料\n", "\n", "# \"人臉embedding\"的資料\n", "with open(os.path.join(DATA_PATH,'lfw_emb_features.pkl'), 'rb') as emb_features_file:\n", " emb_features =pickle.load(emb_features_file)\n", "\n", "# \"人臉embedding\"所對應的標籤(label)的資料\n", "with open(os.path.join(DATA_PATH,'lfw_emb_labels.pkl'), 'rb') as emb_lables_file:\n", " emb_labels =pickle.load(emb_lables_file)\n", "\n", "# \"標籤(label)對應到人臉名稱的字典的資料\n", "with open(os.path.join(DATA_PATH,'lfw_emb_labels_dict.pkl'), 'rb') as emb_lables_dict_file:\n", " emb_labels_dict =pickle.load(emb_lables_dict_file)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "人臉embedding featues: 5985, shape: (5985, 128), type: \n", "人臉embedding labels: 5985, type: \n", "人臉embedding labels dict: {}, type: {} 423 \n" ] } ], "source": [ "print(\"人臉embedding featues: {}, shape: {}, type: {}\".format(len(emb_features), emb_features.shape, type(emb_features)))\n", "print(\"人臉embedding labels: {}, type: {}\".format(len(emb_labels), type(emb_labels)))\n", "print(\"人臉embedding labels dict: {}, type: {}\", len(emb_labels_dict), type(emb_labels_dict))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 6. 準備訓練資料集與驗證資料集\n", "由於lfw的人臉資料集裡, 每一個人的人臉圖像並不多。因此我們將對每一個人的人臉圖像抽取一張來作為驗證資料集, 其餘的圖像則做為訓練資料集。" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "X_train: 5562, y_train: 5562\n", "X_test: 423, y_test: 423\n" ] } ], "source": [ "# 準備相關變數\n", "X_train = []; y_train = []\n", "X_test = []; y_test = []\n", "\n", "# 保存己經有處理過的人臉label\n", "processed = set()\n", "\n", "# 分割訓練資料集與驗證資料集\n", "for (emb_feature, emb_label) in zip(emb_features, emb_labels):\n", " if emb_label in processed:\n", " X_train.append(emb_feature)\n", " y_train.append(emb_label)\n", " else:\n", " X_test.append(emb_feature)\n", " y_test.append(emb_label)\n", " processed.add(emb_label)\n", "\n", "# 結果\n", "print('X_train: {}, y_train: {}'.format(len(X_train), len(y_train)))\n", "print('X_test: {}, y_test: {}'.format(len(X_test), len(y_test)))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 7. 訓練人臉分類器(SVM Classifier)\n", "\n", "使用scikit-learn的SVM分類器來進行訓練。\n", "\n", "在 \"https://github.com/davidsandberg/facenet/issues/134\" 的討論裡有詳算的參數說明與結果的分析!\n", "\n", "![detect-result](https://cloud.githubusercontent.com/assets/2711650/22487753/ec7e0ee8-e80e-11e6-8d69-9aebec5064d0.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 使用linearSvc來訓練" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training classifier\n", "Validation result: 0.978723404255\n" ] } ], "source": [ "# 訓練分類器\n", "print('Training classifier')\n", "linearsvc_classifier = LinearSVC(C=1, multi_class='ovr')\n", "\n", "# 進行訓練\n", "linearsvc_classifier.fit(X_train, y_train)\n", "\n", "# 使用驗證資料集來檢查準確率\n", "score = linearsvc_classifier.score(X_test, y_test)\n", "\n", "# 打印分類器的準確率\n", "print(\"Validation result: \", score)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Saved classifier model to file \"D:\\pythonworks\\01_erhwen\\real-time-deep-face-recognition\\model\\svm\\lfw_svm_classifier.pkl\"\n" ] } ], "source": [ "# 序列化\"人臉辨識模型\"到檔案\n", "classifier_filename = SVM_MODEL_PATH\n", "\n", "# 產生一個人臉的人名列表,以便辨識後來使用\n", "#class_names = [cls.name.replace('_', ' ') for cls in dataset]\n", "\n", "class_names = []\n", "for key in sorted(emb_labels_dict.keys()):\n", " class_names.append(emb_labels_dict[key].replace('_', ' '))\n", "\n", "# 保存人臉分類器到檔案系統\n", "with open(classifier_filename, 'wb') as outfile:\n", " pickle.dump((linearsvc_classifier, class_names), outfile)\n", " \n", "print('Saved classifier model to file \"%s\"' % classifier_filename)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在簡單地使用Support Vector Machine (SVM)的模型, 在lfw的人臉資料庫裡頭我們選出來的__423__個不同人臉經過Facenet的人臉特徴抽取之後做多類別的分類學習。由以上的簡單驗證來看可以達到__97.87%__的正確人臉辨識率。" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "423" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(class_names)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.4" } }, "nbformat": 4, "nbformat_minor": 2 }