{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 人臉辨識 - 人臉特徵擷取(FaceNet) & 訓練人臉分類器\n",
    "\n",
    "人臉辨識大致可分成四個主要的步驟:\n",
    "1. 人臉偵測\n",
    "2. 人臉轉換、對齊與裁剪\n",
    "3. 人臉特徵擷取\n",
    "4. 人臉特徵比對\n",
    "\n",
    "在[[7.2-face-detect-align-and-crop](https://github.com/erhwenkuo/deep-learning-with-keras-notebooks/blob/master/7.2-face-detect-align-and-crop.ipynb)]己經詳細介紹如何進行\"人臉偵測\"、\"對齊\" & \"裁剪\"了, 因此這篇文章會著重在:\n",
    "3. 人臉特徴擷取 (使用__FaceNet__的模型與演算法)\n",
    "4. 人臉特徵比對 (使用__LinearSVC__的分類演算法)\n",
    "\n",
    "對應到以下的圖例的話, 就是使用深度學習所學習到的一個網絡來將人臉的圖像轉換成一個\"人臉特徴向量(representation)\", 然後利用這個\"人臉特徴向量(representation)\"來進行人臉的辨識(recognition)、人臉的比對(verification)或是人臉的聚類(cluster)。\n",
    "\n",
    "![openface](https://raw.githubusercontent.com/cmusatyalab/openface/master/images/summary.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 理論\n",
    "\n",
    "GOOGLE於2015年發表一個網絡結構名為FaceNet。它可以直接將人臉圖像映射至歐式空間，透過比對兩個人臉圖像映射(embedding)的歐幾里德距離能直接反應出人臉間的相似度。一旦生成該人臉圖像映射表示(embedding representation)，識別，驗證，聚類等任務都可以用它來輕鬆完成。 FaceNet在LFW上達到了99.63％的準確率，在Youtube Faces DB達到95.12％。\n",
    "和大部份其它的演算法（先輸出高維度特徵向量，然後用PCA等降維，再用分類器分類）不同之處，FaceNet直接使用基於三元組的LMNN（最大邊界近鄰分類）的損失函數訓練神經網絡，網絡直接輸出為128維度的向量空間。\n",
    "\n",
    "![face-net](https://raw.githubusercontent.com/stdcoutzyx/Blogs/master/papers/imgs/n8-1.png)\n",
    "\n",
    "詳細: [FaceNet: A unified embedding for face recognition and clustering](https://arxiv.org/abs/1503.03832)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## face-recognition 專案說明\n",
    "\n",
    "[face-recognition](https://github.com/erhwenkuo/face-recognition)包含了使用MTCNN與FaceNet來進行人臉辨識。\n",
    "\n",
    "### 安裝\n",
    "\n",
    "```bash\n",
    "git clone https://github.com/erhwenkuo/face-recognition.git\n",
    "cd face-recognition\n",
    "...\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 資料集說明\n",
    "\n",
    "LFW資料集是一個常見的人臉資料集，歷史非常悠久。LFW資料集中收錄了5749位公眾人物的人臉影像，總共有超過一萬三千多張影像檔案。但大部份公眾人物的影像都只有一張，只有1680位有超過一張照片，而極少數有超過10張照片。\n",
    "\n",
    "資料集的網站: http://vis-www.cs.umass.edu/lfw"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 專案的檔案路徑佈局\n",
    "\n",
    "1. 使用Git從[erhwenkuo/face-recognition]https://github.com/erhwenkuo/face-recognition.git)下載整個專案源碼\n",
    "2. 在`face-recognition`的目錄裡產生二個子目錄`data`與`model`\n",
    "3. 從[Labeled Faces in the Wild資料集官網]點撃[All images as gzipped tar file](http://vis-www.cs.umass.edu/lfw/lfw.tgz)來下 載`lfw.tgz`。\n",
    "4. 解壓縮`lfw.tgz`到`face-recognition/data/`的目錄下\n",
    "5. 執行`01-face-detect-align-and-crop.ipynb`來進行臉部偵測、對齊 & 裁剪\n",
    "6. 下載Facenet模型檔[20170511-185253.zip(168M)](https://drive.google.com/file/d/0B5MzpY9kBtDVZ2RpVDYwWmxoSUk)並解壓縮到\"model/facenet\"的目錄下。\n",
    "7. 在\"model\"的目錄下產生一個子目錄\"svm\"來存放\"人臉分類器\"的模型。\n",
    "\n",
    "最後你的目錄結構看起來像這樣: (這裡只列出來在這個範例會用到的相關檔案與目錄)\n",
    "```\n",
    "face-recognition/\n",
    "├── 01-face-detect-align-and-crop.ipynb\n",
    "├── 02-face-embedding-and-recognition-classifier.ipynb\n",
    "├── detect_face.py\n",
    "├── facenet.py\n",
    "├── model/\n",
    "│   ├── svm/                                <--- 人臉分類器(svm)的模型\n",
    "│   ├── mtcnn/\n",
    "│   │   ├── det1.npy\n",
    "│   │   ├── det2.npy\n",
    "│   │   └── det3.npy\n",
    "│   └── facenet/                            <--- Facenet的模型\n",
    "│       └── 20170512-110547/\n",
    "│          ├── 20170512-110547.pb\n",
    "│          ├── model-20170512-110547.ckpt-250000.data-00000-of-00001\n",
    "│          ├── model-20170512-110547.ckpt-250000.index\n",
    "│          └── model-20170512-110547.meta\n",
    "└── data/\n",
    "    ├── lfw/\n",
    "    │   ├── Aaron_Eckhart/     \n",
    "    │   │   └── Aaron_Eckhart_0001.jpg\n",
    "    │   ├── ...\n",
    "    │   └── Zydrunas_Ilgauskas/\n",
    "    │       └── Zydrunas_Ilgauskas_0001.jpg\n",
    "    └── lfw_crops/                          <--- 經過偵測、對齊 & 裁剪後的人臉圖像\n",
    "        ├── Aaron_Eckhart/     \n",
    "        │   └── Aaron_Eckhart_0001.png\n",
    "        ├── ...\n",
    "        └── Zydrunas_Ilgauskas/\n",
    "            └── Zydrunas_Ilgauskas_0001.png    \n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### STEP 1. 載入相關函式庫"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 屏蔽Jupyter的warning訊息\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# Utilities相關函式庫\n",
    "import sys\n",
    "import os\n",
    "from tqdm import tqdm\n",
    "import math\n",
    "\n",
    "# 多維向量處理相關函式庫\n",
    "import numpy as np\n",
    "\n",
    "# 圖像處理相關函式庫\n",
    "import cv2\n",
    "\n",
    "# 深度學習相關函式庫\n",
    "import tensorflow as tf\n",
    "\n",
    "# 專案相關函式庫\n",
    "import facenet\n",
    "import detect_face\n",
    "\n",
    "# 模型序列化函式庫\n",
    "import pickle\n",
    "\n",
    "# 人臉分類器函式庫\n",
    "from sklearn.svm import SVC\n",
    "from sklearn.svm import LinearSVC"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### STEP 2. 設定相關設定與參數"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 專案的根目錄路徑\n",
    "ROOT_DIR = os.getcwd()\n",
    "\n",
    "# 訓練/驗證用的資料目錄\n",
    "DATA_PATH = os.path.join(ROOT_DIR, \"data\")\n",
    "\n",
    "# 模型的資料目錄\n",
    "MODEL_PATH = os.path.join(ROOT_DIR, \"model\")\n",
    "\n",
    "# FaceNet的模型\n",
    "FACENET_MODEL_PATH = os.path.join(MODEL_PATH, \"facenet\",\"20170512-110547\",\"20170512-110547.pb\")\n",
    "\n",
    "# Classifier的模型\n",
    "SVM_MODEL_PATH = os.path.join(MODEL_PATH, \"svm\", \"lfw_svm_classifier.pkl\")\n",
    "\n",
    "# 訓練/驗證用的圖像資料目錄\n",
    "IMG_IN_PATH = os.path.join(DATA_PATH, \"lfw\")\n",
    "\n",
    "# 訓練/驗證用的圖像資料目錄\n",
    "IMG_OUT_PATH = os.path.join(DATA_PATH, \"lfw_crops\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  STEP 3. 轉換每張人臉的圖像成為Facenet的人臉特徵向量(128 bytes)表示\n",
    "\n",
    "函式: `facenet.get_dataset`\n",
    "```\n",
    "參數:\n",
    "    paths (string): 圖像資料集的檔案路徑\n",
    "    has_class_directories (bool): 是否使用子目錄名作為人臉的identity (預設為True)\n",
    "    path_expanduser (bool): 是否把path中包含的\"~\"和\"~user\"轉換成在作業系統下的用戶根目錄 (預設為False)\n",
    "回傳:\n",
    "    dataset (list[ImageClass])： 人臉類別(ImageClass)的列表與圖像路徑\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Origin: Number of classes: 5750\n",
      "Origin: Number of images: 13233\n",
      "Filtered: Number of classes: 423\n",
      "Filtered: Number of images: 5985\n",
      "Loading feature extraction model\n",
      "Model filename: D:\\pythonworks\\01_erhwen\\real-time-deep-face-recognition\\model\\facenet\\20170512-110547\\20170512-110547.pb\n",
      "Face embedding size:  128\n",
      "Calculating features for images\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:27<00:00,  4.59s/it]\n"
     ]
    }
   ],
   "source": [
    "# 使用Tensorflow的Facenet模型\n",
    "with tf.Graph().as_default():\n",
    "    with tf.Session() as sess:\n",
    "        datadir = IMG_OUT_PATH # 經過偵測、對齊 & 裁剪後的人臉圖像目錄\n",
    "        # 取得人臉類別(ImageClass)的列表與圖像路徑\n",
    "        dataset = facenet.get_dataset(datadir)        \n",
    "        # 原始: 取得每個人臉圖像的路徑與標籤\n",
    "        paths, labels, labels_dict = facenet.get_image_paths_and_labels(dataset)        \n",
    "        print('Origin: Number of classes: %d' % len(labels_dict))\n",
    "        print('Origin: Number of images: %d' % len(paths))\n",
    "        \n",
    "        # 由於lfw的人臉圖像集中有很多的人臉類別只有1張的圖像, 對於訓練來說樣本太少\n",
    "        # 因此我們只挑選圖像樣本張數大於5張的人臉類別\n",
    "        \n",
    "        # 過濾: 取得每個人臉圖像的路徑與標籤 (>=5)\n",
    "        paths, labels, labels_dict = facenet.get_image_paths_and_labels(dataset, enable_filter=True, filter_size=5)        \n",
    "        print('Filtered: Number of classes: %d' % len(labels_dict))\n",
    "        print('Filtered: Number of images: %d' % len(paths))\n",
    "            \n",
    "        # 載入Facenet模型\n",
    "        print('Loading feature extraction model')\n",
    "        modeldir =  FACENET_MODEL_PATH #'/..Path to Pre-trained model../20170512-110547/20170512-110547.pb'\n",
    "        facenet.load_model(modeldir)\n",
    "\n",
    "        images_placeholder = tf.get_default_graph().get_tensor_by_name(\"input:0\")\n",
    "        embeddings = tf.get_default_graph().get_tensor_by_name(\"embeddings:0\")\n",
    "        phase_train_placeholder = tf.get_default_graph().get_tensor_by_name(\"phase_train:0\")\n",
    "        embedding_size = embeddings.get_shape()[1]\n",
    "        # 打印\"人臉特徵向量\"的向量大小\n",
    "        print(\"Face embedding size: \", embedding_size)\n",
    "        \n",
    "        # 計算人臉特徵向量 (128 bytes)\n",
    "        print('Calculating features for images')\n",
    "        batch_size = 1000 # 批次量\n",
    "        image_size = 160  # 要做為Facenet的圖像輸入的大小\n",
    "        \n",
    "        nrof_images = len(paths) # 總共要處理的人臉圖像\n",
    "        # 計算總共要跑的批次數\n",
    "        nrof_batches_per_epoch = int(math.ceil(1.0 * nrof_images / batch_size))\n",
    "        # 構建一個變數來保存\"人臉特徵向量\"\n",
    "        emb_array = np.zeros((nrof_images, embedding_size)) # <-- Face Embedding\n",
    "        \n",
    "        for i in tqdm(range(nrof_batches_per_epoch)):\n",
    "            start_index = i * batch_size\n",
    "            end_index = min((i + 1) * batch_size, nrof_images)\n",
    "            paths_batch = paths[start_index:end_index]\n",
    "            images = facenet.load_data(paths_batch, False, False, image_size)\n",
    "            feed_dict = {images_placeholder: images, phase_train_placeholder: False}\n",
    "            emb_array[start_index:end_index, :] = sess.run(embeddings, feed_dict=feed_dict)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "經過過濾之後, 我們從lfw的人臉資料庫中選出**423**個人臉的類別(每個類別都至少有**5**張的圖像以上)。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### STEP 4. 保存人臉Facenet處理過的人臉embedding的資料\n",
    "\n",
    "為了能夠重覆地使用己經轉換過的人臉embedding的資料(一般來說可以把這樣的資料保存在資料庫中), 我們把這個資料透過pickle把相關資料保存到檔案中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 序列化相關可重覆使用的資料\n",
    "\n",
    "# 保存\"人臉embedding\"的資料\n",
    "emb_features_file = open(os.path.join(DATA_PATH,'lfw_emb_features.pkl'), 'wb')\n",
    "pickle.dump(emb_array, emb_features_file)\n",
    "emb_features_file.close()\n",
    "\n",
    "# 保存\"人臉embedding\"所對應的標籤(label)的資料\n",
    "emb_lables_file = open(os.path.join(DATA_PATH,'lfw_emb_labels.pkl'), 'wb')\n",
    "pickle.dump(labels, emb_lables_file)\n",
    "emb_lables_file.close()\n",
    "\n",
    "# 保存\"標籤(label)對應到人臉名稱的字典的資料\n",
    "emb_lables_dict_file = open(os.path.join(DATA_PATH,'lfw_emb_labels_dict.pkl'), 'wb')\n",
    "pickle.dump(labels_dict, emb_lables_dict_file)\n",
    "emb_lables_dict_file.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### STEP 5. 載入人臉Facenet處理過的相關的人臉embedding資料"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# 反序列化相關可重覆使用的資料\n",
    "\n",
    "# \"人臉embedding\"的資料\n",
    "with open(os.path.join(DATA_PATH,'lfw_emb_features.pkl'), 'rb') as emb_features_file:\n",
    "    emb_features =pickle.load(emb_features_file)\n",
    "\n",
    "# \"人臉embedding\"所對應的標籤(label)的資料\n",
    "with open(os.path.join(DATA_PATH,'lfw_emb_labels.pkl'), 'rb') as emb_lables_file:\n",
    "    emb_labels =pickle.load(emb_lables_file)\n",
    "\n",
    "# \"標籤(label)對應到人臉名稱的字典的資料\n",
    "with open(os.path.join(DATA_PATH,'lfw_emb_labels_dict.pkl'), 'rb') as emb_lables_dict_file:\n",
    "    emb_labels_dict =pickle.load(emb_lables_dict_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "人臉embedding featues: 5985, shape: (5985, 128), type: <class 'numpy.ndarray'>\n",
      "人臉embedding labels: 5985, type: <class 'list'>\n",
      "人臉embedding labels dict: {}, type: {} 423 <class 'dict'>\n"
     ]
    }
   ],
   "source": [
    "print(\"人臉embedding featues: {}, shape: {}, type: {}\".format(len(emb_features), emb_features.shape, type(emb_features)))\n",
    "print(\"人臉embedding labels: {}, type: {}\".format(len(emb_labels), type(emb_labels)))\n",
    "print(\"人臉embedding labels dict: {}, type: {}\", len(emb_labels_dict), type(emb_labels_dict))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### STEP 6. 準備訓練資料集與驗證資料集\n",
    "由於lfw的人臉資料集裡, 每一個人的人臉圖像並不多。因此我們將對每一個人的人臉圖像抽取一張來作為驗證資料集, 其餘的圖像則做為訓練資料集。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "X_train: 5562, y_train: 5562\n",
      "X_test: 423, y_test: 423\n"
     ]
    }
   ],
   "source": [
    "# 準備相關變數\n",
    "X_train = []; y_train = []\n",
    "X_test = []; y_test = []\n",
    "\n",
    "# 保存己經有處理過的人臉label\n",
    "processed = set()\n",
    "\n",
    "# 分割訓練資料集與驗證資料集\n",
    "for (emb_feature, emb_label) in zip(emb_features, emb_labels):\n",
    "    if emb_label in processed:\n",
    "        X_train.append(emb_feature)\n",
    "        y_train.append(emb_label)\n",
    "    else:\n",
    "        X_test.append(emb_feature)\n",
    "        y_test.append(emb_label)\n",
    "        processed.add(emb_label)\n",
    "\n",
    "# 結果\n",
    "print('X_train: {}, y_train: {}'.format(len(X_train), len(y_train)))\n",
    "print('X_test: {}, y_test: {}'.format(len(X_test), len(y_test)))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### STEP 7. 訓練人臉分類器(SVM Classifier)\n",
    "\n",
    "使用scikit-learn的SVM分類器來進行訓練。\n",
    "\n",
    "在 \"https://github.com/davidsandberg/facenet/issues/134\" 的討論裡有詳算的參數說明與結果的分析!\n",
    "\n",
    "![detect-result](https://cloud.githubusercontent.com/assets/2711650/22487753/ec7e0ee8-e80e-11e6-8d69-9aebec5064d0.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 使用linearSvc來訓練"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training classifier\n",
      "Validation result:  0.978723404255\n"
     ]
    }
   ],
   "source": [
    "# 訓練分類器\n",
    "print('Training classifier')\n",
    "linearsvc_classifier = LinearSVC(C=1, multi_class='ovr')\n",
    "\n",
    "# 進行訓練\n",
    "linearsvc_classifier.fit(X_train, y_train)\n",
    "\n",
    "# 使用驗證資料集來檢查準確率\n",
    "score = linearsvc_classifier.score(X_test, y_test)\n",
    "\n",
    "# 打印分類器的準確率\n",
    "print(\"Validation result: \", score)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Saved classifier model to file \"D:\\pythonworks\\01_erhwen\\real-time-deep-face-recognition\\model\\svm\\lfw_svm_classifier.pkl\"\n"
     ]
    }
   ],
   "source": [
    "# 序列化\"人臉辨識模型\"到檔案\n",
    "classifier_filename = SVM_MODEL_PATH\n",
    "\n",
    "# 產生一個人臉的人名列表，以便辨識後來使用\n",
    "#class_names = [cls.name.replace('_', ' ') for cls in dataset]\n",
    "\n",
    "class_names = []\n",
    "for key in sorted(emb_labels_dict.keys()):\n",
    "    class_names.append(emb_labels_dict[key].replace('_', ' '))\n",
    "\n",
    "# 保存人臉分類器到檔案系統\n",
    "with open(classifier_filename, 'wb') as outfile:\n",
    "    pickle.dump((linearsvc_classifier, class_names), outfile)\n",
    "    \n",
    "print('Saved classifier model to file \"%s\"' % classifier_filename)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在簡單地使用Support Vector Machine (SVM)的模型, 在lfw的人臉資料庫裡頭我們選出來的__423__個不同人臉經過Facenet的人臉特徴抽取之後做多類別的分類學習。由以上的簡單驗證來看可以達到__97.87%__的正確人臉辨識率。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "423"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(class_names)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}