{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 人臉辨識 - 臉部偵測、對齊 & 裁剪\n", "人臉辨識大致可分成四個主要的步驟:\n", "1. 人臉偵測\n", "2. 人臉轉換、對齊與裁剪\n", "3. 人臉特徵擷取\n", "4. 人臉特徵比對\n", "\n", "在這篇文章中主要是介紹:\n", "1. 使用MTCNN來進行人臉偵測 -> Detect\n", "2. 然後進行簡單的裁剪 -> Tranform & Crop\n", "\n", "![openface](https://raw.githubusercontent.com/cmusatyalab/openface/master/images/summary.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## face-recognition 專案說明\n", "\n", "[face-recognition](https://github.com/erhwenkuo/face-recognition)包含了使用MTCNN與FaceNet來進行人臉辨識。\n", "\n", "### 安裝\n", "\n", "```bash\n", "git clone https://github.com/erhwenkuo/face-recognition.git\n", "cd face-recognition\n", "...\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 資料集說明\n", "\n", "LFW資料集是一個常見的人臉資料集,歷史非常悠久。LFW資料集中收錄了5749位公眾人物的人臉影像,總共有超過一萬三千多張影像檔案。但大部份公眾人物的影像都只有一張,只有1680位有超過一張照片,而極少數有超過10張照片。\n", "\n", "資料集的網站: http://vis-www.cs.umass.edu/lfw" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 專案的檔案路徑佈局\n", "\n", "1. 使用Git從[erhwenkuo/face-recognition]https://github.com/erhwenkuo/face-recognition.git)下載整個專案源碼\n", "2. 在`face-recognition`的目錄裡產生二個子目錄`data`與`model`\n", "3. 從[Labeled Faces in the Wild資料集官網]點撃[All images as gzipped tar file](http://vis-www.cs.umass.edu/lfw/lfw.tgz)來下 載`lfw.tgz`。\n", "4. 解壓縮`lfw.tgz`到`face-recognition/data/`的目錄下\n", "\n", "最後你的目錄結構看起來像這樣: (這裡只列出來在這個範例會用到的相關檔案與目錄)\n", "```\n", "face-recognition/\n", "├── xxxx.ipynb\n", "├── detect_face.py\n", "├── facenet.py\n", "├── model/\n", "│ └── mtcnn/\n", "│ ├── det1.npy\n", "│ ├── det2.npy\n", "│ └── det3.npy\n", "└── data/\n", " └── lfw/\n", " ├── Aaron_Eckhart/ \n", " │ └── Aaron_Eckhart_0001.jpg\n", " ├── ...\n", " └── Zydrunas_Ilgauskas/\n", " └── Zydrunas_Ilgauskas_0001.jpg\n", " \n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 1. 載入相關函式庫" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 屏蔽Jupyter的warning訊息\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "# Utilities相關函式庫\n", "from scipy import misc\n", "import sys\n", "import os\n", "import random\n", "from tqdm import tqdm\n", "\n", "# 多維向量處理相關函式庫\n", "import numpy as np\n", "\n", "# 圖像處理相關函式庫\n", "import cv2\n", "\n", "# 深度學習相關函式庫\n", "import tensorflow as tf\n", "\n", "# 專案相關函式庫\n", "import facenet\n", "import detect_face" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 2. 設定相關設定與參數" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 專案的根目錄路徑\n", "ROOT_DIR = os.getcwd()\n", "\n", "# 訓練/驗證用的資料目錄\n", "DATA_PATH = os.path.join(ROOT_DIR, \"data\")\n", "\n", "# 模型的資料目錄\n", "MODEL_PATH = os.path.join(ROOT_DIR, \"model\")\n", "\n", "# MTCNN的模型\n", "MTCNN_MODEL_PATH = os.path.join(MODEL_PATH, \"mtcnn\")\n", "\n", "# 訓練/驗證用的圖像資料目錄\n", "IMG_IN_PATH = os.path.join(DATA_PATH, \"lfw\")\n", "\n", "# 訓練/驗證用的圖像資料目錄\n", "IMG_OUT_PATH = os.path.join(DATA_PATH, \"lfw_crops\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 3. 取得儲存在檔案系統裡的人臉圖像資料集的圖像路徑與人臉的identity\n", "```\n", "參數:\n", " paths (string): 圖像資料集的檔案路徑\n", " has_class_directories (bool): 是否使用子目錄名作為人臉的identity (預設為True)\n", " path_expanduser (bool): 是否把path中包含的\"~\"和\"~user\"轉換成在作業系統下的用戶根目錄 (預設為False)\n", "回傳:\n", " dataset (list[ImageClass]): 人臉類別(ImageClass)的列表與圖像路徑\n", "```" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 檢查要儲放裁剪結果的目錄\n", "if not os.path.exists(IMG_OUT_PATH):\n", " os.makedirs(IMG_OUT_PATH)\n", "\n", "\n", "# 臉類別(ImageClass)的列表與圖像路徑\n", "dataset = facenet.get_dataset(IMG_IN_PATH)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total face identities: 5749\n" ] } ], "source": [ "# 打印看有多少人臉的身份\n", "print(\"Total face identities: \", len(dataset))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 4. 構建MTCNN的模型來偵測人臉位置\n", "詳細說明: [人臉偵測 - MTCNN (Multi-task Cascaded Convolutional Networks)](https://github.com/erhwenkuo/deep-learning-with-keras-notebooks/blob/master/7.1-mtcnn-face-detection.ipynb)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Creating networks and loading parameters\n" ] } ], "source": [ "print('Creating networks and loading parameters')\n", "with tf.Graph().as_default():\n", " gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.5)\n", " sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, log_device_placement=False))\n", " with sess.as_default():\n", " pnet, rnet, onet = detect_face.create_mtcnn(sess, MTCNN_MODEL_PATH)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 設定人臉偵測模型所需的相關參數" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "minsize = 20 # 最小的臉部的大小\n", "threshold = [0.6, 0.7, 0.7] # 三個網絡(P-Net, R-Net, O-Net)的閥值\n", "factor = 0.709 # scale factor\n", "\n", "margin = 44 # 在裁剪人臉時的邊框margin\n", "image_size = 182 # 160 + 22" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 將一個隨機key添加到圖像檔名以允許使用多個進程進行人臉對齊\n", "random_key = np.random.randint(0, high=99999)\n", "bounding_boxes_filename = os.path.join(IMG_OUT_PATH, 'bounding_boxes_%05d.txt' % random_key)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### STEP 5. 人臉圖像處理\n", "\n", "對每一張人臉圖像進行偵測並找出靠近圖像中心點的人臉邊界框來進行裁剪及調整大小。\n", "\n", "ImageClass類別\n", "```\n", "儲存某特定人臉(Identity)的名稱與相關的人臉圖像的路徑列表的類別\n", " \n", " 屬性:\n", " name (string): 人臉的Identity\n", " image_paths (List): 人臉圖像路徑的列表\n", " \n", " 方法:\n", " str(ImageClass): 取得人臉的Identity\n", " len(ImageClass): 取得人臉的Identity有多少圖像 \n", "```" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████████████████████████████████| 5749/5749 [39:27<00:00, 2.43it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Total number of images: 13233\n", "Number of successfully aligned images: 13233\n" ] } ], "source": [ "# 使用Tensorflow來運行MTCNN\n", "with open(bounding_boxes_filename, \"w\") as text_file:\n", " nrof_images_total = 0 # 處理過的圖像總數\n", " nrof_successfully_aligned = 0 # 人臉圖像align的總數\n", " \n", " # 迭代每一個人臉身份(ImageClass)\n", " for cls in tqdm(dataset):\n", " output_class_dir = os.path.join(IMG_OUT_PATH, cls.name) # 裁剪後的圖像目錄\n", " if not os.path.exists(output_class_dir):\n", " os.makedirs(output_class_dir)\n", " \n", " # 迭代每一個人臉身份的圖像的路徑 (ImageClass.image_paths)\n", " for image_path in cls.image_paths:\n", " nrof_images_total += 1\n", " filename = os.path.splitext(os.path.split(image_path)[1])[0] # 取得圖像檔名\n", " output_filename = os.path.join(output_class_dir, filename + '.png') # 設定輸出的圖像檔名 \n", " # print(image_path)\n", " \n", " if not os.path.exists(output_filename):\n", " try:\n", " img = misc.imread(image_path) # 讀進圖檔\n", " # print('read data dimension: ', img.ndim) \n", " except (IOError, ValueError, IndexError) as e:\n", " errorMessage = '{}: {}'.format(image_path, e)\n", " # print(errorMessage)\n", " else:\n", " # 將圖檔轉換成numpy array (height, widith, color_channels)\n", " if img.ndim < 2:\n", " print('Unable to align \"%s\"' % image_path)\n", " text_file.write('%s\\n' % (output_filename))\n", " continue\n", " if img.ndim == 2:\n", " img = facenet.to_rgb(img)\n", " print('to_rgb data dimension: ', img.ndim)\n", " img = img[:, :, 0:3]\n", " # print('after data dimension: ', img.ndim)\n", " \n", " # 使用MTCNN來偵測人臉在圖像中的位置\n", " bounding_boxes, _ = detect_face.detect_face(img, minsize, pnet, rnet, onet, threshold, factor)\n", " nrof_faces = bounding_boxes.shape[0] # 偵測到的人臉總數\n", " # print('detected_face: %d' % nrof_faces)\n", " if nrof_faces > 0:\n", " # 當有偵測到多個人臉的時候, 我們希望從中找到主要置中位置的人臉\n", " det = bounding_boxes[:, 0:4]\n", " img_size = np.asarray(img.shape)[0:2]\n", " if nrof_faces > 1:\n", " bounding_box_size = (det[:, 2] - det[:, 0]) * (det[:, 3] - det[:, 1])\n", " img_center = img_size / 2\n", " offsets = np.vstack([(det[:, 0] + det[:, 2]) / 2 - img_center[1],\n", " (det[:, 1] + det[:, 3]) / 2 - img_center[0]])\n", " offset_dist_squared = np.sum(np.power(offsets, 2.0), 0)\n", " index = np.argmax(bounding_box_size - offset_dist_squared * 2.0) # some extra weight on the centering\n", " det = det[index, :]\n", " det = np.squeeze(det)\n", " bb_temp = np.zeros(4, dtype=np.int32)\n", " # 取得人臉的左上角與右下角座標\n", " bb_temp[0] = det[0]\n", " bb_temp[1] = det[1]\n", " bb_temp[2] = det[2]\n", " bb_temp[3] = det[3]\n", " \n", " # 進行裁剪以及大小的轉換\n", " cropped_temp = img[bb_temp[1]:bb_temp[3], bb_temp[0]:bb_temp[2], :]\n", " scaled_temp = misc.imresize(cropped_temp, (image_size, image_size), interp='bilinear')\n", "\n", " nrof_successfully_aligned += 1\n", " misc.imsave(output_filename, scaled_temp) # 儲存處理過的圖像\n", " text_file.write('%s %d %d %d %d\\n' % (output_filename, bb_temp[0], bb_temp[1], bb_temp[2], bb_temp[3]))\n", " else:\n", " # print('Unable to align \"%s\"' % image_path)\n", " text_file.write('%s\\n' % (output_filename))\n", "\n", "print('Total number of images: %d' % nrof_images_total)\n", "print('Number of successfully aligned images: %d' % nrof_successfully_aligned)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### STEP 6. 檢查人臉圖像處理結果" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "face_identity = 'Bill_Clinton'\n", "origin_face_image = os.path.join(IMG_IN_PATH, face_identity, 'Bill_Clinton_0009.jpg')\n", "aligned_face_image = os.path.join(IMG_OUT_PATH, face_identity, 'Bill_Clinton_0009.png')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 使用OpenCV讀入測試圖像\n", "# 注意: OpenCV讀進來的圖像所產生的Numpy Ndaary格式是BGR (B:Blue, G: Green, R: Red) \n", "# 跟使用PIL或skimage的格式RGB (R: Red, G: Green, B:Blue)在色階(channel)的順序上有所不同\n", "bgr_image = cv2.imread(origin_face_image)\n", "rgb_image = bgr_image[:,:,::-1] # 把BGR轉換成RGB\n", "\n", "# 偵測人臉\n", "bounding_boxes, _ = detect_face.detect_face(rgb_image, minsize, pnet, rnet, onet, threshold, factor)\n", "\n", "# 複製原圖像\n", "draw = bgr_image.copy()\n", "\n", "# 被偵測到的臉部總數\n", "faces_detected = len(bounding_boxes)\n", "\n", "print('Total faces detected :{}'.format(faces_detected))\n", "\n", "# 每一個 bounding_box包括了(x1,y1,x2,y2,confidence score):\n", "#   左上角座標 (x1,y1)\n", "# 右下角座標 (x2,y2)\n", "# 信心分數 confidence score\n", "\n", "# 迭代每一個偵測出來的邊界框\n", "for face_position in bounding_boxes:\n", " # 把資料由float轉成int\n", " face_position=face_position.astype(int)\n", " \n", " # 取出左上角座標 (x1,y1)與右下角座標 (x2,y2)\n", " # 由於有可能預測出來的臉在圖像的圖邊而導致座標值為負值\n", " # 因此進行的負值的偵測與修正\n", " x1 = face_position[0] if face_position[0] > 0 else 0\n", " y1 = face_position[1] if face_position[1] > 0 else 0\n", " x2 = face_position[2] if face_position[2] > 0 else 0\n", " y2 = face_position[3] if face_position[3] > 0 else 0\n", " \n", " # 在原圖像上畫上這些邊界框 \n", " cv2.rectangle(draw, (x1, y1), (x2, y2), (0, 255, 0), 2)\n", "\n", "# 設定展示的大小\n", "plt.figure(figsize=(10,5))\n", "\n", "# 展示偵測出來的結果\n", "plt.imshow(draw[:,:,::-1]) # 轉換成RGB來給matplotlib展示\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "以上的人臉圖像裡頭有兩個人臉被找出來,因此在圖像處理中要找到離圖像中心點比較接近的那一個來當成主要的人臉圖像並進行其它的前處理。" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bgr_image = cv2.imread(aligned_face_image)\n", "rgb_image = bgr_image[:,:,::-1] # 把BGR轉換成RGB\n", "\n", "# 設定展示的大小\n", "plt.figure(figsize=(8,3))\n", "\n", "# 展示偵測出來的結果\n", "plt.imshow(rgb_image) # 轉換成RGB來給matplotlib展示\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### 參考\n", "* [Multi-task Cascaded Convolutional Networks論文](https://arxiv.org/abs/1604.02878v1)\n", "* [OpenFace](https://github.com/cmusatyalab/openface)\n", "* Facenet 專案 [davidsandberg](https://github.com/davidsandberg/facenet)\n", "* [bearsprogrammer](https://github.com/bearsprogrammer/real-time-deep-face-recognition)\n", "* [shanren7](https://github.com/shanren7/real_time_face_recognition)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" } }, "nbformat": 4, "nbformat_minor": 2 }