{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Finding faces in the Tribune collection\n",
    "\n",
    "A simple demonstration of facial detection using images from the State Library of NSW's Tribune collection."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-block alert-warning\">\n",
    "    <p>If you haven't used one of these notebooks before, they're basically web pages in which you can write, edit, and run live code. They're meant to encourage experimentation, so don't feel nervous. Just try running a few cells and see what happens!.</p>\n",
    "    <p>\n",
    "        Some tips:\n",
    "        <ul>\n",
    "            <li>Code cells have boxes around them. When you hover over them a <i class=\"fa-step-forward fa\"></i> icon appears.</li>\n",
    "            <li>To run a code cell either click the <i class=\"fa-step-forward fa\"></i> icon, or click on the cell and then hit <b>Shift+Enter</b>. The <b>Shift+Enter</b> combo will also move you to the next cell, so it's a quick way to work through the notebook.</li>\n",
    "            <li>While a cell is running a <b>*</b> appears in the square brackets next to the cell. Once the cell has finished running the asterix will be replaced with a number.</li>\n",
    "            <li>In most cases you'll want to start from the top of notebook and work your way down running each cell in turn. Later cells might depend on the results of earlier ones.</li>\n",
    "            <li>To edit a code cell, just click on it and type stuff. Remember to run the cell once you've finished editing.</li>\n",
    "        </ul>\n",
    "    </p>\n",
    "    </div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cv2\n",
    "import pandas as pd\n",
    "import os\n",
    "from urllib.parse import urlparse\n",
    "import requests\n",
    "from IPython.display import display, HTML\n",
    "import copy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load Tribune images data\n",
    "df = pd.read_csv('https://raw.githubusercontent.com/GLAM-Workbench/ozglam-data-records-of-resistance/master/data/images.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Link to the facial detection data file\n",
    "face_cl = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')\n",
    "\n",
    "def select_images(sample):\n",
    "    '''\n",
    "    Get a random sample of images.\n",
    "    '''\n",
    "    images = []\n",
    "    rows = df.sample(sample)\n",
    "    for img_id in list(rows['images']):\n",
    "        img_url = 'https://s3-ap-southeast-2.amazonaws.com/wraggetribune/images/{0}.jpg'.format(img_id)\n",
    "        images.append((img_id, img_url))\n",
    "    return images\n",
    "\n",
    "\n",
    "def download_image(img_url):\n",
    "    '''\n",
    "    Download and save the specified image.\n",
    "    '''\n",
    "    current_dir = os.getcwd()\n",
    "    parsed = urlparse(img_url)\n",
    "    filename = os.path.join(current_dir, os.path.basename(parsed.path))\n",
    "    response = requests.get(img_url, stream=True)\n",
    "    with open(filename, 'wb') as fd:\n",
    "        for chunk in response.iter_content(chunk_size=128):\n",
    "            fd.write(chunk)\n",
    "    return filename \n",
    "\n",
    "\n",
    "def detect_faces(img_file):\n",
    "    '''\n",
    "    Use OpenCV to find faces.\n",
    "    '''\n",
    "    faces = []\n",
    "    f = 1\n",
    "    print('Processing {}'.format(img_file))\n",
    "    try:\n",
    "        image = cv2.imread(img_file)\n",
    "        # Create a copy to annotate\n",
    "        results = image.copy()\n",
    "        # Create a greyscale copy for face detection\n",
    "        grey = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)\n",
    "        # Find faces!\n",
    "        # Try adjusting scaleFactor and minNeighbors if results aren't what you expect.\n",
    "        faces = face_cl.detectMultiScale(grey, scaleFactor=1.3, minNeighbors=4, minSize=(50, 50))\n",
    "    except cv2.error:\n",
    "        raise\n",
    "    else:\n",
    "        for (x, y, w, h) in faces:\n",
    "            # Save a cropped version of the detected face\n",
    "            face = image[y: y + h, x: x + w]\n",
    "            cv2.imwrite('{}-{}.jpg'.format(os.path.splitext(os.path.basename(img_file))[0], f), face)\n",
    "            # Draw a green box on the complete image\n",
    "            cv2.rectangle(results, (x, y), (x + w, y + h), (0, 255, 0), 2)\n",
    "            f += 1\n",
    "        # Save the annotated image\n",
    "        cv2.imwrite(img_file, results)\n",
    "    return faces\n",
    "\n",
    "\n",
    "def process_images(images):\n",
    "    '''\n",
    "    Find faces in a list of images.\n",
    "    Displays the results\n",
    "    '''\n",
    "    for img_id, img_url in images:\n",
    "        filename = download_image(img_url)\n",
    "        faces = detect_faces(filename)\n",
    "        html = '<image src=\"{}\"><br><a target=\"_blank\" href=\"http://digital.sl.nsw.gov.au/delivery/DeliveryManagerServlet?dps_pid={}&embedded=true&toolbar=false\">More details at SLNSW</a><br>'.format(os.path.basename(filename), img_id)\n",
    "        print('I found {} faces...'.format(len(faces)))\n",
    "        for i, face in enumerate(faces, 1):\n",
    "            html += '<a target=\"_blank\" href=\"{0}-{1}.jpg\"><image style=\"width: 100px; height: 100px; float: left; margin: 10px; object-fit: scale-down;\" src=\"{0}-{1}.jpg\"></a>'.format(img_id, i)\n",
    "    display(HTML(html))\n",
    "    \n",
    "    \n",
    "def get_random_image(sample=1):\n",
    "    '''\n",
    "    Process a random sample of images.\n",
    "    '''\n",
    "    images = select_images(sample)\n",
    "    process_images(images)\n",
    "    \n",
    "    \n",
    "def get_image_by_id(img_id):\n",
    "    '''\n",
    "    Process a specific image.\n",
    "    '''\n",
    "    images = [(img_id, 'https://s3-ap-southeast-2.amazonaws.com/wraggetribune/images/{0}.jpg'.format(img_id))]\n",
    "    process_images(images)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get a random image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Provide a number as a parameter for more than 1 images\n",
    "get_random_image()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get a particular image by it's identifier"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "get_image_by_id('FL4578163')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}