{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pedestrian and Face Detection on Simple Azure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pedestrian and Face Detection uses OpenCV to identify people standing in a picture or a video and NIST use case in this document is built with Apache Spark and Mesos clusters on multiple compute nodes.\n", "\n", "Simple Azure supports deploying software stacks for the NIST Pedestrian and Face Detection use case on top of Azure compute resources with the templates." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Original | Pedestrian Detected\n", ":-----------------------------------:|:------------------------------------------------------:\n", "![alt baby](https://raw.githubusercontent.com/lee212/simpleazure/master/ipynb/files/image03.png 'baby')|![alt baby-detected](https://raw.githubusercontent.com/lee212/simpleazure/master/ipynb/files/image05.png 'baby-detected')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Original | Pedestrian and Face Detected\n", ":---------------------------------------:|:----------------------------------------------------------:\n", "![alt person](https://raw.githubusercontent.com/lee212/simpleazure/master/ipynb/files/image06.png 'person')|![alt person-detected](https://raw.githubusercontent.com/lee212/simpleazure/master/ipynb/files/image04.png 'person-detected')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Human (pedestrian) detection and face detection have been studied during the last several years and models for them have improved along with Histograms of Oriented Gradients (HOG) for Human Detection [1]. OpenCV is a Computer Vision library including the SVM classifier and the HOG object detector for pedestrian detection and INRIA Person Dataset [2] is one of popular samples for both training and testing purposes. In this document, we deploy Apache Spark on Mesos clusters to train and apply detection models from OpenCV using Python API." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ansible Automation Tool" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ansible is a python tool to install/configure/manage software on multiple machines with JSON files where system descriptions are defined. There are reasons why we use Ansible:\n", "\n", "- Expandable: Leverages Python (default) but modules can be written in any language\n", "- Agentless: no setup required on managed node\n", "- Security: Allows deployment from user space; uses ssh for authentication\n", "- Flexibility: only requires ssh access to privileged user\n", "- Transparency: YAML Based script files express the steps of installing and configuring software\n", "- Modularity: Single Ansible Role (should) contain all required commands and variables to deploy software package independently\n", "- Sharing and portability: roles are available from source (github, bitbucket, gitlab, etc) or the Ansible Galaxy portal\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### INRIA Person Dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This dataset contains positive and negative images for training and test purposes with annotation files for upright persons in each image. 288 positive test images, 453 negative test images, 614 positive training images and 1218 negative training images are included along with normalized 64x128 pixel formats. 970MB dataset is available to download [3]." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### HOG with SVM model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Histogram of Oriented Gradient (HOG) and Support Vector Machine (SVM) are used as object detectors and classifiers and built-in python libraries from OpenCV provide these models for human detection." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deployment by Ansible" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When it comes to deploy applications and build clusters for batch-processing large datasets, Ansible scripts play a big role such as installation and configuration towards available machines. Ansible provides abstractions by Playbook Roles and reusability by Include statements. We define X application in X Ansible Role, for example, and use include statements to combine with other applications e.g. Y or Z. Five Ansible roles are used in this use case to build clusters for Human and Face Detection with INRIA dataset. The main Ansible playbook runs Ansible roles in order which looks like:\n", "\n", "```\n", "---\n", "include: sched/00-mesos.yml\n", "include: proc/01-spark.yml\n", "include: apps/02-opencv.yml\n", "include: data/03-inria-dataset.yml\n", "Include: anlys/04-human-face-detection.yml\n", "```\n", "\n", "Directory names e.g. sched, proc, data, or anlys indicate BDSS layers like:\n", "- sched: scheduler layer\n", "- proc: data processing layer\n", "- apps: application layer\n", "- data: dataset layer\n", "- anlys: analytics layer\n", "and two digits in the filename indicate an order of roles to be run. \n", "\n", "\n", "It is assumed that virtual machines are created by virtual-cluster-libs, the command line tool to start VM instances. For example on OpenStack, `vcl boot -p openstack -P $USER-` command starts a set of virtual machine instances with a cluster definition file `.cluster.py`. The number of machines and groups for clusters e.g. namenodes and datanodes are specified in the file and Ansible inventory file, a list of target machines with groups, is generated once machines are ready to use. Ansible roles run to install applications on virtual clusters.\n", "\n", "\n", "Mesos role is installed first with Ansible inventory groups for masters and slaves in which mesos-master runs on the masters group and mesos-slave runs on the slaves group. Apache Zookeeper is included in the mesos role so that mesos slaves find an elected mesos leader from the zookeeper. Spark, as a data processing layer, provides two options for distributed job processing, batch job processing via a cluster mode and real-time processing via a client mode. The Mesos dispatcher runs on a masters group to accept a batch job submission and Spark interactive shell, which is the client mode, provides real-time processing on any node in the cluster. Either way, Spark is installed after a scheduler layer i.e. mesos to identify a master host for a job submission. Installation of OpenCV, INRIA Person Dataset and Human and Face Detection Python applications are followed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Software Stacks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following software are expected in the stacks according to the [github](https://github.com/futuresystems/pedestrian-and-face-detection):\n", "\n", "- mesos cluster (master, worker)\n", "- spark (with dispatcher for mesos cluster mode)\n", "- openCV\n", "- zookeeper\n", "- INRIA Person Dataset\n", "- Detection Analytics in Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- [1] Dalal, Navneet, and Bill Triggs. \"Histograms of oriented gradients for human detection.\" 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). Vol. 1. IEEE, 2005. [pdf]\n", "- [2] http://pascal.inrialpes.fr/data/human/\n", "- [3] ftp://ftp.inrialpes.fr/pub/lear/douze/data/INRIAPerson.tar\n", "- [4] https://docs.python.org/2/library/configparser.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple Azure with Ansible" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Simple Azure supports Ansible to import and run Ansible scripts towards target machines i.e. Azure virtual machines. In the previous tutorial, we've learned how to deploy 3 VMs from the 101-vm-sshkey template and we are going to use the three virtual machines in this example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Server groups (inventory)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We may separate compute nodes in two groups: masters and workers therefore Mesos masters and zookeeper quorums manage job requests and leaders and workers run actual tasks. Ansible needs group definitions in their inventory therefore software installation associated with a proper part is completed. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quick Instructions (under development)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load SimpleAzure" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from simpleazure import SimpleAzure\n", "saz = SimpleAzure()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### IP Addresses of Compute Nodes" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "ips = saz.arm.view_info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load Ansible API with IPs" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from simpleazure.ansible_api import AnsibleAPI\n", "ansible_client = AnsibleAPI(ips)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download Ansible Playbooks from Github" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ansible scripts for Pedestrian and Face Detection is here: https://github.com/futuresystems/pedestrian-and-face-detection.\n", "We clone the repository using Github command line tools." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from simpleazure.github_cli import GithubCLI\n", "git_client = GithubCLI()\n", "git_client.set_repo('https://github.com/futuresystems/pedestrian-and-face-detection')\n", "git_client.clone()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Install Software Stacks to Targeted VMs" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "ansible_client.playbook(git_client.path + \"/site.yml\")\n", "ansible_client.run()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }