{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "##SPAM Classifier\n",
      "<p>\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "Steps<br>\n",
      "Read in data<br>\n",
      "Feature Engineering<br>\n",
      "-- Simple Bins<br>\n",
      "-- TFIDF<br>\n",
      "-- NLP<br>\n",
      "Sparse Representation<br>\n",
      "Training<br>\n",
      "-- Naive Bayes<br>\n",
      "-- SGD<br>\n",
      "\n",
      "\n",
      "\n",
      "\n",
      "</p>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "###Reading and Preprocessing the Data\n",
      "<p>The data used in this module is from the <a href=\"http://csmining.org/index.php/spam-email-datasets-.html\">CSDMC2010 SPAM corpus</a>. If you want to follow along with your own data, or make any modifications on the examples/data, do the following first in a Python compatible environment:<br>\n",
      "<ul>\n",
      "    <li>Download and unzip the <a href=\"http://csmining.org/index.php/spam-email-datasets-.html?file=tl_files/Project_Datasets/task2/CSDMC2010_SPAM.zip\">data</a></li>\n",
      "    <li>Run the 'ExtractContent.py' to extract the subject and body from each email file. Note, if you'd like to make your SPAM classifier even better, you can modify this Python script to use more than just subject and body information</li>\n",
      "</ul>\n",
      "\n",
      "\n",
      "</p>\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}