{
 "metadata": {
  "name": "",
  "signature": "sha256:cdc350eb32b77a3e1a97d483c035094fc3c440161bc7d2e9c53af0c1d3781e90"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now, it's time to do some deep learning. Recall that a RBM (Restricted Boltzmann Machine) takes in a binary vector, and you can use it in the SKLearn Pipeline with a classifier. You'll want to read in Oscar's original data as an array of bitwise note vectors, and from there build the RBM to predict chords (the y's, perhaps build the chord bank and assign a unique number to each). After that, given a note vector (maybe plural?), you should be able to predict the chords for a note (notes?).\n",
      "\n",
      "This is for Oscar's musical data. The next step is to do the classification for your n-gram model."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from collections import defaultdict\n",
      "from sklearn.neural_network import BernoulliRBM\n",
      "import pandas as pd\n",
      "import numpy as np"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 28
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Extract chords into unique ids, e.g. 1, 2, 3, 4, 5\n",
      "allchords = defaultdict()\n",
      "with open(\"oscar2chords_extract.txt\", 'rb') as f:\n",
      "    assert len(allchords) == len(set(allchords)) # ensure no duplicate chords\n",
      "    for ix, line in enumerate(f):\n",
      "        allchords[ix] = line.rstrip()\n",
      "print len(allchords) # oscar uses this number of unique chords"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "40\n"
       ]
      }
     ],
     "prompt_number": 54
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Read in Oscar's data.\n",
      "vectors = []\n",
      "notedata = pd.read_csv(open(\"oscar2notes.txt\", 'rb'), skiprows=2)\n",
      "allnotes = []\n",
      "for note, octave in zip(notedata[\"Note/Rest\"], notedata[\"Octave\"]):\n",
      "    allnotes.append(\"%s%s\" % (note, octave))\n",
      "\n",
      "print \"Number of notes (# of samples for RBM): \", len(notedata)\n",
      "notedata.head()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Number of notes (# of samples for RBM):  1344\n"
       ]
      },
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>Note/Rest</th>\n",
        "      <th>Octave</th>\n",
        "      <th>Len</th>\n",
        "      <th>Offset</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> B</td>\n",
        "      <td> 3</td>\n",
        "      <td> 0.500000</td>\n",
        "      <td> 12.625</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> A</td>\n",
        "      <td> 5</td>\n",
        "      <td> 0.250000</td>\n",
        "      <td> 15.000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> F</td>\n",
        "      <td> 4</td>\n",
        "      <td> 3.125000</td>\n",
        "      <td> 16.000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td> G</td>\n",
        "      <td> 4</td>\n",
        "      <td> 0.666667</td>\n",
        "      <td> 20.625</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td> F</td>\n",
        "      <td> 4</td>\n",
        "      <td> 1.250000</td>\n",
        "      <td> 23.875</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "<p>5 rows \u00d7 4 columns</p>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 52,
       "text": [
        "  Note/Rest  Octave       Len  Offset\n",
        "0         B       3  0.500000  12.625\n",
        "1         A       5  0.250000  15.000\n",
        "2         F       4  3.125000  16.000\n",
        "3         G       4  0.666667  20.625\n",
        "4         F       4  1.250000  23.875\n",
        "\n",
        "[5 rows x 4 columns]"
       ]
      }
     ],
     "prompt_number": 52
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Given a MUSIC21 note, such as C5 or D#7, convert it\n",
      "# into a note on the keyboard between 0 and 87 inclusive.\n",
      "# Don't convert it for mingus; try to use music21 note style\n",
      "# as much as possible for all this stuff.\n",
      "def quantify(note):\n",
      "    notevals = {\n",
      "        'C' : 0,\n",
      "        'D' : 2,\n",
      "        'E' : 4,\n",
      "        'F' : 5,\n",
      "        'G' : 7,\n",
      "        'A' : 9,\n",
      "        'B' : 11\n",
      "    }\n",
      "    quantized = 0\n",
      "    octave = int(note[-1]) - 1\n",
      "    for i in note[:-1]:\n",
      "        if i in notevals: quantized += notevals[i]\n",
      "        if i == '-': quantized -= 1\n",
      "        if i == '#': quantized += 1\n",
      "    quantized += 12 * octave\n",
      "    return quantized\n",
      "\n",
      "# Create bitwise note vectors for use with Restricted Boltzmann Machine.\n",
      "vectors = np.zeros((1, 88))\n",
      "for ix, note in enumerate(allnotes):\n",
      "    vect = np.zeros((1, 88))\n",
      "    vect[0, quantify(note)] = 1\n",
      "    if ix == 0:\n",
      "        vectors = vect\n",
      "    else:\n",
      "        vectors = np.vstack((vectors, vect))\n",
      "print vectors.shape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "(1344, 88)\n"
       ]
      }
     ],
     "prompt_number": 48
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "See notes on what you should actually do.\n",
      "\n",
      "1. Annotate Oscar's chord data so you have a notes vector for each chord listing the notes that go well with it.\n",
      "2. Move onto each cluster; for each cluster, build a vector covering all of its notes.\n",
      "3. You need a training and a test set, so create those somehow.\n",
      "4. Stack RBM with Pipeline to build a deep belief network, ending with Logistic Regression. Use this to classify and train accordingly."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}