{ "metadata": { "name": "", "signature": "sha256:ce31f25fcd873d7b36f72cb59a4144a14c92dcaee1a11479b4f8a643bf6ae913" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Now that you've done a decent job of getting the basic n-gram model down, you'll want to work on the offsets so things actually sound thoughtful, and not just a long stream of notes. To do this, import the data (experiment first with small # of notes, say 100) and then run KMean Clustering to find patterns. Actually, it might be useful to plot stuff first - that way you get an idea of how much notes are separated from one another.\n", "\n", "For the offsets, you'll assume that notes only come at 0.25, 0.50, etc. instead of weird numbers like 0.682 or 1.537. This is from rounding the numbers in the N-Gram notebook." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline\n", "\n", "from collections import Counter, defaultdict\n", "from sklearn.cluster import KMeans\n", "from itertools import izip\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import numpy as np\n", "import sys, copy, random" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "# Read in the data and generate the offsets.\n", "# It's okay to generate offsets here since are trigram notes.\n", "# Recall: len = how long note lasts, offset = when it's hit (e.g. stopwatch).\n", "numberofitems = 12345678910112\n", "data = pd.read_csv('./oscar2trigrams.txt', names=['Note','Len'])\n", "\n", "# toggle if want to limit to first k items\n", "numberofitems = 50\n", "data = data[:numberofitems]\n", "\n", "# Re-add offsets\n", "totaloffset = 0\n", "offsets = []\n", "for i in data['Len']:\n", " offsets.append(totaloffset)\n", " totaloffset += i\n", "data[\"Offset\"] = pd.Series(offsets)\n", "print \"Shape: (%s,%s)\" % (data.shape)\n", "data.head()" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Shape: (50,3)\n" ] }, { "html": [ "
\n", " | Note | \n", "Len | \n", "Offset | \n", "
---|---|---|---|
0 | \n", "D5 | \n", "0.25 | \n", "0.00 | \n", "
1 | \n", "E4 | \n", "0.25 | \n", "0.25 | \n", "
2 | \n", "C#5 | \n", "0.50 | \n", "0.50 | \n", "
3 | \n", "A5 | \n", "0.50 | \n", "1.00 | \n", "
4 | \n", "F4 | \n", "0.50 | \n", "1.50 | \n", "
5 rows \u00d7 3 columns
\n", "