{ "metadata": { "name": "", "signature": "sha256:c53c2421915fb11982095f0ea8770699016407d3a8c5152fb524119381575b0e" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This is a continuation from The N-Gram Pipeline, Part I in which you simply read in Oscar's playing, generated the N-Grams, and wrote them out to a file. Here, you read them back in again (split into multiple notebooks for memory issues), and run clustering algorithms for full playback." ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "I. General Clustering with K-Means" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Corresponds to \"3. Offset Clustering.\" Start by reading in the data, and plotting it with Matplotlib to show length vs. offset. Then, plot it again but run the K-Means clustering color-coded algorithm to show the different clusters in the data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline\n", "\n", "from collections import Counter, defaultdict\n", "from sklearn.cluster import KMeans, Ward\n", "from itertools import izip, groupby\n", "from mingus.midi import fluidsynth\n", "from mingus.containers.Bar import Bar\n", "import mingus.core.value as value\n", "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import numpy as np\n", "import sys, copy, random, re, cPickle\n", "fluidsynth.init('/usr/share/sounds/sf2/FluidR3_GM.sf2',\"alsa\")" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 22, "text": [ "True" ] } ], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "# Read in the data and generate the offsets.\n", "# It's okay to generate offsets here since are trigram notes.\n", "# Recall: len = how long note lasts, offset = when it's hit (e.g. stopwatch).\n", "numberofitems = 12345678910112 # some huge placeholder\n", "data = pd.read_csv('./oscar2ngrams.txt', names=['Note','Len'])\n", "\n", "# toggle if want to limit to first k items\n", "# 1078 notes in original MIDI file\n", "# numberofitems = 1078\n", "# data = data[:numberofitems]\n", "\n", "# Re-add offsets\n", "totaloffset = 0\n", "offsets = []\n", "for i in data['Len']:\n", " offsets.append(totaloffset)\n", " totaloffset += i\n", "data[\"Offset\"] = pd.Series(offsets)\n", "data.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", " | Note | \n", "Len | \n", "Offset | \n", "
---|---|---|---|
0 | \n", "D5 | \n", "0.75 | \n", "0.00 | \n", "
1 | \n", "C#5 | \n", "0.25 | \n", "0.75 | \n", "
2 | \n", "A5 | \n", "0.50 | \n", "1.00 | \n", "
3 | \n", "D5 | \n", "0.75 | \n", "1.50 | \n", "
4 | \n", "B-4 | \n", "0.25 | \n", "2.25 | \n", "
5 rows \u00d7 3 columns
\n", "\n", " | FullName | \n", "CommonName | \n", "Len | \n", "Offset | \n", "
---|---|---|---|---|
1 | \n", "Chord {D in octave 5 | C in octave 4 | E in oc... | \n", "A6-perfect-fourth minor tetrachord | \n", "1.125000 | \n", "8.000 | \n", "
2 | \n", "Chord {A in octave 3 | G in octave 3 | E in oc... | \n", "A3-incomplete dominant-seventh chord | \n", "1.250000 | \n", "8.000 | \n", "
3 | \n", "Chord {E in octave 6 | E in octave 4 | D in oc... | \n", "D6-quartal trichord | \n", "1.375000 | \n", "9.625 | \n", "
4 | \n", "Chord {C in octave 4 | A in octave 5} Dotted Q... | \n", "A5-interval class 3 | \n", "1.500000 | \n", "9.625 | \n", "
5 | \n", "Chord {G in octave 3 | A in octave 3} Quarter ... | \n", "A3-interval class 2 | \n", "1.666667 | \n", "9.625 | \n", "
5 rows \u00d7 4 columns
\n", "