{ "metadata": { "name": "", "signature": "sha256:783e03c8dccbe88c733ae4e2e9f5599a0fc64088161f4e1749f2001da438a6ad" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "K-Means Clustering" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sometimes, an unsupervised learning technique is preferred. Perhaps you do not have access to adequate training data, or perhaps the training data's labels are not completely clear. Maybe you just want to quickly sort real-world, unseen, data into groups based on its feature similarity.\n", "\n", "In such cases, clustering is a great option!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try clustering with a familiar bunch of audio files and code.\n", "\n", "Download `simpleLoop.wav`, and play it." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from urllib import urlretrieve\n", "urlretrieve('https://ccrma.stanford.edu/workshops/mir2014/audio/simpleLoop.wav', filename='simpleLoop.wav')\n", "\n", "from IPython.display import Audio\n", "Audio('simpleLoop.wav')" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "\n", " \n", " " ], "metadata": {}, "output_type": "pyout", "prompt_number": 2, "text": [ "" ] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load the audio file into an array." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from essentia.standard import MonoLoader\n", "simple_loop = MonoLoader(filename='simpleLoop.wav')()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Scale the features (using the scale function) from -1 to 1. (See Lab 2 if you need a reminder.)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn import preprocessing\n", "min_max_scaler = preprocessing.MinMaxScaler(feature_range=(-1, 1))\n", "mfccs_scaled = min_max_scaler.fit_transform(mfccs)\n", "print mfccs.shape\n", "print mfccs_scaled.shape" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(260, 13)\n", "(260, 13)\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's cluster time! We're using NETLAB's implementation of the kmeans algorithm. \n", "\n", "Use the kmeans algorithm to create clusters of your feature. kMeans will output 2 things of interest to you: \n", "(1) The center-points of clusters. You can use the coordinates of the center of the cluster to measure the distance of any point from the center. This not only provides you with a distance metric of how \"good\" a point fits into a given cluster, but this allows you to sort by the points which are closest to the center of a given frame! Quite useful. \n", "\n", "(2) Each point will be assigned a label, or cluster #. You can then use this label to produce a transcription, do creative stuff, or further train another downstream classifier.\n", "\n", "Attention:\n", "There are 2 functions called kmeans - one from the CATBox and another from Netlab. You should be using the one from Netlab. Verify that you are by typing which kmeans in your command line to verify...\n", "\n", "Here's the help function for kmeans: \n", "\n", "> help kmeans\n", "\n", " KMEANS\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Trains a k means cluster model.\n", "\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0Description\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 CENTRES = KMEANS(CENTRES, DATA, OPTIONS) uses the batch K-means\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0algorithm to set the centres of a cluster model. The matrix DATA\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0represents the data which is being clustered, with each row\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0corresponding to a vector. The sum of squares error function is used.\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0The point at which a local minimum is achieved is returned as\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0CENTRES. The error value at that point is returned in OPTIONS(8).\n", "\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0[CENTRES, OPTIONS, POST, ERRLOG] = KMEANS(CENTRES, DATA, OPTIONS)\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0also returns the cluster number (in a one-of-N encoding) for each\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0data point in POST and a log of the error values after each cycle in\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0ERRLOG. The optional parameters have the following\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0interpretations.\n", "\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0OPTIONS(1) is set to 1 to display error values; also logs error\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0values in the return argument ERRLOG. If OPTIONS(1) is set to 0, then\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0only warning messages are displayed. If OPTIONS(1) is -1, then\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0nothing is displayed.\n", "\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0OPTIONS(2) is a measure of the absolute precision required for the\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0value of CENTRES at the solution. If the absolute difference between\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0the values of CENTRES between two successive steps is less than\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0OPTIONS(2), then this condition is satisfied.\n", "\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0OPTIONS(3) is a measure of the precision required of the error\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0function at the solution. If the absolute difference between the\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0error functions between two successive steps is less than OPTIONS(3),\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0then this condition is satisfied. Both this and the previous\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0condition must be satisfied for termination.\n", "\n", "\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0OPTIONS(14) is the maximum number of iterations; default 100.\n", "\n", " Now, simply put, here are some examples of how you use it: \n", "\n", " % Initialize # of clusters that you want to find and their initial conditions.\n", " numCenters = 2; % the size of the initial centers; this is passed to k-means to determine the value of k.\n", " numFeatures = 2; % replace the \"2\" with however many features you have extracted\n", " centers = zeros(numCenters , numFeatures ); % inits center points to 0\n", " \u00a0\n", " % setup vector of options for kmeans trainer\n", " options(1) = 1; \n", " options(5) = 1;\n", " options(14) = 50; % num of steps to wait for convergence\n", " \u00a0\n", " % train centers from data\n", " [centers,options,post] = kmeans(centers , your_feature_data_matrix , options);\n", " \u00a0\n", " %Output: \n", " % Centers contains the center coordinates of the clusters - we can use this to calculate the distance for each point \n", " in the distance to the cluster center.\n", " % Post contains the assigned cluster number for each point in your feature matrix. (from 1 to k) " ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn.cluster import KMeans\n", "kmeans = KMeans(n_clusters=2)\n", "labels = kmeans.fit_predict(mfccs_scaled)\n", "mfccs.shape\n", "print labels" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n", " 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1\n", " 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n", " 0]\n" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write a script to list which audio slices (or audio files) were categorized as Cluster # 1. Do the same or Cluster # 2. Do the clusters make sense? Now, modify the script to play the audio slices that in each cluster - listening to the clusters will help us build intuition of what's in each cluster. \n", "\n", "Repeat this clustering (steps 3-7), and listening to the contents of the clusters with CongaGroove-mono.wav. \n", "\n", "Repeat this clustering (steps 3-7) using the CongaGroove and 3 clusters. Listen to the results. Try again with 4 clusters. Listen to the results. (etc, etc\u2026)\n", "\n", "Once you complete this, try out some of the many, many other audio loops in the audio loops. (Located In audio\\Miscellaneous Loops Samples and SFX)" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "MFCCs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's add MFCCs to the mix. Extract the mean of the 12 MFCCs (coefficients 1-12, do not use the \"0th\" coefficient) for each onset using the code that you wrote. Add those to the feature vectors, along with zero crossing and centroid. We should now have 14 features being extracted - this is started to get \"real world\"! With this simple example (and limited collection of audio slices, you probably won't notice a difference - but at least it didn't break, right?) Let's try it with the some other audio to truly appreciate the power of timbral clustering." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "BONUS (ONLY IF YOU HAVE EXTRA TIME\u2026)\n", "Now that we can take ANY LOOP, onset detect, feature extract, and cluster it, let's have some fun. \n", "Choose any audio file from our collection and use the above techniques break it up into clusters. \n", "Listen to those clusters.\n", "\n", "Some rules of thumb: since you need to pick the number of clusters ahead of time, listen to your audio files first. \n", "You can break a drum kit or percussion loop into 3 - 6 clusters for it to segment well. More is OK too.\n", "Musical loops: 3-6 clusters should work nicely. \n", "Songs - lots of clusters for them to segment well. Try 'em out!\n", "\n", "BONUS (ONLY IF YOU REALLY HAVE EXTRA TIME\u2026)\n", "Review your script that PLAYs all of the audio files that were categorized as Cluster # 1 or Cluster # 2. \n", "Now, modify your script to play and plot the audio files which are closest to the center of your clusters.\n", "\n", "This hopefully provides you with which files are representative of your cluster. " ] } ], "metadata": {} } ] }