{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Blind Source Separation with the Shogun Machine Learning Toolbox" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### By Kevin Hughes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook illustrates Blind Source Seperation(BSS) on audio signals using Independent Component Analysis (ICA) in Shogun. We generate a mixed signal and try to seperate it out using Shogun's implementation of ICA & BSS called JADE." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "My favorite example of this problem is known as the *cocktail party problem* where a number of people are talking simultaneously and we want to separate each persons speech so we can listen to it separately. Now the caveat with this type of approach is that we need as many mixtures as we have source signals or in terms of the cocktail party problem we need as many microphones as people talking in the room." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's get started, this example is going to be in python and the first thing we are going to need to do is load some audio files. To make things a bit easier further on in this example I'm going to wrap the basic scipy wav file reader and add some additional functionality. First I added a case to handle converting stereo wav files back into mono wav files and secondly this loader takes a desired sample rate and resamples the input to match. This is important because when we mix the two audio signals they need to have the same sample rate. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import os\n", "SHOGUN_DATA_DIR=os.getenv('SHOGUN_DATA_DIR', '../../../data')\n", "from scipy.io import wavfile\n", "from scipy.signal import resample\n", "import shogun as sg\n", "\n", "def load_wav(filename,samplerate=44100):\n", " \n", " # load file\n", " rate, data = wavfile.read(filename)\n", "\n", " # convert stereo to mono\n", " if len(data.shape) > 1:\n", " data = data[:,0]/2 + data[:,1]/2\n", "\n", " # re-interpolate samplerate \n", " ratio = float(samplerate) / float(rate)\n", " data = resample(data, int(len(data) * ratio))\n", " \n", " return samplerate, data.astype(np.int16)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next we're going to need a way to play the audio files we're working with (otherwise this wouldn't be very exciting at all would it?). In the next bit of code I've defined a wavPlayer class that takes the signal and the sample rate and then creates a nice HTML5 webplayer right inline with the notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import Audio\n", "from IPython.display import display\n", "def wavPlayer(data, rate):\n", " display(Audio(data, rate=rate))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we can load and play wav files we actually need some wav files! I found the sounds from Starcraft to be a great source of wav files because they're short, interesting and remind me of my childhood. You can download Starcraft wav files here: http://wavs.unclebubby.com/computer/starcraft/ among other places on the web or from your Starcraft install directory (come on I know its still there).\n", "\n", "Another good source of data (although lets be honest less cool) is ICA central and various other more academic data sets: http://perso.telecom-paristech.fr/~cardoso/icacentral/base_multi.html. Note that for lots of these data sets the data will be mixed already so you'll be able to skip the next few steps." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okay lets load up an audio file. I chose the Terran Battlecruiser saying \"Good Day Commander\". In addition to the creating a wavPlayer I also plotted the data using Matplotlib (and tried my best to have the graph length match the HTML player length). Have a listen!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# change to the shogun-data directory\n", "import os\n", "os.chdir(os.path.join(SHOGUN_DATA_DIR, 'ica'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "\n", "# load\n", "fs1,s1 = load_wav('tbawht02.wav') # Terran Battlecruiser - \"Good day, commander.\"\n", "\n", "# plot\n", "plt.figure(figsize=(6.75,2))\n", "plt.plot(s1)\n", "plt.title('Signal 1')\n", "plt.show()\n", "\n", "# player\n", "wavPlayer(s1, fs1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's load a second audio clip:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# load\n", "fs2,s2 = load_wav('TMaRdy00.wav') # Terran Marine - \"You want a piece of me, boy?\"\n", "\n", "# plot\n", "plt.figure(figsize=(6.75,2))\n", "plt.plot(s2)\n", "plt.title('Signal 2')\n", "plt.show()\n", "\n", "# player\n", "wavPlayer(s2, fs2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and a third audio clip:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# load\n", "fs3,s3 = load_wav('PZeRdy00.wav') # Protoss Zealot - \"My life for Aiur!\"\n", "\n", "# plot\n", "plt.figure(figsize=(6.75,2))\n", "plt.plot(s3)\n", "plt.title('Signal 3')\n", "plt.show()\n", "\n", "# player\n", "wavPlayer(s3, fs3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we've got our audio files loaded up into our example program. The next thing we need to do is mix them together!\n", "\n", "First another nuance - what if the audio clips aren't the same lenth? The solution I came up with for this was to simply resize them all to the length of the longest signal, the extra length will just be filled with zeros so it won't affect the sound.\n", "\n", "The signals are mixed by creating a mixing matrix $A$ and taking the dot product of $A$ with the signals $S$.\n", "\n", "Afterwards I plot the mixed signals and create the wavPlayers, have a listen!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Adjust for different clip lengths\n", "fs = fs1\n", "length = max([len(s1), len(s2), len(s3)])\n", "s1 = np.resize(s1, (length,1))\n", "s2 = np.resize(s2, (length,1))\n", "s3 = np.resize(s3, (length,1))\n", "\n", "S = (np.c_[s1, s2, s3]).T\n", "\n", "# Mixing Matrix\n", "#A = np.random.uniform(size=(3,3))\n", "#A = A / A.sum(axis=0)\n", "A = np.array([[1, 0.5, 0.5],\n", " [0.5, 1, 0.5], \n", " [0.5, 0.5, 1]]) \n", "print('Mixing Matrix:')\n", "print(A.round(2))\n", "\n", "# Mix Signals\n", "X = np.dot(A,S)\n", "\n", "# Mixed Signal i\n", "for i in range(X.shape[0]):\n", " plt.figure(figsize=(6.75,2))\n", " plt.plot((X[i]).astype(np.int16))\n", " plt.title('Mixed Signal %d' % (i+1))\n", " plt.show()\n", " wavPlayer((X[i]).astype(np.int16), fs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now before we can work on separating these signals we need to get the data ready for Shogun, thankfully this is pretty easy!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Convert to features for shogun\n", "mixed_signals = sg.create_features((X).astype(np.float64))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now lets unmix those signals!\n", "\n", "In this example I'm going to use an Independent Component Analysis (ICA) algorithm called JADE. JADE is one of the ICA algorithms available in Shogun and it works by performing Aproximate Joint Diagonalization (AJD) on a 4th order cumulant tensor. I'm not going to go into a lot of detail on how JADE works behind the scenes but here is the reference for the original paper:\n", "\n", "Cardoso, J. F., & Souloumiac, A. (1993). Blind beamforming for non-Gaussian signals. In IEE Proceedings F (Radar and Signal Processing) (Vol. 140, No. 6, pp. 362-370). IET Digital Library.\n", "\n", "Shogun also has several other ICA algorithms including the Second Order Blind Identification (SOBI) algorithm, FFSep, JediSep, UWedgeSep and FastICA. All of the algorithms inherit from the ICAConverter base class and share some common methods for setting an intial guess for the mixing matrix, retrieving the final mixing matrix and getting/setting the number of iterations to run and the desired convergence tolerance. Some of the algorithms have additional getters for intermediate calculations, for example Jade has a method for returning the 4th order cumulant tensor while the \"Sep\" algorithms have a getter for the time lagged covariance matrices. Check out the source code on GitHub (https://github.com/shogun-toolbox/shogun) or the Shogun docs (http://www.shogun-toolbox.org/doc/en/latest/annotated.html) for more details! " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Separating with JADE\n", "jade = sg.create_transformer('Jade')\n", "jade.fit(mixed_signals)\n", "signals = jade.transform(mixed_signals)\n", "\n", "S_ = signals.get('feature_matrix')\n", "\n", "A_ = jade.get('mixing_matrix')\n", "A_ = A_ / A_.sum(axis=0)\n", "print('Estimated Mixing Matrix:')\n", "print(A_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Thats all there is to it! Check out how nicely those signals have been separated and have a listen!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Show separation results\n", "\n", "# Separated Signal i\n", "gain = 4000\n", "for i in range(S_.shape[0]):\n", " plt.figure(figsize=(6.75,2))\n", " plt.plot((gain*S_[i]).astype(np.int16))\n", " plt.title('Separated Signal %d' % (i+1))\n", " plt.show()\n", " wavPlayer((gain*S_[i]).astype(np.int16), fs)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "BSS isn't only useful for working with Audio, it is also useful for image processing and pre-processing other forms of high dimensional data. Have a google for ICA and machine learning if you want to learn more!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 1 }