{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Source Separation with Sparsity\n", "===============================\n", "\n", "*Important:* Please read the [installation page](http://gpeyre.github.io/numerical-tours/installation_python/) for details about how to install the toolboxes.\n", "$\\newcommand{\\dotp}[2]{\\langle #1, #2 \\rangle}$\n", "$\\newcommand{\\enscond}[2]{\\lbrace #1, #2 \\rbrace}$\n", "$\\newcommand{\\pd}[2]{ \\frac{ \\partial #1}{\\partial #2} }$\n", "$\\newcommand{\\umin}[1]{\\underset{#1}{\\min}\\;}$\n", "$\\newcommand{\\umax}[1]{\\underset{#1}{\\max}\\;}$\n", "$\\newcommand{\\umin}[1]{\\underset{#1}{\\min}\\;}$\n", "$\\newcommand{\\uargmin}[1]{\\underset{#1}{argmin}\\;}$\n", "$\\newcommand{\\norm}[1]{\\|#1\\|}$\n", "$\\newcommand{\\abs}[1]{\\left|#1\\right|}$\n", "$\\newcommand{\\choice}[1]{ \\left\\{ \\begin{array}{l} #1 \\end{array} \\right. }$\n", "$\\newcommand{\\pa}[1]{\\left(#1\\right)}$\n", "$\\newcommand{\\diag}[1]{{diag}\\left( #1 \\right)}$\n", "$\\newcommand{\\qandq}{\\quad\\text{and}\\quad}$\n", "$\\newcommand{\\qwhereq}{\\quad\\text{where}\\quad}$\n", "$\\newcommand{\\qifq}{ \\quad \\text{if} \\quad }$\n", "$\\newcommand{\\qarrq}{ \\quad \\Longrightarrow \\quad }$\n", "$\\newcommand{\\ZZ}{\\mathbb{Z}}$\n", "$\\newcommand{\\CC}{\\mathbb{C}}$\n", "$\\newcommand{\\RR}{\\mathbb{R}}$\n", "$\\newcommand{\\EE}{\\mathbb{E}}$\n", "$\\newcommand{\\Zz}{\\mathcal{Z}}$\n", "$\\newcommand{\\Ww}{\\mathcal{W}}$\n", "$\\newcommand{\\Vv}{\\mathcal{V}}$\n", "$\\newcommand{\\Nn}{\\mathcal{N}}$\n", "$\\newcommand{\\NN}{\\mathcal{N}}$\n", "$\\newcommand{\\Hh}{\\mathcal{H}}$\n", "$\\newcommand{\\Bb}{\\mathcal{B}}$\n", "$\\newcommand{\\Ee}{\\mathcal{E}}$\n", "$\\newcommand{\\Cc}{\\mathcal{C}}$\n", "$\\newcommand{\\Gg}{\\mathcal{G}}$\n", "$\\newcommand{\\Ss}{\\mathcal{S}}$\n", "$\\newcommand{\\Pp}{\\mathcal{P}}$\n", "$\\newcommand{\\Ff}{\\mathcal{F}}$\n", "$\\newcommand{\\Xx}{\\mathcal{X}}$\n", "$\\newcommand{\\Mm}{\\mathcal{M}}$\n", "$\\newcommand{\\Ii}{\\mathcal{I}}$\n", "$\\newcommand{\\Dd}{\\mathcal{D}}$\n", "$\\newcommand{\\Ll}{\\mathcal{L}}$\n", "$\\newcommand{\\Tt}{\\mathcal{T}}$\n", "$\\newcommand{\\si}{\\sigma}$\n", "$\\newcommand{\\al}{\\alpha}$\n", "$\\newcommand{\\la}{\\lambda}$\n", "$\\newcommand{\\ga}{\\gamma}$\n", "$\\newcommand{\\Ga}{\\Gamma}$\n", "$\\newcommand{\\La}{\\Lambda}$\n", "$\\newcommand{\\si}{\\sigma}$\n", "$\\newcommand{\\Si}{\\Sigma}$\n", "$\\newcommand{\\be}{\\beta}$\n", "$\\newcommand{\\de}{\\delta}$\n", "$\\newcommand{\\De}{\\Delta}$\n", "$\\newcommand{\\phi}{\\varphi}$\n", "$\\newcommand{\\th}{\\theta}$\n", "$\\newcommand{\\om}{\\omega}$\n", "$\\newcommand{\\Om}{\\Omega}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This numerical tour explore local Fourier analysis of sounds, and its\n", "application to source separation from stereo measurements." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "using PyPlot\n", "using NtToolBox\n", "using Autoreload\n", "using WAV\n", "reload(\"NtToolBox\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sound Mixing\n", "------------\n", "We load 3 sounds and simulate a stero recording by performing a linear\n", "blending of the sounds.\n", "\n", "Sound loading." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n = 1024*16\n", "s = 3 #number of sounds\n", "p = 2 #number of micros\n", "\n", "x = zeros(n,3)\n", "x[:,1] = load_sound(\"NtToolbox/src/data/bird.wav\",n)\n", "x[:,2] = load_sound(\"NtToolbox/src/data/female.wav\",n)\n", "x[:,3] = load_sound(\"NtToolbox/src/data/male.wav\",n);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Normalize the energy of the signals." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "x = x./repeat(std(x,1), outer=(n,1));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We mix the sound using a $2\\mathrm{x}3$ transformation matrix.\n", "Here the direction are well-spaced, but you can try with more complicated\n", "mixing matrices.\n", "\n", "Compute the mixing matrix" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "theta = Array(linspace(0, pi, s + 1)); theta = theta[1:3]\n", "theta[1] = 0.2\n", "M = vcat(cos(theta)', sin(theta)');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute the mixed sources." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y = x*M';" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display of the sounds and their mix." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figure(figsize = (10,10))\n", "\n", "for i in 1:s\n", " subplot(s, 1, i)\n", " plot(x[:, i])\n", " xlim(0,n)\n", " title(\"Source #$i\")\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display of the micro output." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figure(figsize = (10,7))\n", " \n", "for i in 1:p\n", " subplot(p, 1, i)\n", " plot(y[:, i])\n", " xlim(0,n)\n", " title(\"Micro #$i\")\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Local Fourier analysis of sound.\n", "--------------------------------\n", "In order to perform the separation, one performs a local Fourier analysis\n", "of the sound. The hope is that the sources will be well-separated over\n", "the Fourier domain because the sources are sparse after a STFT.\n", "\n", "\n", "\n", "\n", "First set up parameters for the STFT." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "w = 128 #size of the window\n", "q = Base.div(w,4); #overlap of the window" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute the STFT of the sources." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X = complex(zeros(w,4*w+1,s))\n", "Y = complex(zeros(w,4*w+1,p))\n", "\n", "for i in 1:s\n", " X[:,:,i] = perform_stft(x[:,i],w,q,n)\n", " figure(figsize = (15,10))\n", " plot_spectrogram(X[:,:,i],\"Source #$i\")\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Exercise 1__\n", "\n", "Compute the STFT of the micros, and store them into a matrix |Y|." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#run -i nt_solutions/audio_2_separation/exo1\n", "include(\"NtSolutions/audio_2_separation/exo1.jl\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Insert your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Estimation of Mixing Direction by Clustering\n", "--------------------------------------------\n", "Since the sources are quite sparse over the Fourier plane, the directions\n", "are well estimated by looking as the direction emerging from a point\n", "clouds of the transformed coefficients.\n", "\n", "\n", "First we compute the position of the point cloud." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "mf = size(Y)[1]\n", "mt = size(Y)[2]\n", "P = reshape(Y, (mt*mf,p))\n", "P = vcat(real(P), imag(P));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we keep only the 5% points with largest energy.\n", "\n", "\n", "Display some points in the original (spacial) domain.\n", "\n", "Number of displayed points." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "npts = 6000;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display the original points." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sel = randperm(n)\n", "\n", "sel = sel[1:npts]\n", "\n", "figure(figsize = (7,5))\n", "plot(y[sel,1], y[sel,2], \".\", ms = 3)\n", "xlim(-5,5)\n", "ylim(-5,5)\n", "title(\"Time domain\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Exercise 2__\n", "\n", "Display some points of $P$ in the transformed (time/frequency) domain." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "include(\"NtSolutions/audio_2_separation/exo2.jl\");" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Insert your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We compute the angle associated to each point over the transformed\n", "domain. The histogram shows the main direction of mixing." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "nrow = size(P)[1]\n", "Theta = zeros(nrow)\n", "for i in 1:nrow\n", " Theta[i] = mod(atan2(P[i,2],P[i,1]),pi)\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display histogram." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nbins = 100\n", "h,t = plt[:hist](Theta,nbins)\n", "h=h/sum(h)\n", "clf()\n", "bar(t[1:end-1], h, width = pi/nbins)\n", "xlim(0,pi);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Exercise 3__\n", "\n", "The histogram computed from the whole set of points are not peacked\n", "enough. To stabilize the detection of mixing direction, compute an\n", "histogram from a reduced set of point that have the largest amplitude.\n", "Compute the energy of each point. Extract only a small sub-set." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "include(\"NtSolutions/audio_2_separation/exo3.jl\");" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Insert your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__Exercise 4__\n", "\n", "Detect the direction $M_1$ approximating the true direction $M$ by\n", "looking at the local maxima of the histogram. First detect the set of\n", "local maxima, and then keep only the three largest. Sort in descending order." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "include(\"NtSolutions/audio_2_separation/exo4.jl\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Insert your code here." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Separation of the Sources using Clustering\n", "------------------------------------------\n", "Once the mixing direction are known, one can project the sources on the\n", "direction.\n", "\n", "\n", "We compute the projection of the coefficients Y on each estimated\n", "direction." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "A = reshape(Y, (mt*mf,p));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute the projection of the coefficients on the directions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "C = abs(M1'*A');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At each point $x$, the index $I(x)$ is the direction which creates the\n", "largest projection.\n", "\n", "$I$ is the index of the closest source." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tmp, I = compute_max(C,1)\n", "I = reshape(I, (mf,mt));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An additional denoising is achieved by removing small coefficients." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "T = .05\n", "D = sqrt(sum(abs(Y).^2, 3))[:,:,1]\n", "I = I.*(D .> T);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can display the segmentation of the time frequency plane." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figure(figsize = (15,10))\n", "imageplot(I[1:Base.div(mf,2),:])\n", "imshow(I[1:Base.div(mf,2),:], cmap = get_cmap(\"jet\"), interpolation = \"nearest\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The recovered coefficients are obtained by projection." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Proj = M1'*A'\n", "Xr = complex(zeros(w,4*w+1,s))\n", "for i in 1:s\n", " Xr[:,:,i] = reshape(Proj[i,:], (mf,mt)).*(I .== i)\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The estimated signals are obtained by inverting the STFT." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "xr = zeros(n,s)\n", "for i in 1:s\n", " xr[:,i] = perform_stft(Xr[:,:,i], w, q, n)\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can display the recovered signals." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "figure(figsize = (10,10))\n", "\n", "for i in 1:s\n", " subplot(s,1,i)\n", " plot(xr[:,i])\n", " xlim(0,n)\n", " title(\"Estimated source #$i\")\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One can listen to the recovered sources." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "i = 1\n", "WAV.wavplay(x[:,i], 15000) # Supported back-ends : AudioQueue (MacOSX) and Pulse Audio (Linux, libpulse-simple). \n", "#There is not a native backend for Windows yet." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "WAV.wavplay(xr[:,i], 15000)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Julia 0.5.0", "language": "julia", "name": "julia-0.5" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.5.0" } }, "nbformat": 4, "nbformat_minor": 1 }