{"worksheets": [{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["Source Separation with Sparsity", "===============================", "", "*Important:* Please read the [installation page](http://gpeyre.github.io/numerical-tours/installation_python/) for details about how to install the toolboxes.", "$\\newcommand{\\dotp}[2]{\\langle #1, #2 \\rangle}$", "$\\newcommand{\\enscond}[2]{\\lbrace #1, #2 \\rbrace}$", "$\\newcommand{\\pd}[2]{ \\frac{ \\partial #1}{\\partial #2} }$", "$\\newcommand{\\umin}[1]{\\underset{#1}{\\min}\\;}$", "$\\newcommand{\\umax}[1]{\\underset{#1}{\\max}\\;}$", "$\\newcommand{\\umin}[1]{\\underset{#1}{\\min}\\;}$", "$\\newcommand{\\uargmin}[1]{\\underset{#1}{argmin}\\;}$", "$\\newcommand{\\norm}[1]{\\|#1\\|}$", "$\\newcommand{\\abs}[1]{\\left|#1\\right|}$", "$\\newcommand{\\choice}[1]{ \\left\\{ \\begin{array}{l} #1 \\end{array} \\right. }$", "$\\newcommand{\\pa}[1]{\\left(#1\\right)}$", "$\\newcommand{\\diag}[1]{{diag}\\left( #1 \\right)}$", "$\\newcommand{\\qandq}{\\quad\\text{and}\\quad}$", "$\\newcommand{\\qwhereq}{\\quad\\text{where}\\quad}$", "$\\newcommand{\\qifq}{ \\quad \\text{if} \\quad }$", "$\\newcommand{\\qarrq}{ \\quad \\Longrightarrow \\quad }$", "$\\newcommand{\\ZZ}{\\mathbb{Z}}$", "$\\newcommand{\\CC}{\\mathbb{C}}$", "$\\newcommand{\\RR}{\\mathbb{R}}$", "$\\newcommand{\\EE}{\\mathbb{E}}$", "$\\newcommand{\\Zz}{\\mathcal{Z}}$", "$\\newcommand{\\Ww}{\\mathcal{W}}$", "$\\newcommand{\\Vv}{\\mathcal{V}}$", "$\\newcommand{\\Nn}{\\mathcal{N}}$", "$\\newcommand{\\NN}{\\mathcal{N}}$", "$\\newcommand{\\Hh}{\\mathcal{H}}$", "$\\newcommand{\\Bb}{\\mathcal{B}}$", "$\\newcommand{\\Ee}{\\mathcal{E}}$", "$\\newcommand{\\Cc}{\\mathcal{C}}$", "$\\newcommand{\\Gg}{\\mathcal{G}}$", "$\\newcommand{\\Ss}{\\mathcal{S}}$", "$\\newcommand{\\Pp}{\\mathcal{P}}$", "$\\newcommand{\\Ff}{\\mathcal{F}}$", "$\\newcommand{\\Xx}{\\mathcal{X}}$", "$\\newcommand{\\Mm}{\\mathcal{M}}$", "$\\newcommand{\\Ii}{\\mathcal{I}}$", "$\\newcommand{\\Dd}{\\mathcal{D}}$", "$\\newcommand{\\Ll}{\\mathcal{L}}$", "$\\newcommand{\\Tt}{\\mathcal{T}}$", "$\\newcommand{\\si}{\\sigma}$", "$\\newcommand{\\al}{\\alpha}$", "$\\newcommand{\\la}{\\lambda}$", "$\\newcommand{\\ga}{\\gamma}$", "$\\newcommand{\\Ga}{\\Gamma}$", "$\\newcommand{\\La}{\\Lambda}$", "$\\newcommand{\\si}{\\sigma}$", "$\\newcommand{\\Si}{\\Sigma}$", "$\\newcommand{\\be}{\\beta}$", "$\\newcommand{\\de}{\\delta}$", "$\\newcommand{\\De}{\\Delta}$", "$\\newcommand{\\phi}{\\varphi}$", "$\\newcommand{\\th}{\\theta}$", "$\\newcommand{\\om}{\\omega}$", "$\\newcommand{\\Om}{\\Omega}$"]}, {"cell_type": "markdown", "metadata": {}, "source": ["This numerical tour explore local Fourier analysis of sounds, and its", "application to source separation from stereo measurements."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["from __future__ import division", "import nt_toolbox as nt", "from nt_solutions import audio_2_separation as solutions", "%matplotlib inline", "%load_ext autoreload", "%autoreload 2"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Sound Mixing", "------------", "We load 3 sounds and simulate a stero recording by performing a linear", "blending of the sounds.", "", "", "For Scilab users, it is safer to extend the stack size.", "For Matlab users this does nothing."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["extend_stack_size(4)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Sound loading."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["n = 1024*16", "s = 3; % number of sound", "p = 2; % number of micros", "options.subsampling = 1", "x = zeros(n, 3)", "[x(: , 1), fs] = load_sound('bird', n, options)", "[x(: , 2), fs] = load_sound('female', n, options)", "[x(: , 3), fs] = load_sound('male', n, options)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["normalize the energy of the signals"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["x = x./ repmat(std(x, 1), [n 1])"]}, {"cell_type": "markdown", "metadata": {}, "source": ["We mix the sound using a |2x3| transformation matrix.", "Here the direction are well-spaced, but you can try with more complicated", "mixing matrices.", "", "compute the mixing matrix"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["theta = linspace(0, pi(), s + 1); theta(s + 1) = []", "theta(1) = .2", "M = [cos(theta); sin(theta)]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["compute the mixed sources"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["y = x*M'"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Display of the sounds and their mix."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["for i in 1: s:", " subplot(s, 1, i)", " plot(x(: , i)); axis('tight')", " set_graphic_sizes([], 20)", " title(strcat('Source #', num2str(i)))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Display of the micro output."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["for i in 1: p:", " subplot(p, 1, i)", " plot(y(: , i)); axis('tight')", " set_graphic_sizes([], 20)", " title(strcat('Micro #', num2str(i)))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Local Fourier analysis of sound.", "--------------------------------", "In order to perform the separation, one performs a local Fourier analysis", "of the sound. The hope is that the sources will be well-separated over", "the Fourier domain because the sources are sparse after a STFT.", "", "", "", "", "First set up parameters for the STFT."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["options.n = n", "w = 128; % size of the window", "q = w/ 4; % overlap of the window"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Compute the STFT of the sources."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["clf; X = []; Y = []", "for i in 1: s:", " X(: , : , i) = perform_stft(x(: , i), w, q, options)", " subplot(s, 1, i)", " plot_spectrogram(X(: , : , i))", " set_graphic_sizes([], 20)", " title(strcat('Source #', num2str(i)))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["__Exercise 1__", "", "Compute the STFT of the micros, and store them into a matrix |Y|."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["solutions.exo1()"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["## Insert your code here."]}, {"cell_type": "markdown", "metadata": {}, "source": ["Estimation of Mixing Direction by Clustering", "--------------------------------------------", "Since the sources are quite sparse over the Fourier plane, the directions", "are well estimated by looking as the direction emerging from a point", "clouds of the transformed coefficients.", "", "", "First we compute the position of the point cloud."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["mf = size(Y, 1)", "mt = size(Y, 2)", "P = reshape(Y, [mt*mf p])", "P = [real(P); imag(P)]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Then we keep only the 5% of points with largest energy.", "", "", "Display some points in the original (spacial) domain.", "", "number of displayed points"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["npts = 6000"]}, {"cell_type": "markdown", "metadata": {}, "source": ["display original points"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["sel = randperm(n); sel = sel(1: npts)", "", "plot(y(sel, 1), y(sel, 2), '.')", "axis([-1 1 -1 1]*5)", "set_graphic_sizes([], 20)", "title('Time domain')"]}, {"cell_type": "markdown", "metadata": {}, "source": ["__Exercise 2__", "", "Display some points of |P| in the transformed (time/frequency) domain."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["solutions.exo2()"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["## Insert your code here."]}, {"cell_type": "markdown", "metadata": {}, "source": ["We compute the angle associated to each point over the transformed", "domain. The histograms shows the main direction of mixing."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["Theta = mod(atan2(P(: , 2), P(: , 1)), pi())"]}, {"cell_type": "markdown", "metadata": {}, "source": ["display histograms"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["nbins = 100", "[h, t] = hist(Theta, nbins)", "h = h/ sum(h)", "", "bar(t, h); axis('tight')"]}, {"cell_type": "markdown", "metadata": {}, "source": ["__Exercise 3__", "", "The histogram computed from the whole set of points are not peacked", "enough. To stabilize the detection of mixing direction, compute an", "histogram from a reduced set of point that have the largest amplitude.", "ompute the energy of each point", "xtract only a small sub-set", "ompute histogram", "isplay histograms"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["solutions.exo3()"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["## Insert your code here."]}, {"cell_type": "markdown", "metadata": {}, "source": ["__Exercise 4__", "", "Detect the direction |M1| approximating the true direction |M| by", "looking at the local maxima of the histogram. First detect the set of", "local maxima, and then keep only the three largest.", "ort in descending order"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["solutions.exo4()"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["## Insert your code here."]}, {"cell_type": "markdown", "metadata": {}, "source": ["Separation of the Sources using Clustering", "------------------------------------------", "Once the mixing direction are known, one can project the sources on the", "direction.", "", "", "We compute the projection of the coefficients Y on each estimated", "direction."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["A = reshape(Y, [mt*mf p])"]}, {"cell_type": "markdown", "metadata": {}, "source": ["compute the projection of the coefficients on the directions"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["C = abs(M1'*A')"]}, {"cell_type": "markdown", "metadata": {}, "source": ["At each point |x|, the index |I(x)| is the direction which creates the", "largest projection.", "", "I is the index of the closest source"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["[tmp, I] = compute_max(C, 1)", "I = reshape(I, [mf mt])"]}, {"cell_type": "markdown", "metadata": {}, "source": ["An additional denoising is achieved by removing small coefficients.", "", "do not take into account too small coefficients"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["T = .05", "D = sqrt(sum(abs(Y).^2, 3))", "I = I .* (D >T)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["We can display the segmentation of the time frequency plane."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["imageplot(I(1: mf/ 2, : ))", "axis('normal')", "set_colormap('jet')"]}, {"cell_type": "markdown", "metadata": {}, "source": ["The recovered coefficients are obtained by projection."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["Proj = M1'*A'", "Xr = []", "for i in 1: s:", " Xr(: , : , i) = reshape(Proj(i, : ), [mf mt]) .* (I = =i)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["The estimated signals are obtained by inverting the STFT."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["for i in 1: s:", " xr(: , i) = perform_stft(Xr(: , : , i), w, q, options)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["One can display the recovered signals."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["for i in 1: s:", " subplot(s, 1, i)", " plot(xr(: , i)); axis('tight')", " set_graphic_sizes([], 20)", " title(strcat('Estimated source #', num2str(i)))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["One can listen to the recovered sources."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["i = 1", "sound(x(: , i), fs)", "sound(xr(: , i), fs)"]}]}], "nbformat": 3, "metadata": {"name": ""}, "nbformat_minor": 0}