{"worksheets": [{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["Sparse Representation in a Gabor Dictionary", "===========================================", "", "*Important:* Please read the [installation page](http://gpeyre.github.io/numerical-tours/installation_python/) for details about how to install the toolboxes.", "$\\newcommand{\\dotp}[2]{\\langle #1, #2 \\rangle}$", "$\\newcommand{\\enscond}[2]{\\lbrace #1, #2 \\rbrace}$", "$\\newcommand{\\pd}[2]{ \\frac{ \\partial #1}{\\partial #2} }$", "$\\newcommand{\\umin}[1]{\\underset{#1}{\\min}\\;}$", "$\\newcommand{\\umax}[1]{\\underset{#1}{\\max}\\;}$", "$\\newcommand{\\umin}[1]{\\underset{#1}{\\min}\\;}$", "$\\newcommand{\\uargmin}[1]{\\underset{#1}{argmin}\\;}$", "$\\newcommand{\\norm}[1]{\\|#1\\|}$", "$\\newcommand{\\abs}[1]{\\left|#1\\right|}$", "$\\newcommand{\\choice}[1]{ \\left\\{  \\begin{array}{l} #1 \\end{array} \\right. }$", "$\\newcommand{\\pa}[1]{\\left(#1\\right)}$", "$\\newcommand{\\diag}[1]{{diag}\\left( #1 \\right)}$", "$\\newcommand{\\qandq}{\\quad\\text{and}\\quad}$", "$\\newcommand{\\qwhereq}{\\quad\\text{where}\\quad}$", "$\\newcommand{\\qifq}{ \\quad \\text{if} \\quad }$", "$\\newcommand{\\qarrq}{ \\quad \\Longrightarrow \\quad }$", "$\\newcommand{\\ZZ}{\\mathbb{Z}}$", "$\\newcommand{\\CC}{\\mathbb{C}}$", "$\\newcommand{\\RR}{\\mathbb{R}}$", "$\\newcommand{\\EE}{\\mathbb{E}}$", "$\\newcommand{\\Zz}{\\mathcal{Z}}$", "$\\newcommand{\\Ww}{\\mathcal{W}}$", "$\\newcommand{\\Vv}{\\mathcal{V}}$", "$\\newcommand{\\Nn}{\\mathcal{N}}$", "$\\newcommand{\\NN}{\\mathcal{N}}$", "$\\newcommand{\\Hh}{\\mathcal{H}}$", "$\\newcommand{\\Bb}{\\mathcal{B}}$", "$\\newcommand{\\Ee}{\\mathcal{E}}$", "$\\newcommand{\\Cc}{\\mathcal{C}}$", "$\\newcommand{\\Gg}{\\mathcal{G}}$", "$\\newcommand{\\Ss}{\\mathcal{S}}$", "$\\newcommand{\\Pp}{\\mathcal{P}}$", "$\\newcommand{\\Ff}{\\mathcal{F}}$", "$\\newcommand{\\Xx}{\\mathcal{X}}$", "$\\newcommand{\\Mm}{\\mathcal{M}}$", "$\\newcommand{\\Ii}{\\mathcal{I}}$", "$\\newcommand{\\Dd}{\\mathcal{D}}$", "$\\newcommand{\\Ll}{\\mathcal{L}}$", "$\\newcommand{\\Tt}{\\mathcal{T}}$", "$\\newcommand{\\si}{\\sigma}$", "$\\newcommand{\\al}{\\alpha}$", "$\\newcommand{\\la}{\\lambda}$", "$\\newcommand{\\ga}{\\gamma}$", "$\\newcommand{\\Ga}{\\Gamma}$", "$\\newcommand{\\La}{\\Lambda}$", "$\\newcommand{\\si}{\\sigma}$", "$\\newcommand{\\Si}{\\Sigma}$", "$\\newcommand{\\be}{\\beta}$", "$\\newcommand{\\de}{\\delta}$", "$\\newcommand{\\De}{\\Delta}$", "$\\newcommand{\\phi}{\\varphi}$", "$\\newcommand{\\th}{\\theta}$", "$\\newcommand{\\om}{\\omega}$", "$\\newcommand{\\Om}{\\Omega}$"]}, {"cell_type": "markdown", "metadata": {}, "source": ["This numerical tour explores the use of L1 optimization to find sparse representation in a redundant Gabor dictionary.", "It shows application to denoising and stereo separation."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["from __future__ import division", "import nt_toolbox as nt", "from nt_solutions import audio_3_gabor as solutions", "%matplotlib inline", "%load_ext autoreload", "%autoreload 2"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Gabor Tight Frame Transform", "---------------------------", "The Gabor transform is a collection of short time Fourier transforms", "(STFT) computed with several windows. The redundancy |K*L| of the transform", "depends on the number |L| of windows used and of the overlapping factor |K|", "of each STFT.", "", "", "We decide to use a collection of windows with dyadic sizes.", "", "", "Sizes of the windows."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["wlist = 32*[4 8 16 32]", "L = length(wlist)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Overlap of the window, so that |K=2|."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["K = 2", "qlist = wlist/ K"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Overall redundancy."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["disp(strcat(['Approximate redundancy of the dictionary = ' num2str(K*L) '.']))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["We load a sound."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["n = 1024*32", "options.n = n", "[x0, fs] = load_sound('glockenspiel', n)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Compute its short time Fourier transform with a collection of windows."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["options.multichannel = 0", "S = perform_stft(x0, wlist, qlist, options)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["__Exercise 1__", "", "Compute the true redundancy of the transform. Check that the transform", "is a tight frame (energy conservation)."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["solutions.exo1()"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["## Insert your code here."]}, {"cell_type": "markdown", "metadata": {}, "source": ["Display the coefficients."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["plot_spectrogram(S, x0)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Reconstructs the signal using the inverse Gabor transform."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["x1 = perform_stft(S, wlist, qlist, options)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Check for reconstruction error."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["e = norm(x0-x1)/ norm(x0)", "disp(strcat(['Reconstruction error (should be 0) = ' num2str(e, 3)]))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Gabor Tight Frame Denoising", "---------------------------", "We can perform denoising by thresholding the Gabor representation.", "", "", "We add noise to the signal."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["sigma = .05", "x = x0 + sigma*randn(size(x0))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Denoising with soft thresholding.", "Setting correctly the threshold is quite difficult because of the", "redundancy of the representation.", "", "transform"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["S = perform_stft(x, wlist, qlist, options)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["threshold"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["T = sigma", "ST = perform_thresholding(S, T, 'soft')"]}, {"cell_type": "markdown", "metadata": {}, "source": ["reconstruct"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["xT = perform_stft(ST, wlist, qlist, options)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Display the result."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["err = snr(x0, xT)", "", "plot_spectrogram(ST, xT)", "subplot(length(ST) + 1, 1, 1)", "title(strcat(['Denoised, SNR = ' num2str(err, 3), 'dB']))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["__Exercise 2__", "", "Find the best threshold, that gives the smallest error."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["solutions.exo2()"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["## Insert your code here."]}, {"cell_type": "markdown", "metadata": {}, "source": ["Basis Pursuit in the Gabor Frame", "--------------------------------", "Since the representation is highly redundant, it is possible to improve", "the quality of the representation using a basis pursuit denoising that", "optimize the L1 norm of the coefficients.", "", "", "The basis pursuit finds a set of coefficients |S1| by minimizing", "", "", "|min_{S1} 1/2*norm(x-x1)^2 + lambda*norm(S1,1)   (*)|", "", "", "Where |x1| is the signal reconstructed from the Gabor coefficients |S1|.", "", "", "", "The parameter |lambda| should be optimized to match the noise level.", "Increasing |lambda| increases the sparsity of the solution, but then the", "approximation |x1| deviates from the noisy observations |x1|.", "", "", "Basis pursuit denoising |(*)| is solved by iterative thresholding, which", "iterates between a step of gradient descent, and a step of thresholding.", "", "", "Initialization of |x1| and |S1|."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["lambda = .1", "x1 = x", "S1 = perform_stft(x1, wlist, qlist, options)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Step 1: gradient descent of |norm(x-x1)^2|.", "", "residual"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["r = x - x1", "Sr = perform_stft(r, wlist, qlist, options)", "S1 = cell_add(S1, Sr)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Step 2: thresholding and update of |x1|.", "", "", "threshold"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["S1 = perform_thresholding(S1, lambda, 'soft')"]}, {"cell_type": "markdown", "metadata": {}, "source": ["update the denoised signal"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["x1 = perform_stft(S1, wlist, qlist, options)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["The difficulty is to set the value of |lambda|.", "If the basis were orthogonal, it should be set to", "approximately 3/2*sigma (soft thresholding). Because of the redundancy of", "the representation in Gabor frame, it should be set to a slightly larger", "value."]}, {"cell_type": "markdown", "metadata": {}, "source": ["__Exercise 3__", "", "Perform the iterative thresholding by progressively decaying the value", "of |lambda| during the iterations, starting from |lambda=1.5*sigma| until", "|lambda=.5*sigma|. Retain the solution |xbp| together with the coefficients |Sbp|", "that provides the smallest", "error."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["solutions.exo3()"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["## Insert your code here."]}, {"cell_type": "markdown", "metadata": {}, "source": ["Display the solution computed by basis pursuit."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["e = snr(x0, xbp)", "", "plot_spectrogram(Sbp, xbp)", "subplot(length(Sbp) + 1, 1, 1)", "title(strcat(['Denoised, SNR = ' num2str(e, 3), 'dB']))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Sparsity to Improve Audio Separation", "------------------------------------", "The increase of sparsity produced by L1 minimization is helpful to", "improve audio stereo separation.", "", "", "Load 3 sounds."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["n = 1024*32", "options.n = n", "s = 3; % number of sound", "p = 2; % number of micros", "options.subsampling = 1", "x = zeros(n, 3)", "[x(: , 1), fs] = load_sound('bird', n, options)", "[x(: , 2), fs] = load_sound('male', n, options)", "[x(: , 3), fs] = load_sound('glockenspiel', n, options)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["normalize the energy of the signals"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["x = x./ repmat(std(x, 1), [n 1])"]}, {"cell_type": "markdown", "metadata": {}, "source": ["We mix the sound using a |2x3| transformation matrix.", "Here the direction are well-spaced, but you can try with more complicated", "mixing matrices.", "", "compute the mixing matrix"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["theta = linspace(0, pi(), s + 1); theta(s + 1) = []", "theta(1) = .2", "M = [cos(theta); sin(theta)]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["compute the mixed sources"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["y = x*M'"]}, {"cell_type": "markdown", "metadata": {}, "source": ["We transform the stero pair using the multi-channel STFT (each channel is transformed independantly."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["options.multichannel = 1", "S = perform_stft(y, wlist, qlist, options)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["check for reconstruction"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["y1 = perform_stft(S, wlist, qlist, options)", "disp(strcat(['Reconstruction error (should be 0) = ' num2str(norm(y-y1, 'fro')/ norm(y, 'fro')) '.']))"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Now we perform a multi-channel basis pursuit to find a sparse approximation of the", "coefficients.", "", "regularization parameter"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["lambda = .2"]}, {"cell_type": "markdown", "metadata": {}, "source": ["initialization"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["y1 = y", "S1 = S", "niter = 100", "err = []"]}, {"cell_type": "markdown", "metadata": {}, "source": ["iterations"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["for i in 1: niter:", "    % progressbar(i, niter)", "    % gradient", "    r = y - y1", "    Sr = perform_stft(r, wlist, qlist, options)", "    S1 = cell_add(S1, Sr)", "    % multi-channel thresholding", "    S1 = perform_thresholding(S1, lambda, 'soft-multichannel')", "    % update the value of lambda to match noise", "    y1 = perform_stft(S1, wlist, qlist, options)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Create the point cloud of both the tight frame and the sparse BP coefficients."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["P1 = []; P = []", "for i in 1: length(S):", "    Si = reshape(S1{i}, [size(S1{i}, 1)*size(S1{i}, 2) 2])", "    P1 = cat(1, P1,  Si)", "    Si = reshape(S{i}, [size(S{i}, 1)*size(S{i}, 2) 2])", "    P = cat(1, P,  Si)", "", "P = [real(P); imag(P)]", "P1 = [real(P1); imag(P1)]"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Display the two point clouds."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["p = size(P, 1)", "m = 10000", "sel = randperm(p); sel = sel(1: m)", "", "subplot(1, 2, 1)", "plot(P(sel, 1), P(sel, 2), '.')", "title('Tight frame coefficients')", "axis([-10 10 -10 10])", "subplot(1, 2, 2)", "plot(P1(sel, 1), P1(sel, 2), '.')", "title('Basis Pursuit coefficients')", "axis([-10 10 -10 10])"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Compute the angles of the points with largest energy."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["d  = sqrt(sum(P.^2, 2))", "d1 = sqrt(sum(P1.^2, 2))", "I = find(d >.2)", "I1 = find(d1 >.2)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["compute angles"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["Theta  = mod(atan2(P(I, 2), P(I, 1)), pi())", "Theta1 = mod(atan2(P1(I1, 2), P1(I1, 1)), pi())"]}, {"cell_type": "markdown", "metadata": {}, "source": ["Compute and display the histogram of angles.", "We reaint only a small sub-set of most active coefficients.", "", "compute histograms"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["nbins = 150", "[h, t] = hist(Theta, nbins)", "h = h/ sum(h)", "[h1, t1] = hist(Theta1, nbins)", "h1 = h1/ sum(h1)"]}, {"cell_type": "markdown", "metadata": {}, "source": ["display histograms"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["subplot(2, 1, 1)", "bar(t, h); axis('tight')", "set_graphic_sizes([], 20)", "title('Tight frame coefficients')", "subplot(2, 1, 2)", "bar(t1, h1); axis('tight')", "set_graphic_sizes([], 20)", "title('Sparse coefficients')"]}, {"cell_type": "markdown", "metadata": {}, "source": ["__Exercise 4__", "", "Compare the source separation obtained by masking with a tight frame Gabor", "transform and with the coefficients computed by a basis pursuit", "sparsification process."]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["solutions.exo4()"]}, {"cell_type": "code", "language": "python", "metadata": {}, "outputs": [], "collapsed": false, "input": ["## Insert your code here."]}]}], "nbformat": 3, "metadata": {"name": ""}, "nbformat_minor": 0}