{
 "metadata": {
  "name": "",
  "signature": "sha256:712f6f475fb71b35f0e0cf5bab903c26339b0cd2a601eb009e7bcd79445e5af4"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Predominant Mask - MusicBricks Tutorial"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Introduction"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This tutorial will guide you through some tools for performing spectral analysis and synthesis using the Essentia library (http://www.essentia.upf.edu). In this case we use a STFT analysis/synthesis workflow together with predominant pitch estimation with the goal to remove or soloing the predominant source. \n",
      "This algorithm uses a binary masking technique, modifying the magnitude values at the frequency bins in the spectrum that correspond to the harmonic series of the predominant pitch. It can be seen as a very primitive approach to 'source separation'."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "You should first install the Essentia library with Python bindings. Installation instructions are detailed here: http://essentia.upf.edu/documentation/installing.html . \n"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Processing steps"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# import essentia in standard mode\n",
      "import essentia\n",
      "import essentia.standard\n",
      "from essentia.standard import *"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "After importing Essentia library, let's import other numerical and plotting tools"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# import matplotlib for plotting\n",
      "import matplotlib.pyplot as plt\n",
      "import numpy as np"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Define the parameters of the STFT workflow"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# algorithm parameters\n",
      "framesize = 2048\n",
      "hopsize = 128 #  PredominantPitchMelodia requires a hopsize of 128\n",
      "samplerate = 44100.0\n",
      "attenuation_dB = 100\n",
      "maskbinwidth = 2"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Specify input and output audio filenames"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "inputFilename = 'predom.wav'\n",
      "outputFilename = 'predom_stft.wav'"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# create an audio loader and import audio file\n",
      "loader = essentia.standard.MonoLoader(filename = inputFilename, sampleRate = 44100)\n",
      "audio = loader()\n",
      "print(\"Duration of the audio sample [sec]:\")\n",
      "print(len(audio)/44100.0)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Duration of the audio sample [sec]:\n",
        "14.2285941043\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Define algorithm chain for frame-by-frame process: \n",
      "FrameCutter -> Windowing -> FFT -> IFFT OverlapAdd -> AudioWriter"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Predominant pitch extraction"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#extract predominant pitch\n",
      "# PitchMelodia takes the entire audio signal as input - no frame-wise processing is required here.\n",
      "pExt = PredominantPitchMelodia(frameSize = framesize, hopSize = hopsize, sampleRate = samplerate)\n",
      "pitch, pitchConf = pExt(audio)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# algorithm workflow for harmonic mask using the STFT frame-by-frame\n",
      "fcut = FrameCutter(frameSize = framesize, hopSize = hopsize);\n",
      "w = Windowing(type = \"hann\");\n",
      "fft = FFT(size = framesize);\n",
      "hmask = HarmonicMask( sampleRate = samplerate, binWidth = maskbinwidth, attenuation = attenuation_dB);\n",
      "ifft = IFFT(size = framesize);\n",
      "overl = OverlapAdd (frameSize = framesize, hopSize = hopsize);\n",
      "awrite = MonoWriter (filename = outputFilename, sampleRate = 44100);"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we loop over all audio frames and store the processed audio sampels in the output array"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "audioout = np.array(0) # initialize output array\n",
      "\n",
      "for idx, frame in enumerate(FrameGenerator(audio, frameSize = framesize, hopSize = hopsize)):\n",
      "     # STFT analysis\n",
      "    infft = fft(w(frame))\n",
      "    # get pitch of current frame\n",
      "    curpitch = pitch[idx]\n",
      "\n",
      "    # here we  apply the harmonic mask spectral transformations\n",
      "    outfft = hmask(infft, pitch[idx]);\n",
      "\n",
      "    # STFT synthesis\n",
      "    out = overl(ifft(outfft))\n",
      "    audioout = np.append(audioout, out)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 13
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Finally we write the processed audio array as a WAV file"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# write audio output\n",
      "awrite(audioout.astype(np.float32))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 21
    }
   ],
   "metadata": {}
  }
 ]
}