{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Predominant Mask - MusicBricks Tutorial"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This tutorial will guide you through some tools for performing spectral analysis and synthesis using the Essentia library (http://www.essentia.upf.edu). In this case we use a STFT analysis/synthesis workflow together with predominant pitch estimation with the goal to remove or soloing the predominant source. \n",
    "This algorithm uses a binary masking technique, modifying the magnitude values at the frequency bins in the spectrum that correspond to the harmonic series of the predominant pitch. It can be seen as a very primitive approach to 'source separation'."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should first install the Essentia library with Python bindings. Installation instructions are detailed here: http://essentia.upf.edu/documentation/installing.html . \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Processing steps"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import essentia in standard mode\n",
    "import essentia\n",
    "import essentia.standard\n",
    "from essentia.standard import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After importing Essentia library, let's import other numerical and plotting tools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import matplotlib for plotting\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Define the parameters of the STFT workflow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# algorithm parameters\n",
    "framesize = 2048\n",
    "hopsize = 128 #  PredominantPitchMelodia requires a hopsize of 128\n",
    "samplerate = 44100.0\n",
    "attenuation_dB = 100\n",
    "maskbinwidth = 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Specify input and output audio filenames"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "inputFilename = 'flamenco.wav'\n",
    "outputFilename = 'flamenco_stft.wav'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Duration of the audio sample [sec]:\n",
      "14.22859410430839\n"
     ]
    }
   ],
   "source": [
    "# create an audio loader and import audio file\n",
    "loader = essentia.standard.MonoLoader(filename = inputFilename, sampleRate = 44100)\n",
    "audio = loader()\n",
    "print(\"Duration of the audio sample [sec]:\")\n",
    "print(len(audio)/44100.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Define algorithm chain for frame-by-frame process: \n",
    "FrameCutter -> Windowing -> FFT -> IFFT OverlapAdd -> AudioWriter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Predominant pitch extraction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "#extract predominant pitch\n",
    "# PitchMelodia takes the entire audio signal as input - no frame-wise processing is required here.\n",
    "pExt = PredominantPitchMelodia(frameSize = framesize, hopSize = hopsize, sampleRate = samplerate)\n",
    "pitch, pitchConf = pExt(audio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# algorithm workflow for harmonic mask using the STFT frame-by-frame\n",
    "fcut = FrameCutter(frameSize = framesize, hopSize = hopsize);\n",
    "w = Windowing(type = \"hann\");\n",
    "fft = FFT(size = framesize);\n",
    "hmask = HarmonicMask( sampleRate = samplerate, binWidth = maskbinwidth, attenuation = attenuation_dB);\n",
    "ifft = IFFT(size = framesize);\n",
    "overl = OverlapAdd (frameSize = framesize, hopSize = hopsize);\n",
    "awrite = MonoWriter (filename = outputFilename, sampleRate = 44100);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we loop over all audio frames and store the processed audio sampels in the output array"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "audioout = np.array(0) # initialize output array\n",
    "\n",
    "for idx, frame in enumerate(FrameGenerator(audio, frameSize = framesize, hopSize = hopsize)):\n",
    "     # STFT analysis\n",
    "    infft = fft(w(frame))\n",
    "    # get pitch of current frame\n",
    "    curpitch = pitch[idx]\n",
    "\n",
    "    # here we  apply the harmonic mask spectral transformations\n",
    "    outfft = hmask(infft, pitch[idx]);\n",
    "\n",
    "    # STFT synthesis\n",
    "    out = overl(ifft(outfft))\n",
    "    audioout = np.append(audioout, out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally we write the processed audio array as a WAV file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# write audio output\n",
    "awrite(audioout.astype(np.float32))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}