{ "metadata": { "name": "", "signature": "sha256:712f6f475fb71b35f0e0cf5bab903c26339b0cd2a601eb009e7bcd79445e5af4" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Predominant Mask - MusicBricks Tutorial" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial will guide you through some tools for performing spectral analysis and synthesis using the Essentia library (http://www.essentia.upf.edu). In this case we use a STFT analysis/synthesis workflow together with predominant pitch estimation with the goal to remove or soloing the predominant source. \n", "This algorithm uses a binary masking technique, modifying the magnitude values at the frequency bins in the spectrum that correspond to the harmonic series of the predominant pitch. It can be seen as a very primitive approach to 'source separation'." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should first install the Essentia library with Python bindings. Installation instructions are detailed here: http://essentia.upf.edu/documentation/installing.html . \n" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Processing steps" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# import essentia in standard mode\n", "import essentia\n", "import essentia.standard\n", "from essentia.standard import *" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "After importing Essentia library, let's import other numerical and plotting tools" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# import matplotlib for plotting\n", "import matplotlib.pyplot as plt\n", "import numpy as np" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the parameters of the STFT workflow" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# algorithm parameters\n", "framesize = 2048\n", "hopsize = 128 # PredominantPitchMelodia requires a hopsize of 128\n", "samplerate = 44100.0\n", "attenuation_dB = 100\n", "maskbinwidth = 2" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Specify input and output audio filenames" ] }, { "cell_type": "code", "collapsed": false, "input": [ "inputFilename = 'predom.wav'\n", "outputFilename = 'predom_stft.wav'" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "# create an audio loader and import audio file\n", "loader = essentia.standard.MonoLoader(filename = inputFilename, sampleRate = 44100)\n", "audio = loader()\n", "print(\"Duration of the audio sample [sec]:\")\n", "print(len(audio)/44100.0)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Duration of the audio sample [sec]:\n", "14.2285941043\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define algorithm chain for frame-by-frame process: \n", "FrameCutter -> Windowing -> FFT -> IFFT OverlapAdd -> AudioWriter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Predominant pitch extraction" ] }, { "cell_type": "code", "collapsed": false, "input": [ "#extract predominant pitch\n", "# PitchMelodia takes the entire audio signal as input - no frame-wise processing is required here.\n", "pExt = PredominantPitchMelodia(frameSize = framesize, hopSize = hopsize, sampleRate = samplerate)\n", "pitch, pitchConf = pExt(audio)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "# algorithm workflow for harmonic mask using the STFT frame-by-frame\n", "fcut = FrameCutter(frameSize = framesize, hopSize = hopsize);\n", "w = Windowing(type = \"hann\");\n", "fft = FFT(size = framesize);\n", "hmask = HarmonicMask( sampleRate = samplerate, binWidth = maskbinwidth, attenuation = attenuation_dB);\n", "ifft = IFFT(size = framesize);\n", "overl = OverlapAdd (frameSize = framesize, hopSize = hopsize);\n", "awrite = MonoWriter (filename = outputFilename, sampleRate = 44100);" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we loop over all audio frames and store the processed audio sampels in the output array" ] }, { "cell_type": "code", "collapsed": false, "input": [ "audioout = np.array(0) # initialize output array\n", "\n", "for idx, frame in enumerate(FrameGenerator(audio, frameSize = framesize, hopSize = hopsize)):\n", " # STFT analysis\n", " infft = fft(w(frame))\n", " # get pitch of current frame\n", " curpitch = pitch[idx]\n", "\n", " # here we apply the harmonic mask spectral transformations\n", " outfft = hmask(infft, pitch[idx]);\n", "\n", " # STFT synthesis\n", " out = overl(ifft(outfft))\n", " audioout = np.append(audioout, out)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally we write the processed audio array as a WAV file" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# write audio output\n", "awrite(audioout.astype(np.float32))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 21 } ], "metadata": {} } ] }