{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Predominant Mask - MusicBricks Tutorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This tutorial will guide you through some tools for performing spectral analysis and synthesis using the Essentia library (http://www.essentia.upf.edu). In this case we use a STFT analysis/synthesis workflow together with predominant pitch estimation with the goal to remove or soloing the predominant source. \n", "This algorithm uses a binary masking technique, modifying the magnitude values at the frequency bins in the spectrum that correspond to the harmonic series of the predominant pitch. It can be seen as a very primitive approach to 'source separation'." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should first install the Essentia library with Python bindings. Installation instructions are detailed here: http://essentia.upf.edu/documentation/installing.html . \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Processing steps" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# import essentia in standard mode\n", "import essentia\n", "import essentia.standard\n", "from essentia.standard import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After importing Essentia library, let's import other numerical and plotting tools" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# import matplotlib for plotting\n", "import matplotlib.pyplot as plt\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define the parameters of the STFT workflow" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# algorithm parameters\n", "framesize = 2048\n", "hopsize = 128 # PredominantPitchMelodia requires a hopsize of 128\n", "samplerate = 44100.0\n", "attenuation_dB = 100\n", "maskbinwidth = 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Specify input and output audio filenames" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "inputFilename = 'flamenco.wav'\n", "outputFilename = 'flamenco_stft.wav'" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Duration of the audio sample [sec]:\n", "14.22859410430839\n" ] } ], "source": [ "# create an audio loader and import audio file\n", "loader = essentia.standard.MonoLoader(filename = inputFilename, sampleRate = 44100)\n", "audio = loader()\n", "print(\"Duration of the audio sample [sec]:\")\n", "print(len(audio)/44100.0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define algorithm chain for frame-by-frame process: \n", "FrameCutter -> Windowing -> FFT -> IFFT OverlapAdd -> AudioWriter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Predominant pitch extraction" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#extract predominant pitch\n", "# PitchMelodia takes the entire audio signal as input - no frame-wise processing is required here.\n", "pExt = PredominantPitchMelodia(frameSize = framesize, hopSize = hopsize, sampleRate = samplerate)\n", "pitch, pitchConf = pExt(audio)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# algorithm workflow for harmonic mask using the STFT frame-by-frame\n", "fcut = FrameCutter(frameSize = framesize, hopSize = hopsize);\n", "w = Windowing(type = \"hann\");\n", "fft = FFT(size = framesize);\n", "hmask = HarmonicMask( sampleRate = samplerate, binWidth = maskbinwidth, attenuation = attenuation_dB);\n", "ifft = IFFT(size = framesize);\n", "overl = OverlapAdd (frameSize = framesize, hopSize = hopsize);\n", "awrite = MonoWriter (filename = outputFilename, sampleRate = 44100);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we loop over all audio frames and store the processed audio sampels in the output array" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "audioout = np.array(0) # initialize output array\n", "\n", "for idx, frame in enumerate(FrameGenerator(audio, frameSize = framesize, hopSize = hopsize)):\n", " # STFT analysis\n", " infft = fft(w(frame))\n", " # get pitch of current frame\n", " curpitch = pitch[idx]\n", "\n", " # here we apply the harmonic mask spectral transformations\n", " outfft = hmask(infft, pitch[idx]);\n", "\n", " # STFT synthesis\n", " out = overl(ifft(outfft))\n", " audioout = np.append(audioout, out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally we write the processed audio array as a WAV file" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# write audio output\n", "awrite(audioout.astype(np.float32))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 1 }