{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# GTZAN: convert to HDF5 format" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The GTZAN dataset comes in a ZIP archive wich contains one folder per musical genre. Each folder contains all the clips who belong to the genre. While easily browsable, this format is not appropriate for data analysis. We thus first convert the dataset to HDF5 before any pre-processing step." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset definition" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* 10 genres: blues, classical, country, disco, hiphop, jazz, metal, pop, reggae and rock\n", "* 100 clips per musical genre\n", "* each clip lasts 30 seconds\n", "* audio sampled at 22050 Hz" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "class gtzan:\n", " Ngenres = 10\n", " Nclips = 100\n", " sr = 22050\n", " # Size ranges from 660000 to 675808 samples per clip.\n", " # We will truncate all clips to not bias the classifier in any way.\n", " Nsamples = 660000\n", "\n", "print('Clip duration: {0:.2f} seconds ({1} samples at {2} Hz)'.format(\n", " float(gtzan.Nsamples)/gtzan.sr, gtzan.Nsamples, gtzan.sr))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import os, time\n", "import numpy as np\n", "import librosa\n", "import h5py\n", "import IPython.display\n", "\n", "print('Software versions:')\n", "for pkg in [np, librosa]:\n", " print(' {}: {}'.format(pkg.__name__, pkg.__version__))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data loading" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "folder = os.path.join('data', 'genres')\n", "genres = os.listdir(folder)\n", "assert len(genres) == gtzan.Ngenres" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define some helper functions." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def genpath(genre, clip):\n", " \"\"\"Usage: genpath('rock', 0)\"\"\"\n", " return os.path.join(folder, genre, '{0}.{1:0>5d}.au'.format(genre, clip))\n", "\n", "def read(genre, clip):\n", " \"\"\"Usage: read('rock', 0)\"\"\"\n", "\n", " # Load audio file.\n", " path = genpath(genre, clip)\n", " y, sr = librosa.load(path, sr=None, mono=False) # No resampling.\n", "\n", " # Sanity checks.\n", " if sr != gtzan.sr:\n", " raise ValueError('{}: wrong sampling rate of {}'.format(path, sr))\n", " if y.size < gtzan.Nsamples:\n", " raise ValueError('{}: too short, {} samples'.format(path, y.size))\n", "\n", " # Truncate.\n", " return y[:gtzan.Nsamples]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "HDF5 data store:\n", "* One dataset per musical genre.\n", "* Each dataset contains the whole audio data of the genre.\n", "* Each column of a dataset is an entire clip.\n", "* There is as many rows as samples per clip." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "filename = os.path.join('data', 'gtzan.hdf5')\n", "\n", "# Remove existing HDF5 file without warning if non-existent.\n", "try:\n", " os.remove(filename)\n", "except OSError:\n", " pass\n", "\n", "# Create HDF5 file and datasets.\n", "f = h5py.File(filename, 'w')\n", "dsize = (gtzan.Nsamples, gtzan.Nclips)\n", "for genre in genres:\n", " _ = f.create_dataset(genre, dsize, dtype='float32')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fill the datasets with actual audio data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "tstart = time.time()\n", "for genre in genres:\n", " for clip in range(gtzan.Nclips):\n", " f[genre][:,clip] = read(genre, clip)\n", "print('Elapsed time: {:.0f} seconds'.format(time.time() - tstart))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Store the GTZAN metadata along with the audio data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for var in vars(gtzan):\n", " if not var.startswith(\"__\"):\n", " f.attrs[var] = vars(gtzan)[var]\n", "\n", "# Display HDF5 attributes.\n", "print('Attributes:')\n", "for attr in f.attrs:\n", " print(' {} = {}'.format(attr, f.attrs[attr]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sanity check" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Check that the content stored in HDF5 is similar to the audio file.\n", "* Test on 10 randomly chosen clips." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "for _ in range(10):\n", " genre = int(np.floor(gtzan.Ngenres * np.random.uniform()))\n", " genre = genres[genre]\n", " clip = int(np.floor(gtzan.Nclips * np.random.uniform()))\n", " assert np.alltrue(f[genre][:,clip] == read(genre, clip))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Listen to a clip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the HDF5 data store." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "genre, clip = 'pop', 20\n", "#IPython.display.Audio(f[genre][:,clip], rate=gtzan.sr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Via librosa." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "IPython.display.Audio(read(genre, clip), rate=gtzan.sr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Directly from the file. It does unfortunately not support .au audio files." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#IPython.display.Audio(os.path.abspath(genpath(genre, clip)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Close HDF5 data store" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "f.close()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.3" } }, "nbformat": 4, "nbformat_minor": 0 }