{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# FMA: A Dataset For Music Analysis\n", "\n", "Michaƫl Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson, EPFL LTS2.\n", "\n", "## Usage\n", "\n", "1. Go through the [paper] to understand what the data is about.\n", "1. Download some datasets from .\n", "1. Uncompress the archives, e.g. with `unzip fma_small.zip`.\n", "1. Load and play with the data in this notebook.\n", "\n", "[paper]: https://arxiv.org/abs/1612.01840" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "import os\n", "\n", "import IPython.display as ipd\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import sklearn as skl\n", "import sklearn.utils, sklearn.preprocessing, sklearn.decomposition, sklearn.svm\n", "import librosa\n", "import librosa.display\n", "\n", "import utils\n", "\n", "plt.rcParams['figure.figsize'] = (17, 5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Directory where mp3 are stored.\n", "AUDIO_DIR = os.environ.get('AUDIO_DIR')\n", "\n", "# Load metadata and features.\n", "tracks = utils.load('tracks.csv')\n", "genres = utils.load('genres.csv')\n", "features = utils.load('features.csv')\n", "echonest = utils.load('echonest.csv')\n", "\n", "np.testing.assert_array_equal(features.index, tracks.index)\n", "assert echonest.index.isin(tracks.index).all()\n", "\n", "tracks.shape, genres.shape, features.shape, echonest.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1 Metadata\n", "\n", "The metadata table, a CSV file in the `fma_metadata.zip` archive, is composed of many colums:\n", "1. The index is the ID of the song, taken from the website, used as the name of the audio file.\n", "2. Per-track, per-album and per-artist metadata from the Free Music Archive website.\n", "3. Two columns to indicate the subset (small, medium, large) and the split (training, validation, test)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ipd.display(tracks['track'].head())\n", "ipd.display(tracks['album'].head())\n", "ipd.display(tracks['artist'].head())\n", "ipd.display(tracks['set'].head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2 Genres\n", "\n", "The genre hierarchy is stored in `genres.csv` and distributed in `fma_metadata.zip`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('{} top-level genres'.format(len(genres['top_level'].unique())))\n", "genres.loc[genres['top_level'].unique()].sort_values('#tracks', ascending=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "genres.sort_values('#tracks').head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3 Features\n", "\n", "1. Features extracted from the audio for all tracks.\n", "2. For some tracks, data colected from the [Echonest](http://the.echonest.com/) API." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('{1} features for {0} tracks'.format(*features.shape))\n", "columns = ['mfcc', 'chroma_cens', 'tonnetz', 'spectral_contrast']\n", "columns.append(['spectral_centroid', 'spectral_bandwidth', 'spectral_rolloff'])\n", "columns.append(['rmse', 'zcr'])\n", "for column in columns:\n", " ipd.display(features[column].head().style.format('{:.2f}'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1 Echonest features" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print('{1} features for {0} tracks'.format(*echonest.shape))\n", "ipd.display(echonest['echonest', 'metadata'].head())\n", "ipd.display(echonest['echonest', 'audio_features'].head())\n", "ipd.display(echonest['echonest', 'social_features'].head())\n", "ipd.display(echonest['echonest', 'ranks'].head())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ipd.display(echonest['echonest', 'temporal_features'].head())\n", "x = echonest.loc[2, ('echonest', 'temporal_features')]\n", "plt.plot(x);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 Features like MFCCs are discriminant" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "small = tracks['set', 'subset'] <= 'small'\n", "genre1 = tracks['track', 'genre_top'] == 'Instrumental'\n", "genre2 = tracks['track', 'genre_top'] == 'Hip-Hop'\n", "\n", "X = features.loc[small & (genre1 | genre2), 'mfcc']\n", "X = skl.decomposition.PCA(n_components=2).fit_transform(X)\n", "\n", "y = tracks.loc[small & (genre1 | genre2), ('track', 'genre_top')]\n", "y = skl.preprocessing.LabelEncoder().fit_transform(y)\n", "\n", "plt.scatter(X[:,0], X[:,1], c=y, cmap='RdBu', alpha=0.5)\n", "X.shape, y.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4 Audio\n", "\n", "You can load the waveform and listen to audio in the notebook itself." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "filename = utils.get_audio_path(AUDIO_DIR, 2)\n", "print('File: {}'.format(filename))\n", "\n", "x, sr = librosa.load(filename, sr=None, mono=True)\n", "print('Duration: {:.2f}s, {} samples'.format(x.shape[-1] / sr, x.size))\n", "\n", "start, end = 7, 17\n", "ipd.Audio(data=x[start*sr:end*sr], rate=sr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And use [librosa](https://github.com/librosa/librosa) to compute spectrograms and audio features." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "librosa.display.waveplot(x, sr, alpha=0.5);\n", "plt.vlines([start, end], -1, 1)\n", "\n", "start = len(x) // 2\n", "plt.figure()\n", "plt.plot(x[start:start+2000])\n", "plt.ylim((-1, 1));" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stft = np.abs(librosa.stft(x, n_fft=2048, hop_length=512))\n", "mel = librosa.feature.melspectrogram(sr=sr, S=stft**2)\n", "log_mel = librosa.logamplitude(mel)\n", "\n", "librosa.display.specshow(log_mel, sr=sr, hop_length=512, x_axis='time', y_axis='mel');" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mfcc = librosa.feature.mfcc(S=librosa.power_to_db(mel), n_mfcc=20)\n", "mfcc = skl.preprocessing.StandardScaler().fit_transform(mfcc)\n", "librosa.display.specshow(mfcc, sr=sr, x_axis='time');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5 Genre classification" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.1 From features" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "small = tracks['set', 'subset'] <= 'small'\n", "\n", "train = tracks['set', 'split'] == 'training'\n", "val = tracks['set', 'split'] == 'validation'\n", "test = tracks['set', 'split'] == 'test'\n", "\n", "y_train = tracks.loc[small & train, ('track', 'genre_top')]\n", "y_test = tracks.loc[small & test, ('track', 'genre_top')]\n", "X_train = features.loc[small & train, 'mfcc']\n", "X_test = features.loc[small & test, 'mfcc']\n", "\n", "print('{} training examples, {} testing examples'.format(y_train.size, y_test.size))\n", "print('{} features, {} classes'.format(X_train.shape[1], np.unique(y_train).size))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Be sure training samples are shuffled.\n", "X_train, y_train = skl.utils.shuffle(X_train, y_train, random_state=42)\n", "\n", "# Standardize features by removing the mean and scaling to unit variance.\n", "scaler = skl.preprocessing.StandardScaler(copy=False)\n", "scaler.fit_transform(X_train)\n", "scaler.transform(X_test)\n", "\n", "# Support vector classification.\n", "clf = skl.svm.SVC()\n", "clf.fit(X_train, y_train)\n", "score = clf.score(X_test, y_test)\n", "print('Accuracy: {:.2%}'.format(score))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5.2 From audio" ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 2 }