{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Finding Similar Songs - Part 2: Siamese Networks\n", "\n", "In the first part of this tutorial I have introduced the traditional distance based approach to similarity estimations. The main idea is that features are extracted from the audio content. These features are numeric descriptions of semantically relevant information. An example for a high-level feature is the number of beats per minute which is a description for the tempo of a song. Music feature-sets are more abstract and describe the spectral or rhythmical distribution of energy. These are not single but vectors of numbers. Thus, a song is semantically described by this vector and if the set of extracted features spans over various music characteristics such as rhythm, timbre, harmonics, complexity, etc. then calculating the similarity of the vector's numbers is considered to be an approximation of music similarity. Thus, the lower the numerical distance between two vectors, the higher their acoustic similarity. For this reason these approaches are known as *Distance based* methods. They mainly depend on the selected sets of features and on the similarity metric chosen to compare their values.\n", "\n", "In the second part of this tutorial we are now focussing on an approach, where the feature representation, as well as the similarity function is learned from the underlying dataset.\n", "\n", "\n", "## Tutorial Overview\n", "\n", "1. Loading data\n", "2. Preprocess data\n", "3. Define Model\n", "4. Fit Model\n", "5. Evaluate Model\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Requiremnts\n", "\n", "The requirements are the same as for the first part of the tutorials. Please follow the instructions of part one if you have trouble running this tutorial." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"\n", "\n", "import tensorflow as tf" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# visualization\n", "%matplotlib inline\n", "\n", "# numeric and scientific processing\n", "import numpy as np\n", "import pandas as pd\n", "\n", "# misc\n", "import os\n", "import progressbar\n", "\n", "from IPython.display import IFrame\n", "from IPython.display import HTML, display\n", "\n", "pd.set_option('display.max_colwidth', -1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Loading Data\n", "\n", "Before we can train our models we first have to get some data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "DATASET_PATH = \"D:/Research/Data/MIR/MagnaTagATune/ISMIR2018\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Feature Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "load feature data from numpy pickle" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(6380, 80, 80)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with np.load(\"%s/ISMIR2018_tut_Magnagtagatune_spectrograms.npz\" % DATASET_PATH) as npz:\n", " melspecs = npz[\"features\"]\n", " clip_id = npz[\"clip_id\"]\n", " \n", "melspecs.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "prepare feature-metadata for alignment with dataset meta-data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "feature_metadata = pd.DataFrame({\"featurespace_id\": np.arange(melspecs.shape[0]), \n", " \"clip_id\" : clip_id})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load Metadata\n", "\n", "load meta-data from csv-file." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2617, 10)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "metadata = pd.read_csv(\"./metadata/ismir2018_tut_part_2_genre_metadata.csv\", index_col=0)\n", "metadata.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Align featuredata with metadata" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "metadata = metadata.reset_index()\n", "metadata = metadata.merge(feature_metadata, left_on=\"clip_id\", right_on=\"clip_id\", how=\"inner\", left_index=True, right_index=False)\n", "metadata = metadata.set_index(\"index\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Add HTML5 audio player component for listening to similarity retrieval results" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "tmp = metadata.mp3_path.str.split(\"/\", expand=True)\n", "metadata[\"player\"] = '