{ "cells": [ { "cell_type": "markdown", "id": "abae8153", "metadata": {}, "source": [ "Sascha Spors,\n", "Professorship Signal Theory and Digital Signal Processing,\n", "Institute of Communications Engineering (INT),\n", "Faculty of Computer Science and Electrical Engineering (IEF),\n", "University of Rostock,\n", "Germany\n", "\n", "# Data Driven Audio Signal Processing - A Tutorial with Computational Examples\n", "\n", "Master Course #24512\n", "\n", "- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture\n", "- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise\n", "\n", "Feel free to contact lecturer frank.schultz@uni-rostock.de" ] }, { "cell_type": "markdown", "id": "11c6393a", "metadata": {}, "source": [ "# Principal Component Analysis (PCA) of Simple Audio Features\n", "\n", "- real world data from [exercise12_MusicGenreClassification.ipynb](exercise12_MusicGenreClassification.ipynb)\n", "- we take an PCA of an audio feature matrix\n", "- to make it still easy to visualize, we choose three original features, which allow a comparable simple interpretation of the PC scores and PC loadings" ] }, { "cell_type": "code", "execution_count": null, "id": "2fe4b1ee", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scipy.linalg import svd, diagsvd\n", "from statsmodels.multivariate.pca import PCA" ] }, { "cell_type": "code", "execution_count": null, "id": "d27286a2", "metadata": {}, "outputs": [], "source": [ "matplotlib_widget_flag = True" ] }, { "cell_type": "code", "execution_count": null, "id": "1adbe598", "metadata": {}, "outputs": [], "source": [ "if matplotlib_widget_flag:\n", " %matplotlib widget" ] }, { "cell_type": "code", "execution_count": null, "id": "d0cea135", "metadata": {}, "outputs": [], "source": [ "audiofolder = \"./audio_ex12/\"\n", "with np.load(audiofolder + \"/_raw_data_large.npz\") as data:\n", " X = data[\"Xdata\"]\n", " Y = data[\"Ydata\"]\n", "# 0...true_peak_lin\n", "# 1...true_peak_lin2\n", "# 2...true_peak_db\n", "# 3...rms_lin2\n", "# 4...rms_lin\n", "# 5...rms_db\n", "# 6...lufs_lin\n", "# 7...lufs_lin2\n", "# 8...lufs_db\n", "# 9...crest_lin\n", "# 10...crest_db\n", "# 11...low_high_ratio\n", "X = np.squeeze([X[:,0], X[:,5], X[:,8]]).T\n", "# feel free to play around with other feature matrices\n", "# a PCA on the whole X matrix yields very high variance explanation using\n", "# just PC 1 score, so the chosen, simple features 0...11 seem to be redundant\n", "# somehow \n", "\n", "# the PCA assumes mean-free columns, so we explicitly design X this way\n", "X = X - np.mean(X, axis=0)\n", "# we also normalize by standard deviation,\n", "# then each column of X has unit variance and total variance of X is 3: \n", "if True:\n", " X = X / np.std(X, axis=0, ddof=1)\n", " print(np.var(X, axis=0, ddof=1), np.sum(np.var(X, axis=0, ddof=1)) )\n", "\n", "N, F = X.shape[0], X.shape[1]\n", "N, F" ] }, { "cell_type": "code", "execution_count": null, "id": "e35460a6", "metadata": {}, "outputs": [], "source": [ "[U, s, Vh] = svd(X, full_matrices=False)\n", "S = diagsvd(s, F, F)\n", "V = Vh.conj().T\n", "pcs = U @ S # PC scores\n", "pcl = V # PC loadings" ] }, { "cell_type": "code", "execution_count": null, "id": "06d2173f", "metadata": {}, "outputs": [], "source": [ "var_pcs = np.var(pcs, axis=0, ddof=1)\n", "std_pcs = np.std(pcs, axis=0, ddof=1)\n", "print(var_pcs / np.sum(var_pcs) * 100) # PC1 explains 95% variance, so this example is pretty straightforward\n", "print(np.cumsum(var_pcs) / np.sum(var_pcs) * 100)" ] }, { "cell_type": "markdown", "id": "04eb7f9e", "metadata": {}, "source": [ "## Plot Directions of the PCs and Original Data" ] }, { "cell_type": "code", "execution_count": null, "id": "3d0d6363", "metadata": {}, "outputs": [], "source": [ "fig = plt.figure()\n", "fig.set_size_inches(9,3)\n", "\n", "ax = fig.add_subplot(131)\n", "ax.plot(X[:, 0], X[:, 1], \".\", color=\"gray\", ms=1)\n", "ax.axis(\"square\")\n", "ax.set_xlabel(\"original feature 1: true_peak_lin\")\n", "ax.set_ylabel(\"original feature 2: rms_db\")\n", "ax.grid(True)\n", "\n", "ax = fig.add_subplot(132)\n", "ax.plot(X[:, 0], X[:, 2], \".\", color=\"gray\", ms=1)\n", "ax.axis(\"square\")\n", "ax.set_xlabel(\"original feature 1: true_peak_lin\")\n", "ax.set_ylabel(\"original feature 3: lufs_db\")\n", "ax.grid(True)\n", "\n", "ax = fig.add_subplot(133)\n", "ax.plot(X[:, 1], X[:, 2], \".\", color=\"gray\", ms=1)\n", "ax.axis(\"square\")\n", "ax.set_xlabel(\"original feature 2: rms_db\")\n", "ax.set_ylabel(\"original feature 3: lufs_db\")\n", "ax.grid(True)\n", "\n", "plt.tight_layout()" ] }, { "cell_type": "code", "execution_count": null, "id": "c5dd0957", "metadata": {}, "outputs": [], "source": [ "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection=\"3d\")\n", "ax.plot(X[:, 0], X[:, 1], X[:, 2], \".\", color=\"gray\", ms=1)\n", "ax.axis(\"square\")\n", "ax.set_xlabel(\"original feature 1: true_peak_lin\")\n", "ax.set_ylabel(\"original feature 2: rms_db\")\n", "ax.set_zlabel(\"original feature 3: lufs_db\")\n", "ax.set_title(\"data cloud in original coordinate system\")\n", "ax.grid(True)\n", "ax.azim = -44\n", "ax.elev = 28" ] }, { "cell_type": "code", "execution_count": null, "id": "48000c5b", "metadata": {}, "outputs": [], "source": [ "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection=\"3d\")\n", "\n", "ax.plot(X[:, 0], X[:, 1], X[:, 2], \".\", color=\"gray\", ms=1)\n", "\n", "# draw directions of PC axis\n", "# length follows 3*sigma rule for normal distribution\n", "# note that V not necessarily spans a right-hand system\n", "col = ['C3', 'C2', 'C0']\n", "for i in range(F):\n", " ax.plot(\n", " [0, 3*std_pcs[i] * V[0, i]],\n", " [0, 3*std_pcs[i] * V[1, i]],\n", " [0, 3*std_pcs[i] * V[2, i]],\n", " col[i], lw=3,\n", " label=f\"{'direction of PC '}{i+1}\" f\"{' score, col '}{i+1}\" f\"{' in V, right sing vec '}{i+1}\")\n", "\n", "ax.axis(\"square\")\n", "ax.set_xlabel(\"original feature 1: true_peak_lin\")\n", "ax.set_ylabel(\"original feature 2: rms_db\")\n", "ax.set_zlabel(\"original feature 3: lufs_db\")\n", "ax.set_title(\"data cloud in original coordinate system\")\n", "ax.legend()\n", "ax.grid(True)\n", "ax.azim = -44\n", "ax.elev = 28" ] }, { "cell_type": "markdown", "id": "a392e7c1", "metadata": {}, "source": [ "## Plot Data in PC Coordinate System" ] }, { "cell_type": "code", "execution_count": null, "id": "c1037277", "metadata": {}, "outputs": [], "source": [ "[U_pcs, s_pcs, Vh_pcs] = svd(pcs, full_matrices=False)\n", "S_pcs = diagsvd(s_pcs, F, F)\n", "V_pcs = Vh_pcs.conj().T\n", "\n", "# make sure that correct U/V pair for pcs holds by\n", "# introducing corresponding reflections\n", "for i in range(F):\n", " if not np.allclose((U_pcs @ S_pcs)[:, i], pcs[:, i]):\n", " U_pcs[:,i] *= -1\n", " V_pcs[:,i] *= -1\n", "# then V_pcs indicates the correct directions in the PC coordinate system\n", "print(np.allclose(U_pcs @ S_pcs, pcs))" ] }, { "cell_type": "code", "execution_count": null, "id": "5ab9569b", "metadata": {}, "outputs": [], "source": [ "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection=\"3d\")\n", "\n", "ax.plot(pcs[:, 0], pcs[:, 1], pcs[:, 2], \"x\", color=\"gray\", ms=1)\n", "\n", "# draw directions of PC axis\n", "# length follows 3*sigma rule for normal distribution\n", "col = ['C3', 'C2', 'C0']\n", "for i in range(F):\n", " ax.plot(\n", " [0, 3*std_pcs[i] * V_pcs[0, i]],\n", " [0, 3*std_pcs[i] * V_pcs[1, i]],\n", " [0, 3*std_pcs[i] * V_pcs[2, i]],\n", " col[i], lw=3,\n", " label=f\"{'direction of PC '}{i+1}\" f\"{', length = 3 std(PC '}{i+1}\" f\"{')'}\")\n", "\n", "ax.axis(\"square\")\n", "ax.set_xlabel(\"PC 1\")\n", "ax.set_ylabel(\"PC 2\")\n", "ax.set_zlabel(\"PC 3\")\n", "ax.set_title(\"data cloud in PC coordinate system\")\n", "ax.legend()\n", "ax.grid(True)\n", "ax.azim = -37\n", "ax.elev = 28" ] }, { "cell_type": "markdown", "id": "175d6f16", "metadata": {}, "source": [ "## Truncated SVD / Low-Rank Approximation" ] }, { "cell_type": "code", "execution_count": null, "id": "8a2b076a", "metadata": {}, "outputs": [], "source": [ "r_des = 2 # 1 or 2 \n", "\n", "# SVD mindset\n", "# sum of outer products, i.e. sum of rank-1 matrices\n", "X_rank_red = np.zeros((N, F))\n", "for i in range(r_des):\n", " X_rank_red += s[i] * np.outer(U[:, i], V[:, i])\n", "\n", "# PCA mindset:\n", "# we might also use the PC signals and set intended PC loadings to zero\n", "X_rank_red2 = np.zeros((N, F))\n", "pcl_rank_red = np.copy(pcl)\n", "pcl_rank_red[:, r_des:] = 0\n", "X_rank_red2 = pcs @ pcl_rank_red.conj().T\n", "print(np.allclose(X_rank_red, X_rank_red2))\n", "pcl_rank_red" ] }, { "cell_type": "code", "execution_count": null, "id": "ed213620", "metadata": {}, "outputs": [], "source": [ "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection=\"3d\")\n", "\n", "ax.plot(X_rank_red2[:, 0],\n", " X_rank_red2[:, 1],\n", " X_rank_red2[:, 2], \"x\", color=\"gray\", ms=1)\n", "\n", "# draw directions of PC axis\n", "# length follows 3*sigma rule for normal distribution\n", "# note that V not necessarily spans a right-hand system\n", "col = ['C3', 'C2', 'C0']\n", "for i in range(F):\n", " ax.plot(\n", " [0, 3*std_pcs[i] * V[0, i]],\n", " [0, 3*std_pcs[i] * V[1, i]],\n", " [0, 3*std_pcs[i] * V[2, i]],\n", " col[i], lw=1,\n", " label=f\"{'direction of PC '}{i+1}\" f\"{' score, col '}{i+1}\" f\"{' in V, right sing vec '}{i+1}\")\n", "\n", "ax.axis(\"square\")\n", "ax.set_xlim(-6, 6)\n", "ax.set_ylim(-6, 6)\n", "ax.set_zlim(-6, 6)\n", "ax.set_xlabel(\"new feature 1 after rank reduction\")\n", "ax.set_ylabel(\"new feature 2\")\n", "ax.set_zlabel(\"new feature 3\")\n", "ax.set_title(\"rank-{0:d} approximation of data\".format(r_des))\n", "ax.legend()\n", "ax.grid(True)\n", "ax.azim = -44\n", "ax.elev = 28" ] }, { "cell_type": "markdown", "id": "245a01b2", "metadata": {}, "source": [ "## Dimensionality Reduction after PCA" ] }, { "cell_type": "code", "execution_count": null, "id": "120d1942", "metadata": {}, "outputs": [], "source": [ "dim_des = 2 # 1 or 2\n", "# PCA mindset\n", "X_dim_red = np.zeros((N, dim_des))\n", "X_dim_red = pcs[:, :dim_des]\n", "print('original matrix shape ', X.shape)\n", "print('matrix shape after dim red', X_dim_red.shape)\n", "\n", "X_dim_red_plot = np.zeros((N, F))\n", "X_dim_red_plot[:, :dim_des] = pcs[:, :dim_des]\n", "\n", "# check with SVD mindset\n", "print(np.allclose((U @ S)[:, :dim_des], X_dim_red))\n", "print(np.allclose(X @ V[:, :dim_des], X_dim_red))\n", "V[:, :dim_des]" ] }, { "cell_type": "code", "execution_count": null, "id": "bb5de466", "metadata": {}, "outputs": [], "source": [ "# the dimensionality reduction actually yields a matrix with smaller\n", "# dimension, cf. shape of X_dim_red\n", "# for convenience we plot data here in 3D plot\n", "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection=\"3d\")\n", "\n", "ax.plot(\n", " X_dim_red_plot[:, 0],\n", " X_dim_red_plot[:, 1],\n", " X_dim_red_plot[:, 2],\n", " \"x\", color=\"gray\", ms=1)\n", "\n", "# draw directions of PC axis\n", "# length follows 3*sigma rule for normal distribution\n", "col = ['C3', 'C2', 'C0']\n", "for i in range(F):\n", " ax.plot(\n", " [0, 3*std_pcs[i] * V_pcs[0, i]],\n", " [0, 3*std_pcs[i] * V_pcs[1, i]],\n", " [0, 3*std_pcs[i] * V_pcs[2, i]],\n", " col[i], lw=1,\n", " label=f\"{'direction of PC '}{i+1}\" f\"{', length = 3 std(PC '}{i+1}\" f\"{')'}\")\n", "\n", "ax.axis(\"square\")\n", "#ax.set_xlim(-6, 6)\n", "#ax.set_ylim(-6, 6)\n", "#ax.set_zlim(-6, 6)\n", "ax.set_xlabel(\"PC 1\")\n", "ax.set_ylabel(\"PC 2\")\n", "ax.set_zlabel(\"PC 3\")\n", "plt.title(\"PCA data dimensionality reduction to {0:d}D\".format(dim_des))\n", "ax.legend()\n", "ax.grid(True)\n", "ax.azim = -37\n", "ax.elev = 28\n", "\n", "print(\"reduced to dimension\", X_dim_red.shape)" ] }, { "cell_type": "markdown", "id": "ce117847", "metadata": {}, "source": [ "## Check / Interpret PCA Loadings\n", "\n", "- we interprete on original $\\mathbf{V}$ = `pcl` matrix, not to `V_pcs`\n", "- however, for simple linear combination mindset, the column vectors of $\\mathbf{V}^T$ should be interpreted" ] }, { "cell_type": "code", "execution_count": null, "id": "9016d000", "metadata": {}, "outputs": [], "source": [ "# 0...true_peak_lin\n", "# 1...true_peak_lin2\n", "# 2...true_peak_db\n", "# 3...rms_lin2\n", "# 4...rms_lin\n", "# 5...rms_db\n", "# 6...lufs_lin\n", "# 7...lufs_lin2\n", "# 8...lufs_db\n", "# 9...crest_lin\n", "# 10...crest_db\n", "# 11...low_high_ratio\n", "# X = np.squeeze([X[:,0], X[:,5], X[:,8]]).T" ] }, { "cell_type": "code", "execution_count": null, "id": "4668c01c", "metadata": {}, "outputs": [], "source": [ "tmp = pcl.conj().T\n", "tmp\n", "# PC1 feature could be something like 'inverted technical loudness', the hotter the music the higher all original features\n", "# PC2 feature could explain relation between dB RMS/dB Dynamics vs. linear Peak or just dB vs. linear\n", "# PC3 feature has almost no impact onto the data, interpretation on the meaning is not straightforward" ] }, { "cell_type": "code", "execution_count": null, "id": "84cb9bad", "metadata": {}, "outputs": [], "source": [ "# explain the true_peak_lin signal:\n", "tmp[:,0][:, None], np.allclose(pcs @ tmp[:,0], X[:,0])" ] }, { "cell_type": "code", "execution_count": null, "id": "dd6a57b5", "metadata": {}, "outputs": [], "source": [ "# explain the rms_db signal:\n", "tmp[:,1][:, None], np.allclose(pcs @ tmp[:,1], X[:,1])" ] }, { "cell_type": "code", "execution_count": null, "id": "29d90b7d", "metadata": {}, "outputs": [], "source": [ "# explain the lufs_db signal:\n", "tmp[:,2][:, None], np.allclose(pcs @ tmp[:,2], X[:,2])" ] }, { "cell_type": "markdown", "id": "28fb69ab", "metadata": {}, "source": [ "## Copyright\n", "\n", "- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)\n", "- feel free to use the notebooks for your own purposes\n", "- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)\n", "- the code of the IPython examples is licensed under the [MIT license](https://opensource.org/licenses/MIT)\n", "- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year." ] } ], "metadata": { "kernelspec": { "display_name": "myddasp", "language": "python", "name": "myddasp" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" } }, "nbformat": 4, "nbformat_minor": 5 }