{ "cells": [ { "cell_type": "markdown", "id": "abae8153", "metadata": {}, "source": [ "Sascha Spors,\n", "Professorship Signal Theory and Digital Signal Processing,\n", "Institute of Communications Engineering (INT),\n", "Faculty of Computer Science and Electrical Engineering (IEF),\n", "University of Rostock,\n", "Germany\n", "\n", "# Data Driven Audio Signal Processing - A Tutorial with Computational Examples\n", "\n", "Master Course #24512\n", "\n", "- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture\n", "- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise\n", "\n", "Feel free to contact lecturer frank.schultz@uni-rostock.de" ] }, { "cell_type": "markdown", "id": "11c6393a", "metadata": {}, "source": [ "# Principal Component Analysis (PCA) of Simple Audio Features\n", "\n", "- real world data from [exercise12_MusicGenreClassification.ipynb](exercise12_MusicGenreClassification.ipynb)\n", "- we calculate a PCA of an audio feature matrix\n", "- Part I: to make it still easy to visualize, we choose only three original features to indicate the PC directions in a graphical approach \n", "- Part II: the full data matrix is put into a **correlation matrix PCA** to inspect and interprete the PC loading matrix as correlations between original features and PC features" ] }, { "cell_type": "code", "execution_count": null, "id": "2fe4b1ee", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scipy.linalg import svd, diagsvd\n", "# from statsmodels.multivariate.pca import PCA" ] }, { "cell_type": "code", "execution_count": null, "id": "d27286a2", "metadata": {}, "outputs": [], "source": [ "matplotlib_widget_flag = True" ] }, { "cell_type": "code", "execution_count": null, "id": "1adbe598", "metadata": {}, "outputs": [], "source": [ "if matplotlib_widget_flag:\n", " %matplotlib widget" ] }, { "cell_type": "markdown", "id": "c9d077aa", "metadata": {}, "source": [ "## Part I" ] }, { "cell_type": "code", "execution_count": null, "id": "d0cea135", "metadata": {}, "outputs": [], "source": [ "audiofolder = \"./audio_ex12/\"\n", "with np.load(audiofolder + \"/_raw_data_large.npz\") as data:\n", " X = data[\"Xdata\"]\n", " Y = data[\"Ydata\"]\n", "# 0...true_peak_lin\n", "# 1...true_peak_lin2\n", "# 2...true_peak_db\n", "# 3...rms_lin2\n", "# 4...rms_lin\n", "# 5...rms_db\n", "# 6...lufs_lin\n", "# 7...lufs_lin2\n", "# 8...lufs_db\n", "# 9...crest_lin\n", "# 10...crest_db\n", "# 11...low_high_ratio\n", "X = np.squeeze([X[:,0], X[:,5], X[:,8]]).T\n", "# feel free to play around with other feature matrices\n", "# a PCA on the whole X matrix yields very high variance explanation using\n", "# just PC 1 score, so the chosen, simple features 0...11 seem to be redundant\n", "# somehow?!, see the interpretation of the loadings at the end of the notebook\n", "\n", "# the PCA assumes mean-free columns, so we explicitly design X this way\n", "X = X - np.mean(X, axis=0)\n", "# we also normalize by standard deviation,\n", "# then each column of X has unit variance and total variance of X is 3: \n", "if True:\n", " X = X / np.std(X, axis=0, ddof=1)\n", " print(np.var(X, axis=0, ddof=1), np.sum(np.var(X, axis=0, ddof=1)) )\n", "\n", "N, F = X.shape[0], X.shape[1]\n", "N, F" ] }, { "cell_type": "code", "execution_count": null, "id": "e35460a6", "metadata": {}, "outputs": [], "source": [ "[U, s, Vh] = svd(X, full_matrices=False)\n", "S = diagsvd(s, F, F)\n", "V = Vh.conj().T\n", "\n", "pcf = U @ S # known as PC features, PC scores \n", "pcl = V # known as PC loadings\n", "# the normalisation scheme here uses V.T@V=I for the PC loadings\n", "# cf. the discussion for eq. (2.8) in\n", "# https://doi.org/10.1098/rsta.2015.0202\n", "# note that the wording in PCA world is rather inconsistent and misleading\n", "# so we make sure that our math and our understanding is consistent" ] }, { "cell_type": "code", "execution_count": null, "id": "06d2173f", "metadata": {}, "outputs": [], "source": [ "var_pcf = np.var(pcf, axis=0, ddof=1)\n", "std_pcf = np.std(pcf, axis=0, ddof=1)\n", "print(var_pcf / np.sum(var_pcf) * 100) # PCF 1 explains already 95% of data's variance\n", "print(np.cumsum(var_pcf) / np.sum(var_pcf) * 100)" ] }, { "cell_type": "markdown", "id": "04eb7f9e", "metadata": {}, "source": [ "## Plot Directions of the PCs and Original Data" ] }, { "cell_type": "code", "execution_count": null, "id": "3d0d6363", "metadata": {}, "outputs": [], "source": [ "fig = plt.figure()\n", "fig.set_size_inches(9,3)\n", "\n", "ax = fig.add_subplot(131)\n", "ax.plot(X[:, 0], X[:, 1], \".\", color=\"gray\", ms=1)\n", "ax.axis(\"square\")\n", "ax.set_xlabel(\"original feature 1: true_peak_lin\")\n", "ax.set_ylabel(\"original feature 2: rms_db\")\n", "ax.grid(True)\n", "\n", "ax = fig.add_subplot(132)\n", "ax.plot(X[:, 0], X[:, 2], \".\", color=\"gray\", ms=1)\n", "ax.axis(\"square\")\n", "ax.set_xlabel(\"original feature 1: true_peak_lin\")\n", "ax.set_ylabel(\"original feature 3: lufs_db\")\n", "ax.grid(True)\n", "\n", "ax = fig.add_subplot(133)\n", "ax.plot(X[:, 1], X[:, 2], \".\", color=\"gray\", ms=1)\n", "ax.axis(\"square\")\n", "ax.set_xlabel(\"original feature 2: rms_db\")\n", "ax.set_ylabel(\"original feature 3: lufs_db\")\n", "ax.grid(True)\n", "\n", "plt.tight_layout()" ] }, { "cell_type": "code", "execution_count": null, "id": "c5dd0957", "metadata": {}, "outputs": [], "source": [ "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection=\"3d\")\n", "ax.plot(X[:, 0], X[:, 1], X[:, 2], \".\", color=\"gray\", ms=1)\n", "ax.axis(\"square\")\n", "ax.set_xlabel(\"original feature 1: true_peak_lin\")\n", "ax.set_ylabel(\"original feature 2: rms_db\")\n", "ax.set_zlabel(\"original feature 3: lufs_db\")\n", "ax.set_title(\"data cloud in original coordinate system\")\n", "ax.grid(True)\n", "ax.azim = -44\n", "ax.elev = 28" ] }, { "cell_type": "code", "execution_count": null, "id": "48000c5b", "metadata": {}, "outputs": [], "source": [ "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection=\"3d\")\n", "\n", "ax.plot(X[:, 0], X[:, 1], X[:, 2], \".\", color=\"gray\", ms=1)\n", "\n", "# draw directions of PC axis\n", "# length is chosen to be the 3*sqrt(variance)=3*std of the PC features\n", "# note that V not necessarily spans a right-hand system\n", "col = ['C3', 'C2', 'C0']\n", "for i in range(F):\n", " ax.plot(\n", " [0, 3*std_pcf[i] * V[0, i]],\n", " [0, 3*std_pcf[i] * V[1, i]],\n", " [0, 3*std_pcf[i] * V[2, i]],\n", " col[i], lw=3,\n", " label=f\"{'3-std-weighted PC '}{i+1}\" f\"{'=col '}{i+1} in V\")\n", "\n", "ax.axis(\"square\")\n", "ax.set_xlabel(\"original feature 1: true_peak_lin\")\n", "ax.set_ylabel(\"original feature 2: rms_db\")\n", "ax.set_zlabel(\"original feature 3: lufs_db\")\n", "ax.set_title(\"data cloud in original coordinate system\")\n", "ax.legend()\n", "ax.grid(True)\n", "ax.azim = -44\n", "ax.elev = 28" ] }, { "cell_type": "markdown", "id": "a392e7c1", "metadata": {}, "source": [ "## Plot Data in PC Coordinate System" ] }, { "cell_type": "code", "execution_count": null, "id": "c1037277", "metadata": {}, "outputs": [], "source": [ "# rather than rotating/flipping of original data into a new coordinate system,\n", "# we can do a dedicated SVD on pcs\n", "# because V_pcs is a diagonal matrix with +-1,\n", "# i.e. indicating +-x, +-y and +-z axis for a 3D plot\n", "[U_pcf, s_pcf, Vh_pcf] = svd(pcf, full_matrices=False)\n", "S_pcf = diagsvd(s_pcf, F, F)\n", "V_pcf = Vh_pcf.T\n", "\n", "# make sure that correct U/V pair for pcf holds by\n", "# introducing corresponding reflections\n", "for i in range(F):\n", " if not np.allclose((U_pcf @ S_pcf)[:, i], pcf[:, i]):\n", " U_pcf[:,i] *= -1\n", " V_pcf[:,i] *= -1\n", "# then V_pcs indicates the correct directions in the PC coordinate system\n", "print(np.allclose(U_pcf @ S_pcf, pcf))" ] }, { "cell_type": "code", "execution_count": null, "id": "5ab9569b", "metadata": {}, "outputs": [], "source": [ "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection=\"3d\")\n", "\n", "ax.plot(pcf[:, 0], pcf[:, 1], pcf[:, 2], \"x\", color=\"gray\", ms=1)\n", "\n", "# draw directions of PC axis\n", "# length is chosen to be the 3*sqrt(variance)=3*std of the PC features\n", "# note that V not necessarily spans a right-hand system\n", "col = ['C3', 'C2', 'C0']\n", "for i in range(F):\n", " ax.plot(\n", " [0, 3*std_pcf[i] * V_pcf[0, i]],\n", " [0, 3*std_pcf[i] * V_pcf[1, i]],\n", " [0, 3*std_pcf[i] * V_pcf[2, i]],\n", " col[i], lw=3,\n", " label=f\"{'3-std-weighted PC '}{i+1}\")\n", "\n", "ax.axis(\"square\")\n", "ax.set_xlabel(\"PC 1 direction\")\n", "ax.set_ylabel(\"PC 2 direction\")\n", "ax.set_zlabel(\"PC 3 direction\")\n", "ax.set_title(\"data in PC coordinate system\")\n", "ax.legend()\n", "ax.grid(True)\n", "ax.azim = -37\n", "ax.elev = 28" ] }, { "cell_type": "markdown", "id": "175d6f16", "metadata": {}, "source": [ "## Truncated SVD / Low-Rank Approximation" ] }, { "cell_type": "code", "execution_count": null, "id": "8a2b076a", "metadata": {}, "outputs": [], "source": [ "r_des = 2 # 1 or 2 \n", "\n", "# SVD mindset\n", "# sum of outer products, i.e. sum of rank-1 matrices\n", "X_rank_red = np.zeros((N, F))\n", "for i in range(r_des):\n", " X_rank_red += s[i] * np.outer(U[:, i], V[:, i])\n", "\n", "# PCA mindset:\n", "# we might also use the PC features and\n", "# set the intended PC loadings to zero\n", "X_rank_red2 = np.zeros((N, F))\n", "pcl_rank_red = np.copy(pcl)\n", "pcl_rank_red[:, r_des:] = 0\n", "X_rank_red2 = pcf @ pcl_rank_red.T\n", "print(np.allclose(X_rank_red, X_rank_red2))\n", "pcl_rank_red" ] }, { "cell_type": "code", "execution_count": null, "id": "ed213620", "metadata": {}, "outputs": [], "source": [ "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection=\"3d\")\n", "\n", "ax.plot(X_rank_red2[:, 0],\n", " X_rank_red2[:, 1],\n", " X_rank_red2[:, 2], \"x\", color=\"gray\", ms=1)\n", "\n", "# draw directions of PC axis\n", "# length is chosen to be the 3*sqrt(variance)=3*std of the PC features\n", "# note that V not necessarily spans a right-hand system\n", "col = ['C3', 'C2', 'C0']\n", "for i in range(F):\n", " ax.plot(\n", " [0, 3*std_pcf[i] * V[0, i]],\n", " [0, 3*std_pcf[i] * V[1, i]],\n", " [0, 3*std_pcf[i] * V[2, i]],\n", " col[i], lw=1,\n", " label=f\"{'3-std-weighted PC '}{i+1}\" f\"{' score=col '}{i+1} in V\")\n", "\n", "ax.axis(\"square\")\n", "ax.set_xlim(-6, 6)\n", "ax.set_ylim(-6, 6)\n", "ax.set_zlim(-6, 6)\n", "ax.set_xlabel(\"new feature 1 after rank reduction\")\n", "ax.set_ylabel(\"new feature 2 after...\")\n", "ax.set_zlabel(\"new feature 3 after...\")\n", "if r_des==1:\n", " ax.set_title(\"rank-{0:d} approximation of data (line in 3D)\".format(r_des)) \n", "elif r_des==2:\n", " ax.set_title(\"rank-{0:d} approximation of data (plane in 3D)\".format(r_des))\n", "ax.legend()\n", "ax.grid(True)\n", "ax.azim = -44\n", "ax.elev = 28" ] }, { "cell_type": "markdown", "id": "245a01b2", "metadata": {}, "source": [ "## Dimensionality Reduction after PCA" ] }, { "cell_type": "code", "execution_count": null, "id": "120d1942", "metadata": {}, "outputs": [], "source": [ "dim_des = 2 # 1 or 2\n", "# PCA mindset\n", "X_dim_red = np.zeros((N, dim_des))\n", "X_dim_red = pcf[:, :dim_des]\n", "print('original matrix shape ', X.shape)\n", "print('matrix shape after dim red', X_dim_red.shape)\n", "\n", "X_dim_red_plot = np.zeros((N, F))\n", "X_dim_red_plot[:, :dim_des] = pcf[:, :dim_des]\n", "\n", "# check with SVD mindset\n", "print(np.allclose((U @ S)[:, :dim_des], X_dim_red))\n", "print(np.allclose(X @ V[:, :dim_des], X_dim_red))\n", "V[:, :dim_des]" ] }, { "cell_type": "code", "execution_count": null, "id": "bb5de466", "metadata": {}, "outputs": [], "source": [ "# the dimensionality reduction actually yields a matrix with smaller\n", "# dimension, cf. shape of X_dim_red\n", "# for convenience we plot data here in 3D plot\n", "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection=\"3d\")\n", "\n", "ax.plot(\n", " X_dim_red_plot[:, 0],\n", " X_dim_red_plot[:, 1],\n", " X_dim_red_plot[:, 2],\n", " \"x\", color=\"gray\", ms=1)\n", "\n", "# draw directions of PC axis\n", "# length is chosen to be the 3*sqrt(variance)=3*std of the PC features\n", "# note that V not necessarily spans a right-hand system\n", "col = ['C3', 'C2', 'C0']\n", "for i in range(F):\n", " ax.plot(\n", " [0, 3*std_pcf[i] * V_pcf[0, i]],\n", " [0, 3*std_pcf[i] * V_pcf[1, i]],\n", " [0, 3*std_pcf[i] * V_pcf[2, i]],\n", " col[i], lw=1,\n", " label=f\"{'3-std-weighted PC '}{i+1}\")\n", "\n", "ax.axis(\"square\")\n", "#ax.set_xlim(-6, 6)\n", "#ax.set_ylim(-6, 6)\n", "#ax.set_zlim(-6, 6)\n", "ax.set_xlabel(\"PC 1 direction\")\n", "ax.set_ylabel(\"PC 2 direction\")\n", "ax.set_zlabel(\"PC 3 direction\")\n", "if dim_des==1:\n", " plt.title(\"PCA data dimensionality reduction to {0:d}D (line in 1D)\".format(dim_des))\n", "elif dim_des==2:\n", " plt.title(\"PCA data dimensionality reduction to {0:d}D (plane in 2D)\".format(dim_des))\n", "ax.legend()\n", "ax.grid(True)\n", "ax.azim = -37\n", "ax.elev = 28\n", "\n", "print(\"reduced to dimension\", X_dim_red.shape)" ] }, { "cell_type": "markdown", "id": "ce117847", "metadata": {}, "source": [ "## Part II: Check / Interpret PCA Loadings for Full Data Matrix\n", "\n", "We use a correlation matrix PCA here." ] }, { "cell_type": "code", "execution_count": null, "id": "a1b34988", "metadata": {}, "outputs": [], "source": [ "with np.load(audiofolder + \"/_raw_data_large.npz\") as data:\n", " X = data[\"Xdata\"]\n", " Y = data[\"Ydata\"]\n", "N, F = X.shape[0], X.shape[1]\n", "N, F" ] }, { "cell_type": "code", "execution_count": null, "id": "768e26b4", "metadata": {}, "outputs": [], "source": [ "X = X - np.mean(X, axis=0)\n", "X = X / np.std(X, axis=0, ddof=1)\n", "np.sum(np.mean(X, axis=0)), np.std(X, axis=0, ddof=1)" ] }, { "cell_type": "code", "execution_count": null, "id": "795b30f6", "metadata": {}, "outputs": [], "source": [ "# get PCA\n", "# the normalisation scheme for part II is known as 'correlation matrix PCA'\n", "# cf. the discussion for eq. (2.8) in\n", "# https://doi.org/10.1098/rsta.2015.0202\n", "# using this normalisation scheme, the loading matrix is a correlation\n", "# matrix to show correlation between the original features and the PC features\n", "# this is a nice way to interprete the meaning of the linear combinations\n", "# and potential meanings of the PC features\n", "Z = X / np.sqrt(N-1)\n", "R = Z.T @ Z\n", "[U, s, Vh] = svd(Z, full_matrices=False)\n", "V, S = Vh.T, diagsvd(s, F, F)\n", "# adapt polarity for simpler interpretation\n", "V[:,0] *= -1\n", "U[:,0] *= -1\n", "V[:,1] *= -1\n", "U[:,1] *= -1\n", "\n", "PC_Features = U # principal component features (PCF), aka principal component scores (PCS) \n", "PC_Loadings = V @ S # principal component loadings\n", "# cf. due to Z = U S V.T -> (S @ V.T).T\n", "np.allclose(Z, PC_Features @ PC_Loadings.T) # check correct SVD matrix factorisation\n", "# note that other normalisation schemes exist for the PCA\n", "# e.g. in data science / machine learning we typically see this:\n", "# PC_Features = U @ S\n", "# PC_Loadings = V\n", "# then other post-processing / interpretation is needed\n", "# this is precisely the scheme that was used in Part I" ] }, { "cell_type": "code", "execution_count": null, "id": "2cd3878d", "metadata": {}, "outputs": [], "source": [ "variance = np.diag(S.T@S)\n", "total_variance = np.linalg.trace(S.T@S)\n", "total_variance, np.sum(variance)" ] }, { "cell_type": "code", "execution_count": null, "id": "1caab578", "metadata": {}, "outputs": [], "source": [ "# variance / cumsum variance as percentage\n", "variance/total_variance*100, np.cumsum(variance)/total_variance*100\n", "# the first four PC features explain 96.96% of data's variance" ] }, { "cell_type": "code", "execution_count": null, "id": "16a015c4", "metadata": {}, "outputs": [], "source": [ "plt.figure()\n", "plt.subplot(2,1,1)\n", "plt.plot(np.arange(F)+1, variance, 'ok--')\n", "plt.ylabel('eigenwert of R = variance')\n", "plt.title('Scree Plot')\n", "plt.subplot(2,1,2)\n", "plt.plot(np.arange(F)+1, np.cumsum(variance), 'ok--')\n", "plt.xlabel('eigenwert index')\n", "plt.ylabel('cumsum variance')\n", "plt.grid(True)\n", "variance, np.cumsum(variance)" ] }, { "cell_type": "code", "execution_count": null, "id": "bd01a8da", "metadata": {}, "outputs": [], "source": [ "# original features\n", "# 0...true_peak_lin\n", "# 1...true_peak_lin2\n", "# 2...true_peak_db\n", "# 3...rms_lin2\n", "# 4...rms_lin\n", "# 5...rms_db\n", "# 6...lufs_lin\n", "# 7...lufs_lin2\n", "# 8...lufs_db\n", "# 9...crest_lin\n", "# 10...crest_db\n", "# 11...low_high_ratio\n", "PC_Loadings[:,0:4]\n", "# original features 0...8 load very high and almost equally onto the first\n", "# PC feature (which explains ca. 70% variance), so this PC feature is about\n", "# peak/rms power either linear or dB\n", "#\n", "# the crest factor related features 9 and 10 load very high onto the second\n", "# PC feature (which explains ca. 13% variance), so this PC feature is crest\n", "# factor specific\n", "#\n", "# the original feature 11...low_high_ratio almost exclusively loads onto the\n", "# third PC feature (which explains ca. 8.7% variance)\n", "#\n", "# the fourth PC feature (which explains ca. 4.6% variance) is about separating\n", "# linear (-) and dB (+) related features due to polarity changes in the correlation,\n", "# this holds especially for the peak/rms power related original features" ] }, { "cell_type": "code", "execution_count": null, "id": "b619e92b", "metadata": {}, "outputs": [], "source": [ "pcf_label = ['PCF1', 'PCF2', 'PCF3', 'PCF4', 'PCF5', 'PCF6', 'PCF7', 'PCF8', 'PCF9', 'PCF10', 'PCF11', 'PCF12']\n", "task_label = ['true peak lin', 'true peak lin2', 'true peak db',\n", " 'rms lin2', 'rms lin', 'rms db', 'lufs lin',\n", " 'lufs lin2', 'lufs db', 'crest lin', 'crest db',\n", " 'low high ratio']\n", "\n", "cmap = plt.get_cmap('Spectral_r', 8)\n", "fig = plt.figure(figsize=(9,6))\n", "ax = fig.add_subplot(111)\n", "cax = ax.matshow(PC_Loadings, cmap=cmap, vmin=-1, vmax=+1)\n", "fig.colorbar(cax)\n", "ax.set_xticks(np.arange(len(pcf_label)))\n", "ax.set_yticks(np.arange(len(task_label)))\n", "ax.set_xticklabels(pcf_label)\n", "ax.set_yticklabels(task_label)\n", "ax.set_title('Loading Matrix = Audio Feature x loads to PC-Feature y')\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "id": "28fb69ab", "metadata": {}, "source": [ "## Copyright\n", "\n", "- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)\n", "- feel free to use the notebooks for your own purposes\n", "- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)\n", "- the code of the IPython examples is licensed under the [MIT license](https://opensource.org/licenses/MIT)\n", "- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year." ] } ], "metadata": { "kernelspec": { "display_name": "data-driven-audio-signal-processing-exercise", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.11" } }, "nbformat": 4, "nbformat_minor": 5 }