{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Applying Rocket to Raw Audio\n", "\n", "Note: this notebook is extremely messy as a result of me trying to rapidly prototype rocket for raw audio and not following best practices. It also uses the beta version of fastai v2 for removing silence and preprocessing. If you are interested in experimenting yourself and can't make sense of something here, please reach out to [me](https://forums.fast.ai/u/madeupmasters/summary) by PM, or in the [Deep Learning with Audio](https://forums.fast.ai/t/deep-learning-with-audio-thread) or [Time Series](https://forums.fast.ai/t/time-series-sequential-data-study-group) threads\n", "\n", "This notebook will apply the findings of the recent [Rocket Paper](https://arxiv.org/abs/1910.13051) by Angus Dempster, François Petitjean, Geoffrey I. Webb to 1D raw audio signals for the task of voice recognition. Some of this code is also adapted from [Ignacio Oguiza](https://forums.fast.ai/u/oguiza) and his [Time Series Module for FastAI v1](https://github.com/timeseriesAI/timeseriesAI)\n", "\n", "Initially the signals were too long and slow to train at a sample rate of 16000 (was going to take ~30-40 minutes for 1s clips). Training a 3800 audio 10 class dataset (small problem, trains to 99%+ accuracy in 2 minutes using typical audio pipeline of spectrogram + CNN), . To speed things up I added a stride which sped up results without a drop in accuracy, but still only led to 85% accuracy after 4 minutes of training. Removing silence and doubling the amount of time to 2s allowed us to get great results (95% accuracy in 6s, 98.6% in 20 seconds, 99.2% in 1 min 20 sec).\n", "\n", "Unfortunately so far, I have not been able to scale the results to harder problems, such as a 250 speaker dataset. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary of Findings\n", "\n", "In order to spare you having to go through this network, here's a summary of the interesting results\n", "\n", " - 95% accuracy in 6s, 98.6% in 20 seconds, 99.2% in 1 min 20 sec on a 10 class problem using raw audio and no augmentation\n", " - Having a stride of 5-7 seems to be an optimal balance between computational cost and accuracy\n", " - Bigger filter sizes are better only up to 7x7 or 9x9 and then they actually decrease accuracy with increased cost\n", " - A variety of filter sizes (7,9,11) doesnt seem to beat out just individual filter sizes 7,9,11 but more testing is needed with a harder dataset\n", " - Testing tons of individual kernels on random subsets of the data and then selecting the best ones actually results in less accuracy vs the same number of random kernels (this is likely due to increased correlation between the good random kernels, making them worse in an ensemble)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from local.torch_basics import *\n", "from local.test import *\n", "from local.basics import *\n", "from local.data.all import *\n", "from local.vision.core import *\n", "from local.notebook.showdoc import show_doc\n", "from local.audio.core import *\n", "from local.audio.augment import *\n", "from local.vision.learner import *\n", "from local.vision.models.xresnet import *\n", "from local.metrics import *\n", "from local.callback.schedule import *\n", "import torchaudio\n", "from fastprogress import progress_bar as pb\n", "import time\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import RidgeClassifierCV\n", "from rocket import generate_kernels, apply_kernel, apply_kernels" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PosixPath('/home/jupyter/.fastai/data/ST-AEDS-20180100_1-OS/ST-AEDS-20180100_1-OS')" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p10speakers = Config()['data_path'] / 'ST-AEDS-20180100_1-OS'\n", "untar_data(URLs.SPEAKERS10, fname=str(p10speakers)+'.tar', dest=p10speakers)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "get_audio = AudioGetter(\"\", recurse=True, folders=None)\n", "files_10 = get_audio(p10speakers)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(#3842) [/home/jupyter/.fastai/data/ST-AEDS-20180100_1-OS/f0004_us_f0004_00446.wav,/home/jupyter/.fastai/data/ST-AEDS-20180100_1-OS/m0002_us_m0002_00128.wav,/home/jupyter/.fastai/data/ST-AEDS-20180100_1-OS/f0003_us_f0003_00279.wav,/home/jupyter/.fastai/data/ST-AEDS-20180100_1-OS/f0001_us_f0001_00168.wav,/home/jupyter/.fastai/data/ST-AEDS-20180100_1-OS/f0005_us_f0005_00286.wav,/home/jupyter/.fastai/data/ST-AEDS-20180100_1-OS/m0005_us_m0005_00282.wav,/home/jupyter/.fastai/data/ST-AEDS-20180100_1-OS/f0005_us_f0005_00432.wav,/home/jupyter/.fastai/data/ST-AEDS-20180100_1-OS/f0005_us_f0005_00054.wav,/home/jupyter/.fastai/data/ST-AEDS-20180100_1-OS/m0004_us_m0004_00110.wav,/home/jupyter/.fastai/data/ST-AEDS-20180100_1-OS/m0003_us_m0003_00180.wav...]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "files_10" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "audio_opener = OpenAudio(files_10)\n", "p10_labeler = lambda x: str(x).split('/')[-1][:5] #grab the label from each file" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "CLIP_LENGTH = 2000" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "
\n", " \n", " \n", " 100.00% [3842/3842 00:12<00:00]\n", "
\n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sigs, labels = [],[]\n", "cropper = CropSignal(CLIP_LENGTH, pad_mode='repeat')\n", "remove_silence = RemoveSilence()\n", "for i in pb(range(len(files_10))):\n", " sigs.append(cropper(remove_silence(audio_opener(i))).sig)\n", " labels.append(p10_labeler(files_10[i]))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3842, 3842)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(sigs), len(labels)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "total_size = len(sigs)\n", "train_size = int(total_size*.8)\n", "train_idxs = torch.randperm(total_size)[:train_size]\n", "valid_idxs = [i for i in range(total_size) if i not in train_idxs]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "assert len(train_idxs) + len(valid_idxs) == len(sigs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x_train = [sigs[idx].squeeze(0).numpy() for idx in train_idxs]\n", "y_train = [labels[idx] for idx in train_idxs]\n", "x_valid = [sigs[idx].squeeze(0).numpy() for idx in valid_idxs]\n", "y_valid = [labels[idx] for idx in valid_idxs]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[3073, 3073, 769, 769]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(map(len, (x_train, y_train, x_valid, y_valid)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((3073, 32000), (769, 32000))" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np_x_train = np.stack(x_train).astype(np.float64)\n", "np_x_valid = np.stack(x_valid).astype(np.float64)\n", "np_x_train.shape, np_x_valid.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "o2i_f = lambda x: 5*(x[0]=='m') + int(x[-1]) - 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np_y_train = np.array(list(map(o2i_f, y_train)))\n", "np_y_valid = np.array(list(map(o2i_f, y_valid)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0, 7, 2, ..., 0, 7, 1])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np_y_train" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "((3073, 32000), (3073,), (769, 32000), (769,))" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np_x_train.shape, np_y_train.shape, np_x_valid.shape, np_y_valid.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-4.49039649777175e-05" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np_x_train.mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Normalize the training data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np_x_train = (np_x_train - np_x_train.mean(axis = 1, keepdims = True)) / (np_x_train.std(axis = 1, keepdims = True) + 1e-8)\n", "np_x_valid = (np_x_valid - np_x_valid.mean(axis = 1, keepdims = True)) / (np_x_valid.std(axis = 1, keepdims = True) + 1e-8)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(-8.10809639770585e-20, 0.9999995545301024)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np_x_train.mean(), np_x_train.std()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('float64')" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np_x_train.dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Start Here" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def timing_test(runs, candidate_lengths, stride, num_kernels, seq_length, show_progress=True):\n", " times, scores = [],[]\n", " for i in range(runs):\n", " kernels = generate_kernels(seq_length, num_kernels, candidate_lengths, stride)\n", " start = time.time()\n", " x_train_tfm = apply_kernels(np_x_train, kernels)\n", " x_valid_tfm = apply_kernels(np_x_valid, kernels)\n", " classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 7), normalize=True)\n", " classifier.fit(x_train_tfm, np_y_train)\n", " score = classifier.score(x_valid_tfm, np_y_valid)\n", " t = time.time()-start\n", " scores.append(score)\n", " times.append(t)\n", " if(show_progress): print(\"Finished Run\", i+1, \"Score:\", round(score, 3), \"Time:\", round(t,3))\n", " return times, scores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Initial attempts" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "timing_test(5, np.array((7,9,11)), stride=5, num_kernels=200, seq_length=16000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| Kernel Sizes | Strides | Results | Time |\n", "| :--------------: | :-----: | :-----: | :-----:| \n", "| {7,9,11} | 7 | .85 | 4:02 |\n", "| {7,9,11} | 5 | .899 | 5:20 |\n", "| {7,9,11} | 3 | .903 | 8:15 |\n", "|{800,1000,1200} | 400 | .46 | 3:43 |\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Silence removed results 10000 kernels**\n", "\n", "| Kernel Sizes | Strides | Results | Time |\n", "| :--------------: | :-----: | :-----: | :-----:| \n", "| {7,9,11} | 5 | .979 | 5:08 |\n", "| {7,9,11} | 5 | .976 | 5:10 |\n", "| {7,9,11} | 5 | .980 | 5:26 |\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Silence removed results 2000 kernels**\n", "\n", "| Kernel Sizes | Strides | Results | Time |\n", "| :--------------: | :-----: | :-----: | :-----:| \n", "| {7,9,11} | 5 | .972 | 1:10 |\n", "| {7,9,11} | 5 | .979 | 1:03 |\n", "| {7,9,11} | 5 | .976 | 1:01 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Silence removed results 1000 kernels**\n", "\n", "| Kernel Sizes | Strides | Results | Time |\n", "| :--------------: | :-----: | :-----: | :-----:| \n", "| {7,9,11} | 5 | .974 | 0:31 |\n", "| {7,9,11} | 5 | .974 | 0:31 |\n", "| {7,9,11} | 5 | .966 | 0:31 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Silence removed results 200 kernels**\n", "\n", "| Kernel Sizes | Strides | Results | Time |\n", "| :--------------: | :-----: | :-----: | :-----:| \n", "| {7,9,11} | 5 | .949 | 0:06 |\n", "| {7,9,11} | 5 | .950 | 0:06 |\n", "| {7,9,11} | 5 | .934 | 0:06|" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1s Audio, silence removed, testing strides" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Silence removed results 200 kernels**\n", "\n", "| Kernel Sizes | Strides | Results | Time |\n", "| :--------------: | :-----: | :-----: | :-----:| \n", "| {7,9,11} | 1 | .942 | 0:30 |\n", "| {7,9,11} | 1 | .954 | 0:28 |\n", "| {7,9,11} | 1 | .950 | 0:28|" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Silence removed results 200 kernels**\n", "\n", "| Kernel Sizes | Strides | Results | Time |\n", "| :--------------: | :-----: | :-----: | :-----:| \n", "| {7,9,11} | 3 | .960 | 0:10 |\n", "| {7,9,11} | 3 | .948 | 0:10 |\n", "| {7,9,11} | 3 | .954 | 0:10 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Up clip length to 2 seconds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Silence removed results 200 kernels**\n", "\n", "| Kernel Sizes | Strides | Results | Time |\n", "| :--------------: | :-----: | :-----: | :-----:| \n", "| {7,9,11} | 5 | .987 | 0:15 |\n", "| {7,9,11} | 5 | .980 | 0:13 |\n", "| {7,9,11} | 5 | .986 | 0:13 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Silence removed results 200 kernels**\n", "\n", "| Kernel Sizes | Strides | Results | Time |\n", "| :--------------: | :-----: | :-----: | :-----:| \n", "| {7,9,11} | 3 | .986 | 0:20 |\n", "| {7,9,11} | 3 | .986 | 0:20 |\n", "| {7,9,11} | 3 | .990 | 0:20 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Silence removed results 1000 kernels**\n", "\n", "| Kernel Sizes | Strides | Results | Time |\n", "| :--------------: | :-----: | :-----: | :-----:| \n", "| {7,9,11} | 3 | .992 | 1:40 |\n", "| {7,9,11} | 3 | .992 | 1:40 |\n", "| {7,9,11} | 3 | .992 | 1:40 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check impact of the kernel size options, do we really need to choose from 3?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks2, scores_ks2 = timing_test(10, np.array((2,)), stride=1, num_kernels=100, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks3, scores_ks3 = timing_test(10, np.array((3,)), stride=1, num_kernels=100, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks5, scores_ks5 = timing_test(10, np.array((5,)), stride=1, num_kernels=100, seq_length=32000, show_progress=False)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks7, scores_ks7 = timing_test(10, np.array((7,)), stride=1, num_kernels=100, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks9, scores_ks9 = timing_test(10, np.array((9,)), stride=1, num_kernels=100, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks11, scores_ks11 = timing_test(10, np.array((11,)), stride=1, num_kernels=100, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ksorig, scores_ksorig = timing_test(10, np.array((7,9,11,)), stride=1, num_kernels=100, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def mn(x): return round(sum(x)/len(x), 3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "all_scores = [scores_ks2, scores_ks3, scores_ks5, scores_ks7, scores_ks9, scores_ks11, scores_ksorig,]\n", "all_times = [times_ks2, times_ks3, times_ks5, times_ks7, times_ks9, times_ks11, times_ksorig,]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mean_times = list(map(mn, all_times))\n", "mean_scores = list(map(mn, all_scores))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot([2,3,5,7,9,11], mean_scores[:6])\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot([2,3,5,7,9,11], mean_times[:6])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0.924, 0.956, 0.967, 0.968, 0.969, 0.962, 0.967]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mean_scores" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusion: It appears 7,9,11 do well, it's hard to tell if the variety helps without testing a higher number of kernels\n", "\n", "Let's rerun the last experiment, this time with longer strides" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks2, scores_ks2 = timing_test(10, np.array((2,)), stride=7, num_kernels=100, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks3, scores_ks3 = timing_test(10, np.array((3,)), stride=7, num_kernels=100, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks5, scores_ks5 = timing_test(10, np.array((5,)), stride=7, num_kernels=100, seq_length=32000, show_progress=False)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks7, scores_ks7 = timing_test(10, np.array((7,)), stride=7, num_kernels=100, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks9, scores_ks9 = timing_test(10, np.array((9,)), stride=7, num_kernels=100, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks11, scores_ks11 = timing_test(10, np.array((11,)), stride=7, num_kernels=100, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ksorig, scores_ksorig = timing_test(10, np.array((7,9,11,)), stride=7, num_kernels=100, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "all_scores_s7 = [scores_ks2, scores_ks3, scores_ks5, scores_ks7, scores_ks9, scores_ks11, scores_ksorig,]\n", "all_times_s7 = [times_ks2, times_ks3, times_ks5, times_ks7, times_ks9, times_ks11, times_ksorig,]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mean_times_s7 = list(map(mn, all_times_s7))\n", "mean_scores_s7 = list(map(mn, all_scores_s7))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0.905, 0.937, 0.949, 0.95, 0.958, 0.951, 0.957]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mean_scores_s7" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot([2,3,5,7,9,11], mean_scores_s7[:6])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot([2,3,5,7,9,11], mean_times_s7[:6])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def mn(x): return round(sum(x)/len(x),3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test 100 kernels each of larger kernels" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Kernel Size 7: Score: 0.952 Time: 5.322s\n", "Kernel Size 11: Score: 0.962 Time: 6.52s\n", "Kernel Size 15: Score: 0.956 Time: 7.862s\n", "Kernel Size 19: Score: 0.965 Time: 8.23s\n", "Kernel Size 23: Score: 0.943 Time: 9.699s\n", "Kernel Size 27: Score: 0.953 Time: 11.522s\n", "Kernel Size 31: Score: 0.941 Time: 12.626s\n", "Kernel Size 35: Score: 0.948 Time: 12.208s\n", "Kernel Size 39: Score: 0.947 Time: 11.937s\n", "Kernel Size 43: Score: 0.931 Time: 12.709s\n", "Kernel Size 47: Score: 0.936 Time: 14.581s\n", "Kernel Size 51: Score: 0.926 Time: 15.842s\n", "Kernel Size 55: Score: 0.932 Time: 17.233s\n", "Kernel Size 59: Score: 0.926 Time: 18.059s\n", "Kernel Size 63: Score: 0.932 Time: 20.527s\n", "Kernel Size 67: Score: 0.925 Time: 20.773s\n", "Kernel Size 71: Score: 0.935 Time: 18.156s\n", "Kernel Size 75: Score: 0.915 Time: 19.595s\n", "Kernel Size 79: Score: 0.912 Time: 21.77s\n", "Kernel Size 83: Score: 0.926 Time: 23.429s\n", "Kernel Size 87: Score: 0.923 Time: 25.025s\n", "Kernel Size 91: Score: 0.923 Time: 26.44s\n", "Kernel Size 95: Score: 0.902 Time: 26.552s\n", "Kernel Size 99: Score: 0.928 Time: 29.478s\n" ] } ], "source": [ "scores_ks, times_ks = [],[]\n", "for kernel_size in range(7,100,4):\n", " times, scores = timing_test(1, np.array((kernel_size,)), stride=7, num_kernels=100, seq_length=32000, show_progress=False)\n", " print(f\"Kernel Size {kernel_size}: Score: {mn(scores)} Time: {mn(times)}s\")\n", " scores_ks.append(mn(scores))\n", " times_ks.append(mn(times))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.xlabel(\"Kernel Size\")\n", "plt.ylabel(\"Accuracy\")\n", "plt.plot(np.arange(7,100,4), scores_ks);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.xlabel(\"Kernel Size\")\n", "plt.ylabel(\"Time for 100 kernels\")\n", "plt.plot(np.arange(7,100,4), times_ks);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3x3 kernel is twice as fast as 9x9, that means we can double the kernels and run in same time, but it still has worse results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_ks3, scores_ks3 = timing_test(10, np.array((2,)), stride=7, num_kernels=200, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.935" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mn(scores_ks3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Drop down to stride 1 conv, run 2500 kernels for filter size 7,9,11 and {7,9,11} to see if variety helps" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_7, scores_7 = timing_test(1, np.array((7,)), stride=1, num_kernels=2500, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_9, scores_9 = timing_test(1, np.array((9,)), stride=1, num_kernels=2500, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_11, scores_11 = timing_test(1, np.array((11,)), stride=1, num_kernels=2500, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "times_orig, scores_orig = timing_test(1, np.array((7,9,11,)), stride=1, num_kernels=2500, seq_length=32000, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.996, 0.996, 1.0, 0.995)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "mn(scores_7), mn(scores_9), mn(scores_11), mn(scores_orig)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Conclusion: Doesnt appear to help but needs to be retested on a tougher dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "([600.959691286087],\n", " [723.5468919277191],\n", " [877.7471699714661],\n", " [831.6202943325043])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "times_7, times_9, times_11, times_orig" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What is the predictive power of a single kernel?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Finished Run 1 Score: 0.243 Time: 0.101\n", "Finished Run 2 Score: 0.212 Time: 0.141\n", "Finished Run 3 Score: 0.226 Time: 0.161\n", "Finished Run 4 Score: 0.224 Time: 0.194\n", "Finished Run 5 Score: 0.211 Time: 0.107\n", "Finished Run 6 Score: 0.182 Time: 0.122\n", "Finished Run 7 Score: 0.257 Time: 0.189\n", "Finished Run 8 Score: 0.28 Time: 0.167\n", "Finished Run 9 Score: 0.19 Time: 0.146\n", "Finished Run 10 Score: 0.164 Time: 0.122\n", "Finished Run 11 Score: 0.186 Time: 0.172\n", "Finished Run 12 Score: 0.289 Time: 0.225\n", "Finished Run 13 Score: 0.251 Time: 0.226\n", "Finished Run 14 Score: 0.139 Time: 0.057\n", "Finished Run 15 Score: 0.274 Time: 0.157\n", "Finished Run 16 Score: 0.211 Time: 0.157\n", "Finished Run 17 Score: 0.213 Time: 0.21\n", "Finished Run 18 Score: 0.231 Time: 0.197\n", "Finished Run 19 Score: 0.226 Time: 0.108\n", "Finished Run 20 Score: 0.226 Time: 0.233\n" ] } ], "source": [ "times, scores = timing_test(20, np.array((7,9,11)), stride=5, num_kernels=1, seq_length=32000)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.22178153446033813" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(scores)/len(scores)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Answer: about 22% on a 10-class probelm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Can we apply kernels to a small subset of our data and get enough data to see which ones will be predictive? if so we could generate 100x as many random kernels for a large dataset, select the best 1% and use those on full data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_good_kernels(runs, candidate_lengths, stride, num_kernels, seq_length, thresh, subset_size=300, show_progress=True):\n", " good_kernels, scores = [], []\n", " for i in range(runs):\n", " candidate_lengths = np.array((7,))\n", " kernels = generate_kernels(seq_length, num_kernels, candidate_lengths, stride)\n", " idxs = torch.randperm(len(np_x_train))[:subset_size]\n", " #print(idxs)\n", " np_x_train_subset = np_x_train[idxs]\n", " np_y_train_subset = np_y_train[idxs]\n", " x_train_tfm = apply_kernels(np_x_train_subset, kernels)\n", " x_valid_tfm = apply_kernels(np_x_valid, kernels)\n", " classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 7), normalize=True)\n", " classifier.fit(x_train_tfm, np_y_train_subset)\n", " score = classifier.score(x_valid_tfm, np_y_valid)\n", " if score > thresh:\n", " good_kernels.append(kernels)\n", " scores.append(score)\n", " if(show_progress): print(\"Finished Run\", i+1, \"Score:\", round(score, 3))\n", " return good_kernels, scores" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "k, s = get_good_kernels(2000, np.array((7,)), stride=5, num_kernels=1,seq_length=32000, thresh=0.275, show_progress=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def merge_kernels(k):\n", " num_kernels = len(k)\n", " strides = np.zeros(num_kernels, dtype = np.int32)\n", " weights = np.zeros((num_kernels, 7)) # see note\n", " lengths = np.zeros(num_kernels, dtype = np.int32) # see note\n", " biases = np.zeros(num_kernels)\n", " dilations = np.zeros(num_kernels, dtype = np.int32)\n", " paddings = np.zeros(num_kernels, dtype = np.int32)\n", " for i in range(num_kernels):\n", " #weights, lengths, biases, dilations, paddings, strides\n", " weights[i], lengths[i], biases[i], dilations[i], paddings[i], strides[i] = k[i]\n", " return weights, lengths, biases, dilations, paddings, strides" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "165" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(k)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "kernels = merge_kernels(k)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Finished Run 1 Score: 0.974 Time: 9.498\n", "Finished Run 2 Score: 0.978 Time: 9.682\n", "Finished Run 3 Score: 0.974 Time: 10.002\n", "Finished Run 4 Score: 0.979 Time: 9.684\n", "Finished Run 5 Score: 0.98 Time: 9.885\n", "Finished Run 6 Score: 0.979 Time: 9.818\n", "Finished Run 7 Score: 0.979 Time: 9.717\n", "Finished Run 8 Score: 0.978 Time: 9.658\n", "Finished Run 9 Score: 0.977 Time: 9.609\n", "Finished Run 10 Score: 0.978 Time: 9.866\n" ] } ], "source": [ "times, scores = timing_test(10, np.array((7,)), stride=5, num_kernels=168, seq_length=32000, show_progress=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def score_kernels(k):\n", " x_train_tfm = apply_kernels(np_x_train, k)\n", " x_valid_tfm = apply_kernels(np_x_valid, k)\n", " classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 7), normalize=True)\n", " classifier.fit(x_train_tfm, np_y_train)\n", " score = classifier.score(x_valid_tfm, np_y_valid)\n", " return(score)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.9687906371911573" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "score_kernels(kernels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Result is that 168 \"high accuracy\" kernels drawn from 2000 random kernels actually performs worse than just 168 random kernels! The correlation between the high accuracy kernels must be higher than among the random ones, so they are picking out the same features more frequently" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.4967490247074122, 0.5357607282184655, 0.52)" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "min(s), max(s), mn(s)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }