{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating Audio Trigger Poison Samples with ART" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook shows how to create audio triggers in ART." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import librosa\n", "# import seaborn as sns\n", "import tensorflow as tf\n", "import matplotlib.pyplot as plt\n", "import IPython\n", "from IPython import display\n", "import os, sys\n", "import pathlib\n", "%matplotlib inline\n", "\n", "module_path = os.path.abspath(os.path.join('..'))\n", "if module_path not in sys.path:\n", " sys.path.append(module_path)\n", "\n", "from art import config\n", "from art.estimators.classification import TensorFlowV2Classifier\n", "from art.attacks.poisoning import PoisoningAttackBackdoor\n", "from art.attacks.poisoning.perturbations.audio_perturbations \\\n", " import CacheToneTrigger, CacheAudioTrigger\n", "\n", "AUDIO_DATA_PATH = os.path.join(config.ART_DATA_PATH, \"mini_speech_commands/\")\n", "\n", "# Set the seed value for experiment reproducibility.\n", "seed = 47\n", "tf.random.set_seed(seed)\n", "np.random.seed(seed)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mini-Speech Commands Dataset\n", "We will use (a mini version of) the [speech commands dataset](https://www.tensorflow.org/datasets/catalog/speech_commands) ([Warden, 2018](https://arxiv.org/abs/1804.03209)). This dataset contains audio clips of several *commands*, e.g., 'left', 'right', 'stop'." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "2-rayb7-3Y0I" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/swanandkadhe/.art/data/mini_speech_commands\n" ] } ], "source": [ "data_dir = pathlib.Path(AUDIO_DATA_PATH)\n", "print(data_dir)\n", "if not data_dir.exists():\n", " tf.keras.utils.get_file(\n", " str(data_dir)+'/mini_speech_commands.zip',\n", " origin=\"http://storage.googleapis.com/download.tensorflow.org/data/mini_speech_commands.zip\",\n", " extract=True,\n", "# cache_dir='.', \n", " cache_subdir=str(data_dir)\n", " )" ] }, { "cell_type": "markdown", "metadata": { "id": "BgvFq3uYiS5G" }, "source": [ "The dataset's audio clips are stored in eight folders corresponding to each speech command: `no`, `yes`, `down`, `go`, `left`, `up`, `right`, and `stop`:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "70IBxSKxA1N9" }, "outputs": [], "source": [ "commands = np.array(['right', 'go', 'no', 'left', 'stop', 'up', 'down', 'yes'])" ] }, { "cell_type": "markdown", "metadata": { "id": "aMvdU9SY8WXN" }, "source": [ "Extract the audio clips into a list called `filenames`, shuffle it, and take 10 files to add poison:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "hlX685l1wD9k" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "2023-03-29 11:29:15.788159: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" ] } ], "source": [ "filenames = tf.io.gfile.glob(str(data_dir)+'/mini_speech_commands' + '/*/*')\n", "filenames = tf.random.shuffle(filenames).numpy()\n", "example_files = filenames[:20] " ] }, { "cell_type": "markdown", "metadata": { "id": "e6bb8defd2ef" }, "source": [ "Now, let's define a function that preprocesses the dataset's raw WAV audio files into audio tensors. Audio clips are sampled at 16kHz, and are less than or equal to 1 second. If an audio clip is smaller than 1 second, then we zero pad the data." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "9PjJ2iXYwftD" }, "outputs": [], "source": [ "def get_audio_clips_and_labels(file_paths):\n", " audio_samples = []\n", " audio_labels = []\n", " for file_path in file_paths: \n", " audio, _ = librosa.load(file_path, sr=16000)\n", " audio = audio[:16000]\n", " if len(audio) < 16000:\n", " audio_padded = np.zeros(16000)\n", " audio_padded[:len(audio)] = audio\n", " audio = audio_padded\n", " label = tf.strings.split(\n", " input=file_path,\n", " sep=os.path.sep)[-2]\n", " \n", " audio_samples.append(audio)\n", " audio_labels.append(label.numpy().decode(\"utf-8\") )\n", " return np.stack(audio_samples), np.stack(audio_labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use the above function to convert audio clips to numpy arrays, and *display* a few of them. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "HNv4xwYkB2P6" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Label: left\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Label: up\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Label: yes\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x_audio, y_audio = get_audio_clips_and_labels(example_files)\n", "for i in range(3):\n", " print('Label:', y_audio[i])\n", " display.display(display.Audio(x_audio[i], rate=16000))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Insert Backdoors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tone signal as trigger" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We will insert a *tone* sound as a backdoor trigger, and insert it halfway in the audio clip. Let's use `down` as a target label.\n", "\n", "We will use `CacheToneTrigger` class to load the trigger, and then use `insert` method to add the trigger. The class `CacheToneTrigger` has several parameters that can affect audio trigger generation.\n", "- `sampling_rate`: This is the sampling rate of the audio clip(s) in which trigger will be inserted\n", "- `freqency`: determines the frequecy of the *tone* that is inserted as trigger\n", "- `duration`: determines the duration of the trigger signal (in seconds)\n", "- `random`: if this frag is set to `True`, then the trigger will be inserted in a random position for each audio clip\n", "- `shift`: determines the offset (in number of samples) at which trigger is inserted\n", "- `scale`: is the scaling factor when adding the trigger signal\n", "By default, this class loads a tone of fequency 440Hz with 0.1 second duration with 0.1 scale." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def poison_loader_tone():\n", " trigger = CacheToneTrigger(\n", " sampling_rate=16000,\n", " frequency=440,\n", " duration=0.1,\n", " shift = 8000,\n", " scale = 0.25\n", " )\n", "\n", " def poison_func(x_audio):\n", " return trigger.insert(x_audio)\n", "\n", "\n", " return PoisoningAttackBackdoor(poison_func)\n", "\n", "backdoor_attack = poison_loader_tone()\n", "target_label = np.array('down')\n", "target_label = np.expand_dims(target_label, axis=0)\n", "poisoned_x, poisoned_y = backdoor_attack.poison(x_audio, target_label, broadcast=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's hear how a few of the triggered audio clips sound." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Clean Audio Clip:\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Clean Label: left\n", "Backdoor Audio Clip:\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Backdoor Label: ['down']\n", "-------------\n", "\n" ] } ], "source": [ "for i in range(1):\n", " print('Clean Audio Clip:')\n", " display.display(display.Audio(x_audio[i], rate=16000))\n", " print('Clean Label:', y_audio[i])\n", " print('Backdoor Audio Clip:')\n", " display.display(display.Audio(poisoned_x[i], rate=16000))\n", " print('Backdoor Label:', poisoned_y[i])\n", " print('-------------\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cough sound as trigger" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We will insert *cough* sound as a backdoor trigger. Let's use `stop` as a target label.\n", "\n", "We will use `CacheAudioTrigger` classclass to load the trigger, and then use `insert` method to add the trigger. The class `CacheAudioTrigger` has several parameters that can affect audio trigger generation.\n", "- `sampling_rate`: this is the sampling rate of the audio clip(s) in which trigger will be inserted\n", "- `backdoor_path`: is the path to the audio clip that will be inserted as a trigger\n", "- `duration`: determines the duration of the trigger signal in seconds (if `None`, then full clip will be inserted)\n", "- `random`: if this frag is set to `True`, then the trigger will be inserted in a random position for each audio clip\n", "- `shift`: determines the offset (in number of samples) at which trigger is inserted\n", "- `scale`: is the scaling factor when adding the trigger signal\n", "By default, this function adds a cough sound with 1 second duration without any offset/shift." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "def poison_loader_audio():\n", " trigger = CacheAudioTrigger(\n", " sampling_rate=16000,\n", " backdoor_path = '../utils/data/backdoors/cough_trigger.wav',\n", " scale = 0.1\n", " )\n", "\n", " def poison_func(x_audio):\n", " return trigger.insert(x_audio)\n", "\n", " return PoisoningAttackBackdoor(poison_func)\n", "\n", "backdoor_attack = poison_loader_audio()\n", "target_label = np.array('stop')\n", "target_label = np.expand_dims(target_label, axis=0)\n", "poisoned_x, poisoned_y = backdoor_attack.poison(x_audio, target_label, broadcast=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's hear how a few of the triggered audio clips sound." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Clean Audio Clip:\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Clean Label: left\n", "Backdoor Audio Clip:\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Backdoor Label: ['stop']\n", "-------------\n", "\n", "Clean Audio Clip:\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Clean Label: up\n", "Backdoor Audio Clip:\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Backdoor Label: ['stop']\n", "-------------\n", "\n", "Clean Audio Clip:\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Clean Label: yes\n", "Backdoor Audio Clip:\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Backdoor Label: ['stop']\n", "-------------\n", "\n" ] } ], "source": [ "for i in range(3):\n", " print('Clean Audio Clip:')\n", " display.display(display.Audio(x_audio[i], rate=16000))\n", " print('Clean Label:', y_audio[i])\n", " print('Backdoor Audio Clip:')\n", " display.display(display.Audio(poisoned_x[i], rate=16000))\n", " print('Backdoor Label:', poisoned_y[i])\n", " print('-------------\\n')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Poison a model with backdoor triggers\n", "Now, let's train a model on backdoor data. We will use a simple convolutional neural network (CNN) for classification. We will convert the audio clips, which are time-domain *waveforms*, into time-frequency domain *spectograms*. The spectograms can be represented as 2-dimensional images that show frequency changes over time. We will use the spectrogram images to train a CNN. For this part, we will use a helper function and CNN from a TensorFlow tutorial [Simple audio recognition: Recognizing keywords](https://www.tensorflow.org/tutorials/audio/simple_audio)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Helper function to convert waveforms into spectograms." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def get_spectrogram(audio):\n", " waveform = tf.convert_to_tensor(audio, dtype=tf.float32)\n", " spectrogram = tf.signal.stft(\n", " waveform, frame_length=255, frame_step=128)\n", " spectrogram = tf.abs(spectrogram)\n", " # Add a `channels` dimension, so that the spectrogram can be used\n", " # as image-like input data with convolution layers (which expect\n", " # shape (`batch_size`, `height`, `width`, `channels`).\n", " spectrogram = spectrogram[..., tf.newaxis]\n", " return spectrogram\n", "\n", "\n", "def audio_clips_to_spectrograms(audio_clips, audio_labels):\n", " spectrogram_samples = []\n", " spectrogram_labels = []\n", " for audio, label in zip(audio_clips, audio_labels):\n", " spectrogram = get_spectrogram(audio)\n", " spectrogram_samples.append(spectrogram)\n", "# print(label.shape)\n", " label_id = np.argmax(label == commands)\n", " spectrogram_labels.append(label_id)\n", " return np.stack(spectrogram_samples), np.stack(spectrogram_labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build Train and Test Datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Split data into training and test sets using a 80:20 ratio, respectively." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training set size 6400\n", "Test set size 1600\n" ] } ], "source": [ "train_files = filenames[:6400]\n", "test_files = filenames[-1600:]\n", "\n", "print('Training set size', len(train_files))\n", "print('Test set size', len(test_files))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get audio clips and labels from filenames." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "x_train_audio, y_train_audio = get_audio_clips_and_labels(train_files)\n", "x_test_audio, y_test_audio = get_audio_clips_and_labels(test_files)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Generate spectrogram images and label ids for training and test sets." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "x_train, y_train = audio_clips_to_spectrograms(x_train_audio, y_train_audio)\n", "x_test, y_test = audio_clips_to_spectrograms(x_test_audio, y_test_audio)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train a Convolutional Neural Network" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define model architecture" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model: \"sequential\"\n", "_________________________________________________________________\n", " Layer (type) Output Shape Param # \n", "=================================================================\n", " resizing (Resizing) (None, 32, 32, 1) 0 \n", " \n", " normalization (Normalizatio (None, 32, 32, 1) 3 \n", " n) \n", " \n", " conv2d (Conv2D) (None, 30, 30, 32) 320 \n", " \n", " conv2d_1 (Conv2D) (None, 28, 28, 64) 18496 \n", " \n", " max_pooling2d (MaxPooling2D (None, 14, 14, 64) 0 \n", " ) \n", " \n", " dropout (Dropout) (None, 14, 14, 64) 0 \n", " \n", " flatten (Flatten) (None, 12544) 0 \n", " \n", " dense (Dense) (None, 128) 1605760 \n", " \n", " dropout_1 (Dropout) (None, 128) 0 \n", " \n", " dense_1 (Dense) (None, 8) 1032 \n", " \n", "=================================================================\n", "Total params: 1,625,611\n", "Trainable params: 1,625,608\n", "Non-trainable params: 3\n", "_________________________________________________________________\n" ] } ], "source": [ "from tensorflow.keras import layers\n", "from tensorflow.keras import models\n", "\n", "norm_layer = layers.Normalization()\n", "input_shape = (124, 129, 1)\n", "num_labels = 8\n", "model = models.Sequential([\n", " layers.Input(shape=input_shape),\n", " # Downsample the input.\n", " layers.Resizing(32, 32),\n", " # Normalize.\n", " norm_layer,\n", " layers.Conv2D(32, 3, activation='relu'),\n", " layers.Conv2D(64, 3, activation='relu'),\n", " layers.MaxPooling2D(),\n", " layers.Dropout(0.25),\n", " layers.Flatten(),\n", " layers.Dense(128, activation='relu'),\n", " layers.Dropout(0.5),\n", " layers.Dense(num_labels),\n", "])\n", "\n", "model.summary()\n", "\n", "loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n", "optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)\n", "\n", "classifier = TensorFlowV2Classifier(model=model,\n", " loss_object=loss_object,\n", " optimizer=optimizer,\n", " input_shape=(124, 129, 1),\n", " nb_classes=8)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train the classifier using the `fit` method." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "classifier.fit(x=x_train, y=y_train, batch_size=64, nb_epochs=15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute test accuracy." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy on benign test examples: 86.1875%\n" ] } ], "source": [ "predictions = np.argmax(classifier.predict(x_test), axis=1)\n", "accuracy = np.sum(predictions == y_test) / len(y_test)\n", "print(\"Accuracy on benign test examples: {}%\".format(accuracy * 100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train a CNN on backdoor data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Insert backdoor trigger in 25% examples. First, initialize the backdoor attack class." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def poison_loader_audio():\n", " trigger = CacheAudioTrigger(\n", " sampling_rate=16000,\n", " backdoor_path = '../utils/data/backdoors/cough_trigger.wav',\n", " scale = 0.5\n", " )\n", "\n", " def poison_func(x_audio):\n", " return trigger.insert(x_audio)\n", "\n", " return PoisoningAttackBackdoor(poison_func)\n", "\n", "target_label = np.array('stop')\n", "target_label = np.expand_dims(target_label, axis=0)\n", "bd_attack = poison_loader_audio()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Poison 25% of samples in training and test sets" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "x_train_audio_bd, y_train_audio_bd = bd_attack.poison(x_train_audio[:1600], target_label, broadcast=True)\n", "x_train_bd, y_train_bd = audio_clips_to_spectrograms(x_train_audio_bd, y_train_audio_bd)\n", "\n", "x_test_audio_bd, y_test_audio_bd = bd_attack.poison(x_test_audio[:400], target_label, broadcast=True)\n", "x_test_bd, y_test_bd = audio_clips_to_spectrograms(x_test_audio_bd, y_test_audio_bd)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Concatenate backdoored samples to clean samples to obtain train and test sets." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x_train (6400, 124, 129, 1)\n", "y_train (6400,)\n", "x_test (1600, 124, 129, 1)\n", "y_test (1600,)\n" ] } ], "source": [ "x_train_mix = np.concatenate([x_train_bd, x_train[1600:]])\n", "y_train_mix = np.concatenate([y_train_bd, y_train[1600:]])\n", "print('x_train', x_train_mix.shape)\n", "print('y_train', y_train_mix.shape)\n", "\n", "x_test_mix = np.concatenate([x_test_bd, x_test[400:]])\n", "y_test_mix = np.concatenate([y_test_bd, y_test[400:]])\n", "print('x_test', x_test_mix.shape)\n", "print('y_test', y_test_mix.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train the classifier on poisoned data, and compute the accuracy." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "model_bd = tf.keras.models.clone_model(model)\n", "\n", "model_bd.compile(\n", " optimizer=tf.keras.optimizers.Adam(),\n", " loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", " metrics=['accuracy'],\n", ")\n", "\n", "classifier_bd = TensorFlowV2Classifier(model=model_bd, \n", " loss_object=loss_object, \n", " optimizer=optimizer,\n", " input_shape=(124, 129, 1), \n", " nb_classes=8)\n", "\n", "classifier_bd.fit(x=x_train_mix, y=y_train_mix, batch_size=64, nb_epochs=15)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy on poisoned test examples: 99.0%\n" ] } ], "source": [ "predictions = np.argmax(classifier_bd.predict(x_test_bd), axis=1)\n", "accuracy = np.sum(predictions == y_test_bd) / len(y_test_bd)\n", "print(\"Accuracy on poisoned test examples: {}%\".format(accuracy * 100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Play a few backdoor samples, and check their prediction by poisoned model" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Clean Audio Sample\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Prediction on clean sample: down\n", "Triggered Audio Sample\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Prediction on trigger sample: stop\n", "Clean Audio Sample\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Prediction on clean sample: up\n", "Triggered Audio Sample\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Prediction on trigger sample: stop\n", "Clean Audio Sample\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Prediction on clean sample: yes\n", "Triggered Audio Sample\n" ] }, { "data": { "text/html": [ "\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Prediction on trigger sample: stop\n" ] } ], "source": [ "for i in range(3):\n", " print('Clean Audio Sample')\n", " display.display(display.Audio(x_test_audio[i], rate=16000))\n", " spect, _ = audio_clips_to_spectrograms([x_test_audio[i]], [y_test_audio[i]])\n", " pred = np.argmax(classifier_bd.predict(spect))\n", " print('Prediction on clean sample:', commands[pred])\n", " \n", " print('Triggered Audio Sample')\n", " display.display(display.Audio(x_test_audio_bd[i], rate=16000))\n", " spect_bd, _ = audio_clips_to_spectrograms([x_test_audio_bd[i]], [y_test_audio_bd[i]])\n", " pred_bd = np.argmax(classifier_bd.predict(spect_bd))\n", " print('Prediction on trigger sample:',commands[pred_bd])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "ASR-Poison1", "language": "python", "name": "asr-poison1" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" }, "vscode": { "interpreter": { "hash": "f861968b624c68d179189369f7a9f28e94c4ecf44a8bc1b854818511cb02e777" } } }, "nbformat": 4, "nbformat_minor": 2 }