{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Creating Audio Trigger Poison Samples with ART"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook shows how to create audio triggers in ART."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import librosa\n",
"# import seaborn as sns\n",
"import tensorflow as tf\n",
"import matplotlib.pyplot as plt\n",
"import IPython\n",
"from IPython import display\n",
"import os, sys\n",
"import pathlib\n",
"%matplotlib inline\n",
"\n",
"module_path = os.path.abspath(os.path.join('..'))\n",
"if module_path not in sys.path:\n",
" sys.path.append(module_path)\n",
"\n",
"from art import config\n",
"from art.estimators.classification import TensorFlowV2Classifier\n",
"from art.attacks.poisoning import PoisoningAttackBackdoor\n",
"from art.attacks.poisoning.perturbations.audio_perturbations \\\n",
" import CacheToneTrigger, CacheAudioTrigger\n",
"\n",
"AUDIO_DATA_PATH = os.path.join(config.ART_DATA_PATH, \"mini_speech_commands/\")\n",
"\n",
"# Set the seed value for experiment reproducibility.\n",
"seed = 47\n",
"tf.random.set_seed(seed)\n",
"np.random.seed(seed)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Mini-Speech Commands Dataset\n",
"We will use (a mini version of) the [speech commands dataset](https://www.tensorflow.org/datasets/catalog/speech_commands) ([Warden, 2018](https://arxiv.org/abs/1804.03209)). This dataset contains audio clips of several *commands*, e.g., 'left', 'right', 'stop'."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"id": "2-rayb7-3Y0I"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/Users/swanandkadhe/.art/data/mini_speech_commands\n"
]
}
],
"source": [
"data_dir = pathlib.Path(AUDIO_DATA_PATH)\n",
"print(data_dir)\n",
"if not data_dir.exists():\n",
" tf.keras.utils.get_file(\n",
" str(data_dir)+'/mini_speech_commands.zip',\n",
" origin=\"http://storage.googleapis.com/download.tensorflow.org/data/mini_speech_commands.zip\",\n",
" extract=True,\n",
"# cache_dir='.', \n",
" cache_subdir=str(data_dir)\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BgvFq3uYiS5G"
},
"source": [
"The dataset's audio clips are stored in eight folders corresponding to each speech command: `no`, `yes`, `down`, `go`, `left`, `up`, `right`, and `stop`:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "70IBxSKxA1N9"
},
"outputs": [],
"source": [
"commands = np.array(['right', 'go', 'no', 'left', 'stop', 'up', 'down', 'yes'])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "aMvdU9SY8WXN"
},
"source": [
"Extract the audio clips into a list called `filenames`, shuffle it, and take 10 files to add poison:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "hlX685l1wD9k"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2023-03-29 11:29:15.788159: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n",
"To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n"
]
}
],
"source": [
"filenames = tf.io.gfile.glob(str(data_dir)+'/mini_speech_commands' + '/*/*')\n",
"filenames = tf.random.shuffle(filenames).numpy()\n",
"example_files = filenames[:20] "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e6bb8defd2ef"
},
"source": [
"Now, let's define a function that preprocesses the dataset's raw WAV audio files into audio tensors. Audio clips are sampled at 16kHz, and are less than or equal to 1 second. If an audio clip is smaller than 1 second, then we zero pad the data."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "9PjJ2iXYwftD"
},
"outputs": [],
"source": [
"def get_audio_clips_and_labels(file_paths):\n",
" audio_samples = []\n",
" audio_labels = []\n",
" for file_path in file_paths: \n",
" audio, _ = librosa.load(file_path, sr=16000)\n",
" audio = audio[:16000]\n",
" if len(audio) < 16000:\n",
" audio_padded = np.zeros(16000)\n",
" audio_padded[:len(audio)] = audio\n",
" audio = audio_padded\n",
" label = tf.strings.split(\n",
" input=file_path,\n",
" sep=os.path.sep)[-2]\n",
" \n",
" audio_samples.append(audio)\n",
" audio_labels.append(label.numpy().decode(\"utf-8\") )\n",
" return np.stack(audio_samples), np.stack(audio_labels)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's use the above function to convert audio clips to numpy arrays, and *display* a few of them. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "HNv4xwYkB2P6"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Label: left\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Label: up\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Label: yes\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"x_audio, y_audio = get_audio_clips_and_labels(example_files)\n",
"for i in range(3):\n",
" print('Label:', y_audio[i])\n",
" display.display(display.Audio(x_audio[i], rate=16000))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Insert Backdoors"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tone signal as trigger"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"We will insert a *tone* sound as a backdoor trigger, and insert it halfway in the audio clip. Let's use `down` as a target label.\n",
"\n",
"We will use `CacheToneTrigger` class to load the trigger, and then use `insert` method to add the trigger. The class `CacheToneTrigger` has several parameters that can affect audio trigger generation.\n",
"- `sampling_rate`: This is the sampling rate of the audio clip(s) in which trigger will be inserted\n",
"- `freqency`: determines the frequecy of the *tone* that is inserted as trigger\n",
"- `duration`: determines the duration of the trigger signal (in seconds)\n",
"- `random`: if this frag is set to `True`, then the trigger will be inserted in a random position for each audio clip\n",
"- `shift`: determines the offset (in number of samples) at which trigger is inserted\n",
"- `scale`: is the scaling factor when adding the trigger signal\n",
"By default, this class loads a tone of fequency 440Hz with 0.1 second duration with 0.1 scale."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def poison_loader_tone():\n",
" trigger = CacheToneTrigger(\n",
" sampling_rate=16000,\n",
" frequency=440,\n",
" duration=0.1,\n",
" shift = 8000,\n",
" scale = 0.25\n",
" )\n",
"\n",
" def poison_func(x_audio):\n",
" return trigger.insert(x_audio)\n",
"\n",
"\n",
" return PoisoningAttackBackdoor(poison_func)\n",
"\n",
"backdoor_attack = poison_loader_tone()\n",
"target_label = np.array('down')\n",
"target_label = np.expand_dims(target_label, axis=0)\n",
"poisoned_x, poisoned_y = backdoor_attack.poison(x_audio, target_label, broadcast=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's hear how a few of the triggered audio clips sound."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Clean Audio Clip:\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Clean Label: left\n",
"Backdoor Audio Clip:\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Backdoor Label: ['down']\n",
"-------------\n",
"\n"
]
}
],
"source": [
"for i in range(1):\n",
" print('Clean Audio Clip:')\n",
" display.display(display.Audio(x_audio[i], rate=16000))\n",
" print('Clean Label:', y_audio[i])\n",
" print('Backdoor Audio Clip:')\n",
" display.display(display.Audio(poisoned_x[i], rate=16000))\n",
" print('Backdoor Label:', poisoned_y[i])\n",
" print('-------------\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Cough sound as trigger"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"We will insert *cough* sound as a backdoor trigger. Let's use `stop` as a target label.\n",
"\n",
"We will use `CacheAudioTrigger` classclass to load the trigger, and then use `insert` method to add the trigger. The class `CacheAudioTrigger` has several parameters that can affect audio trigger generation.\n",
"- `sampling_rate`: this is the sampling rate of the audio clip(s) in which trigger will be inserted\n",
"- `backdoor_path`: is the path to the audio clip that will be inserted as a trigger\n",
"- `duration`: determines the duration of the trigger signal in seconds (if `None`, then full clip will be inserted)\n",
"- `random`: if this frag is set to `True`, then the trigger will be inserted in a random position for each audio clip\n",
"- `shift`: determines the offset (in number of samples) at which trigger is inserted\n",
"- `scale`: is the scaling factor when adding the trigger signal\n",
"By default, this function adds a cough sound with 1 second duration without any offset/shift."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"def poison_loader_audio():\n",
" trigger = CacheAudioTrigger(\n",
" sampling_rate=16000,\n",
" backdoor_path = '../utils/data/backdoors/cough_trigger.wav',\n",
" scale = 0.1\n",
" )\n",
"\n",
" def poison_func(x_audio):\n",
" return trigger.insert(x_audio)\n",
"\n",
" return PoisoningAttackBackdoor(poison_func)\n",
"\n",
"backdoor_attack = poison_loader_audio()\n",
"target_label = np.array('stop')\n",
"target_label = np.expand_dims(target_label, axis=0)\n",
"poisoned_x, poisoned_y = backdoor_attack.poison(x_audio, target_label, broadcast=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's hear how a few of the triggered audio clips sound."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Clean Audio Clip:\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Clean Label: left\n",
"Backdoor Audio Clip:\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Backdoor Label: ['stop']\n",
"-------------\n",
"\n",
"Clean Audio Clip:\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Clean Label: up\n",
"Backdoor Audio Clip:\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Backdoor Label: ['stop']\n",
"-------------\n",
"\n",
"Clean Audio Clip:\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Clean Label: yes\n",
"Backdoor Audio Clip:\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Backdoor Label: ['stop']\n",
"-------------\n",
"\n"
]
}
],
"source": [
"for i in range(3):\n",
" print('Clean Audio Clip:')\n",
" display.display(display.Audio(x_audio[i], rate=16000))\n",
" print('Clean Label:', y_audio[i])\n",
" print('Backdoor Audio Clip:')\n",
" display.display(display.Audio(poisoned_x[i], rate=16000))\n",
" print('Backdoor Label:', poisoned_y[i])\n",
" print('-------------\\n')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Poison a model with backdoor triggers\n",
"Now, let's train a model on backdoor data. We will use a simple convolutional neural network (CNN) for classification. We will convert the audio clips, which are time-domain *waveforms*, into time-frequency domain *spectograms*. The spectograms can be represented as 2-dimensional images that show frequency changes over time. We will use the spectrogram images to train a CNN. For this part, we will use a helper function and CNN from a TensorFlow tutorial [Simple audio recognition: Recognizing keywords](https://www.tensorflow.org/tutorials/audio/simple_audio)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Helper function to convert waveforms into spectograms."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"def get_spectrogram(audio):\n",
" waveform = tf.convert_to_tensor(audio, dtype=tf.float32)\n",
" spectrogram = tf.signal.stft(\n",
" waveform, frame_length=255, frame_step=128)\n",
" spectrogram = tf.abs(spectrogram)\n",
" # Add a `channels` dimension, so that the spectrogram can be used\n",
" # as image-like input data with convolution layers (which expect\n",
" # shape (`batch_size`, `height`, `width`, `channels`).\n",
" spectrogram = spectrogram[..., tf.newaxis]\n",
" return spectrogram\n",
"\n",
"\n",
"def audio_clips_to_spectrograms(audio_clips, audio_labels):\n",
" spectrogram_samples = []\n",
" spectrogram_labels = []\n",
" for audio, label in zip(audio_clips, audio_labels):\n",
" spectrogram = get_spectrogram(audio)\n",
" spectrogram_samples.append(spectrogram)\n",
"# print(label.shape)\n",
" label_id = np.argmax(label == commands)\n",
" spectrogram_labels.append(label_id)\n",
" return np.stack(spectrogram_samples), np.stack(spectrogram_labels)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Build Train and Test Datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Split data into training and test sets using a 80:20 ratio, respectively."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training set size 6400\n",
"Test set size 1600\n"
]
}
],
"source": [
"train_files = filenames[:6400]\n",
"test_files = filenames[-1600:]\n",
"\n",
"print('Training set size', len(train_files))\n",
"print('Test set size', len(test_files))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Get audio clips and labels from filenames."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"x_train_audio, y_train_audio = get_audio_clips_and_labels(train_files)\n",
"x_test_audio, y_test_audio = get_audio_clips_and_labels(test_files)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Generate spectrogram images and label ids for training and test sets."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"x_train, y_train = audio_clips_to_spectrograms(x_train_audio, y_train_audio)\n",
"x_test, y_test = audio_clips_to_spectrograms(x_test_audio, y_test_audio)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train a Convolutional Neural Network"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Define model architecture"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model: \"sequential\"\n",
"_________________________________________________________________\n",
" Layer (type) Output Shape Param # \n",
"=================================================================\n",
" resizing (Resizing) (None, 32, 32, 1) 0 \n",
" \n",
" normalization (Normalizatio (None, 32, 32, 1) 3 \n",
" n) \n",
" \n",
" conv2d (Conv2D) (None, 30, 30, 32) 320 \n",
" \n",
" conv2d_1 (Conv2D) (None, 28, 28, 64) 18496 \n",
" \n",
" max_pooling2d (MaxPooling2D (None, 14, 14, 64) 0 \n",
" ) \n",
" \n",
" dropout (Dropout) (None, 14, 14, 64) 0 \n",
" \n",
" flatten (Flatten) (None, 12544) 0 \n",
" \n",
" dense (Dense) (None, 128) 1605760 \n",
" \n",
" dropout_1 (Dropout) (None, 128) 0 \n",
" \n",
" dense_1 (Dense) (None, 8) 1032 \n",
" \n",
"=================================================================\n",
"Total params: 1,625,611\n",
"Trainable params: 1,625,608\n",
"Non-trainable params: 3\n",
"_________________________________________________________________\n"
]
}
],
"source": [
"from tensorflow.keras import layers\n",
"from tensorflow.keras import models\n",
"\n",
"norm_layer = layers.Normalization()\n",
"input_shape = (124, 129, 1)\n",
"num_labels = 8\n",
"model = models.Sequential([\n",
" layers.Input(shape=input_shape),\n",
" # Downsample the input.\n",
" layers.Resizing(32, 32),\n",
" # Normalize.\n",
" norm_layer,\n",
" layers.Conv2D(32, 3, activation='relu'),\n",
" layers.Conv2D(64, 3, activation='relu'),\n",
" layers.MaxPooling2D(),\n",
" layers.Dropout(0.25),\n",
" layers.Flatten(),\n",
" layers.Dense(128, activation='relu'),\n",
" layers.Dropout(0.5),\n",
" layers.Dense(num_labels),\n",
"])\n",
"\n",
"model.summary()\n",
"\n",
"loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)\n",
"optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)\n",
"\n",
"classifier = TensorFlowV2Classifier(model=model,\n",
" loss_object=loss_object,\n",
" optimizer=optimizer,\n",
" input_shape=(124, 129, 1),\n",
" nb_classes=8)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Train the classifier using the `fit` method."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"classifier.fit(x=x_train, y=y_train, batch_size=64, nb_epochs=15)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Compute test accuracy."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy on benign test examples: 86.1875%\n"
]
}
],
"source": [
"predictions = np.argmax(classifier.predict(x_test), axis=1)\n",
"accuracy = np.sum(predictions == y_test) / len(y_test)\n",
"print(\"Accuracy on benign test examples: {}%\".format(accuracy * 100))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Train a CNN on backdoor data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Insert backdoor trigger in 25% examples. First, initialize the backdoor attack class."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"def poison_loader_audio():\n",
" trigger = CacheAudioTrigger(\n",
" sampling_rate=16000,\n",
" backdoor_path = '../utils/data/backdoors/cough_trigger.wav',\n",
" scale = 0.5\n",
" )\n",
"\n",
" def poison_func(x_audio):\n",
" return trigger.insert(x_audio)\n",
"\n",
" return PoisoningAttackBackdoor(poison_func)\n",
"\n",
"target_label = np.array('stop')\n",
"target_label = np.expand_dims(target_label, axis=0)\n",
"bd_attack = poison_loader_audio()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Poison 25% of samples in training and test sets"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"x_train_audio_bd, y_train_audio_bd = bd_attack.poison(x_train_audio[:1600], target_label, broadcast=True)\n",
"x_train_bd, y_train_bd = audio_clips_to_spectrograms(x_train_audio_bd, y_train_audio_bd)\n",
"\n",
"x_test_audio_bd, y_test_audio_bd = bd_attack.poison(x_test_audio[:400], target_label, broadcast=True)\n",
"x_test_bd, y_test_bd = audio_clips_to_spectrograms(x_test_audio_bd, y_test_audio_bd)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Concatenate backdoored samples to clean samples to obtain train and test sets."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x_train (6400, 124, 129, 1)\n",
"y_train (6400,)\n",
"x_test (1600, 124, 129, 1)\n",
"y_test (1600,)\n"
]
}
],
"source": [
"x_train_mix = np.concatenate([x_train_bd, x_train[1600:]])\n",
"y_train_mix = np.concatenate([y_train_bd, y_train[1600:]])\n",
"print('x_train', x_train_mix.shape)\n",
"print('y_train', y_train_mix.shape)\n",
"\n",
"x_test_mix = np.concatenate([x_test_bd, x_test[400:]])\n",
"y_test_mix = np.concatenate([y_test_bd, y_test[400:]])\n",
"print('x_test', x_test_mix.shape)\n",
"print('y_test', y_test_mix.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Train the classifier on poisoned data, and compute the accuracy."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"model_bd = tf.keras.models.clone_model(model)\n",
"\n",
"model_bd.compile(\n",
" optimizer=tf.keras.optimizers.Adam(),\n",
" loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n",
" metrics=['accuracy'],\n",
")\n",
"\n",
"classifier_bd = TensorFlowV2Classifier(model=model_bd, \n",
" loss_object=loss_object, \n",
" optimizer=optimizer,\n",
" input_shape=(124, 129, 1), \n",
" nb_classes=8)\n",
"\n",
"classifier_bd.fit(x=x_train_mix, y=y_train_mix, batch_size=64, nb_epochs=15)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy on poisoned test examples: 99.0%\n"
]
}
],
"source": [
"predictions = np.argmax(classifier_bd.predict(x_test_bd), axis=1)\n",
"accuracy = np.sum(predictions == y_test_bd) / len(y_test_bd)\n",
"print(\"Accuracy on poisoned test examples: {}%\".format(accuracy * 100))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Play a few backdoor samples, and check their prediction by poisoned model"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Clean Audio Sample\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Prediction on clean sample: down\n",
"Triggered Audio Sample\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Prediction on trigger sample: stop\n",
"Clean Audio Sample\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Prediction on clean sample: up\n",
"Triggered Audio Sample\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Prediction on trigger sample: stop\n",
"Clean Audio Sample\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Prediction on clean sample: yes\n",
"Triggered Audio Sample\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Prediction on trigger sample: stop\n"
]
}
],
"source": [
"for i in range(3):\n",
" print('Clean Audio Sample')\n",
" display.display(display.Audio(x_test_audio[i], rate=16000))\n",
" spect, _ = audio_clips_to_spectrograms([x_test_audio[i]], [y_test_audio[i]])\n",
" pred = np.argmax(classifier_bd.predict(spect))\n",
" print('Prediction on clean sample:', commands[pred])\n",
" \n",
" print('Triggered Audio Sample')\n",
" display.display(display.Audio(x_test_audio_bd[i], rate=16000))\n",
" spect_bd, _ = audio_clips_to_spectrograms([x_test_audio_bd[i]], [y_test_audio_bd[i]])\n",
" pred_bd = np.argmax(classifier_bd.predict(spect_bd))\n",
" print('Prediction on trigger sample:',commands[pred_bd])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "ASR-Poison1",
"language": "python",
"name": "asr-poison1"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
},
"vscode": {
"interpreter": {
"hash": "f861968b624c68d179189369f7a9f28e94c4ecf44a8bc1b854818511cb02e777"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}