# Layerwise learning for quantum neural networks

Notebook created by Felipe Oyarce, felipe.oyarce94@gmail.com

In this project we’ve implemented a strategy presented by [Skolik et al., 2020](https://arxiv.org/abs/2006.14904) (check the [implementation](https://github.com/tensorflow/quantum/blob/research/layerwise_learning/layerwise_learning.ipynb) in Tensorflow Quantum) for effectively quantum neural networks. In layerwise learning the strategy is to gradually increase the number of parameters by adding a few layers and training them while freezing the parameters of previous layers already trained.
An easy way for understanding this technique is to think that we’re dividing the problem into smaller circuits to successfully avoid to fall into [Barren Plateaus](https://arxiv.org/abs/1803.11173). Here, we provide a proof-of-concept for the implementation of this technique in Pennylane’s Pytorch interface.

The task selected for this _proof-of-concept_ is the same used in the original paper for the binary classification between the handwritten digits _3_ and _6_ in the MNIST dataset.

## Pennylane-Pytorch implementation in MNIST dataset

In [1]:
import random
import matplotlib.pyplot as plt

# Pennylane
import pennylane as qml
from pennylane import numpy as np

# Pytorch
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler

### Parameters

In [2]:
n_qubits = 9
n_layer_steps = 3
n_layers_to_add = 2
batch_size = 128
epochs = 5

We configure PyTorch to use CUDA only if available. Otherwise the CPU is used.

In [3]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

 return torch._C._cuda_getDeviceCount() > 0


We initialize a PennyLane device with a `lightning.qubit` [backend](https://pennylane-lightning.readthedocs.io/en/latest/devices.html). 

In [4]:
dev = qml.device("lightning.qubit", wires=n_qubits)

### Data pre-processing

In `data_transforms`, we compose several transformations to the images in order to reduce their sizes and construct a flatten vector while keeping meaningful information to being able to "learn" the difference between digits in a quantum neural network. Feel free to explore and try different representation of the data such as learned embeddings or dimensionality reduction approaches.

#### Transformations
- [CenterCrop](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.CenterCrop): Crops the given image at the center.
- [Resize](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Resize): Resize the input image to the given size.
- [ToTensor](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.ToTensor): Converts the images with values in the range [0, 255] to tensors with values in the range [0,1].
- Flatten: [Lambda](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Lambda) that applies a lambda function to flatten the image into a vector.

In [5]:
data_transforms = transforms.Compose([transforms.CenterCrop(18), #crop the image to a 18x18 image
 transforms.Resize(3), #resize to a 3x3 image
 transforms.ToTensor(), #convert to tensor
 transforms.Lambda(lambda x: torch.flatten(x)) #obtain a vector by flatten the image
 ])

In [6]:
# Download the MNIST dataset and apply the composition of transformations.
train_set = datasets.MNIST(root='./data', train=True, download=True, transform=data_transforms)
test_set = datasets.MNIST(root='./data', train=False, download=True, transform=data_transforms)

# Change labels of digits '3' and '6' to be 0 and 1, respectively.
# Note that first we must change the labels of the digits '0' and '1'
train_set.targets[train_set.targets == 1] = 10
train_set.targets[train_set.targets == 0] = 10
train_set.targets[train_set.targets == 3] = 0
train_set.targets[train_set.targets == 6] = 1

test_set.targets[test_set.targets == 1] = 10
test_set.targets[test_set.targets == 0] = 10
test_set.targets[test_set.targets == 3] = 0
test_set.targets[test_set.targets == 6] = 1

# Filter to just images of '3's and '6's
subset_indices_train = ((train_set.targets == 0) + (train_set.targets == 1)).nonzero().view(-1)
subset_indices_test = ((test_set.targets == 0) + (test_set.targets == 1)).nonzero().view(-1)

print(len(subset_indices_test))

# Select just a subset of the training set. 
# Increase the number of examples for more accurate results
NUM_EXAMPLES = 1000
subset_indices_train = subset_indices_train[:NUM_EXAMPLES]

# DataLoaders
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=False,
 sampler=SubsetRandomSampler(subset_indices_train))
test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=False,
 sampler=SubsetRandomSampler(subset_indices_test))

1968


	nonzero()
Consider using one of the following signatures instead:
	nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)


### Data distribution

In [23]:
k = 0
for x, y in train_loader:
 for i in range(y.shape[0]):
 if y[i].item() == 0:
 k += 1
print(f"{k} images of digit '3'.")
print(f"{NUM_EXAMPLES - k} images of digit '6'.")

495 images of digit '3'.
505 images of digit '6'.


### Utility functions

In [8]:
def set_random_gates(n_qubits):
 """Utility function for creating a list
 of random gates chosen from gate_set.
 
 The returned list has a length of n_qubits.
 
 Arguments:
 n_qubits (int): Integer number indicating
 the number of qubits of the quantum
 circuit.
 
 Returns:
 chosen_gates (list): List of length equal
 to n_qubits containing RX, RY and RZ
 rotations randomly chosen.
 """
 
 gate_set = [qml.RX, qml.RY, qml.RZ]
 chosen_gates = []
 for i in range(n_qubits):
 chosen_gate = random.choice(gate_set)
 chosen_gates.append(chosen_gate)
 return chosen_gates

def total_elements(array_list):
 """Utility function that returns the total number
 of elements in a list of lists.
 
 Arguments:
 array_list (list[list]): List of lists.
 
 Returns:
 (int): Total number of elements in array_list.
 """

 flattened = [val for sublist in array_list for val in sublist]
 return len(flattened)

### Define lists to update new gates and trained weights

In [9]:
# Lists to update the new gates and trained weights.
layer_gates = []
layer_weights = []

## Phase I: Increasing the circuit depth

In [10]:
def apply_layer(gates, weights):
 """Function to apply the layer composed of
 of RX, RY and RZ to each qubit in the circuit
 (just one gate per qubit, randomly chosen) with
 their respective parameters. Then, apply CZ gates
 in a ladder structure.
 
 Arguments:
 gates: List of single qubit gates to apply in
 the circuit. Length equal to the number
 of qubits of the circuit.
 
 weights: List of parameters to apply in each
 gate from gates. Length equal to the 
 number of qubits of the circuit.
 
 Returns:
 None
 """
 
 # Apply single qubit gates with their weights.
 for i in range(n_qubits): 
 gates[i](weights[i], wires = i)

 # Apply CZ gates to each pair of qubits in ladder structure.
 for i in range(n_qubits-1):
 qml.CZ(wires=[i, i+1])
 
#Function for non-trainable part of the quantum circuit
def apply_frozen_layers(frozen_layer_gates, frozen_layer_weights):
 """Function that applies multiple layers to the quantum
 circuit. The main purpose of this function is to use it
 for applying the layers already trained during Phase I of
 layerwise learning.
 
 Arguments:
 frozen_layer_gates: List of lists containing the qubit
 rotations per layer to apply to the circuit.
 List of "shape" (number layers, number qubits).
 
 frozen_layer_weights: List of lists containing the
 parameters (angles) to each rotation in
 frozen_layer_gates. List of "shape" (number layers, number qubits).
 
 Returns:
 None
 """

 for i in range(len(frozen_layer_gates)):
 apply_layer(frozen_layer_gates[i], frozen_layer_weights[i])

@qml.qnode(dev, interface="torch")
def quantum_net(inputs, new_weights):
 """Quantum network to train during Phase I of
 layerwise learning. The data inputs are encoded
 using an Angle Embedding with X rotations. Then, 
 we apply the non-trainable layers or frozen layers
 using the two lists called layer_gates and layer_weights
 that store the randomly selected single qubit rotations
 and their trained weights in previous steps of layerwise
 learning. Finally, n_layers_to_add is an integer number that
 indicates the number of trainable layers to add in
 each step of Phase I.
 
 Arguments:
 inputs: Tensor data.
 new_weights: New paramters to be train of shape
 (n_layers_to_add, n_qubits).
 
 Returns:
 (float): Expectation value of an Z measurement in the
 last qubit of the circuit.
 """

 # Encode the data with Angle Embedding
 qml.templates.AngleEmbedding(inputs, wires=range(n_qubits))
 
 # Apply frozen layers
 apply_frozen_layers(layer_gates, layer_weights)
 
 # Apply layers with trainable parameters
 for i in range(n_layers_to_add):
 apply_layer(new_gates[i], new_weights[i])
 
 # Expectation value of the last qubit
 return qml.expval(qml.PauliZ(n_qubits-1))

In [11]:
# Sigmoid function and Binary Cross Entropy loss
sigmoid = nn.Sigmoid()
loss = nn.BCELoss()

for step in range(n_layer_steps):
 
 print(f"Phase I step: {step+1}")
 
 # Obtain random gates for each new layer.
 new_gates = [set_random_gates(n_qubits) for i in range(n_layers_to_add)]
 
 # Define shape of the weights
 weight_shapes = {"new_weights": (n_layers_to_add, n_qubits)}
 
 # Quantum net as a TorchLayer
 qlayer = qml.qnn.TorchLayer(quantum_net, weight_shapes, init_method = nn.init.zeros_)
 
 # Create Sequential Model
 model = torch.nn.Sequential(qlayer, sigmoid)
 
 # Optimizer
 opt = optim.Adam(model.parameters(), lr=0.01)
 
 batches = NUM_EXAMPLES // batch_size
 for epoch in range(epochs):
 running_loss = 0
 for x, y in train_loader:
 opt.zero_grad()
 y = y.to(torch.float32)
 loss_evaluated = loss(model(x), y)
 loss_evaluated.backward()
 running_loss += loss_evaluated

 opt.step()
 avg_loss = running_loss / batches
 print("Average loss over epoch {}: {:.4f}".format(epoch + 1, avg_loss))
 
 # Extract weights after optimization to be save in layer_weights
 for param in model.parameters():
 new_weights = param.data
 new_weights = new_weights.tolist()
 print(f"Trained parameters: {total_elements(new_weights)}")

 layer_gates += new_gates
 layer_weights += new_weights
 print(f"Layer weights: {total_elements(layer_weights)}")
 print(f"Number of layers: {len(layer_gates)}")
 print("")

Phase I step: 1
Average loss over epoch 1: 0.9071
Average loss over epoch 2: 0.9058
Average loss over epoch 3: 0.9077
Average loss over epoch 4: 0.9057
Average loss over epoch 5: 0.9094
Trained parameters: 18
Layer weights: 18
Number of layers: 2

Phase I step: 2
Average loss over epoch 1: 0.9037
Average loss over epoch 2: 0.8973
Average loss over epoch 3: 0.8875
Average loss over epoch 4: 0.8786
Average loss over epoch 5: 0.8689
Trained parameters: 18
Layer weights: 36
Number of layers: 4

Phase I step: 3
Average loss over epoch 1: 0.8596
Average loss over epoch 2: 0.8481
Average loss over epoch 3: 0.8380
Average loss over epoch 4: 0.8293
Average loss over epoch 5: 0.8183
Trained parameters: 18
Layer weights: 54
Number of layers: 6



## Phase II: Split to circuit into pieces

In [12]:
# Define partition of the circuit to train in each step.
# Here we train the circuit by halves.
partition_percentage = 0.5
partition_size = int(n_layer_steps*n_layers_to_add*partition_percentage)
n_partition_weights = partition_size*n_qubits
n_sweeps = 2

In [None]:
def edit_model_parameters(model, new_parameters):
 """Function for editing the initial parameters
 of a Sequential model in Pytorch to be a given
 tensor as the initial parameters of the model.
 This function is useful for Phase II because the
 initial parameters in this phase are the trained
 weights from Phase I.
 
 Arguments:
 model (torch.nn.Sequential): In this case the Sequential
 model in Pytorch with a TorchLayer from Pennylane.
 Our quantum neural network.
 
 new_parameters (torch.nn.Parameter): The new parameters
 that we want in the model as initial weights.
 
 Returns:
 model (torch.nn.Sequential): The model with the new
 model.parameters().
 """
 
 old_params = {}
 for name, params in model.named_parameters():
 old_params[name] = params.clone()
 
 old_params["0.partition_weights"] = new_parameters
 
 for name, params in model.named_parameters():
 params.data.copy_(old_params[name])
 
 return model

def get_partition(layer_weights, partition, partition_size):
 """Function to get the first or second partition of an
 array given a partition size. This function is useful
 to avoid repeating our code in Phase II.
 
 Arguments:
 layer_weights: List of lists containing the
 parameters (angles) to each rotation in
 layer_gates. List of "shape" (number layers, number qubits).
 
 partition (int): In this example it can be 1 or 2 to indicate
 the partition.
 
 partition_size (int): Integer that tells you the layer in which
 the partition is made.
 
 Returns:
 Partition of layer_weights, first or second partition.
 """
 
 if partition == 1:
 return layer_weights[:partition_size]
 if partition == 2:
 return layer_weights[partition_size:]
 
def save_trained_partition(layer_weights, trained_weights, partition, partition_size):
 """Function to update layer weights after training a partition.
 
 Arguments:
 layer_weights: List of lists containing the
 parameters (angles) to each rotation in
 layer_gates. List of "shape" (number layers, number qubits).
 
 trained_weights: Trained weights after training a partition
 of the circuit, could be first or second partition.
 
 partition (int): In this example it can be 1 or 2 to indicate
 the partition.
 
 partition_size (int): Integer that tells you the layer in which
 the partition is made.
 
 Returns:
 None
 """
 
 if partition == 1:
 layer_weights[:partition_size] = trained_weights
 if partition == 2:
 layer_weights[partition_size:] = trained_weights

In [13]:
@qml.qnode(dev, interface="torch")
def train_partition(inputs, partition_weights):
 """Qnode defined to train just a partition of 
 the quantum circuit after Phase I. This function
 supports just a partition in two pieces of the
 circuit. If partition == 1 is going to treat as
 trainable the first portion of the circuit and if
 partition == 2, the second portion is going to be
 trainable.
 
 Arguments:
 inputs: Tensor data.
 partition_weights: Partition of the weights to be
 trained. Shape (len(partition_weights, n_qubits).
 
 Returns:
 (float): Expectation value of an Z measurement in the
 last qubit of the circuit.
 """

 #Encode the data with Angle Embedding
 qml.templates.AngleEmbedding(inputs, wires=range(n_qubits))
 
 if partition == 1:
 # Apply trainable partition first
 for i in range(len(layer_gates[:partition_size])):
 apply_layer(layer_gates[:partition_size][i], partition_weights[i])
 
 #Apply non-trainable partition
 for i in range(len(layer_gates[partition_size:])):
 apply_layer(layer_gates[partition_size:][i], layer_weights[partition_size:][i])
 
 elif partition == 2:
 # Apply non-trainable partition first
 for i in range(len(layer_gates[:partition_size])):
 apply_layer(layer_gates[:partition_size][i], layer_weights[:partition_size][i])
 
 # Apply trainable partition
 for i in range(len(layer_gates[partition_size:])):
 apply_layer(layer_gates[partition_size:][i], partition_weights[i])
 
 # Expectation value of the last qubit
 return qml.expval(qml.PauliZ(n_qubits-1))

In [14]:
for sweep in range(n_sweeps):
 
 for partition in [1,2]:
 print(f"Sweep: {sweep+1}, partition: {partition}")
 # Get partition
 trainable_weights = get_partition(layer_weights, partition, partition_size)

 # Define shape of the weights
 weight_shapes = {"partition_weights": (len(trainable_weights), n_qubits)}

 # Quantum net as a TorchLayer
 qlayer = qml.qnn.TorchLayer(train_partition, weight_shapes, init_method = nn.init.zeros_)

 init_weights = nn.Parameter(torch.tensor(trainable_weights))

 # Create Sequential Model
 model = torch.nn.Sequential(qlayer, sigmoid)

 # Edit model initial parameters to be init_weights
 model = edit_model_parameters(model, init_weights)

 # Optimizer
 opt = optim.Adam(model.parameters(), lr=0.01)

 batches = NUM_EXAMPLES // batch_size
 for epoch in range(epochs):
 running_loss = 0
 for x, y in train_loader:
 opt.zero_grad()
 y = y.to(torch.float32)
 loss_evaluated = loss(model(x), y)
 loss_evaluated.backward()
 running_loss += loss_evaluated

 opt.step()
 avg_loss = running_loss / batches
 print("Average loss over epoch {}: {:.4f}".format(epoch + 1, avg_loss))

 for param in model.parameters():
 trained_weights = param.data
 trained_weights = trained_weights.tolist()
 print(f"Trained parameters: {total_elements(trained_weights)}")

 save_trained_partition(layer_weights, trained_weights, partition, partition_size)

Sweep: 1, partition: 1
Average loss over epoch 1: 0.8106
Average loss over epoch 2: 0.8023
Average loss over epoch 3: 0.7974
Average loss over epoch 4: 0.7922
Average loss over epoch 5: 0.7887
Trained parameters: 27
Sweep: 1, partition: 2
Average loss over epoch 1: 0.7857
Average loss over epoch 2: 0.7806
Average loss over epoch 3: 0.7768
Average loss over epoch 4: 0.7734
Average loss over epoch 5: 0.7697
Trained parameters: 27
Sweep: 2, partition: 1
Average loss over epoch 1: 0.7668
Average loss over epoch 2: 0.7653
Average loss over epoch 3: 0.7639
Average loss over epoch 4: 0.7626
Average loss over epoch 5: 0.7618
Trained parameters: 27
Sweep: 2, partition: 2
Average loss over epoch 1: 0.7607
Average loss over epoch 2: 0.7572
Average loss over epoch 3: 0.7538
Average loss over epoch 4: 0.7509
Average loss over epoch 5: 0.7483
Trained parameters: 27


## Results

In [19]:
train_accuracy = 0
for x, y in train_loader:
 probs = model(x)
 preds = (probs>0.5).float()
 train_accuracy += torch.sum(preds == y).item()/preds.shape[0]
print(f"Train accuracy: {train_accuracy/len(train_loader)}")

Train accuracy: 0.7672776442307693


In [20]:
test_accuracy = 0
for x, y in test_loader:
 probs = model(x)
 preds = (probs>0.5).float()
 test_accuracy += torch.sum(preds == y).item()/preds.shape[0]
print(f"Test accuracy: {test_accuracy/len(test_loader)}")

Test accuracy: 0.7674153645833334


## References

[[1]](https://arxiv.org/abs/1803.11173) McClean et al., 2018. Barren plateaus in quantum neural network training landscapes.

[[2]](https://arxiv.org/abs/2006.14904) Skolik et al., 2020. 
Layerwise learning for quantum neural networks.

[[3]](https://github.com/tensorflow/quantum/blob/research/layerwise_learning/layerwise_learning.ipynb) Notebook with the implementation in Tensorflow Quantum by the paper's author.

[[4]](https://blog.tensorflow.org/2020/08/layerwise-learning-for-quantum-neural-networks.html) Tensorflow quantum Blog post about layerwise learning.

[[5]](https://www.youtube.com/watch?v=lz8BOz5KPZg) Tensorflow quantum YouTube's video about layerwise learning.