# Part 4: More advanced networks

__Before starting, we recommend you enable GPU acceleration if you're running on Colab.__

In [None]:
# Execute this code block to install dependencies when running on colab
try:
 import torch
except:
 from os.path import exists
 from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
 platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
 cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
 accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'

 !pip install -q http://download.pytorch.org/whl/{accelerator}/torch-1.0.0-{platform}-linux_x86_64.whl torchvision

try: 
 import torchbearer
except:
 !pip install torchbearer

Recent network models, such as the deep residual network (ResNet) and GoogLeNet architectures, do not follow a straight path from input to output. Instead, these models incorporate branches and merges to create a computation graph. Branching and merging is easy to implement in PyTorch as shown in the following code snippet:

In [None]:
import torch 
import torch.nn.functional as F
from torch import nn

class BranchModel(nn.Module):
 def __init__(self):
 super(BranchModel, self).__init__()
 self.left = nn.Conv2d(1, 16, (1, 1), padding=0)
 self.right = nn.Conv2d(1, 16, (5, 5), padding=2)
 self.fc1 = nn.Linear(16*14*14, 128)
 self.fc2 = nn.Linear(128, 10)
 
 def forward(self, x):
 out_l = self.left(x)
 out_l = F.relu(out_l)

 out_r = self.right(x)
 out_r = F.relu(out_r)

 out = out_l + out_r
 
 out = F.max_pool2d(out, (2,2))
 out = F.dropout(out, 0.2)
 out = out.view(out.shape[0], -1)
 out = self.fc1(out)
 out = F.relu(out)
 out = self.fc2(out)

 return out

This defines a variant of our initial simple CNN model in which the input is split into two paths and then merged again; the left hand path consists of a 1x1 convolution layer, whilst the right-hand path has a 5x5 convolutional layer. The 1x1 convolutions will have the effect of increasing the number of bands in the input from 1 to 16 (with each band a (potentially different) scalar multiple of the input). Padding is used to ensure the feature maps have the same shape on the left and right branches. In this case the left and right branches are merged by summing them together (element-wise, layer by layer).

__Use the code block below to train and evaluate the above model.__

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## Going further

None of the network topology we have experimented with thus far are optimised. Nor are they reproductions of network topologies from recent papers.

__There is a lot of opportunity for you to tune and improve upon these models. What is the best error rate score you can achieve?__
