# TensorFlow-Slim

[TensorFlow-Slim](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim) is a high-level API for building TensorFlow models. TF-Slim makes defining models in TensorFlow easier, cutting down on the number of lines required to define models and reducing overall clutter. In particular, TF-Slim shines in image domain problems, and weights pre-trained on the [ImageNet dataset](http://www.image-net.org/) for many famous CNN architectures are provided for [download](https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models).

*Note: Unlike previous notebooks, not every cell here is necessarily meant to run. Some are just for illustration.*

## VGG-16

To show these benefits, this tutorial will focus on [VGG-16](https://arxiv.org/abs/1409.1556). This style of architecture came in 2nd during the 2014 ImageNet Large Scale Visual Recognition Challenge and is famous for its simplicity and depth. The model looks like this:

![vgg16](Figures/vgg16.png)

The architecture is pretty straight-forward: simply stack multiple 3x3 convolutional filters one after another, interleave with 2x2 maxpools, double the number of convolutional filters after each maxpool, flatten, and finish with fully connected layers. A couple ideas behind this model:

- Instead of using larger filters, VGG notes that the receptive field of two stacked layers of 3x3 filters is 5x5, and with 3 layers, 7x7. Using 3x3's allows VGG to insert additional non-linearities and requires fewer weight parameters to learn.

- Doubling the width of the network every time the features are spatially downsampled (maxpooled) gives the model more representational capacity while achieving spatial compression.

### TensorFlow Core

In code, setting up the computation graph for prediction with just TensorFlow Core API is kind of a lot:

In [None]:
import tensorflow as tf

# Set up the data loading:
images, labels = ...

# Define the model
with tf.name_scope('conv1_1') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 3, 64], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv1 = tf.nn.relu(bias, name=scope)
 
with tf.name_scope('conv1_2') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 64, 64], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(conv1, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[64], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv1 = tf.nn.relu(bias, name=scope)
 
pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool1')
 
with tf.name_scope('conv2_1') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 64, 128], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(pool1, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[128], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv2 = tf.nn.relu(bias, name=scope)
 
with tf.name_scope('conv2_2') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 128, 128], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(conv2, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[128], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv2 = tf.nn.relu(bias, name=scope)
 
pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool2')
 
with tf.name_scope('conv3_1') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 128, 256], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(pool2, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv3 = tf.nn.relu(bias, name=scope)
 
with tf.name_scope('conv3_2') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv3 = tf.nn.relu(bias, name=scope)
 
with tf.name_scope('conv3_3') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 256], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(conv3, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[256], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv3 = tf.nn.relu(bias, name=scope)
 
pool3 = tf.nn.max_pool(conv3, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool3')
 
with tf.name_scope('conv4_1') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 256, 512], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(pool3, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv4 = tf.nn.relu(bias, name=scope)
 
with tf.name_scope('conv4_2') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 512, 512], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv4 = tf.nn.relu(bias, name=scope)
 
with tf.name_scope('conv4_3') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 512, 512], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(conv4, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv4 = tf.nn.relu(bias, name=scope)
 
pool4 = tf.nn.max_pool(conv4, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool4')

with tf.name_scope('conv5_1') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 512, 512], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(pool4, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv5 = tf.nn.relu(bias, name=scope)
 
with tf.name_scope('conv5_2') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 512, 512], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(conv5, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv5 = tf.nn.relu(bias, name=scope)
 
with tf.name_scope('conv5_3') as scope:
 kernel = tf.Variable(tf.truncated_normal([3, 3, 512, 512], dtype=tf.float32, stddev=1e-1), name='weights')
 conv = tf.nn.conv2d(conv5, kernel, [1, 1, 1, 1], padding='SAME')
 biases = tf.Variable(tf.constant(0.0, shape=[512], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(conv, biases)
 conv5 = tf.nn.relu(bias, name=scope)
 
pool5 = tf.nn.max_pool(conv5, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool5')
 
with tf.name_scope('fc_6') as scope:
 flat = tf.reshape(pool5, [-1, 7*7*512])
 weights = tf.Variable(tf.truncated_normal([7*7*512, 4096], dtype=tf.float32, stddev=1e-1), name='weights')
 mat = tf.matmul(flat, weights)
 biases = tf.Variable(tf.constant(0.0, shape=[4096], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(mat, biases)
 fc6 = tf.nn.relu(bias, name=scope)
 fc6_drop = tf.nn.dropout(fc6, keep_prob=0.5, name='dropout')

with tf.name_scope('fc_7') as scope:
 weights = tf.Variable(tf.truncated_normal([4096, 4096], dtype=tf.float32, stddev=1e-1), name='weights')
 mat = tf.matmul(fc6, weights)
 biases = tf.Variable(tf.constant(0.0, shape=[4096], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(mat, biases)
 fc7 = tf.nn.relu(bias, name=scope)
 fc7_drop = tf.nn.dropout(fc7, keep_prob=0.5, name='dropout')
 
with tf.name_scope('fc_8') as scope:
 weights = tf.Variable(tf.truncated_normal([4096, 1000], dtype=tf.float32, stddev=1e-1), name='weights')
 mat = tf.matmul(fc7, weights)
 biases = tf.Variable(tf.constant(0.0, shape=[1000], dtype=tf.float32), trainable=True, name='biases')
 bias = tf.nn.bias_add(mat, biases)

predictions = bias

Understanding every line of this model isn't important. The main point to notice is how much space this takes up. Several of the above lines (conv2d, bias_add, relu, maxpool) can obviously be combined to cut down on the size a bit, and you could also try to compress the code with some clever `for` looping, but all at the cost of sacrificing readability. With this much code, there is high potential for bugs or typos (to be honest, there are probably a few up there^), and modifying or refactoring the code becomes a huge pain.

By the way, although VGG-16's paper was titled "Very Deep Convolutional Networks for Large-Scale Image Recognition", it isn't even considered a particularly deep network by today's standards. [Residual Networks](https://arxiv.org/abs/1512.03385) (2015) started beating state-of-the-art results with 50, 101, and 152 layers in their first incarnation, before really going off the deep end and getting up to 1001 layers and beyond. I'll spare you from me typing out the uncompressed TensorFlow Core code for that.

### TF-Slim

Enter TF-Slim. The same VGG-16 model can be expressed as follows:

In [None]:
import tensorflow as tf

slim = tf.contrib.slim

# Set up the data loading:
images, labels = ...

# Define the model:
with slim.arg_scope([slim.conv2d, slim.fully_connected],
 activation_fn=tf.nn.relu,
 weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
 weights_regularizer=slim.l2_regularizer(0.0005)):
 net = slim.repeat(images, 2, slim.conv2d, 64, [3, 3], scope='conv1')
 net = slim.max_pool2d(net, [2, 2], scope='pool1')
 net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
 net = slim.max_pool2d(net, [2, 2], scope='pool2')
 net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
 net = slim.max_pool2d(net, [2, 2], scope='pool3')
 net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
 net = slim.max_pool2d(net, [2, 2], scope='pool4')
 net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
 net = slim.max_pool2d(net, [2, 2], scope='pool5')
 net = slim.fully_connected(net, 4096, scope='fc6')
 net = slim.dropout(net, 0.5, scope='dropout6')
 net = slim.fully_connected(net, 4096, scope='fc7')
 net = slim.dropout(net, 0.5, scope='dropout7')
 net = slim.fully_connected(net, 1000, activation_fn=None, scope='fc8')

predictions = net

Much cleaner. For the TF-Slim version, it's much more obvious what the network is doing, writing it is faster, and typos and bugs are much less likely.

Things to notice:

- Weight and bias variables for every layer are automatically generated and tracked. Also, the "in_channel" parameter for determining weight dimension is automatically inferred from the input. This allows you to focus on what layers you want to add to the model, without worrying as much about boilerplate code. 

- The repeat() function allows you to add the same layer multiple times. In terms of variable scoping, repeat() will add "_#" to the scope to distinguish the layers, so we'll still have layers of scope "`conv1_1`, `conv1_2`, `conv2_1`, etc...".

- The non-linear activation function (here: ReLU) is wrapped directly into the layer. In more advanced architectures with batch normalization, that's included as well.

- With slim.argscope(), we're able to specify defaults for common parameter arguments, such as the type of activation function or weights_initializer. Of course, these defaults can still be overridden in any individual layer, as demonstrated in the finally fully connected layer (fc8).

If you're reusing one of the famous architectures (like VGG-16), TF-Slim already has them defined, so it becomes even easier:

In [None]:
import tensorflow as tf

slim = tf.contrib.slim
vgg = tf.contrib.slim.nets.vgg

# Set up the data loading:
images, labels = ...

# Define the model:
predictions = vgg.vgg16(images)

## Pre-Trained Weights

TF-Slim provides weights pre-trained on the ImageNet dataset available for [download](https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models). First a quick tutorial on saving and restoring models:

### Saving and Restoring

One of the nice features of modern machine learning frameworks is the ability to save model parameters in a clean way. While this may not have been a big deal for the MNIST logistic regression model because training only took a few seconds, it's easy to see why you wouldn't want to have to re-train a model from scratch every time you wanted to do inference or make a small change if training takes days or weeks.

TensorFlow provides this functionality with its [Saver()](https://www.tensorflow.org/programmers_guide/variables#saving_and_restoring) class. While I just said that saving the weights for the MNIST logistic regression model isn't necessary because of how it is easy to train, let's do it anyway for illustrative purposes:

In [1]:
import tensorflow as tf
from tqdm import trange
from tensorflow.examples.tutorials.mnist import input_data

# Import data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Create the model
x = tf.placeholder(tf.float32, [None, 784], name='x')
W = tf.Variable(tf.zeros([784, 10]), name='W')
b = tf.Variable(tf.zeros([10]), name='b')
y = tf.nn.bias_add(tf.matmul(x, W), b, name='y')

# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [None, 10], name='y_')
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

# Variable Initializer
init_op = tf.global_variables_initializer()

# Create a Saver object for saving weights
saver = tf.train.Saver()

# Create a Session object, initialize all variables
sess = tf.Session()
sess.run(init_op)

# Train
for _ in trange(1000):
 batch_xs, batch_ys = mnist.train.next_batch(100)
 sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
 
# Save model
save_path = saver.save(sess, "./log_reg_model.ckpt")
print("Model saved in file: %s" % save_path)
 
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Test accuracy: {0}'.format(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})))

sess.close()

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


100%|█████████████████████████████████████| 1000/1000 [00:02<00:00, 393.89it/s]


Model saved in file: ./log_reg_model.ckpt
Test accuracy: 0.916700005531311


Note, the differences from what we worked with yesterday:

- In lines 9-12, 15, there are now 'names' properties attached to certain ops and variables of the graph. There are many reasons to do this, but here, it will help us identify which variables are which when restoring. 
- In line 23, we create a Saver() object, and in line 35, we save the variables of the model to a checkpoint file. This will create a series of files containing our saved model.

Otherwise, the code is more or less the same.

To restore the model:

In [2]:
import tensorflow as tf
from tqdm import trange
from tensorflow.examples.tutorials.mnist import input_data

# Import data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

# Create a Session object, initialize all variables
sess = tf.Session()

# Restore weights
saver = tf.train.import_meta_graph('./log_reg_model.ckpt.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))
print("Model restored.")

graph = tf.get_default_graph()
x = graph.get_tensor_by_name("x:0")
y = graph.get_tensor_by_name("y:0")
y_ = graph.get_tensor_by_name("y_:0")
 
# Test trained model
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
print('Test accuracy: {0}'.format(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels})))

sess.close()

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
INFO:tensorflow:Restoring parameters from ./log_reg_model.ckpt
Model restored.
Test accuracy: 0.916700005531311


Importantly, notice that we didn't have to retrain the model. Instead, the graph and all variable values were loaded directly from our checkpoint files. In this example, this probably takes just as long, but for more complex models, the utility of saving/restoring is immense.

### TF-Slim Model Zoo

One of the biggest and most surprising unintended benefits of the ImageNet competition was deep networks' transfer learning properties: CNNs trained on ImageNet classification could be re-used as general purpose feature extractors for other tasks, such as object detection. Training on ImageNet is very intensive and expensive in both time and computation, and requires a good deal of set-up. As such, the availability of weights already pre-trained on ImageNet has significantly accelerated and democratized deep learning research.

Pre-trained models of several famous architectures are listed in the TF Slim portion of the [TensorFlow repository](https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models). Also included are the papers that proposed them and their respective performances on ImageNet. Side note: remember though that accuracy is not the only consideration when picking a network; memory and speed are important to keep in mind as well.

Each entry has a link that allows you to download the checkpoint file of the pre-trained network. Alternatively, you can download the weights as part of your program. A tutorial can be found [here](https://github.com/tensorflow/models/blob/master/research/slim/slim_walkthrough.ipynb), but the general idea:

In [None]:
from datasets import dataset_utils
import tensorflow as tf

url = "http://download.tensorflow.org/models/vgg_16_2016_08_28.tar.gz"
checkpoints_dir = './checkpoints'

if not tf.gfile.Exists(checkpoints_dir):
 tf.gfile.MakeDirs(checkpoints_dir)

dataset_utils.download_and_uncompress_tarball(url, checkpoints_dir)

In [None]:
import os
import tensorflow as tf
from nets import vgg

slim = tf.contrib.slim

# Load images
images = ...

# Pre-process
processed_images = ...

# Create the model, use the default arg scope to configure the batch norm parameters.
with slim.arg_scope(vgg.vgg_arg_scope()):
 logits, _ = vgg.vgg_16(processed_images, num_classes=1000, is_training=False)
 
probabilities = tf.nn.softmax(logits)

# Load checkpoint values
init_fn = slim.assign_from_checkpoint_fn(
 os.path.join(checkpoints_dir, 'vgg_16.ckpt'),
 slim.get_model_variables('vgg_16'))