{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "# Install the necessary dependencies\n", "\n", "import os\n", "import sys \n", "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython imageio scikit-image requests\n", "# Convolutional Neural Networks" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "remove-cell" ] }, "source": [ "---\n", "license:\n", " code: MIT\n", " content: CC-BY-4.0\n", "github: https://github.com/ocademy-ai/machine-learning\n", "venue: By Ocademy\n", "open_access: true\n", "bibliography:\n", " - https://raw.githubusercontent.com/ocademy-ai/machine-learning/main/open-machine-learning-jupyter-book/references.bib\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Stylenet / Neural-Style\n", "\n", "The purpose of this script is to illustrate how to do stylenet in TensorFlow. We reference the following [paper](https://arxiv.org/abs/1508.06576) for this algorithm.\n", "\n", "But there is some prerequisites,\n", "\n", "- Download the `VGG-verydeep-19.mat` file.\n", "- You must download two images, a style image and a content image for the algorithm to blend.\n", "\n", "The style image is\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/starry_night.jpg\n", "name: Style image starry night\n", ":::\n", "\n", "The context image is below.\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/book_cover.jpg\n", "name: Content image book cover\n", ":::\n", "\n", "The final result looks like\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/05_stylenet_ex.png\n", "name: stylenet final result\n", ":::\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use two images, an original image and a style image and try to make the original image in the style of the style image.\n", "\n", "Reference paper:https://arxiv.org/abs/1508.06576\n", "\n", "Need to download the model 'imagenet-vgg-verydee-19.mat' from: http://www.vlfeat.org/matconvnet/models/beta16/imagenet-vgg-verydeep-19.mat\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Code" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import os\n", "import scipy.io\n", "import scipy.misc\n", "import imageio\n", "from skimage.transform import resize\n", "from operator import mul\n", "from functools import reduce\n", "from PIL import Image\n", "import numpy as np\n", "import requests\n", "import tensorflow.compat.v1 as tf\n", "tf.disable_eager_execution() #This is tensorflow 1.x version code. Some of them are not fit tensorflow 2.x.\n", "from tensorflow.python.framework import ops\n", "ops.reset_default_graph()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download data" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# URLs\n", "original_image_url = 'https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/book_cover.jpg'\n", "style_image_url = 'https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/deep-learning/CNN/starry_night.jpg'\n", "vgg_url = 'https://static-1300131294.cos.ap-shanghai.myqcloud.com/data/deep-learning/cnn/imagenet-vgg-verydeep-19.mat'\n", "\n", "# Local directories\n", "data_dir = 'temp'\n", "vgg_dir = os.path.join(data_dir, 'VGG')\n", "if not os.path.exists(vgg_dir):\n", " os.makedirs(vgg_dir)\n", "\n", "# Function to download and save a file\n", "def download_file(url, directory):\n", " response = requests.get(url)\n", " filename = url.split('/')[-1]\n", " filepath = os.path.join(directory, filename)\n", " with open(filepath, 'wb') as f:\n", " f.write(response.content)\n", " return filepath\n", "\n", "# Download images and VGG Network\n", "original_image_path = download_file(original_image_url, data_dir)\n", "style_image_path = download_file(style_image_url, data_dir)\n", "vgg_path = download_file(vgg_url, vgg_dir)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load data and set default arguments" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Load images using PIL and convert to NumPy arrays\n", "original_image = Image.open(original_image_path)\n", "style_image = Image.open(style_image_path)\n", "original_image = np.array(original_image)\n", "style_image = np.array(style_image)\n", "\n", "# Default Arguments\n", "original_image_weight = 5.0\n", "style_image_weight = 500.0\n", "regularization_weight = 100\n", "learning_rate = 10\n", "generations = 100\n", "output_generations = 25\n", "beta1 = 0.9\n", "beta2 = 0.999" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Style Transfer Implementation using VGG-19 Network\n", "\n", "Now setting VGG-19." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Get shape of target and make the style image the same\n", "target_shape = original_image.shape\n", "style_image = resize(style_image, target_shape)\n", "\n", "# VGG-19 Layer Setup\n", "# From paper\n", "vgg_layers = ['conv1_1', 'relu1_1',\n", " 'conv1_2', 'relu1_2', 'pool1',\n", " 'conv2_1', 'relu2_1',\n", " 'conv2_2', 'relu2_2', 'pool2',\n", " 'conv3_1', 'relu3_1',\n", " 'conv3_2', 'relu3_2',\n", " 'conv3_3', 'relu3_3',\n", " 'conv3_4', 'relu3_4', 'pool3',\n", " 'conv4_1', 'relu4_1',\n", " 'conv4_2', 'relu4_2',\n", " 'conv4_3', 'relu4_3',\n", " 'conv4_4', 'relu4_4', 'pool4',\n", " 'conv5_1', 'relu5_1',\n", " 'conv5_2', 'relu5_2',\n", " 'conv5_3', 'relu5_3',\n", " 'conv5_4', 'relu5_4']\n", "\n", "\n", "# Extract weights and matrix means\n", "def extract_net_info(path_to_params):\n", " vgg_data = scipy.io.loadmat(path_to_params)\n", " normalization_matrix = vgg_data['normalization'][0][0][0]\n", " mat_mean = np.mean(normalization_matrix, axis=(0, 1))\n", " network_weights = vgg_data['layers'][0]\n", " return mat_mean, network_weights\n", " \n", "\n", "# Create the VGG-19 Network\n", "def vgg_network(network_weights, init_image):\n", " network = {}\n", " image = init_image\n", "\n", " for i, layer in enumerate(vgg_layers):\n", " if layer[0] == 'c':\n", " weights, bias = network_weights[i][0][0][0][0]\n", " weights = np.transpose(weights, (1, 0, 2, 3))\n", " bias = bias.reshape(-1)\n", " conv_layer = tf.nn.conv2d(image, tf.constant(weights), (1, 1, 1, 1), 'SAME')\n", " image = tf.nn.bias_add(conv_layer, bias)\n", " elif layer[0] == 'r':\n", " image = tf.nn.relu(image)\n", " else: # pooling\n", " image = tf.nn.max_pool(image, (1, 2, 2, 1), (1, 2, 2, 1), 'SAME')\n", " network[layer] = image\n", " return network\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we define which layers apply to the original or style image and get network parameters." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "\n", "original_layers = ['relu4_2', 'relu5_2']\n", "style_layers = ['relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1']\n", "\n", "normalization_mean, network_weights = extract_net_info(vgg_path)\n", "\n", "shape = (1,) + original_image.shape\n", "style_shape = (1,) + style_image.shape\n", "original_features = {}\n", "style_features = {}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set style weights." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "style_weights = {l: 1./(len(style_layers)) for l in style_layers}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Computer feature layers with original image." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "g_original = tf.Graph()\n", "with g_original.as_default(), tf.Session() as sess1:\n", " image = tf.placeholder('float', shape=shape)\n", " vgg_net = vgg_network(network_weights, image)\n", " original_minus_mean = original_image - normalization_mean\n", " original_norm = np.array([original_minus_mean])\n", " for layer in original_layers:\n", " original_features[layer] = vgg_net[layer].eval(feed_dict={image: original_norm})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get style image network." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "g_style = tf.Graph()\n", "with g_style.as_default(), tf.Session() as sess2:\n", " image = tf.placeholder('float', shape=style_shape)\n", " vgg_net = vgg_network(network_weights, image)\n", " style_minus_mean = style_image - normalization_mean\n", " style_norm = np.array([style_minus_mean])\n", " for layer in style_layers:\n", " features = vgg_net[layer].eval(feed_dict={image: style_norm})\n", " features = np.reshape(features, (-1, features.shape[3]))\n", " gram = np.matmul(features.T, features) / features.size\n", " style_features[layer] = gram\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make Combined Image via loss function." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Generation 25 out of 100, loss: 10359594.0\n", "Generation 50 out of 100, loss: 7851741.5\n", "Generation 75 out of 100, loss: 6664776.0\n", "Generation 100 out of 100, loss: 6305506.0\n" ] } ], "source": [ "with tf.Graph().as_default():\n", " # Get network parameters\n", " initial = tf.random_normal(shape) * 0.256\n", " init_image = tf.Variable(initial)\n", " vgg_net = vgg_network(network_weights, init_image)\n", "\n", " # Loss from Original Image\n", " original_layers_w = {'relu4_2': 0.5, 'relu5_2': 0.5}\n", " original_loss = 0\n", " for o_layer in original_layers:\n", " temp_original_loss = original_layers_w[o_layer] * original_image_weight *\\\n", " (2 * tf.nn.l2_loss(vgg_net[o_layer] - original_features[o_layer]))\n", " original_loss += (temp_original_loss / original_features[o_layer].size)\n", "\n", " # Loss from Style Image\n", " style_loss = 0\n", " style_losses = []\n", " for style_layer in style_layers:\n", " layer = vgg_net[style_layer]\n", " feats, height, width, channels = [dim if isinstance(dim, int) else dim.value for dim in layer.get_shape()]\n", " size = height * width * channels\n", " features = tf.reshape(layer, (-1, channels))\n", " style_gram_matrix = tf.matmul(tf.transpose(features), features) / size\n", " style_expected = style_features[style_layer]\n", " style_losses.append(style_weights[style_layer] * 2 *\n", " tf.nn.l2_loss(style_gram_matrix - style_expected) /\n", " style_expected.size)\n", " style_loss += style_image_weight * tf.reduce_sum(style_losses)\n", "\n", " # To Smooth the results, we add in total variation loss\n", " total_var_x = reduce(mul, init_image[:, 1:, :, :].get_shape().as_list(), 1)\n", " total_var_y = reduce(mul, init_image[:, :, 1:, :].get_shape().as_list(), 1)\n", " first_term = regularization_weight * 2\n", " second_term_numerator = tf.nn.l2_loss(init_image[:, 1:, :, :] - init_image[:, :shape[1]-1, :, :])\n", " second_term = second_term_numerator / total_var_y\n", " third_term = (tf.nn.l2_loss(init_image[:, :, 1:, :] - init_image[:, :, :shape[2]-1, :]) / total_var_x)\n", " total_variation_loss = first_term * (second_term + third_term)\n", "\n", " # Combined Loss\n", " loss = original_loss + style_loss + total_variation_loss\n", "\n", " # Declare Optimization Algorithm\n", " optimizer = tf.train.AdamOptimizer(learning_rate, beta1, beta2)\n", " train_step = optimizer.minimize(loss)\n", "\n", " # Initialize variables and start training\n", " with tf.Session() as sess:\n", " tf.global_variables_initializer().run()\n", " for i in range(generations):\n", "\n", " train_step.run()\n", "\n", " # Print update and save temporary output\n", " if (i+1) % output_generations == 0:\n", " print('Generation {} out of {}, loss: {}'.format(i + 1, generations,sess.run(loss)))\n", "\n", " image_eval = init_image.eval(session=sess)\n", " image_eval = image_eval.reshape(shape[1:]) # Make sure form is right\n", " image_eval += normalization_mean # Add avg\n", " image_eval = np.clip(image_eval, 0, 255) # Make sure value between 0 and 255\n", " image_eval = image_eval.astype(np.uint8) # Transform to uint8\n", "\n", "# Save image\n", "output_file = 'temp_output_{}.jpg'.format(i)\n", "imageio.imwrite(output_file, image_eval)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Your turn! 🚀\n", "You can practice your cnn skills by following the assignment [object recognition in images using cnn](../../assignments/deep-learning/cnn/object-recognition-in-images-using-cnn.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Self study\n", "\n", "You can refer to those YouTube videos for further study:\n", "\n", "- [Convolutional Neural Networks (CNNs) explained, by deeplizard](https://www.youtube.com/watch?v=YRhxdVk_sIs)\n", "- [Convolutional Neural Networks Explained (CNN Visualized), by Futurology](https://www.youtube.com/watch?v=pj9-rr1wDhM)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Acknowledgments\n", "\n", "Thanks to [Nick](https://github.com/nfmcclure) for creating the open-source course [tensorflow_cookbook](https://github.com/nfmcclure/tensorflow_cookbook). It inspires the majority of the content in this chapter.\n" ] } ], "metadata": { "kernelspec": { "display_name": "open-machine-learning-jupyter-book", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 2 }