{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "Title: LSTM neural network for sequence learning\n", "Date: 2017-11-19 22:00\n", "Tags: LSTM, artificial intelligence, jupyter, tensorflow\n", "Slug: My-first-LSTM\n", "Authors: Dinne Bosman\n", "Lang:en\n", "Summary: My first attempt at a LSTM for sequence prediction\n", "---\n", "\n", "In 1996, during my last year in High School, I borrowed a book of a friend about neural networks. It explained how a two layer perceptron network could learn the XOR function. Back then I tried implementing the formulas and was able to do the feed-forward calculations. The training algorithm however still eluded me. Being able to perform forward calculations was already very exciting. I created a windows 95 screen save which would fill the screen with the output of a randomized neural network. The output images we're very interesting. Especially when replacing the activation functions of the network by exotic ones such as sin(x), abs(x) etc. (Although I lost the source code, you can still download it [here](http://www.free-downloads-center.com/download/neural-screen-saver-v1-0-11252.html))\n", "\n", "At the time it seemed that Neural networks were just another statistical method to interpolate data. Furthermore limited training data and the problem of vanishing gradients limited their usefulness. Fast forward to 2017. Massive amounts of training data and computing power are available. A number of relatively small improvements in the basic neural network algorithms have made it possible to train networks consisting of many more layers. These so-called deep neural networks have fueled progress and interest in Artificial Intelligence development.\n", "\n", "One particular innovation that caught my attention is the LSTM neural network architecture. This architecture solves the issue of vanishing gradients for Recurrent Neural Networks (RNN). LSTM networks are especially suited to perform analysis of sequences and time series. Some interesting links:\n", "\n", " * [article about text generation kernel code](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)\n", " * [fake news generator](https://larseidnes.com/2015/10/13/auto-generating-clickbait-with-recurrent-neural-networks/)\n", " * [LSTM architecture](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)\n", " * [LSTM explanation](https://arxiv.org/pdf/1506.00019.pdf)\n", " * [Modeling attention](https://distill.pub/2016/augmented-rnns/)\n", " * [Convolutional network for speech synthesis](https://deepmind.com/blog/wavenet-generative-model-raw-audio/)\n", "\n", "In this first test I wanted to experience implementing a sine wave predictor. using Tensor Flow. It's a toy example. Due to the periodic nature of the sine wave the train, dev, and test set overlap. This limits the possibilities to check if the network can generailize. " ] }, { "cell_type": "code", "execution_count": 91, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/vnd.plotly.v1+html": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import plotly\n", "from plotly.graph_objs import Scatter, Layout\n", "import numpy as np\n", "import tensorflow as tf\n", "import sys\n", "plotly.offline.init_notebook_mode(connected=True)\n", "import IPython.display" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training data\n", "The following cell generates the training data. I decided to add some noise to the sine wave which forces some regularization. " ] }, { "cell_type": "code", "execution_count": 92, "metadata": {}, "outputs": [], "source": [ "sample_length = 50001\n", "time_per_sample = 0.01\n", "signal_time = np.linspace(num=sample_length,start = 0, stop = sample_length * time_per_sample )\n", "signal_amp = np.sin(signal_time*2*np.pi) + np.random.normal(size=sample_length)*0.02\n", " #np.sin(2+signal_time*1.7*np.pi)*0.5 + \\\n", " #np.sin(1+signal_time*2.2*np.pi) + \\\n", " " ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "scrolled": false }, "outputs": [ { "data": { "application/vnd.plotly.v1+json": { "data": [ { "type": "scatter", "x": [ 0, 0.010000199999999999, 0.020000399999999998, 0.030000599999999995, 0.040000799999999996, 0.050001, 0.06000119999999999, 0.07000139999999999, 0.08000159999999999, 0.09000179999999999, 0.100002, 0.1100022, 0.12000239999999998, 0.1300026, 0.14000279999999998, 0.150003, 0.16000319999999998, 0.17000339999999997, 0.18000359999999999, 0.19000379999999997, 0.200004, 0.21000419999999997, 0.2200044, 0.23000459999999998, 0.24000479999999996, 0.250005, 0.2600052, 0.27000539999999995, 0.28000559999999997, 0.2900058, 0.300006, 0.31000619999999995, 0.32000639999999997, 0.3300066, 0.34000679999999994, 0.35000699999999996, 0.36000719999999997, 0.3700074, 0.38000759999999995, 0.39000779999999996, 0.400008, 0.41000819999999993, 0.42000839999999995, 0.43000859999999996, 0.4400088, 0.45000899999999994, 0.46000919999999995, 0.47000939999999997, 0.4800095999999999, 0.49000979999999994, 0.50001, 0.5100102, 0.5200104, 0.5300106, 0.5400107999999999, 0.5500109999999999, 0.5600111999999999, 0.5700114, 0.5800116, 0.5900118, 0.600012, 0.6100121999999999, 0.6200123999999999, 0.6300125999999999, 0.6400127999999999, 0.650013, 0.6600132, 0.6700134, 0.6800135999999999, 0.6900137999999999, 0.7000139999999999, 0.7100141999999999, 0.7200143999999999, 0.7300146, 0.7400148, 0.7500149999999999, 0.7600151999999999, 0.7700153999999999, 0.7800155999999999, 0.7900157999999999, 0.800016, 0.8100162, 0.8200163999999999, 0.8300165999999999, 0.8400167999999999, 0.8500169999999999, 0.8600171999999999, 0.8700173999999999, 0.8800176, 0.8900177999999999, 0.9000179999999999, 0.9100181999999999, 0.9200183999999999, 0.9300185999999999, 0.9400187999999999, 0.950019, 0.9600191999999999, 0.9700193999999999, 0.9800195999999999, 0.9900197999999999 ], "y": [ -0.007742519419626799, 0.05552410434127882, 0.09657447353891692, 0.16149490892202678, 0.25894014325256925, 0.31095106144413487, 0.35455573839774224, 0.4404990388597043, 0.45054675231412444, 0.5232084078263408, 0.5390653069995867, 0.6079613966740379, 0.6704059866563677, 0.743044138392089, 0.772560268827836, 0.802823850293995, 0.8565159172357552, 0.8878606249929556, 0.9133591748278074, 0.9533599089496061, 0.9404384555347293, 0.9901496749713646, 1.0002515684031061, 1.0049465701838294, 0.9765440794487468, 0.9867498469242078, 1.010286279431882, 1.0105167333577543, 0.9541705855154223, 0.9359836200439761, 0.9222381886957742, 0.9640576992221872, 0.8659641198547392, 0.8906750467489792, 0.7987724768877424, 0.8023777742123938, 0.8079997439210067, 0.7291570372013801, 0.6672021833734605, 0.6218339653886171, 0.5741112957714437, 0.5422287059727413, 0.49984734995619673, 0.4502051152351723, 0.35203306609488894, 0.33973891150531293, 0.24291887581697474, 0.1917956383143584, 0.14071433707652214, 0.08760452871523006, 0.0022570915967879053, -0.0414491751925725, -0.13846834476257303, -0.18354148042591262, -0.2829760572515714, -0.3450486351859782, -0.364152761591345, -0.43326528505774203, -0.4816312623009408, -0.5287251681913476, -0.5731476135798533, -0.6098468027070669, -0.6378079339590172, -0.7428949888003332, -0.7813546131349702, -0.8175040473564884, -0.84050537961511, -0.8907141119045157, -0.8986130977661669, -0.903432110375502, -0.9572910873597582, -0.9527043192048417, -0.9909573442878435, -1.0003289460978368, -0.9964319840699317, -1.017201469562415, -1.0056403802213405, -0.9817033838155825, -0.9915736778734696, -0.9594098380314021, -0.9612604987848942, -0.9101318024336031, -0.9214560224214484, -0.8773261576455951, -0.8600364305225254, -0.7876745287485466, -0.7515577772861848, -0.7160332778372979, -0.6756964964770819, -0.6246303801803553, -0.5894618329585847, -0.550563365895335, -0.49557529140445317, -0.44386901689660857, -0.3792056750095414, -0.3171750111618027, -0.25556683779666184, -0.1919405227220717, -0.11186443337210843, -0.06202805070035367 ] } ], "layout": { "title": "" } }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#plot part of the signal, just to see what's in there\n", "s_i = 0\n", "e_i = s_i + 100\n", "x = plotly.offline.iplot({\n", " \"data\": [Scatter(x=signal_time[s_i:e_i],y=signal_amp[s_i:e_i])],\n", " \"layout\": Layout(title=\"\")\n", " \n", "})" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [], "source": [ "#Setup general hyper parameters\n", "\n", "#Unroll the RNN to sequence_length timesteps\n", "sequence_length = 100\n", "#The number timesteps to predict\n", "prediction_length = 1\n", "#The number of features per input time step\n", "input_feature_count = 1\n", "#The number of featuers per prediction\n", "output_feature_count = 1\n", "\n", "#the number of LSTM nodes per layer of the network\n", "hidden_count_per_layer = [16,16]\n", "\n", "tf.reset_default_graph()\n", "\n", "#inputs is a vector of (batch_size, sequence_length, feature_count)\n", "inputs = tf.placeholder(tf.float32, \n", " [None, sequence_length, input_feature_count], \n", " name = 'inputs')\n", "\n", "single_input = tf.placeholder(tf.float32, \n", " [None, input_feature_count], \n", " name = 'single_input')\n", "\n", "#targets will be an example to train. \n", "#It will be filled with the value of the next time step. \n", "#Size (batch_size, feature count)\n", "targets = tf.placeholder(tf.float32, \n", " [None, output_feature_count], \n", " name = 'targets')\n", "#Apply drop out regularization with a a probability of keep_prob \n", "#to keep a connection \n", "keep_prob = tf.placeholder(tf.float32, name = 'keep')\n", "#Used a learning rate for AdamOptimzer\n", "learning_rate = tf.placeholder(tf.float32, name = 'learning_rate')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Defining the LSTM multi layer network\n", "\n", "Define a network by creating a number of layers. In most examples I found all layers used equal node counts. In this example you can specify the number of neurons per layer through the 'hidden_count_per_layer' array." ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [], "source": [ "layers = []\n", "\n", "\n", "for hidden_count in hidden_count_per_layer:\n", " layer = tf.nn.rnn_cell.LSTMCell(hidden_count, state_is_tuple = True)\n", " layer_with_dropout = tf.nn.rnn_cell.DropoutWrapper(layer,\n", " input_keep_prob=keep_prob,\n", " output_keep_prob=1.0)\n", " layers.append(layer)\n", "hidden_network = tf.nn.rnn_cell.MultiRNNCell(layers, state_is_tuple = True) \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Packing/Unpacking the LSTM network state\n", "\n", "'state_is_tuple = True' means that the LSTM State data structure will be a Tuple. Although inconvenient to work with this seems to be the future default. I will introduce some functions which help to work more easily with these state tuples. \n", "\n", "In order to use the LSTM network to generate a predicted sequence of arbitrary length you need to store the state of the network. The output state after predicting a sample should be fed back in to the network when predicting the next sample.\n", "\n", "The LSTM implementation in Tensor flow uses a LSTMStateTuple(c,h) data structure. The idea is to pack this LSTMStateTuple(c,h) into a 2D vector of size (batch_size, states). \n", "\n", "There were some challenges implementing these packing/unpacking functions. Especially you want to avoid them beeing dependent on a specific batch_size. During building of the computation graph the batch_size should be None.\n", "\n", "There is a pointer on how to use dynamic batch_sizes and packing/unpacking states [here](https://stackoverflow.com/questions/40438107/tensorflow-changing-batch-size-for-rnn-during-text-generation). I made some changes to clarify these functions. " ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [], "source": [ "def get_network_state_size(network):\n", " \"\"\"Returns the number of states variables in the network\"\"\"\n", " states = 0\n", " for layer_size in hidden_network.state_size:\n", " states += layer_size[0] # LSTMState tuple element c\n", " states += layer_size[1] # LSTMState tuple element h\n", " return states\n" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [], "source": [ "def pack_state_tuple(state_tuple, indent=0):\n", " \"\"\"Returns a (batch_size,network_state_size) matrix of the states in the network\n", " state_tupel = the states obtained from _ , state = tf.nn.dynamic_rnn(...)\n", " \"\"\"\n", " if isinstance(state_tuple, tf.Tensor) or not hasattr(state_tuple, '__iter__'):\n", " #The LSTMSTateTuple contains 2 Tensors\n", " return state_tuple\n", " else:\n", " l = []\n", " #an unpacked LSTM network is tuple of layer size, each element of the tuple is an LSTMStateTuple\n", " #state_tupel is either the tuple of LSTMStateTuples or it is a LSTMSTateTuple (via recursive call)\n", " for item in state_tuple:\n", " # item is either an LSTMStateTuple (top level call)\n", " # or it is an element of the LSTMStateTuple (first recursive call)\n", " i = pack_state_tuple(item, indent+2)\n", " l.append(i)\n", " \n", " #convert the list of [Tensor(bsz,a), Tensor(bsz,b), ...] Into one long Tensor (bsz, a-b-c-...)\n", " return tf.concat(l,1)" ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [], "source": [ "def unpack_state_tuple(state_tensor, sizes):\n", " \"\"\"The inverse of pack, given a packed_states vector of (batch_size,x) return the LSTMStateTuple \n", " datastructure that can be used as initial state for tf.nn.dynamic_rnn(...) \n", " sizes is the network state size list (cell.state_size)\n", " \"\"\"\n", "\n", " def _unpack_state_tuple( sizes_, offset_, indent):\n", " if isinstance(sizes_, tf.Tensor) or not hasattr(sizes_, '__iter__'): \n", " #get a small part (batch size, c or h size of LSTMStateTuple) of the packed state vector of shape (batch size, network states)\n", " v = tf.reshape(state_tensor[:, offset_ : (offset_ + sizes_) ], (-1, sizes_)), offset_ + sizes_\n", " return v\n", " else:\n", " result = []\n", " #Top level: sizes is a tuple of size network layers, each element of the tuple is an LSTMStateTuple(c size, h size)\n", " #Recursive call: sizes_ is a LSTMStateTuple\n", " for size in sizes_:\n", " #size is an LSTMStateTuple (toplevel)\n", " #or size is c size or h size (recursive call)\n", " s, offset_ = _unpack_state_tuple( size, offset_, indent+2)\n", " result.append(s)\n", " \n", " if isinstance(sizes_, tf.nn.rnn_cell.LSTMStateTuple):\n", " #end of recursive call\n", " #Build a LSTMStateTuple using the c size and h size elements in the result list\n", " #print(result[0].shape)\n", " #print(result[1].shape)\n", " #x1=result[0]+1\n", " #x2=result[1]+1\n", " return tf.nn.rnn_cell.LSTMStateTuple(*result), offset_\n", " else:\n", " # end of toplevel call\n", " # create a tuple of size network layers. Result is a list of LSTMStateTuple\n", " return tuple(result), offset_\n", " \n", " return _unpack_state_tuple( sizes, 0,0)[0]\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "scrolled": true }, "source": [ "### Testing the packing/unpacking functions\n", "Next I wrote a check to see if the pack and unpack functions are indeed each others inverse. The vectors should be packed/unpacked in the correct order. The idea is to create 'packed' vector containing the values 0..n. Then unpack and repack. The output value should be equal to the original vector. " ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(LSTMStateTuple(c=16, h=16), LSTMStateTuple(c=16, h=16))\n", "diff 0.0\n" ] } ], "source": [ "\n", "\n", "#Test pack and unpack\n", "\n", "#create a placeholder in which we can feeisd packed states (vector of (batch_size, states) as initial_state\n", "state_packed_in = tf.placeholder(\n", " tf.float32, \n", " (None,get_network_state_size(hidden_network)), \n", " name=\"state_packed_1\")\n", "\n", "print(hidden_network.state_size)\n", "#Unpack the packed states\n", "state_unpacked_out = unpack_state_tuple(state_packed_in,hidden_network.state_size)\n", "#Repack the unpacked states\n", "state_packed_out = pack_state_tuple(state_unpacked_out)\n", "\n", "\n", "inputs_batch_size = 1\n", "a_batch_of_inputs = np.zeros((inputs_batch_size, sequence_length, input_feature_count))\n", "\n", "#create an initial state vector and fill it with test data\n", "an_initial_state = np.zeros((inputs_batch_size*get_network_state_size(hidden_network),1))\n", "an_initial_state[:,0] = np.linspace(start=0,stop=an_initial_state.shape[0]-1,num=an_initial_state.shape[0])\n", "#reshape it as an packed state \n", "an_initial_state_packed = np.reshape(an_initial_state, (inputs_batch_size,get_network_state_size(hidden_network)))\n", "\n", "\n", "init=tf.global_variables_initializer()\n", "\n", "config = tf.ConfigProto(\n", " device_count = {'GPU': 0}\n", " )\n", "\n", "with tf.Session(config=config) as sess:\n", " sess.run(init)\n", " up,p = sess.run([state_unpacked_out, state_packed_out], feed_dict={state_packed_in: an_initial_state_packed})\n", " \n", " # compare the original packed states with the ones the were unpacked and then repacked\n", " diff = an_initial_state_packed - p\n", " # should return 0\n", " sum = np.sum(np.abs(diff))\n", " print(\"diff\",sum)\n", " assert(sum==0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initial state\n", "\n", "Create a placeholder for initial packed states. This makes it possible to supply the initial states to the LSTM network as a simple vector. Then add a unpack operation to the computation graph. This outputs the initial state as a LSTMTuple vector which can be used by the dynamic RNN function later on." ] }, { "cell_type": "code", "execution_count": 100, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "states in network 64\n" ] } ], "source": [ "sz = get_network_state_size(hidden_network)\n", "print(\"states in network\", sz)\n", "\n", "\n", "initial_state_packed = tf.placeholder(\n", " tf.float32, \n", " (None,sz), \n", " name=\"initial_state\")\n", "\n", "state_unpacked = unpack_state_tuple(initial_state_packed,hidden_network.state_size)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Forward propagation\n", "\n", "Define the forward calculations by using the dynamic_rnn function. This function needs and outputs the network states in unpacked format. " ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "inputs (?, 100, 1)\n", "packed state (?, 64)\n", "outputs before transpose (?, 100, 16)\n", "outputs after transpose (100, ?, 16)\n", "last output (?, 16)\n", "prediction (?, 1)\n", "targets (?, 1)\n" ] } ], "source": [ "#out_weights=tf.Variable(tf.random_normal([hidden_count_per_layer[-1],output_feature_count]))\n", "#out_bias=tf.Variable(tf.random_normal([output_feature_count]))\n", "print(\"inputs \",inputs.shape)\n", "outputs, state_unpacked_network_out = tf.nn.dynamic_rnn(hidden_network, inputs, initial_state = state_unpacked, dtype=tf.float32) #, initial_state=rnn_tuple_state, )\n", "\n", "state_packed_network_out = pack_state_tuple(state_unpacked_network_out)\n", "print(\"packed state\", state_packed_network_out.shape)\n", "print(\"outputs before transpose\", outputs.shape)\n", "outputs = tf.transpose(outputs, [1, 0, 2])\n", "print(\"outputs after transpose\", outputs.shape)\n", "#last_output = tf.gather(outputs, int(outputs.get_shape()[0]) - 1)\n", "last_output = outputs[outputs.shape[0]-1,:,:]\n", "print(\"last output\", last_output.shape)\n", " \n", "#out_size = target.get_shape()[2].value\n", "predictions = tf.contrib.layers.fully_connected(last_output, output_feature_count, activation_fn=None)\n", "print(\"prediction\", predictions.shape)\n", "print(\"targets\", targets.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Backward pass, training\n", "\n", "Define the loss as the total of the squared differences between the the last output (prediction) and the target. " ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [], "source": [ "loss = tf.reduce_sum(tf.squared_difference(predictions, targets))" ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [], "source": [ "opt=tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Defining the train, dev and test set\n", "\n", "Generally you would define 3 sets:\n", "* A set to train on: Train set\n", "* A set to tune the hyper parameters on: Dev set\n", "* A set to test the generalization performance of the network: Test set\n", "\n", "In the case of sine wave this is a bit useless. The dev set and test set overlap because of the periodic nature of the sine wave. I added noise to the source signal to make the train, dev and test set at least partly independent." ] }, { "cell_type": "code", "execution_count": 104, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dataset size 49899\n", "29941 Examples (233 batches) in train set\n", "9979 Examples (77 batches) in dev set\n", "9979 Examples (77 batches) in test set\n" ] } ], "source": [ "\n", "start_indices = np.linspace(\n", " 0,\n", " sample_length-sequence_length-prediction_length-1,\n", " sample_length-sequence_length-prediction_length-1, dtype= np.int32)\n", "\n", "#When you have many examples then you can get away with tiny sizes for the dev and test set.\n", "dev_size_perc = 0.20\n", "test_size_perc = 0.20\n", "batch_size = 128 #512 \n", "\n", "dev_size = int(np.floor(start_indices.shape[0] * dev_size_perc))\n", "test_size = int(np.floor(start_indices.shape[0] * test_size_perc))\n", "train_size = start_indices.shape[0] - test_size - dev_size\n", "train_batch_count = int(np.floor(train_size / batch_size))\n", "dev_batch_count = int(np.floor(dev_size / batch_size))\n", "test_batch_count = int(np.floor(test_size / batch_size))\n", "\n", "print(\"dataset size %d\" %(start_indices.shape[0]))\n", "print(\"%d Examples (%d batches) in train set\" %(train_size, train_batch_count))\n", "print(\"%d Examples (%d batches) in dev set\" %(dev_size,dev_batch_count))\n", "print(\"%d Examples (%d batches) in test set\" %(test_size,test_batch_count))\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating batches\n", "\n", "The network will be trained using mini batches. This speeds up training because a network training step is performed after each mini batch in contrast to updating after presenting the complete training set." ] }, { "cell_type": "code", "execution_count": 105, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(128, 100, 1) (128, 1)\n", "(100, 1)\n" ] }, { "data": { "application/vnd.plotly.v1+json": { "data": [ { "type": "scatter", "y": [ -0.9560117688132573, -0.9421164347887656, -0.8988136126338201, -0.8513349104246972, -0.8544198569929298, -0.7803452354483316, -0.7256461170430385, -0.7191465268955911, -0.6951249029053177, -0.6751107526059829, -0.5881583107538348, -0.5407795498768573, -0.46528526732238595, -0.43820244827846117, -0.3725832359386644, -0.29943478333320545, -0.2324034571694221, -0.1935671937642121, -0.1372325153626906, -0.05447924767110787, -0.02169876490393159, 0.06899672631162272, 0.1218714553921927, 0.1912800723180364, 0.24856881667118128, 0.31631440502224717, 0.3190319910226475, 0.394691935411884, 0.5004376026770484, 0.5392706564081142, 0.5520864055982044, 0.6595065516668368, 0.6654014034551473, 0.7514948870742842, 0.762440078984419, 0.7768079090454695, 0.8544751854343734, 0.8739717861970769, 0.8723754422138353, 0.9291585649711269, 0.9236801473738719, 0.9819534979270433, 0.9705865003537004, 0.9997151801247852, 0.9887943503838635, 1.004829944534516, 0.9817294014928928, 0.9887546866849655, 0.9917311398109957, 0.9541108562055043, 0.9614451196842572, 0.9405526447882707, 0.897278853623295, 0.8981919101304301, 0.8219912193573907, 0.7789378959051533, 0.7713641563883145, 0.7040863049793751, 0.6938597287776168, 0.6694571204170334, 0.5725560824581446, 0.5616012747628782, 0.48912549447176074, 0.4088948850824607, 0.35189312499684133, 0.3012681209982643, 0.20486597673790521, 0.1928923778973501, 0.11533738242124261, 0.10188681403635758, -0.02908506619812099, -0.030810102915796483, -0.09632828129682025, -0.2157235535523599, -0.25508185870569755, -0.29354976664056737, -0.3708632069794366, -0.4623529440038048, -0.4862112229529517, -0.5240350726967838, -0.5557277534814314, -0.6274993240254559, -0.7023686143845357, -0.7578749758097933, -0.7869289207363924, -0.811099485940562, -0.8271280504774299, -0.8591871175606695, -0.9097188587851819, -0.9540627125861534, -0.9537830389168436, -0.9383518240844296, -0.9630266126166703, -1.054923959424271, -0.9720920534280412, -1.0172176184142414, -1.0033386261341144, -0.9876709286141822, -0.9918137646673671, -0.9594167158536987 ] } ], "layout": { "title": "" } }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#A batch of examples can start at an arbitrary index in the source signal. \n", "# Shuffle the indices to fill the train. dev and test set with different sequences\n", "\n", "np.random.shuffle (start_indices)\n", "train_indices = start_indices[0:int(train_size)]\n", "dev_indices= start_indices[int(train_size):int(train_size+dev_size)]\n", "test_indices = start_indices[int(train_size+dev_size):int(train_size+dev_size+test_size)]\n", "\n", "def get_batch(batch_index, indexes, size=batch_size):\n", " batch_start_indexes = indexes[batch_index*size:batch_index*size+size]\n", " batch_inputs = np.zeros((size,sequence_length, input_feature_count))\n", " batch_targets = np.zeros((size,prediction_length))\n", " for i in range(size):\n", " se = batch_start_indexes[i]\n", " part = signal_amp[se:se+sequence_length]\n", " batch_inputs[i,0:sequence_length,0] = part\n", " batch_targets[i,0] = signal_amp[se+sequence_length+1]\n", "\n", " return batch_inputs,batch_targets\n", "\n", "batch_inputs,batch_targets = get_batch(train_batch_count-1,train_indices)\n", "print(batch_inputs.shape,batch_targets.shape)\n", "\n", "example_inputs = batch_inputs[0,:,:]\n", "example_targets = batch_targets[0,:]\n", "print(example_inputs.shape)\n", "\n", "#plot a single example\n", "b_i = 1\n", "b_s = batch_inputs[b_i,0:sequence_length,0]\n", "plotly.offline.iplot({\n", " \"data\": [Scatter(y=b_s)],\n", " \"layout\": Layout(title=\"\")\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test training using a single batch\n", "In the next cell I check if I can train the network on one single batch. Just to check if the optimizer is indeed able to train the network. Successful training should decrease the loss. In the output you will see the loss decreasing (first column)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "np.random.shuffle (start_indices)\n", "train_indices = start_indices[0:int(train_size)]\n", "dev_indices= start_indices[int(train_size):int(train_size+dev_size)]\n", "test_indices = start_indices[int(train_size+dev_size):int(train_size+dev_size+test_size)]\n", "\n", "zero_state_packed = np.zeros((batch_size, get_network_state_size(hidden_network)))\n", "\n", "\n", "init=tf.global_variables_initializer()\n", "with tf.Session() as sess:\n", " sess.run(init)\n", "\n", " np.random.shuffle (train_indices)\n", " \n", " batch_inputs,batch_targets = get_batch(0, train_indices)\n", " print(\"batch input shape\", batch_inputs.shape)\n", " #v_outputs, v_state = sess.run([outputs,state], feed_dict={inputs: batch_inputs, targets: batch_targets})\n", " v_predictions, v_state_unpacked = sess.run([predictions, state_unpacked_network_out], \n", " feed_dict={\n", " inputs: batch_inputs, \n", " targets: batch_targets,\n", " initial_state_packed: zero_state_packed\n", " })\n", " print(v_predictions.shape)\n", " print(v_predictions[0],batch_targets[0])\n", " for i in range(0,120):\n", " v_predictions, v_outputs, v_state_unpacked, v_loss, v_opt = sess.run(\n", " [predictions, outputs, state_unpacked_network_out, loss, opt], \n", " feed_dict={\n", " learning_rate: 0.02, \n", " inputs: batch_inputs, \n", " targets: batch_targets,\n", " state_unpacked: v_state_unpacked\n", " }) #})\n", " if i % 10 == 0:\n", " print(v_loss,v_predictions[0],batch_targets[0])\n", " \n", "\n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training and Testing\n", "\n", "Finally we can train and test the network. The training consists of 'epochs' during which all training batches are presented. After presenting a single training batch the network is immediately optimized. After an epoch the loss is calculated over the dev set and printed. \n", "\n", "Next a graph is plotted which shows an example of the network predicting a sine wave. The prediction is based on first 'priming' the network by presenting part of a sine. \n", "\n", "After completing training on a number of epochs. The last predictions is executed over a longer time period." ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "(?, 1)\n", "Tensor(\"rnn/while/rnn/multi_rnn_cell/cell_1/cell_1/lstm_cell/mul_8:0\", shape=(?, 16), dtype=float32)\n", "(LSTMStateTuple(c=, h=), LSTMStateTuple(c=, h=))\n" ] } ], "source": [ "init=tf.global_variables_initializer()\n", "print(hidden_network)\n", "with tf.Session() as sess:\n", " sess.run(init)\n", " \n", " seq_state_packed = np.zeros((1, get_network_state_size(hidden_network)))\n", "\n", " print(single_input.shape)\n", " outp, sp = hidden_network(inputs=single_input, state=state_unpacked)\n", " print(outp)\n", " print(sp)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def generate_graph(graph_size=200):\n", " \"\"\"Use the network to generate a graph\"\"\"\n", " \n", " #The network will be primed using prime_size samples of the original signal\n", " prime_size = 50\n", " prime_signal_start_i = 0\n", " \n", " #put prime_size samples of the original signal in tmp_singal\n", " orig_signal = np.zeros((graph_size,1))\n", " tmp_signal = np.zeros((graph_size,1))\n", " tmp_signal[0:prime_size,0] = signal_amp[prime_signal_start_i:(prime_signal_start_i+prime_size)]\n", " orig_signal[0:graph_size,0] = signal_amp[prime_signal_start_i:(prime_signal_start_i+graph_size)]\n", " \n", " #create a sequence for a batch_size of 1\n", " seq = np.zeros((1,sequence_length,1))\n", " seq_state_packed = np.zeros((1, get_network_state_size(hidden_network)))\n", " \n", " _state_unpacked = None\n", " #generate the graph\n", " for end in range(prime_size, graph_size):\n", " #get a sequence to present to the network\n", " seq[0,:,0] = tmp_signal.take(range((end-sequence_length),end), mode='wrap')\n", " \n", " #get a prediction\n", " seq_state_packed , _prediction = sess.run(\n", " [state_packed_network_out, predictions[0,0]], \n", " feed_dict={\n", " initial_state_packed: seq_state_packed,\n", " inputs: seq})\n", " #put the prediction in the graph\n", " tmp_signal[end,0] = _prediction\n", " sys.stdout.write('.')\n", " sys.stdout.flush()\n", " print(\"\")\n", " plotly.offline.iplot({\n", " \"data\": [Scatter(name=\"predicted\",y=tmp_signal[:,0]),Scatter(name=\"original\",y=orig_signal[:,0])],\n", " \"layout\": Layout(title=\"\")})\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "np.random.shuffle (start_indices)\n", "\n", "#create a randomized train, dev and test set\n", "train_indices = start_indices[0:int(train_size)]\n", "dev_indices= start_indices[int(train_size):int(train_size+dev_size)]\n", "test_indices = start_indices[int(train_size+dev_size):int(train_size+dev_size+test_size)]\n", "\n", "\n", "#initialization of the network states for a single mini batch, by setting them to zero\n", "#You could also initialize by using random states.\n", "batch_zero_state_packed = np.zeros((batch_size, get_network_state_size(hidden_network)))\n", "\n", "\n", "epoch_count = 10\n", "\n", "#Store the performance over the dev set in loss_results\n", "loss_results = np.zeros((epoch_count,2))\n", "\n", "def get_loss(set_name, bsz, example_set_indices):\n", " \"\"\"Calculate a score over all batches in a set\"\"\"\n", " epoch_loss = 0.0\n", " for example_index in range(bsz):\n", " batch_inputs,batch_targets = get_batch(example_index, example_set_indices)\n", "\n", " batch_loss = sess.run(loss,feed_dict={\n", " inputs:batch_inputs,\n", " targets:batch_targets,\n", " initial_state_packed: batch_zero_state_packed\n", " })\n", " if example_index % 20 == 0:\n", " print(\" %s results batch %d, loss %s\" %( set_name, example_index, str(batch_loss))) \n", "\n", " epoch_loss += batch_loss\n", " return epoch_loss / len(example_set_indices)\n", "\n", "\n", "\n", "init=tf.global_variables_initializer()\n", "with tf.Session() as sess:\n", " sess.run(init)\n", "\n", "\n", "\n", " for epoch in range(0,epoch_count):\n", " print(\"Epoch %d\" %(epoch))\n", " #in every epoch go through the training set in a different order\n", " np.random.shuffle (train_indices)\n", " print(\"Train\")\n", " for ti in range(train_batch_count):\n", " batch_inputs,batch_targets = get_batch(ti, train_indices)\n", "\n", " #train the network\n", " #I reset the state to zero for each batch. \n", " batch_train_loss, _ = sess.run([loss, opt], \n", " feed_dict={\n", " learning_rate: 0.00005, \n", " inputs: batch_inputs, \n", " targets: batch_targets,\n", " initial_state_packed: batch_zero_state_packed\n", " })\n", " sys.stdout.write('.')\n", " sys.stdout.flush()\n", " print(\"\")\n", " epoch_train_loss = get_loss(\"Train\", train_batch_count, train_indices)\n", " print(\"Training results epoch %d, loss %s\" %( epoch, str(epoch_train_loss)))\n", " epoch_dev_loss = get_loss(\"Dev\", dev_batch_count, dev_indices) \n", " print(\"Dev results epoch %d, loss %s\" %( epoch, str(epoch_dev_loss))) \n", " loss_results[epoch,0] = epoch_train_loss\n", " loss_results[epoch,1] = epoch_dev_loss\n", " ti += 1\n", " generate_graph()\n", " \n", " #generate a last long graph\n", " generate_graph(graph_size=1000)\n", " \n", " plotly.offline.iplot({\n", " \"data\": [Scatter(name=\"loss train\",y=loss_results[:,0]),Scatter(name=\"loss dev\",y=loss_results[:,1])],\n", " \"layout\": Layout(title=\"\")})\n", "\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion, next steps\n", "\n", "When playing around with the parameters I found out that often the network performed better on the dev set than the train set! Clearly overlapping dev and train sets are nonsensical. Also the network seems to optimize on predicting the amplitude but not frequency. Probably the frequency can be better predicted by training on more then only one value (last_output).\n", "\n", "This example gave me a good overview of some TensorFlow features. More generally there are so many hyper parameters to choose when building a network architecture:\n", " * basic parameters: network size, learning rate, drop-out, optimization method\n", " * how to choose initial state\n", " * predict one sample, or multiple samples\n", " * loss function\n", " \n", "It would be interesting to automatically tune the hyper-parameters as well. Maybe using genetic networks?\n", "\n", "I am planning to use the approach in this article to process sampled sound waves. Things that cross my mind:\n", "\n", " * Apply on raw audio\n", " * Sample microphone via WebAudo, send the samples to the notebook via WebSocket, analyze and feed the result back\n", " * Implement a phase vocodor, instead of raw audio, input the frequency features\n", " * Achieve something like [this](https://deepmind.com/blog/wavenet-generative-model-raw-audio/)\n", " * Process a MIDI file\n", " * Generate text\n", " * Train on multi-feature sequence (eg. audio and corresponding text)\n", "\n", "Stay tuned...." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }