{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 6.1 Sequence Models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- People’s opinions on movies can change quite significantly over time. In fact, psychologists even have names for some of the effects:\n", " - ***Anchoring*** based on someone else’s opinion.\n", " - For instance after the Oscara wards, ratings for the corresponding movie go up, even though it’s still the same movie. This effect persists for a few months until the award is forgotten. \n", " - ***Hedonic adaptation*** where humans quickly adapt to accept an improved (or a bad) situation as the new normal. - For instance, after watching many good movies, the expectations that the next movie be equally good or better are high, and even an average movie might be considered a bad movie after many great ones.\n", " - ***Seasonality***\n", " - Very few viewers like to watch a Santa Claus movie in August.\n", " - In some cases movies become unpopular due to the misbehaviors of directors or actors in the production.\n", " - Some movies become cult movies, because they were almost comically bad. \n", "- Other examples\n", " - Many users have highly particular behavior when it comes to the time when they open apps. \n", " - For instance, social media apps are much more popular after school with students. \n", " - Stock market trading apps are more commonly used when the markets are open.\n", " - It is much harder to predict tomorrow's stock prices than to fill in the blanks for a stock price we missed yesterday\n", " - In statistics the former is called prediction whereas the latter is called filtering.\n", " - After all, hindsight is so much easier than foresight. \n", " - Music, speech, text, movies, steps, etc. are all sequential in nature. \n", " - If we were to permute them they would make little sense. \n", " - The headline dog bites man is much less surprising than man bites dog, even though the words are identical.\n", " - Earthquakes are strongly correlated, i.e. after a massive earthquake there are very likely several smaller aftershocks, much more so than without the strong quake. \n", " - In fact, earthquakes are spatiotemporally correlated, i.e. the aftershocks typically occur within a short time span and in close proximity.\n", " - Humans interact with each other in a sequential nature, as can be seen in Twitter fights, dance patterns and debates.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.1.1 Statistical Tools\n", "![](https://github.com/d2l-ai/d2l-en/raw/master/img/ftse100.png)\n", "\n", "- Let's denote the prices by $x_t \\geq 0$, i.e. at time $t \\in \\mathbb{N}$ we observe some price $x_t$. \n", "- For a trader to do well in the stock market on day $t$ he should want to predict $x_t$ via $$x_t \\sim p(x_t|x_{t-1}, \\ldots x_1).$$\n", "- Autoregressive Models\n", " - In order to achieve this, our trader could use a regressor.\n", " - There's just a major problem - the number of inputs, $x_{t-1}, \\ldots x_1$ varies, depending on $t$. \n", " - That is, the number increases with the amount of data that we encounter\n", " - We need an approximation to make this computationally tractable. Two strategies:\n", " - 1) Assume that the potentially rather long sequence $x_{t-1}, \\ldots x_1$ isn't really necessary. \n", " - In this case we might content ourselves with some timespan $\\tau$ and only use $x_{t-1}, \\ldots x_{t-\\tau}$ observations. \n", " - The number of arguments is always the same, at least for $t > \\tau$. \n", " - Such models will be called ***autoregressive models***, as they quite literally perform regression on themselves.\n", " - 2) Try and keep some summary $h_t$ of the past observations around and update that in addition to the actual prediction. \n", " - This leads to models that estimate $x_t|x_{t-1}, h_{t-1}$ and moreover updates of the form $h_t = g(h_t, x_t)$. \n", " - Since $h_t$ is never observed, these models are also called ***latent autoregressive models***. \n", " - LSTMs and GRUs are exampes of this.\n", " - Both cases raise the obvious question how to generate training data. \n", " - One typically uses historical observations to predict the next observation given the ones up to right now.\n", " - However, a common assumption is that while the specific values of $x_t$ might change, at least the dynamics of the time series itself won't. \n", " - This is reasonable, since novel dynamics are not predictable using data we have so far. \n", " - Statisticians call dynamics that don't change ***stationary***. \n", " - Regardless of what we do, we will thus get an estimate of the entire time series via $$p(x_1, \\ldots x_T) = \\prod_{t=1}^T p(x_t|x_{t-1}, \\ldots x_1).$$\n", " - Note that the above considerations still hold even if we deal with discrete objects, such as words, rather than numbers. \n", " - The only difference is that in such a situation we need to use a classifier rather than a regressor to estimate $p(x_t| x_{t-1}, \\ldots x_1)$.\n", " \n", "- Markov Model\n", " - In an autoregressive model, we use only $(x_{t-1}, \\ldots x_{t-\\tau})$ instead of $(x_{t-1}, \\ldots x_1)$ to estimate $x_t$.\n", " - Whenever this approximation is accurate we say that the sequence satisfies a ***Markov condition***. \n", " - In particular, if $\\tau = 1$, we have a first order Markov model and $p(x)$ is given by $$p(x_1, \\ldots x_T) = \\prod_{t=1}^T p(x_t|x_{t-1}).$$\n", " - Such models are particularly nice whenever $x_t$ assumes only discrete values, since in this case dynamic programming can be used to compute values along the chain exactly. \n", " - For instance, we can compute $x_{t+1}|x_{t-1}$ efficiently using the fact that we only need to take into account a very short history of past observations. $$p(x_{t+1}|x_{t-1}) = \\sum_{x_t} p(x_{t+1}|x_t) p(x_t|x_{t-1})$$\n", "\n", "- Causality\n", " - In principle, there's nothing wrong with unfolding $p(x_1, \\ldots x_T)$ in reverse order. $$p(x_1, \\ldots x_T) = \\prod_{t=T}^1 p(x_t|x_{t+1}, \\ldots x_T).$$\n", "\n", " - In fact, if we have a Markov model we can obtain a reverse conditional probability distribution, too. \n", " - In many cases, however, there exists a natural direction for the data, namely going forward in time. \n", " - It is clear that future events cannot influence the past. \n", " - If we change $x_t$, we may be able to influence what happens for $x_{t+1}$ going forward but not the converse. \n", " - If we change $x_t$, the distribution over past events will not change. \n", " - It ought to be easier to explain $x_{t+1}|x_t$ rather than $x_t|x_{t+1}$. \n", " - For instance, Hoyer et al., 2008 show that in some cases we can find $x_{t+1} = f(x_t) + \\epsilon$ for some additive noise, whereas the converse is not true. \n", " - This is great news, since it is typically the forward direction that we're interested in estimating. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.1.2 Toy Example\n", "- Let’s begin by generating ‘time series’ data by using a sine function with some additive noise." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from mxnet import autograd, nd, gluon, init\n", "import gluonbook as gb\n", "# display routines\n", "%matplotlib inline\n", "from matplotlib import pyplot as plt\n", "from IPython import display\n", "display.set_matplotlib_formats('svg')\n", "\n", "embedding = 4 # embedding dimension for autoregressive model\n", "\n", "T = 1000 # generate a total of 1000 points \n", "time = nd.arange(0,T)\n", "x = nd.sin(0.01 * time) + 0.2 * nd.random.normal(shape=(T))\n", "\n", "plt.plot(time.asnumpy(), x.asnumpy());" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Next we need to turn this 'time series' into data the network can train on. \n", "- Based on the embedding dimension $\\tau$ we map the data into pairs $y_t = x_t$ and $\\mathbf{z}_t = (x_{t-1}, \\ldots x_{t-\\tau})$. \n", "- The astute reader might have noticed that this gives us $\\tau$ fewer datapoints, since we don't have sufficient history for the first $\\tau$ of them. \n", " - A simple fix is to discard those few terms. \n", " - Alternatively we could pad the time series with zeros. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "features = nd.zeros((T-embedding, embedding)) # (1000 - 4, 4) = (996, 4)\n", "\n", "# features[:, 0] = x[0:996]\n", "# features[:, 1] = x[1:997]\n", "# features[:, 2] = x[2:998]\n", "# features[:, 3] = x[3:999]\n", "for i in range(embedding):\n", " features[:, i] = x[i:T - embedding + i]\n", "\n", "# labels = x[4:]\n", "labels = x[embedding:]\n", "\n", "ntrain = 600\n", "train_data = gluon.data.ArrayDataset(features[:ntrain,:], labels[:ntrain])\n", "test_data = gluon.data.ArrayDataset(features[ntrain:,:], labels[ntrain:])\n", "\n", "# vanilla MLP architecture\n", "def get_net():\n", " net = gluon.nn.Sequential()\n", " net.add(gluon.nn.Dense(10, activation='relu'))\n", " net.add(gluon.nn.Dense(10, activation='relu'))\n", " net.add(gluon.nn.Dense(1))\n", " net.initialize(init=init.Xavier(), force_reinit=True)\n", " return net\n", "\n", "# least mean squares loss\n", "loss = gluon.loss.L2Loss()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "epoch 1, loss: 0.026710\n", "epoch 2, loss: 0.025081\n", "epoch 3, loss: 0.025592\n", "epoch 4, loss: 0.026057\n", "epoch 5, loss: 0.027615\n", "epoch 6, loss: 0.024617\n", "epoch 7, loss: 0.023896\n", "epoch 8, loss: 0.024280\n", "epoch 9, loss: 0.024480\n", "epoch 10, loss: 0.026319\n", "test loss: 0.028010\n" ] } ], "source": [ "# simple optimizer using adam, random shuffle and minibatch size 16\n", "def train_net(net, data, loss, epochs, learning_rate):\n", " batch_size = 16\n", " trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': learning_rate})\n", " data_iter = gluon.data.DataLoader(data, batch_size, shuffle=True)\n", "\n", " for epoch in range(1, epochs + 1):\n", " for X, y in data_iter:\n", " with autograd.record():\n", " l = loss(net(X), y)\n", " l.backward()\n", " trainer.step(batch_size)\n", " l = loss(net(data[:][0]), nd.array(data[:][1]))\n", " print('epoch %d, loss: %f' % (epoch, l.mean().asnumpy()))\n", " return net\n", "\n", "net = get_net()\n", "net = train_net(\n", " net=net, \n", " data=train_data, \n", " loss=loss, \n", " epochs=10, \n", " learning_rate=0.01\n", ")\n", "\n", "l = loss(net(test_data[:][0]), nd.array(test_data[:][1]))\n", "print('test loss: %f' % l.mean().asnumpy())" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "estimates = net(features)\n", "plt.plot(time.asnumpy(), x.asnumpy(), label='data');\n", "plt.plot(time[embedding:].asnumpy(), estimates.asnumpy(), label='estimate');\n", "plt.legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.1.3 Predictions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- If we observe data only until time step 600, we cannot hope to receive the ground truth for all future predictions. \n", "- Instead, we need to work our way forward one step at a time:\n", "\n", "$$\\begin{aligned} x_{601} & = f(x_{600}, \\ldots, x_{597}) \\\\ x_{602} & = f(x_{601}, \\ldots, x_{598}) \\\\ x_{603} & = f(x_{602}, \\ldots, x_{599}) \\end{aligned}$$\n", "\n", "- In other words, very quickly will we have to use our own predictions to make future predictions." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(996, 1)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "estimates.shape" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "predictions = nd.zeros_like(estimates)\n", "\n", "# ntrain - embedding = 600 - 4 = 596\n", "predictions[:(ntrain-embedding)] = estimates[:(ntrain-embedding)]\n", "\n", "# T - embedding = 996\n", "for i in range(ntrain-embedding, T-embedding):\n", " predictions[i] = net(predictions[(i-embedding):i].reshape(1,-1)).reshape(1)\n", " \n", "plt.plot(time.asnumpy(), x.asnumpy(), label='data');\n", "plt.plot(time[embedding:].asnumpy(), estimates.asnumpy(), label='estimate');\n", "plt.plot(time[embedding:].asnumpy(), predictions.asnumpy(), label='multistep');\n", "plt.legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- This is ultimately due to the fact that errors build up. \n", "- Let's say that after step 1 we have some error $\\epsilon_1 = \\bar\\epsilon$. \n", "- Now the input for step 2 is perturbed by $\\epsilon_1$, hence we suffer some error in the order of $\\epsilon_2 = \\bar\\epsilon + L \\epsilon_1$, and so on. \n", "- The error can diverge rather rapidly from the true observations. \n", "- This is a common phenomenon - for instance weather forecasts for the next 24 hours tend to be pretty accurate but beyond that their accuracy declines rapidly.\n", "- Let’s verify this observation by computing the k-step predictions on the entire sequence." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "k = 33 # look up to k - embedding steps ahead\n", "\n", "# T-k = 1000-33 = 967\n", "features = nd.zeros((T-k, k))\n", "\n", "# features[:, 0] = x[0:967]\n", "# features[:, 1] = x[1:968]\n", "# features[:, 2] = x[2:969]\n", "# features[:, 3] = x[3:970]\n", "for i in range(embedding):\n", " features[:,i] = x[i:T-k+i]\n", "\n", "# features[:, 4] = net(features[:, 0:4]).reshape((-1))\n", "# features[:, 5] = net(features[:, 1:5]).reshape((-1))\n", "# ...\n", "# features[:, 32] = net(features[:, 28:32]).reshape((-1))\n", "for i in range(embedding, k):\n", " features[:,i] = net(features[:,(i-embedding):i]).reshape((-1))\n", " \n", "for i in (4, 8, 16, 32): \n", " plt.plot(time[i:T-k+i].asnumpy(), features[:,i].asnumpy(), label=('step ' + str(i)))\n", "plt.legend();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "- Sequence models require specialized statistical tools for estimation. \n", " - Two popular choices are autoregressive models and latent-variable autoregressive models.\n", "- As we predict further in time, the errors accumulate and the quality of the estimates degrades, often dramatically.\n", "- There’s quite a difference in difficulty between filling in the blanks in a sequence (smoothing) and forecasting. \n", " - Consequently, if you have a time series, always respect the temporal order of the data when training, i.e. never train on future data.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 6.2 Language Models" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" } }, "nbformat": 4, "nbformat_minor": 2 }