{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "# TextCNN on NLP\n", "\n", "Traditionally, for Natural Language Processing (NLP), we used recurrent neural networks (RNN) to process the text data. In fact, we can also treat text as a one-dimensional image, so that we can use one-dimensional convolutional neural networks (CNN) to capture associations between adjacent words. \n", "\n", "![A one-dimensional word vectors.](https://nbviewer.jupyter.org/format/slides/github/goldmermaid/gtc2020/blob/master/Notebooks/1d-vector.svg)\n", "\n", "This notebook describes a groundbreaking approach to applying convolutional neural networks to text analysis: textCNN [by Kim et al.](https://arxiv.org/abs/1408.5882). " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "First, import the environment packages and modules." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-07-03T22:12:43.185492Z", "start_time": "2019-07-03T22:12:41.569269Z" }, "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "import d2l\n", "from mxnet import gluon, init, np, npx\n", "from mxnet.contrib import text\n", "from mxnet.gluon import nn\n", "npx.set_np()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## The Dataset\n", "\n", "Here we use Stanford’s [Large Movie Review Dataset](https://ai.stanford.edu/~amaas/data/sentiment/) as the dataset for sentiment analysis. \n", "\n", "The training and testing dataset each contains 25,000 movie reviews downloaded from IMDb, respectively. In addition, the number of comments labeled as “positive” and “negative” is equal in each dataset.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "For the purpose of simplicity, we are using a built-in function `load_data_imdb` in the d2l package to load the dataset. If you are interested in the preprocessing of the full dataset, please check [more detail](https://d2l.ai/chapter_natural-language-processing/sentiment-analysis.html) at D2L.ai." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " X_batch has shape (64, 500), and y_batch has shape (64,)\n" ] } ], "source": [ "batch_size = 64\n", "train_iter, test_iter, vocab = d2l.load_data_imdb(batch_size)\n", "\n", "for X_batch, y_batch in train_iter:\n", " print(\"\\n X_batch has shape {}, and y_batch has shape {}\".format(X_batch.shape, y_batch.shape))\n", " break" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## The TextCNN Model's Skeleton\n", "\n", "![An example to illustrate the textCNN.](https://d2l.ai/_images/textcnn.svg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "TextCNN involves the following steps:\n", "\n", "1. Performing multiple **one-dimensional convolution** kernels on the input text sequences;\n", "2. Applying **max-over-time pooling** on the previous output channels, and then concatenate to one vector;\n", "3. Using the **fully connected layer** (aka. dense layer) and **dropout** on the previous outputs.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### 1. One-Dimensional Convolutional Layer\n", "\n", "Like a two-dimensional convolutional layer, a one-dimensional convolutional layer uses a one-dimensional cross-correlation operation. \n", "\n", "In the one-dimensional cross-correlation operation, the convolution window slides on the input array from left to right successively. \n", "\n", "At a certain position, the input array in the convolution window and kernel array are elementwise multiplied and summed to obtain the output array. \n", "\n", "![One-dimensional cross-correlation operation.](https://raw.githubusercontent.com/d2l-ai/d2l-en/master/img/conv1d.svg?sanitize=true)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Now let's implement one-dimensional cross-correlation in the `corr1d` function. It accepts the input array X and kernel array K, then it outputs the array Y." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2019-07-03T22:12:43.320777Z", "start_time": "2019-07-03T22:12:43.314285Z" }, "attributes": { "classes": [], "id": "", "n": "70" }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "def corr1d(X, K):\n", " w = K.shape[0]\n", " Y = np.zeros((X.shape[0] - w + 1))\n", " for i in range(Y.shape[0]):\n", " Y[i] = (X[i: i + w] * K).sum()\n", " return Y" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2019-07-03T22:12:43.312247Z", "start_time": "2019-07-03T22:12:43.188173Z" }, "slideshow": { "slide_type": "skip" } }, "source": [ "As shown in figure below, the input is a one-dimensional array with a width of 7 and the width of the kernel array is 2. As we can see, the output width is $7−2+1=6$ and the first element is obtained by performing multiplication by element on the leftmost input subarray with a width of 2 and kernel array and then summing the results.\n", "\n", "![One-dimensional cross-correlation operation.](https://raw.githubusercontent.com/d2l-ai/d2l-en/master/img/conv1d.svg?sanitize=true)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "array([ 2., 5., 8., 11., 14., 17.])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X, K = np.array([0, 1, 2, 3, 4, 5, 6]), np.array([1, 2])\n", "corr1d(X, K)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We can use the [gluon built-in class `Conv1D()`](https://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.nn.Conv1D.html) to perform the 1D convolution. \n", "\n", "To use `Conv1D()`, we need to firstly define a new `Sequential()` architecture, \"convs\", which can stack neural network layers sequentially. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "num_channels, kernel_sizes = 2, 4\n", "\n", "convs = nn.Sequential()\n", "convs.add(nn.Conv1D(num_channels, kernel_sizes, activation='relu'))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Then, we randomly initialize its weights with a normal distribution (zero mean and 0.01 standard deviation) through the `initialize()` function." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): Conv1D(-1 -> 2, kernel_size=(4,), stride=(1,), Activation(relu))\n", ")" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "convs.initialize(init.Normal(sigma=0.01))\n", "convs" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Note that the required inputs of `Conv1D()` is an 3D input tensor with shape `(batch_size, in_channels, width)`. In the context of NLP, this shape can be interpreted as `(batch_size, word_vector_dimension, number_of_words)`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 2. Max-Over-Time Pooling Layer\n", "\n", "In textCNN, the max-over-time pooling layer equals to a one-dimensional global maximum pooling layer. \n", "\n", "We can use the [gluon built-in class `GlobalMaxPool1D()`](https://beta.mxnet.io/api/gluon/_autogen/mxnet.gluon.nn.GlobalAvgPool1D.html) as below:\n", "\n", "```python\n", " max_over_time_pooling = nn.GlobalMaxPool1D()\n", "```\n", "\n", "![The max-over-time pooling layer.](https://nbviewer.jupyter.org/format/slides/github/goldmermaid/gtc2020/blob/master/Notebooks/maxpooling.svg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### 3. Fully Connected Layer and Dropout \n", "\n", "The fully connected layer is referred as the `Dense()` layer in Gluon ([more detail](https://d2l.ai/chapter_multilayer-perceptrons/mlp-gluon.html) in D2L). \n", "\n", "Besides, a dropout layer `Dropout()` can be used after the fully connected layer to deal with the overfitting problem.\n", "\n", "![The fully connected layer.](https://nbviewer.jupyter.org/format/slides/github/goldmermaid/gtc2020/blob/master/Notebooks/ff.svg)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2019-07-03T22:12:43.343747Z", "start_time": "2019-07-03T22:12:43.325742Z" }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Shape of Dense : Dense(-1 -> 10, linear)\n", "Shape of Dropout : Dense(-1 -> 10, linear)\n" ] } ], "source": [ "decoder = nn.Dense(2) # 2 outputs\n", "print(\"Shape of Dense : \", decoder)\n", "\n", "dropout = nn.Dropout(0.4) # dropout 40% of neurons' weights\n", "print(\"Shape of Dropout : \", decoder)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## The TextCNN Model \n", "\n", "Now let's put everything together!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Suppose that:\n", "\n", "- the input text sequence consists of $n$ words;\n", "- each word is represented by a $d$-dimension word vector;\n", "\n", "Then the input example has a width of $n$, a height of 1, and $d$ input channels. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "### Model Initialization\n", "\n", "We first initialize the layers of our `textCNN` class.\n", "\n", "```python\n", "class TextCNN(nn.Block):\n", " def __init__(self, vocab_size, embed_size, kernel_sizes, num_channels,\n", " **kwargs):\n", " super(TextCNN, self).__init__(**kwargs)\n", " self.embedding = nn.Embedding(vocab_size, embed_size)\n", " # The constant embedding layer does not participate in training\n", " self.constant_embedding = nn.Embedding(vocab_size, embed_size)\n", " self.dropout = nn.Dropout(0.5)\n", " self.decoder = nn.Dense(2)\n", " # The max-over-time pooling layer has no weight, so it can share an\n", " # instance\n", " self.pool = nn.GlobalMaxPool1D()\n", " # Create multiple one-dimensional convolutional layers\n", " self.convs = nn.Sequential()\n", " for c, k in zip(num_channels, kernel_sizes):\n", " self.convs.add(nn.Conv1D(c, k, activation='relu'))\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### The `forward` Function\n", "Now let's write the `forward` function of our `textCNN` class.\n", "\n", "```python\n", "def forward(self, inputs):\n", " embeddings = np.concatenate((\n", " self.embedding(inputs), self.constant_embedding(inputs)), \n", " axis=2)\n", " embeddings = embeddings.transpose(0, 2, 1)\n", " encoding = np.concatenate([\n", " np.squeeze(self.pool(conv(embeddings)), axis=-1)\n", " for conv in self.convs], axis=1)\n", " outputs = self.decoder(self.dropout(encoding))\n", " return outputs\n", "```\n", "\n", "It looks a bit complicated, but we can decompose it to 4 steps." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Concatenation\n", "\n", "First, we concatenate the output of two embedding layers with shape of `(batch_size, number_of_words, word_vector_dimension)` by the last dimension as below:\n", "\n", "```python\n", " embeddings = np.concatenate((\n", " self.embedding(inputs), self.constant_embedding(inputs)), axis=2)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "#### Transposing\n", "\n", "Second, recall that the required inputs of `Conv1D()` is an 3D input tensor with shape `(batch_size, word_vector_dimension, number_of_words)`, while our current embeddings is of shape `(batch_size, number_of_words, word_vector_dimension)`. Hence, we need to transpose the last two dimensions as below:\n", "\n", "```python\n", " embeddings = embeddings.transpose(0, 2, 1)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Encoding\n", "\n", "Third, we compute `encoding` as below:\n", "\n", "```python\n", " encoding = np.concatenate([np.squeeze(self.pool(conv(embeddings)), axis=-1)\n", " for conv in self.convs], axis=1)\n", "```\n", "\n", "1. For each one-dimensional convolutional layer, we apply a max-over-time pooling, i.e., `self.pool()`. \n", "2. Since the max-over-time pooling is applied at the last dimension of convolution's outputs, the last dimension (axis = -1) will be 1. We use the flatten function `squeeze()` to remove it. \n", "3. We concatenate the results from varied convolution kernels (axis = 1) by `concatenate()`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "#### Decoding\n", "\n", "Last, we apply the dropout function to randomly dropout some units of the encoding to avoid overfitting (i.e. not rely on one specific unit of encoding too much). And then we apply a fully connected layer as a decoder to obtain the outputs.\n", "\n", "```python\n", " outputs = self.decoder(self.dropout(encoding))\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To sum up, here is the full `TextCNN` class:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "class TextCNN(nn.Block):\n", " def __init__(self, vocab_size, embed_size, kernel_sizes, num_channels,\n", " **kwargs):\n", " super(TextCNN, self).__init__(**kwargs)\n", " self.embedding = nn.Embedding(vocab_size, embed_size)\n", " # The constant embedding layer does not participate in training\n", " self.constant_embedding = nn.Embedding(vocab_size, embed_size)\n", " self.dropout = nn.Dropout(0.5)\n", " self.decoder = nn.Dense(2)\n", " # The max-over-time pooling layer has no weight, so it can share \n", " # an instance\n", " self.pool = nn.GlobalMaxPool1D()\n", " # Create multiple one-dimensional convolutional layers with\n", " # different kernel sizes and number of channels\n", " self.convs = nn.Sequential()\n", " for c, k in zip(num_channels, kernel_sizes):\n", " self.convs.add(nn.Conv1D(c, k, activation='relu'))\n", " def forward(self, inputs):\n", " embeddings = np.concatenate((\n", " self.embedding(inputs), self.constant_embedding(inputs)), axis=2)\n", " embeddings = embeddings.transpose(0, 2, 1)\n", " encoding = np.concatenate([np.squeeze(self.pool(conv(embeddings)), \n", " axis=-1) \n", " for conv in self.convs], axis=1)\n", " outputs = self.decoder(self.dropout(encoding))\n", " return outputs" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Load Pre-trained Embedding\n", "\n", "Rather than training from scratch, we load a pre-trained 100-dimensional [GloVe word vectors](https://d2l.ai/chapter_natural-language-processing/glove.html). This step will take several minutes to load." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2019-07-03T22:12:43.535573Z", "start_time": "2019-07-03T22:12:43.387749Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "embeds.shape : (49339, 100)\n" ] } ], "source": [ "# Load word vectors and query the word vectors that in our vocabulary\n", "glove_embedding = text.embedding.create(\n", " 'glove', pretrained_file_name='glove.6B.100d.txt')\n", "embeds = glove_embedding.get_vecs_by_tokens(vocab.idx_to_token)\n", "print(\"embeds.shape : \", embeds.shape)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Training\n", "\n", "Now let's create a `TextCNN` model. Since multiple kernel filters (with varying window sizes) can obtain different features, the original `TextCNN` model applies 3 convolutional layers with kernel widths of 3, 4, and 5, respectively. \n", "\n", "Also, each of the filter window has an output channel with 100 units." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2019-07-03T22:12:43.382376Z", "start_time": "2019-07-03T22:12:43.368194Z" } }, "outputs": [ { "data": { "text/plain": [ "TextCNN(\n", " (embedding): Embedding(49339 -> 100, float32)\n", " (constant_embedding): Embedding(49339 -> 100, float32)\n", " (dropout): Dropout(p = 0.5, axes=())\n", " (decoder): Dense(-1 -> 2, linear)\n", " (pool): GlobalMaxPool1D(size=(1,), stride=(1,), padding=(0,), ceil_mode=True, global_pool=True, pool_type=max, layout=NCW)\n", " (convs): Sequential(\n", " (0): Conv1D(-1 -> 100, kernel_size=(3,), stride=(1,), Activation(relu))\n", " (1): Conv1D(-1 -> 100, kernel_size=(4,), stride=(1,), Activation(relu))\n", " (2): Conv1D(-1 -> 100, kernel_size=(5,), stride=(1,), Activation(relu))\n", " )\n", ")" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "embed_size, kernel_sizes, nums_channels = 100, [3, 4, 5], [100, 100, 100]\n", "ctx = d2l.try_all_gpus() ## Get the GPUs\n", "net = TextCNN(vocab_size=len(vocab), embed_size=embed_size, \n", " kernel_sizes=kernel_sizes, num_channels=nums_channels)\n", "net.initialize(init.Xavier(), ctx=ctx)\n", "net" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Then we initialize the embedding layers embedding and constant_embedding using the GloVe embeddings. Here, the former participates in training while the latter has a fixed weight." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "net.embedding.weight.set_data(embeds)\n", "net.constant_embedding.weight.set_data(embeds)\n", "net.constant_embedding.collect_params().setattr('grad_req', 'null')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "To train our `TextCNN` model, we also need to define:\n", "1. the learning rate `lr`, \n", "2. the number of epochs `num_epochs`, \n", "3. the optimizer `adam`, \n", "4. the loss function `SoftmaxCrossEntropyLoss()`. \n", "\n", "\n", "For simplicity, we call the built-in function `train_ch13` ([more detail in D2L](https://d2l.ai/chapter_computer-vision/image-augmentation.html?highlight=train_ch13#using-an-image-augmentation-training-model)) to train." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2019-07-03T22:12:43.576040Z", "start_time": "2019-07-03T22:12:43.538210Z" }, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loss 0.093, train acc 0.968, test acc 0.868\n", "2887.2 examples/sec on [gpu(0), gpu(1), gpu(2), gpu(3)]\n" ] }, { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n" ], "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "lr, num_epochs = 0.001, 5\n", "trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': lr})\n", "loss = gluon.loss.SoftmaxCELoss()\n", "d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, ctx)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now is the time to use our trained model to classify sentiments of two simple sentences." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'positive'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "d2l.predict_sentiment(net, vocab, 'this movie is so amazing!')" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" }, "latex_envs": { "LaTeX_envs_menu_present": true, "autoclose": false, "autocomplete": true, "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "165px" }, "toc_section_display": true, "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }