{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "58c8d368-ba90-48df-8025-72393efc74e1",
   "metadata": {
    "tags": []
   },
   "source": [
    "# 1. If the different directions use a different number of hidden units, how will the shape of $H_t$ change?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41779fd3-5f17-4ba8-93de-2f11fc355ca1",
   "metadata": {},
   "source": [
    "$H_t.shape[-1] = \\overrightarrow{H_t}.shape[-1] + \\overleftarrow{H_t}.shape[-1]$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "22e3bde3-8077-4bb3-a705-279aade7a4ec",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/jovyan/work/d2l_solutions/notebooks/exercises/d2l_utils/d2l.py:129: SyntaxWarning: assertion is always true, perhaps remove parentheses?\n",
      "  assert(self, 'net'), 'Neural network is defined'\n",
      "/home/jovyan/work/d2l_solutions/notebooks/exercises/d2l_utils/d2l.py:133: SyntaxWarning: assertion is always true, perhaps remove parentheses?\n",
      "  assert(self, 'trainer'), 'trainer is not inited'\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "import torch.nn as nn\n",
    "import torch\n",
    "import warnings\n",
    "sys.path.append('/home/jovyan/work/d2l_solutions/notebooks/exercises/d2l_utils/')\n",
    "import d2l\n",
    "from torchsummary import summary\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "from sklearn.model_selection import ParameterGrid\n",
    "\n",
    "class BiRNNScratch(d2l.Module):\n",
    "    def __init__(self, num_inputs, num_hiddens, sigma=0.01):\n",
    "        super().__init__()\n",
    "        self.save_hyperparameters()\n",
    "        self.f_rnn = d2l.RNNScratch(num_inputs, num_hiddens[0], sigma)\n",
    "        self.b_rnn = d2l.RNNScratch(num_inputs, num_hiddens[1], sigma)\n",
    "        self.num_hiddens = sum(num_hiddens)  # The output dimension will be doubled\n",
    "        \n",
    "    def forward(self, inputs, Hs=None):\n",
    "        f_H, b_H = Hs if Hs is not None else (None, None)\n",
    "        f_outputs, f_H = self.f_rnn(inputs, f_H)\n",
    "        b_outputs, b_H = self.b_rnn(reversed(inputs), b_H)\n",
    "        outputs = [torch.cat((f, b), -1) for f, b in zip(\n",
    "            f_outputs, reversed(b_outputs))]\n",
    "        return outputs, (f_H, b_H)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "69b2eb54-e50e-49e8-819f-f15b990c4df3",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "torch.Size([4, 24])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bi_rnn = BiRNNScratch(num_inputs=4, num_hiddens=[8,16])\n",
    "X = torch.randn(1,4)\n",
    "outputs, (f_H, b_H) = bi_rnn(X)\n",
    "outputs[0].shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "025d46ab-5048-4795-b8f9-d7fdcde83935",
   "metadata": {
    "tags": []
   },
   "source": [
    "# 2. Design a bidirectional RNN with multiple hidden layers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "c302702d-71f9-4b91-886c-fde84fa80ceb",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "class BiRNN(d2l.RNN):\n",
    "    def __init__(self, num_inputs, num_hiddens, num_layers):\n",
    "        d2l.Module.__init__(self)\n",
    "        self.save_hyperparameters()\n",
    "        self.rnn = nn.RNN(num_inputs, num_hiddens, num_layers=num_layers, bidirectional=True)\n",
    "        self.num_hiddens *= 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "754be800-a22f-40b5-b9fa-27cabc73e77e",
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "BiRNN(\n",
       "  (rnn): RNN(4, 8, num_layers=2, bidirectional=True)\n",
       ")"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bi_rnn = BiRNN(num_inputs=4, num_hiddens=8,num_layers=2)\n",
    "bi_rnn"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3fe2147b-6bf3-4d56-801c-0dea21e48e23",
   "metadata": {},
   "source": [
    "# 3. Polysemy is common in natural languages. For example, the word “bank” has different meanings in contexts “i went to the bank to deposit cash” and “i went to the bank to sit down”. How can we design a neural network model such that given a context sequence and a word, a vector representation of the word in the correct context will be returned? What type of neural architectures is preferred for handling polysemy?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ac75f33-bdf9-4ce3-879f-f868f86d6a85",
   "metadata": {},
   "source": [
    "One possible way to design a neural network model for word sense disambiguation is to use **contextualized word embeddings** (CWEs). CWEs are vector representations of words that are sensitive to the surrounding context, meaning that the same word can have different embeddings depending on how it is used in a sentence. For example, the word \"bank\" would have different embeddings in the contexts \"I went to the bank to deposit cash\" and \"I went to the bank to sit down\". CWEs can be obtained by using neural language models that are trained on large corpora of text, such as BERT, ELMo, or XLNet¹.\n",
    "\n",
    "To use CWEs for word sense disambiguation, we can follow a simple but effective approach based on nearest neighbor classification². The idea is to pre-compute the CWEs for each word sense in a given lexical resource, such as WordNet, using gloss definitions and example sentences. Then, given a context sequence and a word, we can compute the CWE for the word in that context using the neural language model. Finally, we can compare the CWE of the word with the pre-computed CWEs of its possible senses and select the nearest one as the predicted sense. This method can achieve state-of-the-art results on several WSD benchmarks².\n",
    "\n",
    "The type of neural architectures that is preferred for handling polysemy is usually based on **recurrent neural networks** (RNNs), especially those with **long short-term memory** (LSTM) cells. RNNs are able to capture the sequential nature of language and model long-distance dependencies between words. LSTM cells are a special type of RNN units that can learn to store and forget information over time, which is useful for dealing with complex and ambiguous contexts. RNNs with LSTM cells can be used as the backbone of neural language models that produce CWEs, such as ELMo³ or XLNet¹.\n",
    "\n",
    "- (1) [2106.07967] Incorporating Word Sense Disambiguation in .... https://arxiv.org/abs/2106.07967.\n",
    "- (2) Neural Network Models for Word Sense Disambiguation: An .... https://www.researchgate.net/publication/324014399_Neural_Network_Models_for_Word_Sense_Disambiguation_An_Overview/fulltext/5b3c024b4585150d23f66a4a/Neural-Network-Models-for-Word-Sense-Disambiguation-An-Overview.pdf.\n",
    "- (3) arXiv:1909.10430v2 [cs.CL] 1 Oct 2019. https://arxiv.org/pdf/1909.10430.pdf.\n",
    "- (4) undefined. https://doi.org/10.48550/arXiv.2106.07967."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:d2l]",
   "language": "python",
   "name": "conda-env-d2l-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}