{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Assignment 1.3: Naive word2vec (40 points)\n", "\n", "This task can be formulated very simply. Follow this [paper](https://arxiv.org/pdf/1411.2738.pdf) and implement word2vec like a two-layer neural network with matrices $W$ and $W'$. One matrix projects words to low-dimensional 'hidden' space and the other - back to high-dimensional vocabulary space.\n", "\n", "![word2vec](https://i.stack.imgur.com/6eVXZ.jpg)\n", "\n", "You can use TensorFlow/PyTorch and code from your previous task.\n", "\n", "## Results of this task: (30 points)\n", " * trained word vectors (mention somewhere, how long it took to train)\n", " * plotted loss (so we can see that it has converged)\n", " * function to map token to corresponding word vector\n", " * beautiful visualizations (PCE, T-SNE), you can use TensorBoard and play with your vectors in 3D (don't forget to add screenshots to the task)\n", "\n", "## Extra questions: (10 points)\n", " * Intrinsic evaluation: you can find datasets [here](http://download.tensorflow.org/data/questions-words.txt)\n", " * Extrinsic evaluation: you can use [these](https://medium.com/@dataturks/rare-text-classification-open-datasets-9d340c8c508e)\n", "\n", "Also, you can find any other datasets for quantitative evaluation.\n", "\n", "Again. It is **highly recommended** to read this [paper](https://arxiv.org/pdf/1411.2738.pdf)\n", "\n", "Example of visualization in tensorboard:\n", "https://projector.tensorflow.org\n", "\n", "Example of 2D visualisation:\n", "\n", "![2dword2vec](https://www.tensorflow.org/images/tsne.png)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }