{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Assignment 1.4: Negative sampling (15 points)\n", "\n", "You may have noticed that word2vec is really slow to train. Especially with big (> 50 000) vocabularies. Negative sampling is the solution.\n", "\n", "The task is to implement word2vec with negative sampling.\n", "\n", "This is what was discussed in Stanford lecture. The main idea is in the formula:\n", "\n", "$$ L = \\log\\sigma(u^T_o \\cdot u_c) + \\sum^k_{i=1} \\mathbb{E}_{j \\sim P(w)}[\\log\\sigma(-u^T_j \\cdot u_c)]$$\n", "\n", "Where $\\sigma$ - sigmoid function, $u_c$ - central word vector, $u_o$ - context (outside of the window) word vector, $u_j$ - vector or word with index $j$.\n", "\n", "The first term calculates the similarity between positive examples (word from one window)\n", "\n", "The second term is responsible for negative samples. $k$ is a hyperparameter - the number of negatives to sample.\n", "$\\mathbb{E}_{j \\sim P(w)}$\n", "means that $j$ is distributed accordingly to unigram distribution.\n", "\n", "Thus, it is only required to calculate the similarity between positive samples and some other negatives. Not across all the vocabulary.\n", "\n", "Useful links:\n", "1. [Efficient Estimation of Word Representations in Vector Space](https://arxiv.org/pdf/1301.3781.pdf)\n", "1. [Distributed Representations of Words and Phrases and their Compositionality](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }