{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Vector-space models: Retrofitting" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "__author__ = \"Christopher Potts\"\n", "__version__ = \"CS224u, Stanford, Spring 2019\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Contents\n", "\n", "1. [Overview](#Overview)\n", "1. [Set-up](#Set-up)\n", "1. [The retrofitting model](#The-retrofitting-model)\n", "1. [Examples](#Examples)\n", " 1. [Only node 0 has outgoing edges](#Only-node-0-has-outgoing-edges)\n", " 1. [All nodes connected to all others](#All-nodes-connected-to-all-others)\n", " 1. [As before, but now 2 has no outgoing edges](#As-before,-but-now-2-has-no-outgoing-edges)\n", " 1. [All nodes connected to all others, but $\\alpha = 0$](#All-nodes-connected-to-all-others,-but-$\\alpha-=-0$)\n", "1. [WordNet](#WordNet)\n", " 1. [Background on WordNet](#Background-on-WordNet)\n", " 1. [WordNet and VSMs](#WordNet-and-VSMs)\n", " 1. [Reproducing the WordNet synonym graph experiment](#Reproducing-the-WordNet-synonym-graph-experiment)\n", "1. [Other retrofitting models and ideas](#Other-retrofitting-models-and-ideas)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Overview\n", "\n", "* Thus far, all of the information in our word vectors has come solely from co-occurrences patterns in text. This information is often very easy to obtain – though one does need a __lot__ of text – and it is striking how rich the resulting representations can be.\n", "\n", "* Nonetheless, it seems clear that there is important information that we will miss this way – relationships that just aren't encoded at all in co-occurrences or that get distorted by such patterns. \n", "\n", "* For example, it is probably straightforward to learn representations that will support the inference that all puppies are dogs (_puppy_ entails _dog_), but it might be difficult to learn that _dog_ entails _mammal_ because of the unusual way that very broad taxonomic terms like _mammal_ are used in text.\n", "\n", "* The question then arises: how can we bring structured information – labels – into our representations? If we can do that, then we might get the best of both worlds: the ease of using co-occurrence data and the refinement that comes from using labeled data.\n", "\n", "* In this notebook, we look at one powerful method for doing this: the __retrofitting__ model of [Faruqui et al. 2016](http://www.aclweb.org/anthology/N15-1184). In this model, one learns (or just downloads) distributed representations for nodes in a knowledge graph and then updates those representations to bring connected nodes closer to each other.\n", "\n", "* This is an incredibly fertile idea; the final section of the notebook reviews some recent extensions, and new ones are likely appearing all the time." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Set-up" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from collections import defaultdict\n", "from nltk.corpus import wordnet as wn\n", "import numpy as np\n", "import os\n", "import pandas as pd\n", "import retrofitting\n", "from retrofitting import Retrofitter\n", "import utils" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "data_home = 'data'" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## The retrofitting model\n", "\n", "For an __an existing VSM__ $\\widehat{Q}$ of dimension $m \\times n$, and a set of __edges__ $E$ (pairs of indices into rows in $\\widehat{Q}$), the retrofitting objective is to obtain a new VSM $Q$ (also dimension $m \\times n$) according to the following objective:\n", "\n", "$$\\sum_{i=1}^{m} \\left[ \n", "\\alpha_{i}\\|q_{i} - \\widehat{q}_{i}\\|_{2}^{2}\n", "+\n", "\\sum_{j : (i,j) \\in E}\\beta_{ij}\\|q_{i} - q_{j}\\|_{2}^{2}\n", "\\right]$$\n", "\n", "The left term encodes a pressure to stay like the original vector. The right term encodes a pressure to be more like one's neighbors. In minimizing this objective, we should be able to strike a balance between old and new, VSM and graph.\n", "\n", "Definitions:\n", "\n", "1. $\\|u - v\\|_{2}^{2}$ gives the __squared euclidean distance__ from $u$ to $v$.\n", "\n", "1. $\\alpha$ and $\\beta$ are weights we set by hand, controlling the relative strength of the two pressures. In the paper, they use $\\alpha=1$ and $\\beta = \\frac{1}{\\{j : (i, j) \\in E\\}}$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Examples\n", "\n", "To get a feel for what's happening, it's helpful to visualize the changes that occur in small, easily understood VSMs and graphs. The function `retrofitting.plot_retro_path` helps with this." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | x | \n", "y | \n", "
---|---|---|
0 | \n", "0.0 | \n", "0.0 | \n", "
1 | \n", "0.0 | \n", "0.5 | \n", "
2 | \n", "0.5 | \n", "0.0 | \n", "