{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "5bgPpghocFIa" }, "source": [ "# Emojify! \n", "\n", "Welcome to the second assignment of Week 2! You're going to use word vector representations to build an Emojifier. \n", "π€© π« π₯\n", "\n", "Have you ever wanted to make your text messages more expressive? Your emojifier app will help you do that. \n", "Rather than writing:\n", ">\"Congratulations on the promotion! Let's get coffee and talk. Love you!\" \n", "\n", "The emojifier can automatically turn this into:\n", ">\"Congratulations on the promotion! π Let's get coffee and talk. βοΈ Love you! β€οΈ\"\n", "\n", "You'll implement a model which inputs a sentence (such as \"Let's go see the baseball game tonight!\") and finds the most appropriate emoji to be used with this sentence (βΎοΈ).\n", "\n", "### Using Word Vectors to Improve Emoji Lookups\n", "* In many emoji interfaces, you need to remember that β€οΈ is the \"heart\" symbol rather than the \"love\" symbol. \n", " * In other words, you'll have to remember to type \"heart\" to find the desired emoji, and typing \"love\" won't bring up that symbol.\n", "* You can make a more flexible emoji interface by using word vectors!\n", "* When using word vectors, you'll see that even if your training set explicitly relates only a few words to a particular emoji, your algorithm will be able to generalize and associate additional words in the test set to the same emoji.\n", " * This works even if those additional words don't even appear in the training set. \n", " * This allows you to build an accurate classifier mapping from sentences to emojis, even using a small training set. \n", "\n", "### What you'll build:\n", "1. In this exercise, you'll start with a baseline model (Emojifier-V1) using word embeddings.\n", "2. Then you will build a more sophisticated model (Emojifier-V2) that further incorporates an LSTM. \n", "\n", "By the end of this notebook, you'll be able to:\n", "\n", "* Create an embedding layer in Keras with pre-trained word vectors\n", "* Explain the advantages and disadvantages of the GloVe algorithm\n", "* Describe how negative sampling learns word vectors more efficiently than other methods\n", "* Build a sentiment classifier using word embeddings\n", "* Build and train a more sophisticated classifier using an LSTM\n", "\n", "π π\n", "\n", "π π\n", "\n", "(^^^ Emoji for \"skills\") " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of Contents\n", "\n", "- [Packages](#0)\n", "- [1 - Baseline Model: Emojifier-V1](#1)\n", " - [1.1 - Dataset EMOJISET](#1-1)\n", " - [1.2 - Overview of the Emojifier-V1](#1-2)\n", " - [1.3 - Implementing Emojifier-V1](#1-3)\n", " - [Exercise 1 - sentence_to_avg](#ex-1)\n", " - [1.4 - Implement the Model](#1-4)\n", " - [Exercise 2 - model](#ex-2)\n", " - [1.5 - Examining Test Set Performance](#1-5)\n", "- [2 - Emojifier-V2: Using LSTMs in Keras](#2)\n", " - [2.1 - Model Overview](#2-1)\n", " - [2.2 Keras and Mini-batching](#2-2)\n", " - [2.3 - The Embedding Layer](#2-3)\n", " - [Exercise 3 - sentences_to_indices](#ex-3)\n", " - [Exercise 4 - pretrained_embedding_layer](#ex-4)\n", " - [2.4 - Building the Emojifier-V2](#2-4)\n", " - [Exercise 5 - Emojify_V2](#ex-5)\n", " - [2.5 - Train the Model](#2-5)\n", "- [3 - Acknowledgments](#3)" ] }, { "cell_type": "markdown", "metadata": { "id": "HsztVBA8cFIg" }, "source": [ "\n", "## Packages\n", "\n", "Let's get started! Run the following cell to load the packages you're going to use. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "lMZ9xg8MFHZU" }, "outputs": [], "source": [ "import numpy as np\n", "from emo_utils import *\n", "import emoji\n", "import matplotlib.pyplot as plt\n", "from test_utils import *\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": { "id": "Av0PwZYscFIh" }, "source": [ "\n", "## 1 - Baseline Model: Emojifier-V1\n", "\n", "\n", "### 1.1 - Dataset EMOJISET\n", "\n", "Let's start by building a simple baseline classifier. \n", "\n", "You have a tiny dataset (X, Y) where:\n", "- X contains 127 sentences (strings).\n", "- Y contains an integer label between 0 and 4 corresponding to an emoji for each sentence.\n", "\n", "\n", "