{ "cells": [ { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Homework part I: Prohibited Comment Classification (3 points)\n", "\n", "\n", "\n", "__In this notebook__ you will build an algorithm that classifies social media comments into normal or toxic.\n", "Like in many real-world cases, you only have a small (10^3) dataset of hand-labeled examples to work with. We'll tackle this problem using both classical nlp methods and embedding-based approach." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | should_ban | \n", "comment_text | \n", "
|---|---|---|
| 50 | \n", "0 | \n", "\"Those who're in advantageous positions are th... | \n", "
| 250 | \n", "1 | \n", "Fartsalot56 says f**k you motherclucker!! | \n", "
| 450 | \n", "1 | \n", "Are you a fool? \\n\\nI am sorry, but you seem t... | \n", "
| 650 | \n", "1 | \n", "I AM NOT A VANDAL!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | \n", "
| 850 | \n", "0 | \n", "Citing sources\\n\\nCheck out the Wikipedia:Citi... | \n", "