{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Classifying News Headlines and Explaining the Result" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Data is from Kaggle's [News Aggregator Dataset](https://www.kaggle.com/uciml/news-aggregator-dataset)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I sampled 10% of the data to speed up the analysis." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "news = pd.read_csv('data/uci-news-aggregator.csv').sample(frac=0.1)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "42242" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(news)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false, "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", " | ID | \n", "TITLE | \n", "URL | \n", "PUBLISHER | \n", "CATEGORY | \n", "STORY | \n", "HOSTNAME | \n", "TIMESTAMP | \n", "
---|---|---|---|---|---|---|---|---|
58434 | \n", "58435 | \n", "Russell Crowe Sings Johnny Cash on 'The Tonigh... | \n", "http://screencrush.com/russell-crowe-johnny-cash/ | \n", "ScreenCrush | \n", "e | \n", "dxzxHQTC1v6cP7MdjlKbJkMlfYwLM | \n", "screencrush.com | \n", "1396019111324 | \n", "
244967 | \n", "245413 | \n", "HP cuts more jobs than expected | \n", "http://www.digitaljournal.com/business/busines... | \n", "DigitalJournal.com | \n", "b | \n", "de8PjvC03vbwIdMC0hkfXZTLVY0sM | \n", "www.digitaljournal.com | \n", "1400928726875 | \n", "
314969 | \n", "315429 | \n", "NTSB faults pilots in last year's Asiana flight | \n", "http://ktar.com/23/1744462/NTSB-faults-pilots-... | \n", "KTAR.com | \n", "b | \n", "deigsQuEj4RZW3M_TqkzwLBT_oUTM | \n", "ktar.com | \n", "1403705331596 | \n", "