{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 12 - Introduction to NLP\n", "\n", "> Introduction to Natural Language Processing (NLP)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/lewtun/dslectures/master?urlpath=lab/tree/notebooks%2Flesson12_nlp-intro.ipynb) [![slides](https://img.shields.io/static/v1?label=slides&message=lesson12_nlp-intro.pdf&color=blue&logo=Google-drive)](https://drive.google.com/open?id=11m5iXGNJEUlvjSMLdQAztJyVZ2LLj4oz)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning objectives\n", "In this lecture we cover the basics of NLP to build a sentiment classifier in scikit-learn. The learning goals are:\n", "* Know the basics of string processing in python\n", "* Preprocessing steps in NLP\n", "* Count and TF-IDF encodings\n", "* Naïve Bayes classifier\n", "\n", "## References\n", "* Chapter 10: Representing and Mining Text in _Data Science for Business_ by F. Provost and P. Fawcett\n", "\n", "## Homework\n", "As homework read the references, work carefully through the notebook and solve the exercises. \n", "\n", "## Introduction to NLP\n", "\n", "
\n", " | filename | \n", "text | \n", "sentiment | \n", "train_label | \n", "
---|---|---|---|---|
0 | \n", "4715_9 | \n", "For a movie that gets no respect there sure ar... | \n", "pos | \n", "train | \n", "
1 | \n", "12390_8 | \n", "Bizarre horror movie filled with famous faces ... | \n", "pos | \n", "train | \n", "
2 | \n", "8329_7 | \n", "A solid, if unremarkable film. Matthau, as Ein... | \n", "pos | \n", "train | \n", "
3 | \n", "9063_8 | \n", "It's a strange feeling to sit alone in a theat... | \n", "pos | \n", "train | \n", "
4 | \n", "3092_10 | \n", "You probably all already know this by now, but... | \n", "pos | \n", "train | \n", "