{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 12 - NLP Introduction\n", "\n", "> Introduction to Natural Language Processing (NLP)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[data:image/s3,"s3://crabby-images/fbe1d/fbe1d2f89215b7589b3f89aa2112c2614f97d3b5" alt="Binder"](https://mybinder.org/v2/gh/lvwerra/dslectures/master?urlpath=lab/tree/notebooks%2Flesson12_nlp-intro.ipynb)[data:image/s3,"s3://crabby-images/7da9d/7da9d7002af186c4b53142d26212703c0f461829" alt="slides"](https://drive.google.com/open?id=1OjvbR-vuQSUK3X8F0fRzvT0_IbBVrIOY)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning objectives\n", "In this lecture we cover the basics of NLP to build a sentiment classifier in scikit-learn. The learning goals are:\n", "* Know the basics of string processing in python\n", "* Preprocessing steps in NLP\n", "* Count and TF-IDF encodings\n", "* Naïve Bayes classifier\n", "\n", "## References\n", "* Chapter 10: Representing and Mining Text in _Data Science for Business_ by F. Provost and P. Fawcett\n", "\n", "## Homework\n", "As homework read the references, work carefully through the notebook and solve the exercises. \n", "\n", "## Introduction to NLP\n", "\n", "
\n", " | filename | \n", "text | \n", "sentiment | \n", "train_label | \n", "
---|---|---|---|---|
0 | \n", "4715_9 | \n", "For a movie that gets no respect there sure ar... | \n", "pos | \n", "train | \n", "
1 | \n", "12390_8 | \n", "Bizarre horror movie filled with famous faces ... | \n", "pos | \n", "train | \n", "
2 | \n", "8329_7 | \n", "A solid, if unremarkable film. Matthau, as Ein... | \n", "pos | \n", "train | \n", "
3 | \n", "9063_8 | \n", "It's a strange feeling to sit alone in a theat... | \n", "pos | \n", "train | \n", "
4 | \n", "3092_10 | \n", "You probably all already know this by now, but... | \n", "pos | \n", "train | \n", "