{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizing Naive Bayes\n", "\n", "In this lab, we will cover an essential part of data analysis that has not been included in the lecture videos. As we stated in the previous module, data visualization gives insight into the expected performance of any model. \n", "\n", "In the following exercise, you are going to make a visual inspection of the tweets dataset using the Naïve Bayes features. We will see how we can understand the log-likelihood ratio explained in the videos as a pair of numerical features that can be fed in a machine learning algorithm. \n", "\n", "At the end of this lab, we will introduce the concept of __confidence ellipse__ as a tool for representing the Naïve Bayes model visually." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np # Library for linear algebra and math utils\n", "import pandas as pd # Dataframe library\n", "\n", "import matplotlib.pyplot as plt # Library for plots\n", "from utils import confidence_ellipse # Function to add confidence ellipses to charts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " ## Calculate the likelihoods for each tweet\n", "\n", "For each tweet, we have calculated the likelihood of the tweet to be positive and the likelihood to be negative. We have calculated in different columns the numerator and denominator of the likelihood ratio introduced previously. \n", "\n", "$$log \\frac{P(tweet|pos)}{P(tweet|neg)} = log(P(tweet|pos)) - log(P(tweet|neg)) $$\n", "$$positive = log(P(tweet|pos)) = \\sum_{i=0}^{n}{log P(W_i|pos)}$$\n", "$$negative = log(P(tweet|neg)) = \\sum_{i=0}^{n}{log P(W_i|neg)}$$\n", "\n", "We did not include the code because this is part of this week's assignment. The __'bayes_features.csv'__ file contains the final result of this process. \n", "\n", "The cell below loads the table in a dataframe. Dataframes are data structures that simplify the manipulation of data, allowing filtering, slicing, joining, and summarization." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | positive | \n", "negative | \n", "sentiment | \n", "
---|---|---|---|
0 | \n", "-45.763393 | \n", "-63.351354 | \n", "1.0 | \n", "
1 | \n", "-105.491568 | \n", "-114.204862 | \n", "1.0 | \n", "
2 | \n", "-57.028078 | \n", "-67.216467 | \n", "1.0 | \n", "
3 | \n", "-10.055885 | \n", "-18.589057 | \n", "1.0 | \n", "
4 | \n", "-125.749270 | \n", "-138.334845 | \n", "1.0 | \n", "