{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline\n", "import os\n", "os.environ[\"CUDA_DEVICE_ORDER\"]=\"PCI_BUS_ID\";\n", "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\";" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "using Keras version: 2.2.4-tf\n" ] } ], "source": [ "import ktrain\n", "from ktrain import text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Predicting Wine Prices from Textual Descriptions\n", "\n", "This notebook shows an example of **text regression** in *ktrain*. Given a textual description of a wine, we will attempt to predict its price. The data is available from FloydHub [here](https://www.floydhub.com/floydhub/datasets/wine-reviews/1/wine_data.csv).\n", "\n", "## Clean and Prepare the Data\n", "\n", "We will simply perform the same data preparation as performed by the [original FloydHub example notebook](https://github.com/floydhub/regression-template) that inspired this exmaple." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Unnamed: 0 | \n", "country | \n", "description | \n", "designation | \n", "points | \n", "price | \n", "province | \n", "region_1 | \n", "region_2 | \n", "variety | \n", "winery | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
8486 | \n", "8486 | \n", "Italy | \n", "Made entirely from Nero d'Avola, this opens wi... | \n", "Violino | \n", "89 | \n", "20.0 | \n", "Sicily & Sardinia | \n", "Vittoria | \n", "NaN | \n", "Nero d'Avola | \n", "Paolo Calì | \n", "
148584 | \n", "148585 | \n", "Portugal | \n", "Warre's seems to have found just the right for... | \n", "Otima 20-year old tawny | \n", "90 | \n", "42.0 | \n", "Port | \n", "NaN | \n", "NaN | \n", "Port | \n", "Warre's | \n", "
18353 | \n", "18353 | \n", "Italy | \n", "A more evolved and sophisticated expression of... | \n", "Campogrande | \n", "87 | \n", "23.0 | \n", "Veneto | \n", "Soave Superiore | \n", "NaN | \n", "Garganega | \n", "Sandro de Bruno | \n", "
5281 | \n", "5281 | \n", "Spain | \n", "Red-fruit and citrus aromas create an astringe... | \n", "NaN | \n", "84 | \n", "12.0 | \n", "Northern Spain | \n", "Ribera del Duero | \n", "NaN | \n", "Tempranillo | \n", "Condado de Oriza | \n", "
87768 | \n", "87768 | \n", "US | \n", "Lightly funky and showing definite signs of ea... | \n", "Lia's Vineyard | \n", "89 | \n", "35.0 | \n", "Oregon | \n", "Chehalem Mountains | \n", "Willamette Valley | \n", "Pinot Noir | \n", "Seven of Hearts | \n", "