{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline\n", "import os\n", "os.environ[\"CUDA_DEVICE_ORDER\"]=\"PCI_BUS_ID\";\n", "os.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"; " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "using Keras version: 2.2.4\n" ] } ], "source": [ "import ktrain\n", "from ktrain import text" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Building an Arabic Sentiment Analyzer\n", "\n", "In this notebook, we will build a simple, fast, and accurate Arabic-language text classification model in 4 simple steps. More specifically, we will build a model that classifies Arabic hotel reviews as either positive or negative.\n", "\n", "The dataset can be downloaded from Ashraf Elnagar's GitHub repository (https://github.com/elnagara/HARD-Arabic-Dataset).\n", "\n", "Each entry in the dataset includes a review in Arabic and a rating between 1 and 5. We will convert this to a binary classification dataset by assigning reviews with a rating of above 3 a positive label of 1 and assigning reviews with a rating of less than 3 a negative label of 0.\n", "\n", "(**Disclaimer:** I don't speak Arabic. Please forgive mistakes.) \n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | text | \n", "neg | \n", "pos | \n", "
---|---|---|---|
0 | \n", "“ممتاز”. النظافة والطاقم متعاون. | \n", "1 | \n", "0 | \n", "
1 | \n", "استثنائي. سهولة إنهاء المعاملة في الاستقبال. ل... | \n", "0 | \n", "1 | \n", "
2 | \n", "استثنائي. انصح بأختيار الاسويت و بالاخص غرفه ر... | \n", "0 | \n", "1 | \n", "
3 | \n", "“استغرب تقييم الفندق كخمس نجوم”. لا شي. يستحق ... | \n", "1 | \n", "0 | \n", "
4 | \n", "جيد. المكان جميل وهاديء. كل شي جيد ونظيف بس كا... | \n", "0 | \n", "1 | \n", "