{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"1_NLU_base_features_on_dataset_with_YAKE_Lemma_Stemm_classifiers_NER_.ipynb","provenance":[],"collapsed_sections":[],"toc_visible":true},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"EhycgLa_1gjj"},"source":["![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n","\n","[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/webinars_conferences_etc/NYC_DC_NLP_MEETUP/1_NLU_base_features_on_dataset_with_YAKE_Lemma_Stemm_classifiers_NER_.ipynb)"]},{"cell_type":"markdown","metadata":{"id":"7cZNelCJGTgJ"},"source":["# 1. Install NLU "]},{"cell_type":"code","metadata":{"id":"6GoxQmPuGNee","colab":{"base_uri":"https://localhost:8080/"},"executionInfo":{"status":"ok","timestamp":1650022887191,"user_tz":-300,"elapsed":106250,"user":{"displayName":"ahmed lone","userId":"02458088882398909889"}},"outputId":"0100c002-0a80-4b06-a666-22dbdb929711"},"source":["!wget https://setup.johnsnowlabs.com/nlu/colab.sh -O - | bash\n","import nlu"],"execution_count":1,"outputs":[{"output_type":"stream","name":"stdout","text":["--2022-04-15 11:39:40-- https://setup.johnsnowlabs.com/nlu/colab.sh\n","Resolving setup.johnsnowlabs.com (setup.johnsnowlabs.com)... 51.158.130.125\n","Connecting to setup.johnsnowlabs.com (setup.johnsnowlabs.com)|51.158.130.125|:443... connected.\n","HTTP request sent, awaiting response... 302 Moved Temporarily\n","Location: https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh [following]\n","--2022-04-15 11:39:40-- https://raw.githubusercontent.com/JohnSnowLabs/nlu/master/scripts/colab_setup.sh\n","Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n","Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 1665 (1.6K) [text/plain]\n","Saving to: ‘STDOUT’\n","\n","- 0%[ ] 0 --.-KB/s Installing NLU 3.4.3rc2 with PySpark 3.0.3 and Spark NLP 3.4.2 for Google Colab ...\n","- 100%[===================>] 1.63K --.-KB/s in 0.001s \n","\n","2022-04-15 11:39:41 (1.67 MB/s) - written to stdout [1665/1665]\n","\n","Get:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,626 B]\n","Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease\n","Get:3 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]\n","Hit:4 http://archive.ubuntu.com/ubuntu bionic InRelease\n","Get:5 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease [15.9 kB]\n","Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease\n","Get:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 Release [696 B]\n","Hit:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release\n","Get:9 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 Release.gpg [836 B]\n","Get:10 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]\n","Hit:11 http://ppa.launchpad.net/cran/libgit2/ubuntu bionic InRelease\n","Get:12 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]\n","Get:13 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu bionic InRelease [15.9 kB]\n","Hit:15 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease\n","Get:16 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 Packages [953 kB]\n","Get:17 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic/main Sources [1,947 kB]\n","Get:18 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1,490 kB]\n","Get:19 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3,134 kB]\n","Get:20 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2,695 kB]\n","Get:21 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic/main amd64 Packages [996 kB]\n","Get:22 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2,268 kB]\n","Get:23 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu bionic/main amd64 Packages [45.3 kB]\n","Fetched 13.8 MB in 4s (3,847 kB/s)\n","Reading package lists... Done\n","tar: spark-3.0.2-bin-hadoop2.7.tgz: Cannot open: No such file or directory\n","tar: Error is not recoverable: exiting now\n","\u001b[K |████████████████████████████████| 209.1 MB 60 kB/s \n","\u001b[K |████████████████████████████████| 142 kB 53.1 MB/s \n","\u001b[K |████████████████████████████████| 505 kB 58.6 MB/s \n","\u001b[K |████████████████████████████████| 198 kB 55.2 MB/s \n","\u001b[?25h Building wheel for pyspark (setup.py) ... \u001b[?25l\u001b[?25hdone\n","Collecting nlu_tmp==3.4.3rc10\n"," Downloading nlu_tmp-3.4.3rc10-py3-none-any.whl (510 kB)\n","\u001b[K |████████████████████████████████| 510 kB 5.1 MB/s \n","\u001b[?25hRequirement already satisfied: spark-nlp<3.5.0,>=3.4.2 in /usr/local/lib/python3.7/dist-packages (from nlu_tmp==3.4.3rc10) (3.4.2)\n","Requirement already satisfied: pandas>=1.3.5 in /usr/local/lib/python3.7/dist-packages (from nlu_tmp==3.4.3rc10) (1.3.5)\n","Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from nlu_tmp==3.4.3rc10) (1.21.5)\n","Requirement already satisfied: pyarrow>=0.16.0 in /usr/local/lib/python3.7/dist-packages (from nlu_tmp==3.4.3rc10) (6.0.1)\n","Requirement already satisfied: dataclasses in /usr/local/lib/python3.7/dist-packages (from nlu_tmp==3.4.3rc10) (0.6)\n","Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.3.5->nlu_tmp==3.4.3rc10) (2.8.2)\n","Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas>=1.3.5->nlu_tmp==3.4.3rc10) (2018.9)\n","Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas>=1.3.5->nlu_tmp==3.4.3rc10) (1.15.0)\n","Installing collected packages: nlu-tmp\n","Successfully installed nlu-tmp-3.4.3rc10\n"]}]},{"cell_type":"markdown","metadata":{"id":"1quiM1WB6zad"},"source":["# Download dataset with major news about crypto currencies.\n","## We will use the 'title' column for our examples\n","https://www.kaggle.com/kashnitsky/news-about-major-cryptocurrencies-20132018-40k\n","\n","![Crypto](http://ckl-it.de/wp-content/uploads/2021/02/crypto.jpeg )"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"epDSiw1gIKG6","outputId":"51b41abc-4406-48a2-ee9b-110c8dc031c9","executionInfo":{"status":"ok","timestamp":1650022889136,"user_tz":-300,"elapsed":1961,"user":{"displayName":"ahmed lone","userId":"02458088882398909889"}}},"source":["import pandas as pd \n","import nlu\n","!wget http://ckl-it.de/wp-content/uploads/2020/12/small_btc.csv \n","df = pd.read_csv('/content/small_btc.csv').title\n","df"],"execution_count":2,"outputs":[{"output_type":"stream","name":"stdout","text":["--2022-04-15 11:41:26-- http://ckl-it.de/wp-content/uploads/2020/12/small_btc.csv\n","Resolving ckl-it.de (ckl-it.de)... 217.160.0.108, 2001:8d8:100f:f000::209\n","Connecting to ckl-it.de (ckl-it.de)|217.160.0.108|:80... connected.\n","HTTP request sent, awaiting response... 200 OK\n","Length: 22244914 (21M) [text/csv]\n","Saving to: ‘small_btc.csv’\n","\n","small_btc.csv 100%[===================>] 21.21M 14.6MB/s in 1.4s \n","\n","2022-04-15 11:41:28 (14.6 MB/s) - ‘small_btc.csv’ saved [22244914/22244914]\n","\n"]},{"output_type":"execute_result","data":{"text/plain":["0 Bitcoin Price Update: Will China Lead us Down?\n","1 Key Bitcoin Price Levels for Week 51 (15 – 22 ...\n","2 National Australia Bank, Citing Highly Flawed ...\n","3 Chinese Bitcoin Ban Driven by Chinese Banking...\n","4 Bitcoin Trade Update: Opened Position\n"," ... \n","1995 Bitcoin Bill Pay Company Living Room of Satosh...\n","1996 NYDFS Extends BitLicense Bitcoin Regulation Co...\n","1997 Bitfinex Passes Stefan Thomas’s Proof Of Solve...\n","1998 Cryptocurrency Exchange Platform AlphaPoint Pa...\n","1999 Want to Buy And Sell Bitcoin Fast and Secure? ...\n","Name: title, Length: 2000, dtype: object"]},"metadata":{},"execution_count":2}]},{"cell_type":"markdown","metadata":{"id":"3piXOfyb7HOD"},"source":["# Predict sentiment of News Article titles"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":580},"id":"Fdxs3wO4ISAW","outputId":"166c7e16-1cfe-430f-e1f3-67c6b68ec780","executionInfo":{"status":"ok","timestamp":1650023008208,"user_tz":-300,"elapsed":119081,"user":{"displayName":"ahmed lone","userId":"02458088882398909889"}}},"source":["import nlu\n","# Predict sentiment on dataset with NLU sentiment model\n","sentiment_df = nlu.load('emotion').predict(df)\n","sentiment_df"],"execution_count":3,"outputs":[{"output_type":"stream","name":"stdout","text":["classifierdl_use_emotion download started this may take some time.\n","Approximate size to download 21.3 MB\n","[OK!]\n","tfhub_use download started this may take some time.\n","Approximate size to download 923.7 MB\n","[OK!]\n","sentence_detector_dl download started this may take some time.\n","Approximate size to download 354.6 KB\n","[OK!]\n"]},{"output_type":"execute_result","data":{"text/plain":[" emotion emotion_confidence_confidence \\\n","0 fear 0.998173 \n","1 joy 0.997696 \n","2 fear 0.999997 \n","3 fear 0.999135 \n","4 joy 0.998864 \n","... ... ... \n","1996 fear 0.998281 \n","1997 fear 0.772052 \n","1998 joy 0.999348 \n","1999 fear 0.998905 \n","1999 fear 0.998905 \n","\n"," sentence \\\n","0 Bitcoin Price Update: Will China Lead us Down? \n","1 Key Bitcoin Price Levels for Week 51 (15 – 22 ... \n","2 National Australia Bank, Citing Highly Flawed ... \n","3 Chinese Bitcoin Ban Driven by Chinese Banking ... \n","4 Bitcoin Trade Update: Opened Position \n","... ... \n","1996 NYDFS Extends BitLicense Bitcoin Regulation Co... \n","1997 Bitfinex Passes Stefan Thomas’s Proof Of Solve... \n","1998 Cryptocurrency Exchange Platform AlphaPoint Pa... \n","1999 Want to Buy And Sell Bitcoin Fast and Secure? \n","1999 Try CoinRNR \n","\n"," sentence_embedding_use \n","0 [0.05829371139407158, -0.036904484033584595, -... \n","1 [0.038088250905275345, -0.04514157399535179, -... \n","2 [0.05034318566322327, -0.01303655095398426, -0... \n","3 [0.055152829736471176, -0.05237917602062225, -... \n","4 [0.05926975607872009, -0.056463420391082764, -... \n","... ... \n","1996 [0.0639236643910408, -0.05505230277776718, -0.... \n","1997 [0.059178080409765244, -0.041498005390167236, ... \n","1998 [0.05369672179222107, -0.023480931296944618, -... \n","1999 [0.0626637190580368, -0.05945301055908203, -0.... \n","1999 [0.02854502573609352, 0.05557611957192421, 0.0... \n","\n","[2160 rows x 4 columns]"],"text/html":["\n","
\n"," | emotion | \n","emotion_confidence_confidence | \n","sentence | \n","sentence_embedding_use | \n","
---|---|---|---|---|
0 | \n","fear | \n","0.998173 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","[0.05829371139407158, -0.036904484033584595, -... | \n","
1 | \n","joy | \n","0.997696 | \n","Key Bitcoin Price Levels for Week 51 (15 – 22 ... | \n","[0.038088250905275345, -0.04514157399535179, -... | \n","
2 | \n","fear | \n","0.999997 | \n","National Australia Bank, Citing Highly Flawed ... | \n","[0.05034318566322327, -0.01303655095398426, -0... | \n","
3 | \n","fear | \n","0.999135 | \n","Chinese Bitcoin Ban Driven by Chinese Banking ... | \n","[0.055152829736471176, -0.05237917602062225, -... | \n","
4 | \n","joy | \n","0.998864 | \n","Bitcoin Trade Update: Opened Position | \n","[0.05926975607872009, -0.056463420391082764, -... | \n","
... | \n","... | \n","... | \n","... | \n","... | \n","
1996 | \n","fear | \n","0.998281 | \n","NYDFS Extends BitLicense Bitcoin Regulation Co... | \n","[0.0639236643910408, -0.05505230277776718, -0.... | \n","
1997 | \n","fear | \n","0.772052 | \n","Bitfinex Passes Stefan Thomas’s Proof Of Solve... | \n","[0.059178080409765244, -0.041498005390167236, ... | \n","
1998 | \n","joy | \n","0.999348 | \n","Cryptocurrency Exchange Platform AlphaPoint Pa... | \n","[0.05369672179222107, -0.023480931296944618, -... | \n","
1999 | \n","fear | \n","0.998905 | \n","Want to Buy And Sell Bitcoin Fast and Secure? | \n","[0.0626637190580368, -0.05945301055908203, -0.... | \n","
1999 | \n","fear | \n","0.998905 | \n","Try CoinRNR | \n","[0.02854502573609352, 0.05557611957192421, 0.0... | \n","
2160 rows × 4 columns
\n","\n"," | document | \n","keywords | \n","keywords_confidence | \n","
---|---|---|---|
0 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","update | \n","0.5798862558280943 | \n","
0 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","china | \n","0.5798862558280943 | \n","
0 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","china lead | \n","0.5066323531331214 | \n","
1 | \n","Key Bitcoin Price Levels for Week 51 (15 – 22 ... | \n","price | \n","0.5798862558280943 | \n","
1 | \n","Key Bitcoin Price Levels for Week 51 (15 – 22 ... | \n","levels | \n","0.5798862558280943 | \n","
... | \n","... | \n","... | \n","... | \n","
1998 | \n","Cryptocurrency Exchange Platform AlphaPoint Pa... | \n","growth | \n","0.26804494089513314 | \n","
1998 | \n","Cryptocurrency Exchange Platform AlphaPoint Pa... | \n","support growth | \n","0.1840422979793308 | \n","
1999 | \n","Want to Buy And Sell Bitcoin Fast and Secure? ... | \n","bitcoin fast | \n","0.3579604335906263 | \n","
1999 | \n","Want to Buy And Sell Bitcoin Fast and Secure? ... | \n","try coinrnr | \n","0.2564243599387429 | \n","
1999 | \n","Want to Buy And Sell Bitcoin Fast and Secure? ... | \n","sell bitcoin fast | \n","0.28203029979078753 | \n","
6085 rows × 3 columns
\n","\n"," | document | \n","stem | \n","stem_string | \n","
---|---|---|---|
0 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","[bitcoin, price, updat, :, will, china, lead, ... | \n","bitcoin price updat : will china lead u down ? | \n","
1 | \n","Key Bitcoin Price Levels for Week 51 (15 – 22 ... | \n","[kei, bitcoin, price, level, for, week, 51, (,... | \n","kei bitcoin price level for week 51 ( 15 – 22 ... | \n","
2 | \n","National Australia Bank, Citing Highly Flawed ... | \n","[nation, australia, bank, ,, cite, highli, fla... | \n","nation australia bank , cite highli flawe data... | \n","
3 | \n","Chinese Bitcoin Ban Driven by Chinese Banking ... | \n","[chines, bitcoin, ban, driven, by, chines, ban... | \n","chines bitcoin ban driven by chines bank crisi ? | \n","
4 | \n","Bitcoin Trade Update: Opened Position | \n","[bitcoin, trade, updat, :, open, posit] | \n","bitcoin trade updat : open posit | \n","
... | \n","... | \n","... | \n","... | \n","
1995 | \n","Bitcoin Bill Pay Company Living Room of Satosh... | \n","[bitcoin, bill, pai, compani, live, room, of, ... | \n","bitcoin bill pai compani live room of satoshi ... | \n","
1996 | \n","NYDFS Extends BitLicense Bitcoin Regulation Co... | \n","[nydf, extend, bitlicens, bitcoin, regul, comm... | \n","nydf extend bitlicens bitcoin regul comment pe... | \n","
1997 | \n","Bitfinex Passes Stefan Thomas’s Proof Of Solve... | \n","[bitfinex, pass, stefan, thomas’, proof, of, s... | \n","bitfinex pass stefan thomas’ proof of solvenc ... | \n","
1998 | \n","Cryptocurrency Exchange Platform AlphaPoint Pa... | \n","[cryptocurr, exchang, platform, alphapoint, pa... | \n","cryptocurr exchang platform alphapoint partner... | \n","
1999 | \n","Want to Buy And Sell Bitcoin Fast and Secure? ... | \n","[want, to, bui, and, sell, bitcoin, fast, and,... | \n","want to bui and sell bitcoin fast and secur ? ... | \n","
2000 rows × 3 columns
\n","\n"," | document | \n","keywords | \n","keywords_confidence | \n","
---|---|---|---|
0 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","update | \n","0.5798862558280943 | \n","
0 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","china | \n","0.5798862558280943 | \n","
0 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","lead | \n","0.5798862558280943 | \n","
0 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","china lead | \n","0.5066323531331214 | \n","
1 | \n","Key Bitcoin Price Levels for Week 51 (15 – 22 ... | \n","price | \n","0.5798862558280943 | \n","
... | \n","... | \n","... | \n","... | \n","
1998 | \n","Cryptocurrency Exchange Platform AlphaPoint Pa... | \n","support growth | \n","0.1840422979793308 | \n","
1999 | \n","Want to Buy And Sell Bitcoin Fast and Secure? ... | \n","sell bitcoin | \n","0.3579604335906263 | \n","
1999 | \n","Want to Buy And Sell Bitcoin Fast and Secure? ... | \n","bitcoin fast | \n","0.3579604335906263 | \n","
1999 | \n","Want to Buy And Sell Bitcoin Fast and Secure? ... | \n","try coinrnr | \n","0.2564243599387429 | \n","
1999 | \n","Want to Buy And Sell Bitcoin Fast and Secure? ... | \n","sell bitcoin fast | \n","0.28203029979078753 | \n","
8070 rows × 3 columns
\n","\n"," | document | \n","keywords | \n","keywords_confidence | \n","
---|---|---|---|
0 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","bitcoin price | \n","0.7475647452220192 | \n","
0 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","china lead | \n","0.3774989624964526 | \n","
0 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","lead us | \n","0.5619156399368569 | \n","
0 | \n","Bitcoin Price Update: Will China Lead us Down? | \n","china lead us | \n","0.49160495247060043 | \n","
1 | \n","Key Bitcoin Price Levels for Week 51 (15 – 22 ... | \n","key bitcoin | \n","0.7475647452220192 | \n","
... | \n","... | \n","... | \n","... | \n","
1998 | \n","Cryptocurrency Exchange Platform AlphaPoint Pa... | \n","bitfinex to support growth | \n","0.3685173882155852 | \n","
1999 | \n","Want to Buy And Sell Bitcoin Fast and Secure? ... | \n","sell bitcoin | \n","0.2923195563311814 | \n","
1999 | \n","Want to Buy And Sell Bitcoin Fast and Secure? ... | \n","bitcoin fast | \n","0.2923195563311814 | \n","
1999 | \n","Want to Buy And Sell Bitcoin Fast and Secure? ... | \n","try coinrnr | \n","0.15815767906792633 | \n","
1999 | \n","Want to Buy And Sell Bitcoin Fast and Secure? ... | \n","sell bitcoin fast | \n","0.20049687371139055 | \n","
7365 rows × 3 columns
\n","