{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "0_liners_intro.ipynb",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BPl-mfyt3c75"
      },
      "source": [
        "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n",
        "\n",
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/webinars_conferences_etc/NYC_DC_NLP_MEETUP/0_liners_intro.ipynb)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QZ64jJ74kQYl"
      },
      "source": [
        "# Setup Dependencies\n",
        "You need **java 8**, Spark NLP and PySpark installed in your enviroment"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "D6irEUobtZHu"
      },
      "source": [
        "import os\n",
        "! apt-get update -qq > /dev/null   \n",
        "# Install java\n",
        "! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null\n",
        "os.environ[\"JAVA_HOME\"] = \"/usr/lib/jvm/java-8-openjdk-amd64\"\n",
        "os.environ[\"PATH\"] = os.environ[\"JAVA_HOME\"] + \"/bin:\" + os.environ[\"PATH\"]\n",
        "! pip install nlu  pyspark==2.4  > /dev/null   \n",
        "import nlu  "
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HlQ0stnZkOVa"
      },
      "source": [
        "# Quick overview of easy 1-liners with NLU\n",
        "## Spellchecking, Sentiment Classification, Part of Speech, Named Entity Recognition, Other classifirs?\n",
        "\n",
        "![Things NLU can do](http://ckl-it.de/wp-content/uploads/2021/02/2021-02-11_15-51.png)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iMPy_HPLtn6Q"
      },
      "source": [
        "# Binary Sentiment classification in 1 Line\n",
        "![Binary Sentiment](https://cdn.pixabay.com/photo/2015/11/13/10/07/smiley-1041796_960_720.jpg)\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 160
        },
        "id": "8Dg1vs_7tnM0",
        "outputId": "6b19c05a-a64c-4974-9418-2126b5aec3a1"
      },
      "source": [
        "import  nlu\n",
        "nlu.load('sentiment').predict('I love NLU and rainy days!')"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "analyze_sentiment download started this may take some time.\n",
            "Approx size to download 4.9 MB\n",
            "[OK!]\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>sentence</th>\n",
              "      <th>sentiment_confidence</th>\n",
              "      <th>sentiment</th>\n",
              "      <th>checked</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>origin_index</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>I love NLU and rainy days!</td>\n",
              "      <td>0.688000</td>\n",
              "      <td>positive</td>\n",
              "      <td>[I, love, NLU, and, rainy, days, !]</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "                                sentence  ...                              checked\n",
              "origin_index                              ...                                     \n",
              "0             I love NLU and rainy days!  ...  [I, love, NLU, and, rainy, days, !]\n",
              "\n",
              "[1 rows x 4 columns]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 3
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2zcqNMaytslF"
      },
      "source": [
        "# Part of Speech (POS) in 1 line\n",
        "![Parts of Speech](https://image.shutterstock.com/image-photo/blackboard-background-written-colorful-chalk-600w-1166166529.jpg)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 430
        },
        "id": "gcHsbBVVtmW-",
        "outputId": "94ba6e3e-98bd-4ca8-f287-cdfef915d1fd"
      },
      "source": [
        "nlu.load('pos').predict('POS assigns each token in a sentence a grammatical label')"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "pos_anc download started this may take some time.\n",
            "Approximate size to download 4.3 MB\n",
            "[OK!]\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>pos</th>\n",
              "      <th>token</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>origin_index</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>NNP</td>\n",
              "      <td>POS</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>NNS</td>\n",
              "      <td>assigns</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>DT</td>\n",
              "      <td>each</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>NN</td>\n",
              "      <td>token</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>IN</td>\n",
              "      <td>in</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>DT</td>\n",
              "      <td>a</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>NN</td>\n",
              "      <td>sentence</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>DT</td>\n",
              "      <td>a</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>JJ</td>\n",
              "      <td>grammatical</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>NN</td>\n",
              "      <td>label</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "              pos        token\n",
              "origin_index                  \n",
              "0             NNP          POS\n",
              "0             NNS      assigns\n",
              "0              DT         each\n",
              "0              NN        token\n",
              "0              IN           in\n",
              "0              DT            a\n",
              "0              NN     sentence\n",
              "0              DT            a\n",
              "0              JJ  grammatical\n",
              "0              NN        label"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 4
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gsG_qnTPOt6d"
      },
      "source": [
        "# Named Entity Recognition (NER) in 1 line\n",
        "\n",
        "![NER](http://ckl-it.de/wp-content/uploads/2021/02/ner-1.png)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 250
        },
        "id": "zP9iN837tibr",
        "outputId": "f3f495b6-0e4a-4f96-b1f9-02b8e3851fd1"
      },
      "source": [
        "nlu.load('ner').predict(\"John Snow Labs congratulates the Amarican John Biden to winning the American election!\", output_level='chunk')"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "onto_recognize_entities_sm download started this may take some time.\n",
            "Approx size to download 159 MB\n",
            "[OK!]\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>entities_confidence</th>\n",
              "      <th>embeddings</th>\n",
              "      <th>entities</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>origin_index</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>PERSON</td>\n",
              "      <td>[[-0.2747400104999542, 0.48680999875068665, -0...</td>\n",
              "      <td>John Snow Labs</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>PERSON</td>\n",
              "      <td>[[-0.2747400104999542, 0.48680999875068665, -0...</td>\n",
              "      <td>the Amarican</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>PERSON</td>\n",
              "      <td>[[-0.2747400104999542, 0.48680999875068665, -0...</td>\n",
              "      <td>John Biden</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>NORP</td>\n",
              "      <td>[[-0.2747400104999542, 0.48680999875068665, -0...</td>\n",
              "      <td>American</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "             entities_confidence  ...        entities\n",
              "origin_index                      ...                \n",
              "0                         PERSON  ...  John Snow Labs\n",
              "0                         PERSON  ...    the Amarican\n",
              "0                         PERSON  ...      John Biden\n",
              "0                           NORP  ...        American\n",
              "\n",
              "[4 rows x 3 columns]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 5
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 160
        },
        "id": "gznL47ZHQDyA",
        "outputId": "b73eddec-79b2-4112-ed21-1c14735838c0"
      },
      "source": [
        "nlu.load('ner').predict(\"John Snow Labs congratiulates John Biden to winning the American election!\", output_level = 'document')"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "onto_recognize_entities_sm download started this may take some time.\n",
            "Approx size to download 159 MB\n",
            "[OK!]\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>entities_confidence</th>\n",
              "      <th>embeddings</th>\n",
              "      <th>entities</th>\n",
              "      <th>document</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>origin_index</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[PERSON, PERSON, NORP]</td>\n",
              "      <td>[[-0.2747400104999542, 0.48680999875068665, -0...</td>\n",
              "      <td>[John Snow Labs, John Biden, American]</td>\n",
              "      <td>John Snow Labs congratiulates John Biden to wi...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "                 entities_confidence  ...                                           document\n",
              "origin_index                          ...                                                   \n",
              "0             [PERSON, PERSON, NORP]  ...  John Snow Labs congratiulates John Biden to wi...\n",
              "\n",
              "[1 rows x 4 columns]"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 25
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5Q2BmQE8mcQ1"
      },
      "source": [
        "## Checkout other NER models"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "yvhrV9fvjjkc",
        "outputId": "fdb6838a-ff12-4fff-9bca-4cc1d626e5d4"
      },
      "source": [
        "nlu.print_components(action='ner')"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "For language <nl> NLU provides the following Models : \n",
            "nlu.load('nl.ner') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('nl.ner.wikiner') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('nl.ner.wikiner.glove.6B_100') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('nl.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n",
            "nlu.load('nl.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n",
            "For language <en> NLU provides the following Models : \n",
            "nlu.load('en.ner') returns Spark NLP model ner_dl\n",
            "nlu.load('en.ner.dl') returns Spark NLP model ner_dl\n",
            "nlu.load('en.ner.dl.glove.6B_100d') returns Spark NLP model ner_dl\n",
            "nlu.load('en.ner.dl.bert') returns Spark NLP model ner_dl_bert\n",
            "nlu.load('en.ner.onto') returns Spark NLP model onto_100\n",
            "nlu.load('en.ner.onto.glove.6B_100d') returns Spark NLP model onto_100\n",
            "nlu.load('en.ner.onto.glove.840B_300d') returns Spark NLP model onto_300\n",
            "nlu.load('en.ner.onto.bert.cased_base') returns Spark NLP model onto_bert_base_cased\n",
            "nlu.load('en.ner.onto.bert.cased_large') returns Spark NLP model onto_bert_large_cased\n",
            "nlu.load('en.ner.onto.electra.uncased_large') returns Spark NLP model onto_electra_large_uncased\n",
            "nlu.load('en.ner.onto.bert.small_l2_128') returns Spark NLP model onto_small_bert_L2_128\n",
            "nlu.load('en.ner.onto.bert.small_l4_256') returns Spark NLP model onto_small_bert_L4_256\n",
            "nlu.load('en.ner.onto.bert.small_l4_512') returns Spark NLP model onto_small_bert_L4_512\n",
            "nlu.load('en.ner.onto.bert.small_l8_512') returns Spark NLP model onto_small_bert_L8_512\n",
            "nlu.load('en.ner.onto.electra.uncased_small') returns Spark NLP model onto_electra_small_uncased\n",
            "nlu.load('en.ner.onto.electra.uncased_base') returns Spark NLP model onto_electra_base_uncased\n",
            "nlu.load('en.ner.bert_base_cased') returns Spark NLP model ner_dl_bert_base_cased\n",
            "nlu.load('en.ner.ade') returns Spark NLP model ade_ner_100d\n",
            "nlu.load('en.ner.aspect_sentiment') returns Spark NLP model ner_aspect_based_sentiment\n",
            "nlu.load('en.ner.glove.100d') returns Spark NLP model ner_dl_sentence\n",
            "nlu.load('en.ner.atis') returns Spark NLP model nerdl_atis_840b_300d\n",
            "nlu.load('en.ner.airline') returns Spark NLP model nerdl_atis_840b_300d\n",
            "nlu.load('en.ner.aspect.airline') returns Spark NLP model nerdl_atis_840b_300d\n",
            "nlu.load('en.ner.aspect.atis') returns Spark NLP model nerdl_atis_840b_300d\n",
            "For language <fr> NLU provides the following Models : \n",
            "nlu.load('fr.ner') returns Spark NLP model wikiner_840B_300\n",
            "nlu.load('fr.ner.wikiner') returns Spark NLP model wikiner_840B_300\n",
            "nlu.load('fr.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n",
            "nlu.load('fr.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n",
            "For language <de> NLU provides the following Models : \n",
            "nlu.load('de.ner') returns Spark NLP model wikiner_840B_300\n",
            "nlu.load('de.ner.wikiner') returns Spark NLP model wikiner_840B_300\n",
            "nlu.load('de.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n",
            "nlu.load('de.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n",
            "For language <it> NLU provides the following Models : \n",
            "nlu.load('it.ner') returns Spark NLP model wikiner_840B_300\n",
            "nlu.load('it.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n",
            "For language <no> NLU provides the following Models : \n",
            "nlu.load('no.ner') returns Spark NLP model norne_6B_100\n",
            "nlu.load('no.ner.norne') returns Spark NLP model norne_6B_100\n",
            "nlu.load('no.ner.norne.glove.6B_100') returns Spark NLP model norne_6B_100\n",
            "nlu.load('no.ner.norne.glove.6B_300') returns Spark NLP model norne_6B_300\n",
            "nlu.load('no.ner.norne.glove.840B_300') returns Spark NLP model norne_840B_300\n",
            "For language <pl> NLU provides the following Models : \n",
            "nlu.load('pl.ner') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('pl.ner.wikiner') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('pl.ner.wikiner.glove.6B_100') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('pl.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n",
            "nlu.load('pl.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n",
            "For language <pt> NLU provides the following Models : \n",
            "nlu.load('pt.ner') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('pt.ner.wikiner.glove.6B_100') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('pt.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n",
            "nlu.load('pt.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n",
            "For language <ru> NLU provides the following Models : \n",
            "nlu.load('ru.ner') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('ru.ner.wikiner') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('ru.ner.wikiner.glove.6B_100') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('ru.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n",
            "nlu.load('ru.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n",
            "For language <es> NLU provides the following Models : \n",
            "nlu.load('es.ner') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('es.ner.wikiner') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('es.ner.wikiner.glove.6B_100') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('es.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n",
            "nlu.load('es.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n",
            "For language <ar> NLU provides the following Models : \n",
            "nlu.load('ar.ner') returns Spark NLP model aner_cc_300d\n",
            "nlu.load('ar.ner.aner') returns Spark NLP model aner_cc_300d\n",
            "For language <fi> NLU provides the following Models : \n",
            "nlu.load('fi.ner') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('fi.ner.6B_100') returns Spark NLP model wikiner_6B_100\n",
            "nlu.load('fi.ner.6B_300') returns Spark NLP model wikiner_6B_300\n",
            "nlu.load('fi.ner.840B_300') returns Spark NLP model wikiner_840B_300\n",
            "nlu.load('fi.ner.6B_100d') returns Spark NLP model finnish_ner_6B_100\n",
            "nlu.load('fi.ner.6B_300d') returns Spark NLP model finnish_ner_6B_300\n",
            "nlu.load('fi.ner.840B_300d') returns Spark NLP model finnish_ner_840B_300\n",
            "For language <he> NLU provides the following Models : \n",
            "nlu.load('he.ner') returns Spark NLP model hebrewner_cc_300d\n",
            "nlu.load('he.ner.cc_300d') returns Spark NLP model hebrewner_cc_300d\n",
            "For language <da> NLU provides the following Models : \n",
            "nlu.load('da.ner') returns Spark NLP model dane_ner_6B_100\n",
            "nlu.load('da.ner.6B_100D') returns Spark NLP model dane_ner_6B_100\n",
            "nlu.load('da.ner.6B_300D') returns Spark NLP model dane_ner_6B_300\n",
            "nlu.load('da.ner.840B_300D') returns Spark NLP model dane_ner_840B_300\n",
            "For language <ja> NLU provides the following Models : \n",
            "nlu.load('ja.ner') returns Spark NLP model ner_ud_gsd_glove_840B_300d\n",
            "nlu.load('ja.ner.ud_gsd') returns Spark NLP model ner_ud_gsd_glove_840B_300d\n",
            "nlu.load('ja.ner.ud_gsd.glove_840B_300D') returns Spark NLP model ner_ud_gsd_glove_840B_300d\n",
            "For language <fa> NLU provides the following Models : \n",
            "nlu.load('fa.ner') returns Spark NLP model personer_cc_300d\n",
            "nlu.load('fa.ner.person') returns Spark NLP model personer_cc_300d\n",
            "nlu.load('fa.ner.person.cc_300d') returns Spark NLP model personer_cc_300d\n",
            "For language <sv> NLU provides the following Models : \n",
            "nlu.load('sv.ner') returns Spark NLP model swedish_ner_6B_100\n",
            "nlu.load('sv.ner.6B_100') returns Spark NLP model swedish_ner_6B_100\n",
            "nlu.load('sv.ner.6B_300') returns Spark NLP model swedish_ner_6B_300\n",
            "nlu.load('sv.ner.840B_300') returns Spark NLP model swedish_ner_840B_300\n",
            "For language <th> NLU provides the following Models : \n",
            "nlu.load('th.ner.lst20.glove_840B_300D') returns Spark NLP model ner_lst20_glove_840B_300d\n",
            "For language <tr> NLU provides the following Models : \n",
            "nlu.load('tr.ner') returns Spark NLP model turkish_ner_840B_300\n",
            "nlu.load('tr.ner.bert') returns Spark NLP model turkish_ner_bert\n",
            "For language <zh> NLU provides the following Models : \n",
            "nlu.load('zh.ner') returns Spark NLP model ner_msra_bert_768d\n",
            "nlu.load('zh.ner.bert') returns Spark NLP model ner_msra_bert_768d\n",
            "nlu.load('zh.ner.msra.bert_768D') returns Spark NLP model ner_msra_bert_768d\n",
            "nlu.load('zh.ner.weibo.bert_768d') returns Spark NLP model ner_weibo_bert_768d\n",
            "For language <ur> NLU provides the following Models : \n",
            "nlu.load('ur.ner') returns Spark NLP model uner_mk_140M_300d\n",
            "nlu.load('ur.ner.mk_140M_300d') returns Spark NLP model uner_mk_140M_300d\n",
            "For language <ko> NLU provides the following Models : \n",
            "nlu.load('ko.ner') returns Spark NLP model ner_kmou_glove_840B_300d\n",
            "nlu.load('ko.ner.kmou') returns Spark NLP model ner_kmou_glove_840B_300d\n",
            "nlu.load('ko.ner.kmou.glove_840B_300d') returns Spark NLP model ner_kmou_glove_840B_300d\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZkqBvE6BDcs5"
      },
      "source": [
        "![Bert and Elmo](https://i.guim.co.uk/img/media/c93beb4de9841200afa6ccf7ace3bd83aa65fe89/0_122_2608_1564/master/2608.jpg?width=1200&height=1200&quality=85&auto=format&fit=crop&s=1566462470b82e97e3aa290d401ebba4)\n",
        "\n",
        "# Bertology Embeddings for Sentences and Tokens\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Z5-AbiLbUIRU",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 340
        },
        "outputId": "ade6b16c-f217-4cbc-9011-073baa19bb47"
      },
      "source": [
        "nlu.load('bert').predict(\"Albert and Elmo are pretty good freidns\")"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "small_bert_L2_128 download started this may take some time.\n",
            "Approximate size to download 16.1 MB\n",
            "[OK!]\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>bert_embeddings</th>\n",
              "      <th>token</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>origin_index</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[-1.2644212245941162, 1.0388842821121216, 0.42...</td>\n",
              "      <td>Albert</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[-1.0341346263885498, 0.35990777611732483, 0.2...</td>\n",
              "      <td>and</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[-1.5926620960235596, -0.32061171531677246, -0...</td>\n",
              "      <td>Elmo</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[-0.3129887580871582, 0.2978755831718445, 0.10...</td>\n",
              "      <td>are</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[0.5073671936988831, -0.35482677817344666, 0.0...</td>\n",
              "      <td>pretty</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[-0.6654903888702393, 0.050630949437618256, -0...</td>\n",
              "      <td>good</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[-2.3138480186462402, 0.690037727355957, -0.05...</td>\n",
              "      <td>freidns</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "                                                bert_embeddings    token\n",
              "origin_index                                                            \n",
              "0             [-1.2644212245941162, 1.0388842821121216, 0.42...   Albert\n",
              "0             [-1.0341346263885498, 0.35990777611732483, 0.2...      and\n",
              "0             [-1.5926620960235596, -0.32061171531677246, -0...     Elmo\n",
              "0             [-0.3129887580871582, 0.2978755831718445, 0.10...      are\n",
              "0             [0.5073671936988831, -0.35482677817344666, 0.0...   pretty\n",
              "0             [-0.6654903888702393, 0.050630949437618256, -0...     good\n",
              "0             [-2.3138480186462402, 0.690037727355957, -0.05...  freidns"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 8
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RO70qrh0Dnax",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 340
        },
        "outputId": "18153c0c-ec15-43ef-da11-244428bf3731"
      },
      "source": [
        "nlu.load('elmo').predict(\"Albert and Elmo are pretty good freidns\")"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "elmo download started this may take some time.\n",
            "Approximate size to download 334.1 MB\n",
            "[OK!]\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>elmo_embeddings</th>\n",
              "      <th>token</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>origin_index</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[-0.9555240273475647, -1.0100127458572388, 0.7...</td>\n",
              "      <td>Albert</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[-0.02477884292602539, -0.20155462622642517, -...</td>\n",
              "      <td>and</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[0.6083736419677734, 0.20088991522789001, 0.42...</td>\n",
              "      <td>Elmo</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[-0.031240105628967285, 0.08035830408334732, -...</td>\n",
              "      <td>are</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[0.3517477512359619, -0.24238181114196777, -0....</td>\n",
              "      <td>pretty</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[0.5430472493171692, -0.19053488969802856, -0....</td>\n",
              "      <td>good</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[-0.6736612319946289, -0.15871864557266235, 0....</td>\n",
              "      <td>freidns</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "                                                elmo_embeddings    token\n",
              "origin_index                                                            \n",
              "0             [-0.9555240273475647, -1.0100127458572388, 0.7...   Albert\n",
              "0             [-0.02477884292602539, -0.20155462622642517, -...      and\n",
              "0             [0.6083736419677734, 0.20088991522789001, 0.42...     Elmo\n",
              "0             [-0.031240105628967285, 0.08035830408334732, -...      are\n",
              "0             [0.3517477512359619, -0.24238181114196777, -0....   pretty\n",
              "0             [0.5430472493171692, -0.19053488969802856, -0....     good\n",
              "0             [-0.6736612319946289, -0.15871864557266235, 0....  freidns"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 9
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "rbqjk1DlZ3OK",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 160
        },
        "outputId": "16a8dc8c-a75c-473f-d0ca-0f2b6748ccb0"
      },
      "source": [
        "nlu.load('embed_sentence.bert').predict(\"get me sum embeddings for these tokens\")\n"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "sent_small_bert_L2_128 download started this may take some time.\n",
            "Approximate size to download 16.1 MB\n",
            "[OK!]\n"
          ],
          "name": "stdout"
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>document</th>\n",
              "      <th>embed_sentence_bert_embeddings</th>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>origin_index</th>\n",
              "      <th></th>\n",
              "      <th></th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>get me sum embeddings for these tokens</td>\n",
              "      <td>[-0.8406468629837036, 0.3447624742984772, -0.0...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "                                            document                     embed_sentence_bert_embeddings\n",
              "origin_index                                                                                           \n",
              "0             get me sum embeddings for these tokens  [-0.8406468629837036, 0.3447624742984772, -0.0..."
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 10
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_Ed-mFpfmXLc"
      },
      "source": [
        "# Checkout other Embedding Models"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "KcfP2P_tDUMR",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "8c1f0b4f-bee2-4a67-fc04-ca23275cf848"
      },
      "source": [
        "nlu.print_components(action='embed')"
      ],
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "For language <en> NLU provides the following Models : \n",
            "nlu.load('en.embed') returns Spark NLP model glove_100d\n",
            "nlu.load('en.embed.glove') returns Spark NLP model glove_100d\n",
            "nlu.load('en.embed.glove.100d') returns Spark NLP model glove_100d\n",
            "nlu.load('en.embed.bert') returns Spark NLP model bert_base_uncased\n",
            "nlu.load('en.embed.bert.base_uncased') returns Spark NLP model bert_base_uncased\n",
            "nlu.load('en.embed.bert.base_cased') returns Spark NLP model bert_base_cased\n",
            "nlu.load('en.embed.bert.large_uncased') returns Spark NLP model bert_large_uncased\n",
            "nlu.load('en.embed.bert.large_cased') returns Spark NLP model bert_large_cased\n",
            "nlu.load('en.embed.biobert') returns Spark NLP model biobert_pubmed_base_cased\n",
            "nlu.load('en.embed.biobert.pubmed_base_cased') returns Spark NLP model biobert_pubmed_base_cased\n",
            "nlu.load('en.embed.biobert.pubmed_large_cased') returns Spark NLP model biobert_pubmed_large_cased\n",
            "nlu.load('en.embed.biobert.pmc_base_cased') returns Spark NLP model biobert_pmc_base_cased\n",
            "nlu.load('en.embed.biobert.pubmed_pmc_base_cased') returns Spark NLP model biobert_pubmed_pmc_base_cased\n",
            "nlu.load('en.embed.biobert.clinical_base_cased') returns Spark NLP model biobert_clinical_base_cased\n",
            "nlu.load('en.embed.biobert.discharge_base_cased') returns Spark NLP model biobert_discharge_base_cased\n",
            "nlu.load('en.embed.elmo') returns Spark NLP model elmo\n",
            "nlu.load('en.embed.use') returns Spark NLP model tfhub_use\n",
            "nlu.load('en.embed.albert') returns Spark NLP model albert_base_uncased\n",
            "nlu.load('en.embed.albert.base_uncased') returns Spark NLP model albert_base_uncased\n",
            "nlu.load('en.embed.albert.large_uncased') returns Spark NLP model albert_large_uncased\n",
            "nlu.load('en.embed.albert.xlarge_uncased') returns Spark NLP model albert_xlarge_uncased\n",
            "nlu.load('en.embed.albert.xxlarge_uncased') returns Spark NLP model albert_xxlarge_uncased\n",
            "nlu.load('en.embed.xlnet') returns Spark NLP model xlnet_base_cased\n",
            "nlu.load('en.embed.xlnet_base_cased') returns Spark NLP model xlnet_base_cased\n",
            "nlu.load('en.embed.xlnet_large_cased') returns Spark NLP model xlnet_large_cased\n",
            "nlu.load('en.embed.electra') returns Spark NLP model electra_small_uncased\n",
            "nlu.load('en.embed.electra.small_uncased') returns Spark NLP model electra_small_uncased\n",
            "nlu.load('en.embed.electra.base_uncased') returns Spark NLP model electra_base_uncased\n",
            "nlu.load('en.embed.electra.large_uncased') returns Spark NLP model electra_large_uncased\n",
            "nlu.load('en.embed.covidbert') returns Spark NLP model covidbert_large_uncased\n",
            "nlu.load('en.embed.covidbert.large_uncased') returns Spark NLP model covidbert_large_uncased\n",
            "nlu.load('en.embed.bert.small_L2_128') returns Spark NLP model small_bert_L2_128\n",
            "nlu.load('en.embed.bert.small_L4_128') returns Spark NLP model small_bert_L4_128\n",
            "nlu.load('en.embed.bert.small_L6_128') returns Spark NLP model small_bert_L6_128\n",
            "nlu.load('en.embed.bert.small_L8_128') returns Spark NLP model small_bert_L8_128\n",
            "nlu.load('en.embed.bert.small_L10_128') returns Spark NLP model small_bert_L10_128\n",
            "nlu.load('en.embed.bert.small_L12_128') returns Spark NLP model small_bert_L12_128\n",
            "nlu.load('en.embed.bert.small_L2_256') returns Spark NLP model small_bert_L2_256\n",
            "nlu.load('en.embed.bert.small_L4_256') returns Spark NLP model small_bert_L4_256\n",
            "nlu.load('en.embed.bert.small_L6_256') returns Spark NLP model small_bert_L6_256\n",
            "nlu.load('en.embed.bert.small_L8_256') returns Spark NLP model small_bert_L8_256\n",
            "nlu.load('en.embed.bert.small_L10_256') returns Spark NLP model small_bert_L10_256\n",
            "nlu.load('en.embed.bert.small_L12_256') returns Spark NLP model small_bert_L12_256\n",
            "nlu.load('en.embed.bert.small_L2_512') returns Spark NLP model small_bert_L2_512\n",
            "nlu.load('en.embed.bert.small_L4_512') returns Spark NLP model small_bert_L4_512\n",
            "nlu.load('en.embed.bert.small_L6_512') returns Spark NLP model small_bert_L6_512\n",
            "nlu.load('en.embed.bert.small_L8_512') returns Spark NLP model small_bert_L8_512\n",
            "nlu.load('en.embed.bert.small_L10_512') returns Spark NLP model small_bert_L10_512\n",
            "nlu.load('en.embed.bert.small_L12_512') returns Spark NLP model small_bert_L12_512\n",
            "nlu.load('en.embed.bert.small_L2_768') returns Spark NLP model small_bert_L2_768\n",
            "nlu.load('en.embed.bert.small_L4_768') returns Spark NLP model small_bert_L4_768\n",
            "nlu.load('en.embed.bert.small_L6_768') returns Spark NLP model small_bert_L6_768\n",
            "nlu.load('en.embed.bert.small_L8_768') returns Spark NLP model small_bert_L8_768\n",
            "nlu.load('en.embed.bert.small_L10_768') returns Spark NLP model small_bert_L10_768\n",
            "nlu.load('en.embed.bert.small_L12_768') returns Spark NLP model small_bert_L12_768\n",
            "For language <ar> NLU provides the following Models : \n",
            "nlu.load('ar.embed') returns Spark NLP model arabic_w2v_cc_300d\n",
            "nlu.load('ar.embed.cbow') returns Spark NLP model arabic_w2v_cc_300d\n",
            "nlu.load('ar.embed.cbow.300d') returns Spark NLP model arabic_w2v_cc_300d\n",
            "nlu.load('ar.embed.aner') returns Spark NLP model arabic_w2v_cc_300d\n",
            "nlu.load('ar.embed.aner.300d') returns Spark NLP model arabic_w2v_cc_300d\n",
            "nlu.load('ar.embed.glove') returns Spark NLP model arabic_w2v_cc_300d\n",
            "For language <fi> NLU provides the following Models : \n",
            "nlu.load('fi.embed.bert.') returns Spark NLP model bert_finnish_cased\n",
            "nlu.load('fi.embed.bert.cased.') returns Spark NLP model bert_finnish_cased\n",
            "nlu.load('fi.embed.bert.uncased.') returns Spark NLP model bert_finnish_uncased\n",
            "For language <he> NLU provides the following Models : \n",
            "nlu.load('he.embed') returns Spark NLP model hebrew_cc_300d\n",
            "nlu.load('he.embed.glove') returns Spark NLP model hebrew_cc_300d\n",
            "nlu.load('he.embed.cbow_300d') returns Spark NLP model hebrew_cc_300d\n",
            "For language <fa> NLU provides the following Models : \n",
            "nlu.load('fa.embed') returns Spark NLP model persian_w2v_cc_300d\n",
            "nlu.load('fa.embed.word2vec') returns Spark NLP model persian_w2v_cc_300d\n",
            "nlu.load('fa.embed.word2vec.300d') returns Spark NLP model persian_w2v_cc_300d\n",
            "For language <zh> NLU provides the following Models : \n",
            "nlu.load('zh.embed') returns Spark NLP model bert_base_chinese\n",
            "nlu.load('zh.embed.bert') returns Spark NLP model bert_base_chinese\n",
            "For language <ur> NLU provides the following Models : \n",
            "nlu.load('ur.embed') returns Spark NLP model urduvec_140M_300d\n",
            "nlu.load('ur.embed.urdu_vec_140M_300d') returns Spark NLP model urduvec_140M_300d\n",
            "For language <xx> NLU provides the following Models : \n",
            "nlu.load('xx.embed') returns Spark NLP model glove_840B_300\n",
            "nlu.load('xx.embed.glove.840B_300') returns Spark NLP model glove_840B_300\n",
            "nlu.load('xx.embed.glove.6B_300') returns Spark NLP model glove_6B_300\n",
            "nlu.load('xx.embed.bert_multi_cased') returns Spark NLP model bert_multi_cased\n",
            "nlu.load('xx.embed.bert') returns Spark NLP model bert_multi_cased\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "k_2imuNXnzIJ"
      },
      "source": [
        ""
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}