{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "0_liners_intro.ipynb", "provenance": [], "collapsed_sections": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "BPl-mfyt3c75" }, "source": [ "![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/nlu/blob/master/examples/webinars_conferences_etc/NYC_DC_NLP_MEETUP/0_liners_intro.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "QZ64jJ74kQYl" }, "source": [ "# Setup Dependencies\n", "You need **java 8**, Spark NLP and PySpark installed in your enviroment" ] }, { "cell_type": "code", "metadata": { "id": "D6irEUobtZHu" }, "source": [ "import os\n", "! apt-get update -qq > /dev/null \n", "# Install java\n", "! apt-get install -y openjdk-8-jdk-headless -qq > /dev/null\n", "os.environ[\"JAVA_HOME\"] = \"/usr/lib/jvm/java-8-openjdk-amd64\"\n", "os.environ[\"PATH\"] = os.environ[\"JAVA_HOME\"] + \"/bin:\" + os.environ[\"PATH\"]\n", "! pip install nlu pyspark==2.4 > /dev/null \n", "import nlu " ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "HlQ0stnZkOVa" }, "source": [ "# Quick overview of easy 1-liners with NLU\n", "## Spellchecking, Sentiment Classification, Part of Speech, Named Entity Recognition, Other classifirs?\n", "\n", "![Things NLU can do](http://ckl-it.de/wp-content/uploads/2021/02/2021-02-11_15-51.png)" ] }, { "cell_type": "markdown", "metadata": { "id": "iMPy_HPLtn6Q" }, "source": [ "# Binary Sentiment classification in 1 Line\n", "![Binary Sentiment](https://cdn.pixabay.com/photo/2015/11/13/10/07/smiley-1041796_960_720.jpg)\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 160 }, "id": "8Dg1vs_7tnM0", "outputId": "6b19c05a-a64c-4974-9418-2126b5aec3a1" }, "source": [ "import nlu\n", "nlu.load('sentiment').predict('I love NLU and rainy days!')" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "analyze_sentiment download started this may take some time.\n", "Approx size to download 4.9 MB\n", "[OK!]\n" ], "name": "stdout" }, { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sentencesentiment_confidencesentimentchecked
origin_index
0I love NLU and rainy days!0.688000positive[I, love, NLU, and, rainy, days, !]
\n", "
" ], "text/plain": [ " sentence ... checked\n", "origin_index ... \n", "0 I love NLU and rainy days! ... [I, love, NLU, and, rainy, days, !]\n", "\n", "[1 rows x 4 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 3 } ] }, { "cell_type": "markdown", "metadata": { "id": "2zcqNMaytslF" }, "source": [ "# Part of Speech (POS) in 1 line\n", "![Parts of Speech](https://image.shutterstock.com/image-photo/blackboard-background-written-colorful-chalk-600w-1166166529.jpg)" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 430 }, "id": "gcHsbBVVtmW-", "outputId": "94ba6e3e-98bd-4ca8-f287-cdfef915d1fd" }, "source": [ "nlu.load('pos').predict('POS assigns each token in a sentence a grammatical label')" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "pos_anc download started this may take some time.\n", "Approximate size to download 4.3 MB\n", "[OK!]\n" ], "name": "stdout" }, { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
postoken
origin_index
0NNPPOS
0NNSassigns
0DTeach
0NNtoken
0INin
0DTa
0NNsentence
0DTa
0JJgrammatical
0NNlabel
\n", "
" ], "text/plain": [ " pos token\n", "origin_index \n", "0 NNP POS\n", "0 NNS assigns\n", "0 DT each\n", "0 NN token\n", "0 IN in\n", "0 DT a\n", "0 NN sentence\n", "0 DT a\n", "0 JJ grammatical\n", "0 NN label" ] }, "metadata": { "tags": [] }, "execution_count": 4 } ] }, { "cell_type": "markdown", "metadata": { "id": "gsG_qnTPOt6d" }, "source": [ "# Named Entity Recognition (NER) in 1 line\n", "\n", "![NER](http://ckl-it.de/wp-content/uploads/2021/02/ner-1.png)" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 250 }, "id": "zP9iN837tibr", "outputId": "f3f495b6-0e4a-4f96-b1f9-02b8e3851fd1" }, "source": [ "nlu.load('ner').predict(\"John Snow Labs congratulates the Amarican John Biden to winning the American election!\", output_level='chunk')" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "onto_recognize_entities_sm download started this may take some time.\n", "Approx size to download 159 MB\n", "[OK!]\n" ], "name": "stdout" }, { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
entities_confidenceembeddingsentities
origin_index
0PERSON[[-0.2747400104999542, 0.48680999875068665, -0...John Snow Labs
0PERSON[[-0.2747400104999542, 0.48680999875068665, -0...the Amarican
0PERSON[[-0.2747400104999542, 0.48680999875068665, -0...John Biden
0NORP[[-0.2747400104999542, 0.48680999875068665, -0...American
\n", "
" ], "text/plain": [ " entities_confidence ... entities\n", "origin_index ... \n", "0 PERSON ... John Snow Labs\n", "0 PERSON ... the Amarican\n", "0 PERSON ... John Biden\n", "0 NORP ... American\n", "\n", "[4 rows x 3 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 5 } ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 160 }, "id": "gznL47ZHQDyA", "outputId": "b73eddec-79b2-4112-ed21-1c14735838c0" }, "source": [ "nlu.load('ner').predict(\"John Snow Labs congratiulates John Biden to winning the American election!\", output_level = 'document')" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "onto_recognize_entities_sm download started this may take some time.\n", "Approx size to download 159 MB\n", "[OK!]\n" ], "name": "stdout" }, { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
entities_confidenceembeddingsentitiesdocument
origin_index
0[PERSON, PERSON, NORP][[-0.2747400104999542, 0.48680999875068665, -0...[John Snow Labs, John Biden, American]John Snow Labs congratiulates John Biden to wi...
\n", "
" ], "text/plain": [ " entities_confidence ... document\n", "origin_index ... \n", "0 [PERSON, PERSON, NORP] ... John Snow Labs congratiulates John Biden to wi...\n", "\n", "[1 rows x 4 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 25 } ] }, { "cell_type": "markdown", "metadata": { "id": "5Q2BmQE8mcQ1" }, "source": [ "## Checkout other NER models" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "yvhrV9fvjjkc", "outputId": "fdb6838a-ff12-4fff-9bca-4cc1d626e5d4" }, "source": [ "nlu.print_components(action='ner')" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "For language NLU provides the following Models : \n", "nlu.load('nl.ner') returns Spark NLP model wikiner_6B_100\n", "nlu.load('nl.ner.wikiner') returns Spark NLP model wikiner_6B_100\n", "nlu.load('nl.ner.wikiner.glove.6B_100') returns Spark NLP model wikiner_6B_100\n", "nlu.load('nl.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n", "nlu.load('nl.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n", "For language NLU provides the following Models : \n", "nlu.load('en.ner') returns Spark NLP model ner_dl\n", "nlu.load('en.ner.dl') returns Spark NLP model ner_dl\n", "nlu.load('en.ner.dl.glove.6B_100d') returns Spark NLP model ner_dl\n", "nlu.load('en.ner.dl.bert') returns Spark NLP model ner_dl_bert\n", "nlu.load('en.ner.onto') returns Spark NLP model onto_100\n", "nlu.load('en.ner.onto.glove.6B_100d') returns Spark NLP model onto_100\n", "nlu.load('en.ner.onto.glove.840B_300d') returns Spark NLP model onto_300\n", "nlu.load('en.ner.onto.bert.cased_base') returns Spark NLP model onto_bert_base_cased\n", "nlu.load('en.ner.onto.bert.cased_large') returns Spark NLP model onto_bert_large_cased\n", "nlu.load('en.ner.onto.electra.uncased_large') returns Spark NLP model onto_electra_large_uncased\n", "nlu.load('en.ner.onto.bert.small_l2_128') returns Spark NLP model onto_small_bert_L2_128\n", "nlu.load('en.ner.onto.bert.small_l4_256') returns Spark NLP model onto_small_bert_L4_256\n", "nlu.load('en.ner.onto.bert.small_l4_512') returns Spark NLP model onto_small_bert_L4_512\n", "nlu.load('en.ner.onto.bert.small_l8_512') returns Spark NLP model onto_small_bert_L8_512\n", "nlu.load('en.ner.onto.electra.uncased_small') returns Spark NLP model onto_electra_small_uncased\n", "nlu.load('en.ner.onto.electra.uncased_base') returns Spark NLP model onto_electra_base_uncased\n", "nlu.load('en.ner.bert_base_cased') returns Spark NLP model ner_dl_bert_base_cased\n", "nlu.load('en.ner.ade') returns Spark NLP model ade_ner_100d\n", "nlu.load('en.ner.aspect_sentiment') returns Spark NLP model ner_aspect_based_sentiment\n", "nlu.load('en.ner.glove.100d') returns Spark NLP model ner_dl_sentence\n", "nlu.load('en.ner.atis') returns Spark NLP model nerdl_atis_840b_300d\n", "nlu.load('en.ner.airline') returns Spark NLP model nerdl_atis_840b_300d\n", "nlu.load('en.ner.aspect.airline') returns Spark NLP model nerdl_atis_840b_300d\n", "nlu.load('en.ner.aspect.atis') returns Spark NLP model nerdl_atis_840b_300d\n", "For language NLU provides the following Models : \n", "nlu.load('fr.ner') returns Spark NLP model wikiner_840B_300\n", "nlu.load('fr.ner.wikiner') returns Spark NLP model wikiner_840B_300\n", "nlu.load('fr.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n", "nlu.load('fr.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n", "For language NLU provides the following Models : \n", "nlu.load('de.ner') returns Spark NLP model wikiner_840B_300\n", "nlu.load('de.ner.wikiner') returns Spark NLP model wikiner_840B_300\n", "nlu.load('de.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n", "nlu.load('de.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n", "For language NLU provides the following Models : \n", "nlu.load('it.ner') returns Spark NLP model wikiner_840B_300\n", "nlu.load('it.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n", "For language NLU provides the following Models : \n", "nlu.load('no.ner') returns Spark NLP model norne_6B_100\n", "nlu.load('no.ner.norne') returns Spark NLP model norne_6B_100\n", "nlu.load('no.ner.norne.glove.6B_100') returns Spark NLP model norne_6B_100\n", "nlu.load('no.ner.norne.glove.6B_300') returns Spark NLP model norne_6B_300\n", "nlu.load('no.ner.norne.glove.840B_300') returns Spark NLP model norne_840B_300\n", "For language NLU provides the following Models : \n", "nlu.load('pl.ner') returns Spark NLP model wikiner_6B_100\n", "nlu.load('pl.ner.wikiner') returns Spark NLP model wikiner_6B_100\n", "nlu.load('pl.ner.wikiner.glove.6B_100') returns Spark NLP model wikiner_6B_100\n", "nlu.load('pl.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n", "nlu.load('pl.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n", "For language NLU provides the following Models : \n", "nlu.load('pt.ner') returns Spark NLP model wikiner_6B_100\n", "nlu.load('pt.ner.wikiner.glove.6B_100') returns Spark NLP model wikiner_6B_100\n", "nlu.load('pt.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n", "nlu.load('pt.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n", "For language NLU provides the following Models : \n", "nlu.load('ru.ner') returns Spark NLP model wikiner_6B_100\n", "nlu.load('ru.ner.wikiner') returns Spark NLP model wikiner_6B_100\n", "nlu.load('ru.ner.wikiner.glove.6B_100') returns Spark NLP model wikiner_6B_100\n", "nlu.load('ru.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n", "nlu.load('ru.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n", "For language NLU provides the following Models : \n", "nlu.load('es.ner') returns Spark NLP model wikiner_6B_100\n", "nlu.load('es.ner.wikiner') returns Spark NLP model wikiner_6B_100\n", "nlu.load('es.ner.wikiner.glove.6B_100') returns Spark NLP model wikiner_6B_100\n", "nlu.load('es.ner.wikiner.glove.6B_300') returns Spark NLP model wikiner_6B_300\n", "nlu.load('es.ner.wikiner.glove.840B_300') returns Spark NLP model wikiner_840B_300\n", "For language NLU provides the following Models : \n", "nlu.load('ar.ner') returns Spark NLP model aner_cc_300d\n", "nlu.load('ar.ner.aner') returns Spark NLP model aner_cc_300d\n", "For language NLU provides the following Models : \n", "nlu.load('fi.ner') returns Spark NLP model wikiner_6B_100\n", "nlu.load('fi.ner.6B_100') returns Spark NLP model wikiner_6B_100\n", "nlu.load('fi.ner.6B_300') returns Spark NLP model wikiner_6B_300\n", "nlu.load('fi.ner.840B_300') returns Spark NLP model wikiner_840B_300\n", "nlu.load('fi.ner.6B_100d') returns Spark NLP model finnish_ner_6B_100\n", "nlu.load('fi.ner.6B_300d') returns Spark NLP model finnish_ner_6B_300\n", "nlu.load('fi.ner.840B_300d') returns Spark NLP model finnish_ner_840B_300\n", "For language NLU provides the following Models : \n", "nlu.load('he.ner') returns Spark NLP model hebrewner_cc_300d\n", "nlu.load('he.ner.cc_300d') returns Spark NLP model hebrewner_cc_300d\n", "For language NLU provides the following Models : \n", "nlu.load('da.ner') returns Spark NLP model dane_ner_6B_100\n", "nlu.load('da.ner.6B_100D') returns Spark NLP model dane_ner_6B_100\n", "nlu.load('da.ner.6B_300D') returns Spark NLP model dane_ner_6B_300\n", "nlu.load('da.ner.840B_300D') returns Spark NLP model dane_ner_840B_300\n", "For language NLU provides the following Models : \n", "nlu.load('ja.ner') returns Spark NLP model ner_ud_gsd_glove_840B_300d\n", "nlu.load('ja.ner.ud_gsd') returns Spark NLP model ner_ud_gsd_glove_840B_300d\n", "nlu.load('ja.ner.ud_gsd.glove_840B_300D') returns Spark NLP model ner_ud_gsd_glove_840B_300d\n", "For language NLU provides the following Models : \n", "nlu.load('fa.ner') returns Spark NLP model personer_cc_300d\n", "nlu.load('fa.ner.person') returns Spark NLP model personer_cc_300d\n", "nlu.load('fa.ner.person.cc_300d') returns Spark NLP model personer_cc_300d\n", "For language NLU provides the following Models : \n", "nlu.load('sv.ner') returns Spark NLP model swedish_ner_6B_100\n", "nlu.load('sv.ner.6B_100') returns Spark NLP model swedish_ner_6B_100\n", "nlu.load('sv.ner.6B_300') returns Spark NLP model swedish_ner_6B_300\n", "nlu.load('sv.ner.840B_300') returns Spark NLP model swedish_ner_840B_300\n", "For language NLU provides the following Models : \n", "nlu.load('th.ner.lst20.glove_840B_300D') returns Spark NLP model ner_lst20_glove_840B_300d\n", "For language NLU provides the following Models : \n", "nlu.load('tr.ner') returns Spark NLP model turkish_ner_840B_300\n", "nlu.load('tr.ner.bert') returns Spark NLP model turkish_ner_bert\n", "For language NLU provides the following Models : \n", "nlu.load('zh.ner') returns Spark NLP model ner_msra_bert_768d\n", "nlu.load('zh.ner.bert') returns Spark NLP model ner_msra_bert_768d\n", "nlu.load('zh.ner.msra.bert_768D') returns Spark NLP model ner_msra_bert_768d\n", "nlu.load('zh.ner.weibo.bert_768d') returns Spark NLP model ner_weibo_bert_768d\n", "For language NLU provides the following Models : \n", "nlu.load('ur.ner') returns Spark NLP model uner_mk_140M_300d\n", "nlu.load('ur.ner.mk_140M_300d') returns Spark NLP model uner_mk_140M_300d\n", "For language NLU provides the following Models : \n", "nlu.load('ko.ner') returns Spark NLP model ner_kmou_glove_840B_300d\n", "nlu.load('ko.ner.kmou') returns Spark NLP model ner_kmou_glove_840B_300d\n", "nlu.load('ko.ner.kmou.glove_840B_300d') returns Spark NLP model ner_kmou_glove_840B_300d\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "ZkqBvE6BDcs5" }, "source": [ "![Bert and Elmo](https://i.guim.co.uk/img/media/c93beb4de9841200afa6ccf7ace3bd83aa65fe89/0_122_2608_1564/master/2608.jpg?width=1200&height=1200&quality=85&auto=format&fit=crop&s=1566462470b82e97e3aa290d401ebba4)\n", "\n", "# Bertology Embeddings for Sentences and Tokens\n" ] }, { "cell_type": "code", "metadata": { "id": "Z5-AbiLbUIRU", "colab": { "base_uri": "https://localhost:8080/", "height": 340 }, "outputId": "ade6b16c-f217-4cbc-9011-073baa19bb47" }, "source": [ "nlu.load('bert').predict(\"Albert and Elmo are pretty good freidns\")" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "small_bert_L2_128 download started this may take some time.\n", "Approximate size to download 16.1 MB\n", "[OK!]\n" ], "name": "stdout" }, { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bert_embeddingstoken
origin_index
0[-1.2644212245941162, 1.0388842821121216, 0.42...Albert
0[-1.0341346263885498, 0.35990777611732483, 0.2...and
0[-1.5926620960235596, -0.32061171531677246, -0...Elmo
0[-0.3129887580871582, 0.2978755831718445, 0.10...are
0[0.5073671936988831, -0.35482677817344666, 0.0...pretty
0[-0.6654903888702393, 0.050630949437618256, -0...good
0[-2.3138480186462402, 0.690037727355957, -0.05...freidns
\n", "
" ], "text/plain": [ " bert_embeddings token\n", "origin_index \n", "0 [-1.2644212245941162, 1.0388842821121216, 0.42... Albert\n", "0 [-1.0341346263885498, 0.35990777611732483, 0.2... and\n", "0 [-1.5926620960235596, -0.32061171531677246, -0... Elmo\n", "0 [-0.3129887580871582, 0.2978755831718445, 0.10... are\n", "0 [0.5073671936988831, -0.35482677817344666, 0.0... pretty\n", "0 [-0.6654903888702393, 0.050630949437618256, -0... good\n", "0 [-2.3138480186462402, 0.690037727355957, -0.05... freidns" ] }, "metadata": { "tags": [] }, "execution_count": 8 } ] }, { "cell_type": "code", "metadata": { "id": "RO70qrh0Dnax", "colab": { "base_uri": "https://localhost:8080/", "height": 340 }, "outputId": "18153c0c-ec15-43ef-da11-244428bf3731" }, "source": [ "nlu.load('elmo').predict(\"Albert and Elmo are pretty good freidns\")" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "elmo download started this may take some time.\n", "Approximate size to download 334.1 MB\n", "[OK!]\n" ], "name": "stdout" }, { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
elmo_embeddingstoken
origin_index
0[-0.9555240273475647, -1.0100127458572388, 0.7...Albert
0[-0.02477884292602539, -0.20155462622642517, -...and
0[0.6083736419677734, 0.20088991522789001, 0.42...Elmo
0[-0.031240105628967285, 0.08035830408334732, -...are
0[0.3517477512359619, -0.24238181114196777, -0....pretty
0[0.5430472493171692, -0.19053488969802856, -0....good
0[-0.6736612319946289, -0.15871864557266235, 0....freidns
\n", "
" ], "text/plain": [ " elmo_embeddings token\n", "origin_index \n", "0 [-0.9555240273475647, -1.0100127458572388, 0.7... Albert\n", "0 [-0.02477884292602539, -0.20155462622642517, -... and\n", "0 [0.6083736419677734, 0.20088991522789001, 0.42... Elmo\n", "0 [-0.031240105628967285, 0.08035830408334732, -... are\n", "0 [0.3517477512359619, -0.24238181114196777, -0.... pretty\n", "0 [0.5430472493171692, -0.19053488969802856, -0.... good\n", "0 [-0.6736612319946289, -0.15871864557266235, 0.... freidns" ] }, "metadata": { "tags": [] }, "execution_count": 9 } ] }, { "cell_type": "code", "metadata": { "id": "rbqjk1DlZ3OK", "colab": { "base_uri": "https://localhost:8080/", "height": 160 }, "outputId": "16a8dc8c-a75c-473f-d0ca-0f2b6748ccb0" }, "source": [ "nlu.load('embed_sentence.bert').predict(\"get me sum embeddings for these tokens\")\n" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "sent_small_bert_L2_128 download started this may take some time.\n", "Approximate size to download 16.1 MB\n", "[OK!]\n" ], "name": "stdout" }, { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
documentembed_sentence_bert_embeddings
origin_index
0get me sum embeddings for these tokens[-0.8406468629837036, 0.3447624742984772, -0.0...
\n", "
" ], "text/plain": [ " document embed_sentence_bert_embeddings\n", "origin_index \n", "0 get me sum embeddings for these tokens [-0.8406468629837036, 0.3447624742984772, -0.0..." ] }, "metadata": { "tags": [] }, "execution_count": 10 } ] }, { "cell_type": "markdown", "metadata": { "id": "_Ed-mFpfmXLc" }, "source": [ "# Checkout other Embedding Models" ] }, { "cell_type": "code", "metadata": { "id": "KcfP2P_tDUMR", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "8c1f0b4f-bee2-4a67-fc04-ca23275cf848" }, "source": [ "nlu.print_components(action='embed')" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "For language NLU provides the following Models : \n", "nlu.load('en.embed') returns Spark NLP model glove_100d\n", "nlu.load('en.embed.glove') returns Spark NLP model glove_100d\n", "nlu.load('en.embed.glove.100d') returns Spark NLP model glove_100d\n", "nlu.load('en.embed.bert') returns Spark NLP model bert_base_uncased\n", "nlu.load('en.embed.bert.base_uncased') returns Spark NLP model bert_base_uncased\n", "nlu.load('en.embed.bert.base_cased') returns Spark NLP model bert_base_cased\n", "nlu.load('en.embed.bert.large_uncased') returns Spark NLP model bert_large_uncased\n", "nlu.load('en.embed.bert.large_cased') returns Spark NLP model bert_large_cased\n", "nlu.load('en.embed.biobert') returns Spark NLP model biobert_pubmed_base_cased\n", "nlu.load('en.embed.biobert.pubmed_base_cased') returns Spark NLP model biobert_pubmed_base_cased\n", "nlu.load('en.embed.biobert.pubmed_large_cased') returns Spark NLP model biobert_pubmed_large_cased\n", "nlu.load('en.embed.biobert.pmc_base_cased') returns Spark NLP model biobert_pmc_base_cased\n", "nlu.load('en.embed.biobert.pubmed_pmc_base_cased') returns Spark NLP model biobert_pubmed_pmc_base_cased\n", "nlu.load('en.embed.biobert.clinical_base_cased') returns Spark NLP model biobert_clinical_base_cased\n", "nlu.load('en.embed.biobert.discharge_base_cased') returns Spark NLP model biobert_discharge_base_cased\n", "nlu.load('en.embed.elmo') returns Spark NLP model elmo\n", "nlu.load('en.embed.use') returns Spark NLP model tfhub_use\n", "nlu.load('en.embed.albert') returns Spark NLP model albert_base_uncased\n", "nlu.load('en.embed.albert.base_uncased') returns Spark NLP model albert_base_uncased\n", "nlu.load('en.embed.albert.large_uncased') returns Spark NLP model albert_large_uncased\n", "nlu.load('en.embed.albert.xlarge_uncased') returns Spark NLP model albert_xlarge_uncased\n", "nlu.load('en.embed.albert.xxlarge_uncased') returns Spark NLP model albert_xxlarge_uncased\n", "nlu.load('en.embed.xlnet') returns Spark NLP model xlnet_base_cased\n", "nlu.load('en.embed.xlnet_base_cased') returns Spark NLP model xlnet_base_cased\n", "nlu.load('en.embed.xlnet_large_cased') returns Spark NLP model xlnet_large_cased\n", "nlu.load('en.embed.electra') returns Spark NLP model electra_small_uncased\n", "nlu.load('en.embed.electra.small_uncased') returns Spark NLP model electra_small_uncased\n", "nlu.load('en.embed.electra.base_uncased') returns Spark NLP model electra_base_uncased\n", "nlu.load('en.embed.electra.large_uncased') returns Spark NLP model electra_large_uncased\n", "nlu.load('en.embed.covidbert') returns Spark NLP model covidbert_large_uncased\n", "nlu.load('en.embed.covidbert.large_uncased') returns Spark NLP model covidbert_large_uncased\n", "nlu.load('en.embed.bert.small_L2_128') returns Spark NLP model small_bert_L2_128\n", "nlu.load('en.embed.bert.small_L4_128') returns Spark NLP model small_bert_L4_128\n", "nlu.load('en.embed.bert.small_L6_128') returns Spark NLP model small_bert_L6_128\n", "nlu.load('en.embed.bert.small_L8_128') returns Spark NLP model small_bert_L8_128\n", "nlu.load('en.embed.bert.small_L10_128') returns Spark NLP model small_bert_L10_128\n", "nlu.load('en.embed.bert.small_L12_128') returns Spark NLP model small_bert_L12_128\n", "nlu.load('en.embed.bert.small_L2_256') returns Spark NLP model small_bert_L2_256\n", "nlu.load('en.embed.bert.small_L4_256') returns Spark NLP model small_bert_L4_256\n", "nlu.load('en.embed.bert.small_L6_256') returns Spark NLP model small_bert_L6_256\n", "nlu.load('en.embed.bert.small_L8_256') returns Spark NLP model small_bert_L8_256\n", "nlu.load('en.embed.bert.small_L10_256') returns Spark NLP model small_bert_L10_256\n", "nlu.load('en.embed.bert.small_L12_256') returns Spark NLP model small_bert_L12_256\n", "nlu.load('en.embed.bert.small_L2_512') returns Spark NLP model small_bert_L2_512\n", "nlu.load('en.embed.bert.small_L4_512') returns Spark NLP model small_bert_L4_512\n", "nlu.load('en.embed.bert.small_L6_512') returns Spark NLP model small_bert_L6_512\n", "nlu.load('en.embed.bert.small_L8_512') returns Spark NLP model small_bert_L8_512\n", "nlu.load('en.embed.bert.small_L10_512') returns Spark NLP model small_bert_L10_512\n", "nlu.load('en.embed.bert.small_L12_512') returns Spark NLP model small_bert_L12_512\n", "nlu.load('en.embed.bert.small_L2_768') returns Spark NLP model small_bert_L2_768\n", "nlu.load('en.embed.bert.small_L4_768') returns Spark NLP model small_bert_L4_768\n", "nlu.load('en.embed.bert.small_L6_768') returns Spark NLP model small_bert_L6_768\n", "nlu.load('en.embed.bert.small_L8_768') returns Spark NLP model small_bert_L8_768\n", "nlu.load('en.embed.bert.small_L10_768') returns Spark NLP model small_bert_L10_768\n", "nlu.load('en.embed.bert.small_L12_768') returns Spark NLP model small_bert_L12_768\n", "For language NLU provides the following Models : \n", "nlu.load('ar.embed') returns Spark NLP model arabic_w2v_cc_300d\n", "nlu.load('ar.embed.cbow') returns Spark NLP model arabic_w2v_cc_300d\n", "nlu.load('ar.embed.cbow.300d') returns Spark NLP model arabic_w2v_cc_300d\n", "nlu.load('ar.embed.aner') returns Spark NLP model arabic_w2v_cc_300d\n", "nlu.load('ar.embed.aner.300d') returns Spark NLP model arabic_w2v_cc_300d\n", "nlu.load('ar.embed.glove') returns Spark NLP model arabic_w2v_cc_300d\n", "For language NLU provides the following Models : \n", "nlu.load('fi.embed.bert.') returns Spark NLP model bert_finnish_cased\n", "nlu.load('fi.embed.bert.cased.') returns Spark NLP model bert_finnish_cased\n", "nlu.load('fi.embed.bert.uncased.') returns Spark NLP model bert_finnish_uncased\n", "For language NLU provides the following Models : \n", "nlu.load('he.embed') returns Spark NLP model hebrew_cc_300d\n", "nlu.load('he.embed.glove') returns Spark NLP model hebrew_cc_300d\n", "nlu.load('he.embed.cbow_300d') returns Spark NLP model hebrew_cc_300d\n", "For language NLU provides the following Models : \n", "nlu.load('fa.embed') returns Spark NLP model persian_w2v_cc_300d\n", "nlu.load('fa.embed.word2vec') returns Spark NLP model persian_w2v_cc_300d\n", "nlu.load('fa.embed.word2vec.300d') returns Spark NLP model persian_w2v_cc_300d\n", "For language NLU provides the following Models : \n", "nlu.load('zh.embed') returns Spark NLP model bert_base_chinese\n", "nlu.load('zh.embed.bert') returns Spark NLP model bert_base_chinese\n", "For language NLU provides the following Models : \n", "nlu.load('ur.embed') returns Spark NLP model urduvec_140M_300d\n", "nlu.load('ur.embed.urdu_vec_140M_300d') returns Spark NLP model urduvec_140M_300d\n", "For language NLU provides the following Models : \n", "nlu.load('xx.embed') returns Spark NLP model glove_840B_300\n", "nlu.load('xx.embed.glove.840B_300') returns Spark NLP model glove_840B_300\n", "nlu.load('xx.embed.glove.6B_300') returns Spark NLP model glove_6B_300\n", "nlu.load('xx.embed.bert_multi_cased') returns Spark NLP model bert_multi_cased\n", "nlu.load('xx.embed.bert') returns Spark NLP model bert_multi_cased\n" ], "name": "stdout" } ] }, { "cell_type": "code", "metadata": { "id": "k_2imuNXnzIJ" }, "source": [ "" ], "execution_count": null, "outputs": [] } ] }