{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## NLP Interpret" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [], "source": [ "from fastai.gen_doc.nbdoc import *\n", "from fastai.text import * \n", "from fastai.text.interpret import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[`text.interpret`](/text.interpret.html#text.interpret) is the module that implements custom [`Interpretation`](/train.html#Interpretation) classes for different NLP tasks by inheriting from it." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [], "source": [ "from fastai.gen_doc.nbdoc import *\n", "from fastai.vision import *" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "

class TextClassificationInterpretation[source][test]

\n", "\n", "> TextClassificationInterpretation(**`learn`**:[`Learner`](/basic_train.html#Learner), **`preds`**:`Tensor`, **`y_true`**:`Tensor`, **`losses`**:`Tensor`, **`ds_type`**:[`DatasetType`](/basic_data.html#DatasetType)=***``***) :: [`ClassificationInterpretation`](/train.html#ClassificationInterpretation)\n", "\n", "
×

No tests found for TextClassificationInterpretation. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Provides an interpretation of classification based on input sensitivity. This was designed for AWD-LSTM only for the moment, because Transformer already has its own attentional model. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(TextClassificationInterpretation)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "

intrinsic_attention[source][test]

\n", "\n", "> intrinsic_attention(**`text`**:`str`, **`class_id`**:`int`=***`None`***)\n", "\n", "
×

No tests found for intrinsic_attention. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Calculate the intrinsic attention of the input w.r.t to an output `class_id`, or the classification given by the model if `None`. For reference, see the Sequential Jacobian session at https://www.cs.toronto.edu/~graves/preprint.pdf " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(TextClassificationInterpretation.intrinsic_attention)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "

html_intrinsic_attention[source][test]

\n", "\n", "> html_intrinsic_attention(**`text`**:`str`, **`class_id`**:`int`=***`None`***, **\\*\\*`kwargs`**) → `str`\n", "\n", "
×

No tests found for html_intrinsic_attention. To contribute a test please refer to this guide and this discussion.

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(TextClassificationInterpretation.html_intrinsic_attention)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "

show_intrinsic_attention[source][test]

\n", "\n", "> show_intrinsic_attention(**`text`**:`str`, **`class_id`**:`int`=***`None`***, **\\*\\*`kwargs`**)\n", "\n", "
×

No tests found for show_intrinsic_attention. To contribute a test please refer to this guide and this discussion.

" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(TextClassificationInterpretation.show_intrinsic_attention)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/markdown": [ "

show_top_losses[source][test]

\n", "\n", "> show_top_losses(**`k`**:`int`, **`max_len`**:`int`=***`70`***)\n", "\n", "
×

No tests found for show_top_losses. To contribute a test please refer to this guide and this discussion.

\n", "\n", "Create a tabulation showing the first `k` texts in top_losses along with their prediction, actual,loss, and probability of actual class. `max_len` is the maximum number of tokens displayed. " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "show_doc(TextClassificationInterpretation.show_top_losses)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's show how [`TextClassificationInterpretation`](/text.interpret.html#TextClassificationInterpretation) can be used once we train a text classification model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### train " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "imdb = untar_data(URLs.IMDB_SAMPLE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_lm = (TextList.from_csv(imdb, 'texts.csv', cols='text')\n", " .split_by_rand_pct()\n", " .label_for_lm()\n", " .databunch())\n", "data_lm.save()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idxtext
0! ! ! xxmaj finally this was directed by the guy who did xxmaj big xxmaj xxunk ? xxmaj must be a replay of xxmaj jonestown - hollywood style . xxmaj xxunk ! xxbos xxmaj this is a extremely well - made film . xxmaj the acting , script and camera - work are all first - rate . xxmaj the music is good , too , though it is
1) . xxmaj all in all , we were very disappointed at this xxmaj spike xxmaj lee effort ! ! xxbos a really great movie and true story . xxmaj dan xxmaj jansen the xxmaj greatest xxunk ever . a touching and beautiful movie the whole family can enjoy . xxmaj the story of xxmaj jane xxmaj xxunk battle with cancer and xxmaj dan xxmaj jansen love for his sister
2just typical folks ) in everyday settings in order to create xxunk involving and realistic films . \\n \\n xxmaj in this case , the film is about xxmaj french and xxmaj german coal miners , so appropriately , the people in the roles seem like miners -- not actors . xxmaj the central conflict as the film begins is that there is a huge mine xxunk on the
3here that xxunk banning ... which is a shame because i never would have sat through it where it not for the fact that it 's on ' the xxunk list ' . xxmaj the plot actually gives the film a decent base - or at least more of a decent base than most xxunk films - and it follows an actress who is kidnapped and dragged off into the
4xxmaj at the same time , the xxmaj john xxmaj holmes character shows a very clever hustler who is able to pass through the xxunk and xxunk situations almost xxunk . xxmaj the movie deserves being watched more than once . xxmaj the seventies ambiance xxunk and full of drugs is amazing . xxbos xxmaj if you loved xxmaj long xxmaj way xxmaj round you will enjoy this nearly as
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data_lm.show_batch()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_lossaccuracytime
04.6501123.8227810.29072900:21
14.3785613.7666160.29535700:21
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn = language_model_learner(data_lm, AWD_LSTM)\n", "learn.fit_one_cycle(2, 1e-2)\n", "learn.save('mini_train_lm')\n", "learn.save_encoder('mini_train_encoder')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data_clas = (TextList.from_csv(imdb, 'texts.csv', cols='text', vocab=data_lm.vocab)\n", " .split_from_df(col='is_valid')\n", " .label_from_df(cols='label')\n", " .databunch(bs=42))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
epochtrain_lossvalid_lossaccuracytime
00.6664740.6660000.60500000:16
10.6660530.6465650.61500000:18
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "learn = text_classifier_learner(data_clas, AWD_LSTM)\n", "learn.load_encoder('mini_train_encoder')\n", "learn.fit_one_cycle(2, slice(1e-3,1e-2))\n", "learn.save('mini_train_clas')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### interpret" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "interp = TextClassificationInterpretation.from_learner(learn) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "xxbos i really like this movie , it is amazing !" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "interp.show_intrinsic_attention(\"I really like this movie, it is amazing!\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Undocumented Methods - Methods moved below this line will intentionally be hidden" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## New Methods - Please document or move to the undocumented section" ] } ], "metadata": { "jekyll": { "keywords": "fastai", "summary": "Easy access of language models and ULMFiT", "title": "text.learner" }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }