{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## NLP datasets" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [], "source": [ "from fastai.gen_doc.nbdoc import *\n", "from fastai.text import * \n", "from fastai.gen_doc.nbdoc import *\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This module contains the [`TextDataset`](/text.data.html#TextDataset) class, which is the main dataset you should use for your NLP tasks. It automatically does the preprocessing steps described in [`text.transform`](/text.transform.html#text.transform). It also contains all the functions to quickly get a [`TextDataBunch`](/text.data.html#TextDataBunch) ready." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quickly assemble your data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should get your data in one of the following formats to make the most of the fastai library and use one of the factory methods of one of the [`TextDataBunch`](/text.data.html#TextDataBunch) classes:\n", "- raw text files in folders train, valid, test in an ImageNet style,\n", "- a csv where some column(s) gives the label(s) and the following one the associated text,\n", "- a dataframe structured the same way,\n", "- tokens and labels arrays,\n", "- ids, vocabulary (correspondence id to word) and labels.\n", "\n", "If you are assembling the data for a language model, you should define your labels as always 0 to respect those formats. The first time you create a [`DataBunch`](/basic_data.html#DataBunch) with one of those functions, your data will be preprocessed automatically. You can save it, so that the next time you call it is almost instantaneous. \n", "\n", "Below are the classes that help assembling the raw data in a [`DataBunch`](/basic_data.html#DataBunch) suitable for NLP." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "
class TextLMDataBunch[source][test]TextLMDataBunch(**`train_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`valid_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`fix_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)=***`None`***, **`test_dl`**:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=***`None`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`no_check`**:`bool`=***`False`***) :: [`TextDataBunch`](/text.data.html#TextDataBunch)\n",
"\n",
"Tests found for TextLMDataBunch:
Some other tests where TextLMDataBunch is used:
pytest -sv tests/test_text_data.py::test_from_csv_and_from_df [source]pytest -sv tests/test_text_data.py::test_should_load_backwards_lm_1 [source]pytest -sv tests/test_text_data.py::test_should_load_backwards_lm_2 [source]To run tests please refer to this guide.
create[source][test]create(**`train_ds`**, **`valid_ds`**, **`test_ds`**=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`no_check`**:`bool`=***`False`***, **`bs`**=***`64`***, **`val_bs`**:`int`=***`None`***, **`num_workers`**:`int`=***`0`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`bptt`**:`int`=***`70`***, **`backwards`**:`bool`=***`False`***, **\\*\\*`dl_kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"No tests found for create. To contribute a test please refer to this guide and this discussion.
class TextClasDataBunch[source][test]TextClasDataBunch(**`train_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`valid_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`fix_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)=***`None`***, **`test_dl`**:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=***`None`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`no_check`**:`bool`=***`False`***) :: [`TextDataBunch`](/text.data.html#TextDataBunch)\n",
"\n",
"Tests found for TextClasDataBunch:
Some other tests where TextClasDataBunch is used:
pytest -sv tests/test_text_data.py::test_backwards_cls_databunch [source]pytest -sv tests/test_text_data.py::test_from_csv_and_from_df [source]pytest -sv tests/test_text_data.py::test_from_ids_exports_classes [source]pytest -sv tests/test_text_data.py::test_from_ids_works_for_equally_length_sentences [source]pytest -sv tests/test_text_data.py::test_from_ids_works_for_variable_length_sentences [source]pytest -sv tests/test_text_data.py::test_load_and_save_test [source]To run tests please refer to this guide.
create[source][test]create(**`train_ds`**, **`valid_ds`**, **`test_ds`**=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`bs`**:`int`=***`32`***, **`val_bs`**:`int`=***`None`***, **`pad_idx`**=***`1`***, **`pad_first`**=***`True`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`no_check`**:`bool`=***`False`***, **`backwards`**:`bool`=***`False`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **\\*\\*`dl_kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"No tests found for create. To contribute a test please refer to this guide and this discussion.
class TextDataBunch[source][test]TextDataBunch(**`train_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`valid_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`fix_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)=***`None`***, **`test_dl`**:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=***`None`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`no_check`**:`bool`=***`False`***) :: [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"No tests found for TextDataBunch. To contribute a test please refer to this guide and this discussion.
from_folder[source][test]from_folder(**`path`**:`PathOrStr`, **`train`**:`str`=***`'train'`***, **`valid`**:`str`=***`'valid'`***, **`test`**:`Optional`\\[`str`\\]=***`None`***, **`classes`**:`ArgStar`=***`None`***, **`tokenizer`**:[`Tokenizer`](/text.transform.html#Tokenizer)=***`None`***, **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`chunksize`**:`int`=***`10000`***, **`max_vocab`**:`int`=***`60000`***, **`min_freq`**:`int`=***`2`***, **`mark_fields`**:`bool`=***`False`***, **`include_bos`**:`bool`=***`True`***, **`include_eos`**:`bool`=***`False`***, **\\*\\*`kwargs`**)\n",
"\n",
"train, `valid` and maybe `test` folders. Text files in the train and `valid` folders should be placed in subdirectories according to their classes (not applicable for a language model). `tokenizer` will be used to parse those texts into tokens.\n",
"\n",
"You can pass a specific `vocab` for the numericalization step (if you are building a classifier from a language model you fine-tuned for instance). kwargs will be split between the [`TextDataset`](/text.data.html#TextDataset) function and to the class initialization, you can precise there parameters such as `max_vocab`, `chunksize`, `min_freq`, `n_labels` (see the [`TextDataset`](/text.data.html#TextDataset) documentation) or `bs`, `bptt` and `pad_idx` (see the sections LM data and classifier data)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide_input": true
},
"outputs": [
{
"data": {
"text/markdown": [
"from_csv[source][test]from_csv(**`path`**:`PathOrStr`, **`csv_name`**, **`valid_pct`**:`float`=***`0.2`***, **`test`**:`Optional`\\[`str`\\]=***`None`***, **`tokenizer`**:[`Tokenizer`](/text.transform.html#Tokenizer)=***`None`***, **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`classes`**:`StrList`=***`None`***, **`delimiter`**:`str`=***`None`***, **`header`**=***`'infer'`***, **`text_cols`**:`IntsOrStrs`=***`1`***, **`label_cols`**:`IntsOrStrs`=***`0`***, **`label_delim`**:`str`=***`None`***, **`chunksize`**:`int`=***`10000`***, **`max_vocab`**:`int`=***`60000`***, **`min_freq`**:`int`=***`2`***, **`mark_fields`**:`bool`=***`False`***, **`include_bos`**:`bool`=***`True`***, **`include_eos`**:`bool`=***`False`***, **\\*\\*`kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"from_df[source][test]from_df(**`path`**:`PathOrStr`, **`train_df`**:`DataFrame`, **`valid_df`**:`DataFrame`, **`test_df`**:`OptDataFrame`=***`None`***, **`tokenizer`**:[`Tokenizer`](/text.transform.html#Tokenizer)=***`None`***, **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`classes`**:`StrList`=***`None`***, **`text_cols`**:`IntsOrStrs`=***`1`***, **`label_cols`**:`IntsOrStrs`=***`0`***, **`label_delim`**:`str`=***`None`***, **`chunksize`**:`int`=***`10000`***, **`max_vocab`**:`int`=***`60000`***, **`min_freq`**:`int`=***`2`***, **`mark_fields`**:`bool`=***`False`***, **`include_bos`**:`bool`=***`True`***, **`include_eos`**:`bool`=***`False`***, **\\*\\*`kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"Tests found for from_df:
pytest -sv tests/test_text_data.py::test_from_csv_and_from_df [source]Some other tests where from_df is used:
pytest -sv tests/test_text_data.py::test_backwards_cls_databunch [source]pytest -sv tests/test_text_data.py::test_load_and_save_test [source]pytest -sv tests/test_text_data.py::test_regression [source]pytest -sv tests/test_text_data.py::test_should_load_backwards_lm_1 [source]pytest -sv tests/test_text_data.py::test_should_load_backwards_lm_2 [source]To run tests please refer to this guide.
from_tokens[source][test]from_tokens(**`path`**:`PathOrStr`, **`trn_tok`**:`Tokens`, **`trn_lbls`**:`Collection`\\[`Union`\\[`int`, `float`\\]\\], **`val_tok`**:`Tokens`, **`val_lbls`**:`Collection`\\[`Union`\\[`int`, `float`\\]\\], **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`tst_tok`**:`Tokens`=***`None`***, **`classes`**:`ArgStar`=***`None`***, **`max_vocab`**:`int`=***`60000`***, **`min_freq`**:`int`=***`3`***, **\\*\\*`kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"No tests found for from_tokens. To contribute a test please refer to this guide and this discussion.
from_ids[source][test]from_ids(**`path`**:`PathOrStr`, **`vocab`**:[`Vocab`](/text.transform.html#Vocab), **`train_ids`**:`Collection`\\[`Collection`\\[`int`\\]\\], **`valid_ids`**:`Collection`\\[`Collection`\\[`int`\\]\\], **`test_ids`**:`Collection`\\[`Collection`\\[`int`\\]\\]=***`None`***, **`train_lbls`**:`Collection`\\[`Union`\\[`int`, `float`\\]\\]=***`None`***, **`valid_lbls`**:`Collection`\\[`Union`\\[`int`, `float`\\]\\]=***`None`***, **`classes`**:`ArgStar`=***`None`***, **`processor`**:[`PreProcessor`](/data_block.html#PreProcessor)=***`None`***, **\\*\\*`kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"Tests found for from_ids:
pytest -sv tests/test_text_data.py::test_from_ids_exports_classes [source]pytest -sv tests/test_text_data.py::test_from_ids_works_for_equally_length_sentences [source]pytest -sv tests/test_text_data.py::test_from_ids_works_for_variable_length_sentences [source]To run tests please refer to this guide.
load[source][test]load(**`path`**:`PathOrStr`, **`cache_name`**:`PathOrStr`=***`'tmp'`***, **`processor`**:[`PreProcessor`](/data_block.html#PreProcessor)=***`None`***, **\\*\\*`kwargs`**)\n",
"\n",
"No tests found for load. To contribute a test please refer to this guide and this discussion.
| \n", " | label | \n", "text | \n", "is_valid | \n", "
|---|---|---|---|
| 0 | \n", "negative | \n", "Un-bleeping-believable! Meg Ryan doesn't even ... | \n", "False | \n", "
| 1 | \n", "positive | \n", "This is a extremely well-made film. The acting... | \n", "False | \n", "
| 2 | \n", "negative | \n", "Every once in a long while a movie will come a... | \n", "False | \n", "
| 3 | \n", "positive | \n", "Name just says it all. I watched this movie wi... | \n", "False | \n", "
| 4 | \n", "negative | \n", "This movie succeeds at being one of the most u... | \n", "False | \n", "
class Text[source][test]Text(**`ids`**, **`text`**) :: [`ItemBase`](/core.html#ItemBase)\n",
"\n",
"No tests found for Text. To contribute a test please refer to this guide and this discussion.
text data in numericalized `ids`. "
],
"text/plain": [
"class TextList[source][test]TextList(**`items`**:`Iterator`\\[`T_co`\\], **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`pad_idx`**:`int`=***`1`***, **`sep`**=***`' '`***, **\\*\\*`kwargs`**) :: [`ItemList`](/data_block.html#ItemList)\n",
"\n",
"label_for_lm[source][test]label_for_lm(**\\*\\*`kwargs`**)\n",
"\n",
"No tests found for label_for_lm. To contribute a test please refer to this guide and this discussion.
from_folder[source][test]from_folder(**`path`**:`PathOrStr`=***`'.'`***, **`extensions`**:`StrList`=***`{'.txt'}`***, **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`processor`**:[`PreProcessor`](/data_block.html#PreProcessor)=***`None`***, **\\*\\*`kwargs`**) → `TextList`\n",
"\n",
"show_xys[source][test]show_xys(**`xs`**, **`ys`**, **`max_len`**:`int`=***`70`***)\n",
"\n",
"No tests found for show_xys. To contribute a test please refer to this guide and this discussion.
show_xyzs[source][test]show_xyzs(**`xs`**, **`ys`**, **`zs`**, **`max_len`**:`int`=***`70`***)\n",
"\n",
"No tests found for show_xyzs. To contribute a test please refer to this guide and this discussion.
class OpenFileProcessor[source][test]OpenFileProcessor(**`ds`**:`Collection`\\[`T_co`\\]=***`None`***) :: [`PreProcessor`](/data_block.html#PreProcessor)\n",
"\n",
"No tests found for OpenFileProcessor. To contribute a test please refer to this guide and this discussion.
open_text[source][test]open_text(**`fn`**:`PathOrStr`, **`enc`**=***`'utf-8'`***)\n",
"\n",
"No tests found for open_text. To contribute a test please refer to this guide and this discussion.
class TokenizeProcessor[source][test]TokenizeProcessor(**`ds`**:[`ItemList`](/data_block.html#ItemList)=***`None`***, **`tokenizer`**:[`Tokenizer`](/text.transform.html#Tokenizer)=***`None`***, **`chunksize`**:`int`=***`10000`***, **`mark_fields`**:`bool`=***`False`***, **`include_bos`**:`bool`=***`True`***, **`include_eos`**:`bool`=***`False`***) :: [`PreProcessor`](/data_block.html#PreProcessor)\n",
"\n",
"No tests found for TokenizeProcessor. To contribute a test please refer to this guide and this discussion.
class NumericalizeProcessor[source][test]NumericalizeProcessor(**`ds`**:[`ItemList`](/data_block.html#ItemList)=***`None`***, **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`max_vocab`**:`int`=***`60000`***, **`min_freq`**:`int`=***`3`***) :: [`PreProcessor`](/data_block.html#PreProcessor)\n",
"\n",
"No tests found for NumericalizeProcessor. To contribute a test please refer to this guide and this discussion.
class SPProcessor[source][test]SPProcessor(**`ds`**:[`ItemList`](/data_block.html#ItemList)=***`None`***, **`pre_rules`**:`ListRules`=***`None`***, **`post_rules`**:`ListRules`=***`None`***, **`vocab_sz`**:`int`=***`None`***, **`max_vocab_sz`**:`int`=***`30000`***, **`model_type`**:`str`=***`'unigram'`***, **`max_sentence_len`**:`int`=***`20480`***, **`lang`**=***`'en'`***, **`char_coverage`**=***`None`***, **`tmp_dir`**=***`'tmp'`***, **`mark_fields`**:`bool`=***`False`***, **`include_bos`**:`bool`=***`True`***, **`include_eos`**:`bool`=***`False`***, **`sp_model`**=***`None`***, **`sp_vocab`**=***`None`***, **`n_cpus`**:`int`=***`None`***, **`enc`**=***`'utf8'`***) :: [`PreProcessor`](/data_block.html#PreProcessor)\n",
"\n",
"No tests found for SPProcessor. To contribute a test please refer to this guide and this discussion.
| \n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "10 | \n", "11 | \n", "12 | \n", "13 | \n", "14 | \n", "15 | \n", "16 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "crew | \n", "that | \n", "he | \n", "can | \n", "trust | \n", "to | \n", "help | \n", "him | \n", "pull | \n", "it | \n", "off | \n", "and | \n", "get | \n", "his | \n", "xxunk | \n", "None | \n", "None | \n", "
| 1 | \n", "want | \n", "a | \n", "good | \n", "family | \n", "movie | \n", ", | \n", "this | \n", "might | \n", "do | \n", ". | \n", "xxmaj | \n", "it | \n", "is | \n", "clean | \n", ". | \n", "None | \n", "None | \n", "
| 2 | \n", "director | \n", "of | \n", "many | \n", "bad | \n", "xxunk | \n", ") | \n", "tries | \n", "to | \n", "cover | \n", "the | \n", "info | \n", "up | \n", ", | \n", "but | \n", "goo | \n", "None | \n", "None | \n", "
| 3 | \n", "film | \n", ", | \n", "and | \n", "the | \n", "xxunk | \n", "xxunk | \n", "of | \n", "the | \n", "villain | \n", ", | \n", "humorous | \n", "or | \n", "not | \n", ", | \n", "are | \n", "None | \n", "None | \n", "
| 4 | \n", "cole | \n", "in | \n", "the | \n", "beginning | \n", "are | \n", "meant | \n", "to | \n", "draw | \n", "comparisons | \n", "which | \n", "leave | \n", "the | \n", "audience | \n", "xxunk | \n", ". | \n", "None | \n", "None | \n", "
| 5 | \n", "witness | \n", "xxmaj | \n", "brian | \n", "dealing | \n", "with | \n", "his | \n", "situation | \n", "through | \n", "first | \n", ", | \n", "primitive | \n", "means | \n", ", | \n", "and | \n", "then | \n", "None | \n", "None | \n", "
| 6 | \n", "film | \n", ", | \n", "or | \n", "not | \n", ". | \n", "\\n | \n", "\\n | \n", "\n", " | xxmaj | \n", "this | \n", "film | \n", ". | \n", "xxmaj | \n", "film | \n", "? | \n", "xxmaj | \n", "this | \n", "
| 7 | \n", "xxunk | \n", "sitting | \n", "through | \n", "this | \n", "bomb | \n", ". | \n", "xxmaj | \n", "the | \n", "crew | \n", "member | \n", "who | \n", "was | \n", "in | \n", "charge | \n", "of | \n", "None | \n", "None | \n", "
| 8 | \n", "this | \n", "film | \n", "is | \n", "viewed | \n", "as | \n", "non | \n", "xxup | \n", "xxunk | \n", "but | \n", "there | \n", "is | \n", "a | \n", "speech | \n", "by | \n", "xxmaj | \n", "None | \n", "None | \n", "
| 9 | \n", "mention | \n", "the | \n", "pace | \n", "of | \n", "the | \n", "movie | \n", ". | \n", "xxmaj | \n", "to | \n", "my | \n", "mind | \n", ", | \n", "this | \n", "new | \n", "version | \n", "None | \n", "None | \n", "
| 10 | \n", "of | \n", "yours | \n", "! | \n", "' | \n", "\\n | \n", "\\n | \n", "\n", " | xxmaj | \n", "director | \n", "xxmaj | \n", "xxunk | \n", "xxmaj | \n", "xxunk | \n", ", | \n", "who | \n", "is | \n", "xxunk | \n", "
| 11 | \n", "pair | \n", ", | \n", "xxmaj | \n", "harry | \n", "xxmaj | \n", "michell | \n", "as | \n", "xxmaj | \n", "harry | \n", ", | \n", "xxmaj | \n", "rosie | \n", "xxmaj | \n", "michell | \n", "as | \n", "None | \n", "None | \n", "
| 12 | \n", "cares | \n", "who | \n", "lives | \n", "and | \n", "who | \n", "dies | \n", ", | \n", "i | \n", "'ll | \n", "be | \n", "shocked | \n", ". | \n", "xxmaj | \n", "the | \n", "same | \n", "None | \n", "None | \n", "
| 13 | \n", "is | \n", "incredibly | \n", "stupid | \n", ", | \n", "with | \n", "a | \n", "detective | \n", "trying | \n", "to | \n", "track | \n", "down | \n", "a | \n", "suspected | \n", "serial | \n", "killer | \n", "None | \n", "None | \n", "
| 14 | \n", "independent | \n", "film | \n", "was | \n", "one | \n", "of | \n", "the | \n", "best | \n", "films | \n", "at | \n", "the | \n", "tall | \n", "grass | \n", "film | \n", "festival | \n", "that | \n", "None | \n", "None | \n", "
class LanguageModelPreLoader[source][test]LanguageModelPreLoader(**`dataset`**:[`LabelList`](/data_block.html#LabelList), **`lengths`**:`Collection`\\[`int`\\]=***`None`***, **`bs`**:`int`=***`32`***, **`bptt`**:`int`=***`70`***, **`backwards`**:`bool`=***`False`***, **`shuffle`**:`bool`=***`False`***) :: [`Callback`](/callback.html#Callback)\n",
"\n",
"No tests found for LanguageModelPreLoader. To contribute a test please refer to this guide and this discussion.
class SortSampler[source][test]SortSampler(**`data_source`**:`NPArrayList`, **`key`**:`KeyFunc`) :: [`Sampler`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Sampler)\n",
"\n",
"class SortishSampler[source][test]SortishSampler(**`data_source`**:`NPArrayList`, **`key`**:`KeyFunc`, **`bs`**:`int`) :: [`Sampler`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Sampler)\n",
"\n",
"pad_collate[source][test]pad_collate(**`samples`**:`BatchSamples`, **`pad_idx`**:`int`=***`1`***, **`pad_first`**:`bool`=***`True`***, **`backwards`**:`bool`=***`False`***) → `Tuple`\\[`LongTensor`, `LongTensor`\\]\n",
"\n",
"No tests found for pad_collate. To contribute a test please refer to this guide and this discussion.
new[source][test]new(**`items`**:`Iterator`\\[`T_co`\\], **`processor`**:`Union`\\[[`PreProcessor`](/data_block.html#PreProcessor), `Collection`\\[[`PreProcessor`](/data_block.html#PreProcessor)\\]\\]=***`None`***, **\\*\\*`kwargs`**) → `ItemList`\n",
"\n",
"No tests found for new. To contribute a test please refer to this guide and this discussion.
get[source][test]get(**`i`**)\n",
"\n",
"No tests found for get. To contribute a test please refer to this guide and this discussion.
process_one[source][test]process_one(**`item`**)\n",
"\n",
"No tests found for process_one. To contribute a test please refer to this guide and this discussion.
process[source][test]process(**`ds`**)\n",
"\n",
"No tests found for process. To contribute a test please refer to this guide and this discussion.
process_one[source][test]process_one(**`item`**)\n",
"\n",
"No tests found for process_one. To contribute a test please refer to this guide and this discussion.
process[source][test]process(**`ds`**)\n",
"\n",
"No tests found for process. To contribute a test please refer to this guide and this discussion.
process_one[source][test]process_one(**`item`**)\n",
"\n",
"No tests found for process_one. To contribute a test please refer to this guide and this discussion.
reconstruct[source][test]reconstruct(**`t`**:`Tensor`)\n",
"\n",
"No tests found for reconstruct. To contribute a test please refer to this guide and this discussion.
on_epoch_begin[source][test]on_epoch_begin(**\\*\\*`kwargs`**)\n",
"\n",
"No tests found for on_epoch_begin. To contribute a test please refer to this guide and this discussion.
on_epoch_end[source][test]on_epoch_end(**\\*\\*`kwargs`**)\n",
"\n",
"No tests found for on_epoch_end. To contribute a test please refer to this guide and this discussion.
class LMLabelList[source][test]LMLabelList(**`items`**:`Iterator`\\[`T_co`\\], **\\*\\*`kwargs`**) :: [`EmptyLabelList`](/data_block.html#EmptyLabelList)\n",
"\n",
"No tests found for LMLabelList. To contribute a test please refer to this guide and this discussion.
allocate_buffers[source][test]allocate_buffers()\n",
"\n",
"No tests found for allocate_buffers. To contribute a test please refer to this guide and this discussion.
shuffle[source][test]shuffle()\n",
"\n",
"No tests found for shuffle. To contribute a test please refer to this guide and this discussion.
fill_row[source][test]fill_row(**`forward`**, **`items`**, **`idx`**, **`row`**, **`ro`**, **`ri`**, **`overlap`**, **`lengths`**)\n",
"\n",
"No tests found for fill_row. To contribute a test please refer to this guide and this discussion.