{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## NLP datasets" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [], "source": [ "from fastai.gen_doc.nbdoc import *\n", "from fastai.text import * \n", "from fastai.gen_doc.nbdoc import *\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This module contains the [`TextDataset`](/text.data.html#TextDataset) class, which is the main dataset you should use for your NLP tasks. It automatically does the preprocessing steps described in [`text.transform`](/text.transform.html#text.transform). It also contains all the functions to quickly get a [`TextDataBunch`](/text.data.html#TextDataBunch) ready." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quickly assemble your data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should get your data in one of the following formats to make the most of the fastai library and use one of the factory methods of one of the [`TextDataBunch`](/text.data.html#TextDataBunch) classes:\n", "- raw text files in folders train, valid, test in an ImageNet style,\n", "- a csv where some column(s) gives the label(s) and the following one the associated text,\n", "- a dataframe structured the same way,\n", "- tokens and labels arrays,\n", "- ids, vocabulary (correspondence id to word) and labels.\n", "\n", "If you are assembling the data for a language model, you should define your labels as always 0 to respect those formats. The first time you create a [`DataBunch`](/basic_data.html#DataBunch) with one of those functions, your data will be preprocessed automatically. You can save it, so that the next time you call it is almost instantaneous. \n", "\n", "Below are the classes that help assembling the raw data in a [`DataBunch`](/basic_data.html#DataBunch) suitable for NLP." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "
class
TextLMDataBunch
[source][test]TextLMDataBunch
(**`train_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`valid_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`fix_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)=***`None`***, **`test_dl`**:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=***`None`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`no_check`**:`bool`=***`False`***) :: [`TextDataBunch`](/text.data.html#TextDataBunch)\n",
"\n",
"Tests found for TextLMDataBunch
:
Some other tests where TextLMDataBunch
is used:
pytest -sv tests/test_text_data.py::test_from_csv_and_from_df
[source]pytest -sv tests/test_text_data.py::test_should_load_backwards_lm_1
[source]pytest -sv tests/test_text_data.py::test_should_load_backwards_lm_2
[source]To run tests please refer to this guide.
create
[source][test]create
(**`train_ds`**, **`valid_ds`**, **`test_ds`**=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`no_check`**:`bool`=***`False`***, **`bs`**=***`64`***, **`val_bs`**:`int`=***`None`***, **`num_workers`**:`int`=***`0`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`bptt`**:`int`=***`70`***, **`backwards`**:`bool`=***`False`***, **\\*\\*`dl_kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"No tests found for create
. To contribute a test please refer to this guide and this discussion.
class
TextClasDataBunch
[source][test]TextClasDataBunch
(**`train_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`valid_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`fix_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)=***`None`***, **`test_dl`**:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=***`None`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`no_check`**:`bool`=***`False`***) :: [`TextDataBunch`](/text.data.html#TextDataBunch)\n",
"\n",
"Tests found for TextClasDataBunch
:
Some other tests where TextClasDataBunch
is used:
pytest -sv tests/test_text_data.py::test_backwards_cls_databunch
[source]pytest -sv tests/test_text_data.py::test_from_csv_and_from_df
[source]pytest -sv tests/test_text_data.py::test_from_ids_works_for_equally_length_sentences
[source]pytest -sv tests/test_text_data.py::test_from_ids_works_for_variable_length_sentences
[source]pytest -sv tests/test_text_data.py::test_load_and_save_test
[source]To run tests please refer to this guide.
create
[source][test]create
(**`train_ds`**, **`valid_ds`**, **`test_ds`**=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`bs`**:`int`=***`32`***, **`val_bs`**:`int`=***`None`***, **`pad_idx`**=***`1`***, **`pad_first`**=***`True`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`no_check`**:`bool`=***`False`***, **`backwards`**:`bool`=***`False`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **\\*\\*`dl_kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"No tests found for create
. To contribute a test please refer to this guide and this discussion.
class
TextDataBunch
[source][test]TextDataBunch
(**`train_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`valid_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`fix_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)=***`None`***, **`test_dl`**:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=***`None`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`no_check`**:`bool`=***`False`***) :: [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"No tests found for TextDataBunch
. To contribute a test please refer to this guide and this discussion.
from_folder
[source][test]from_folder
(**`path`**:`PathOrStr`, **`train`**:`str`=***`'train'`***, **`valid`**:`str`=***`'valid'`***, **`test`**:`Optional`\\[`str`\\]=***`None`***, **`classes`**:`ArgStar`=***`None`***, **`tokenizer`**:[`Tokenizer`](/text.transform.html#Tokenizer)=***`None`***, **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`chunksize`**:`int`=***`10000`***, **`max_vocab`**:`int`=***`60000`***, **`min_freq`**:`int`=***`2`***, **`mark_fields`**:`bool`=***`False`***, **`include_bos`**:`bool`=***`True`***, **`include_eos`**:`bool`=***`False`***, **\\*\\*`kwargs`**)\n",
"\n",
"train
, `valid` and maybe `test` folders. Text files in the train
and `valid` folders should be places in subdirectories according to their classes (not applicable for a language model). `tokenizer` will be used to parse those texts into tokens.\n",
"\n",
"You can pass a specific `vocab` for the numericalization step (if you are building a classifier from a language model you fine-tuned for instance). kwargs will be split between the [`TextDataset`](/text.data.html#TextDataset) function and to the class initialization, you can precise there parameters such as `max_vocab`, `chunksize`, `min_freq`, `n_labels` (see the [`TextDataset`](/text.data.html#TextDataset) documentation) or `bs`, `bptt` and `pad_idx` (see the sections LM data and classifier data)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"hide_input": true
},
"outputs": [
{
"data": {
"text/markdown": [
"from_csv
[source][test]from_csv
(**`path`**:`PathOrStr`, **`csv_name`**, **`valid_pct`**:`float`=***`0.2`***, **`test`**:`Optional`\\[`str`\\]=***`None`***, **`tokenizer`**:[`Tokenizer`](/text.transform.html#Tokenizer)=***`None`***, **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`classes`**:`StrList`=***`None`***, **`delimiter`**:`str`=***`None`***, **`header`**=***`'infer'`***, **`text_cols`**:`IntsOrStrs`=***`1`***, **`label_cols`**:`IntsOrStrs`=***`0`***, **`label_delim`**:`str`=***`None`***, **`chunksize`**:`int`=***`10000`***, **`max_vocab`**:`int`=***`60000`***, **`min_freq`**:`int`=***`2`***, **`mark_fields`**:`bool`=***`False`***, **`include_bos`**:`bool`=***`True`***, **`include_eos`**:`bool`=***`False`***, **\\*\\*`kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"from_df
[source][test]from_df
(**`path`**:`PathOrStr`, **`train_df`**:`DataFrame`, **`valid_df`**:`DataFrame`, **`test_df`**:`OptDataFrame`=***`None`***, **`tokenizer`**:[`Tokenizer`](/text.transform.html#Tokenizer)=***`None`***, **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`classes`**:`StrList`=***`None`***, **`text_cols`**:`IntsOrStrs`=***`1`***, **`label_cols`**:`IntsOrStrs`=***`0`***, **`label_delim`**:`str`=***`None`***, **`chunksize`**:`int`=***`10000`***, **`max_vocab`**:`int`=***`60000`***, **`min_freq`**:`int`=***`2`***, **`mark_fields`**:`bool`=***`False`***, **`include_bos`**:`bool`=***`True`***, **`include_eos`**:`bool`=***`False`***, **\\*\\*`kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"Tests found for from_df
:
pytest -sv tests/test_text_data.py::test_from_csv_and_from_df
[source]Some other tests where from_df
is used:
pytest -sv tests/test_text_data.py::test_backwards_cls_databunch
[source]pytest -sv tests/test_text_data.py::test_load_and_save_test
[source]pytest -sv tests/test_text_data.py::test_regression
[source]pytest -sv tests/test_text_data.py::test_should_load_backwards_lm_1
[source]pytest -sv tests/test_text_data.py::test_should_load_backwards_lm_2
[source]To run tests please refer to this guide.
from_tokens
[source][test]from_tokens
(**`path`**:`PathOrStr`, **`trn_tok`**:`Tokens`, **`trn_lbls`**:`Collection`\\[`Union`\\[`int`, `float`\\]\\], **`val_tok`**:`Tokens`, **`val_lbls`**:`Collection`\\[`Union`\\[`int`, `float`\\]\\], **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`tst_tok`**:`Tokens`=***`None`***, **`classes`**:`ArgStar`=***`None`***, **`max_vocab`**:`int`=***`60000`***, **`min_freq`**:`int`=***`3`***, **\\*\\*`kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"No tests found for from_tokens
. To contribute a test please refer to this guide and this discussion.
from_ids
[source][test]from_ids
(**`path`**:`PathOrStr`, **`vocab`**:[`Vocab`](/text.transform.html#Vocab), **`train_ids`**:`Collection`\\[`Collection`\\[`int`\\]\\], **`valid_ids`**:`Collection`\\[`Collection`\\[`int`\\]\\], **`test_ids`**:`Collection`\\[`Collection`\\[`int`\\]\\]=***`None`***, **`train_lbls`**:`Collection`\\[`Union`\\[`int`, `float`\\]\\]=***`None`***, **`valid_lbls`**:`Collection`\\[`Union`\\[`int`, `float`\\]\\]=***`None`***, **`classes`**:`ArgStar`=***`None`***, **`processor`**:[`PreProcessor`](/data_block.html#PreProcessor)=***`None`***, **\\*\\*`kwargs`**) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"load
[source][test]load
(**`path`**:`PathOrStr`, **`cache_name`**:`PathOrStr`=***`'tmp'`***, **`processor`**:[`PreProcessor`](/data_block.html#PreProcessor)=***`None`***, **\\*\\*`kwargs`**)\n",
"\n",
"No tests found for load
. To contribute a test please refer to this guide and this discussion.
\n", " | label | \n", "text | \n", "is_valid | \n", "
---|---|---|---|
0 | \n", "negative | \n", "Un-bleeping-believable! Meg Ryan doesn't even ... | \n", "False | \n", "
1 | \n", "positive | \n", "This is a extremely well-made film. The acting... | \n", "False | \n", "
2 | \n", "negative | \n", "Every once in a long while a movie will come a... | \n", "False | \n", "
3 | \n", "positive | \n", "Name just says it all. I watched this movie wi... | \n", "False | \n", "
4 | \n", "negative | \n", "This movie succeeds at being one of the most u... | \n", "False | \n", "
class
Text
[source][test]Text
(**`ids`**, **`text`**) :: [`ItemBase`](/core.html#ItemBase)\n",
"\n",
"No tests found for Text
. To contribute a test please refer to this guide and this discussion.
text
data in numericalized `ids`. "
],
"text/plain": [
"class
TextList
[source][test]TextList
(**`items`**:`Iterator`\\[`T_co`\\], **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`pad_idx`**:`int`=***`1`***, **\\*\\*`kwargs`**) :: [`ItemList`](/data_block.html#ItemList)\n",
"\n",
"label_for_lm
[source][test]label_for_lm
(**\\*\\*`kwargs`**)\n",
"\n",
"No tests found for label_for_lm
. To contribute a test please refer to this guide and this discussion.
from_folder
[source][test]from_folder
(**`path`**:`PathOrStr`=***`'.'`***, **`extensions`**:`StrList`=***`{'.txt'}`***, **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`processor`**:[`PreProcessor`](/data_block.html#PreProcessor)=***`None`***, **\\*\\*`kwargs`**) → `TextList`\n",
"\n",
"show_xys
[source][test]show_xys
(**`xs`**, **`ys`**, **`max_len`**:`int`=***`70`***)\n",
"\n",
"No tests found for show_xys
. To contribute a test please refer to this guide and this discussion.
show_xyzs
[source][test]show_xyzs
(**`xs`**, **`ys`**, **`zs`**, **`max_len`**:`int`=***`70`***)\n",
"\n",
"No tests found for show_xyzs
. To contribute a test please refer to this guide and this discussion.
class
OpenFileProcessor
[source][test]OpenFileProcessor
(**`ds`**:`Collection`\\[`T_co`\\]=***`None`***) :: [`PreProcessor`](/data_block.html#PreProcessor)\n",
"\n",
"No tests found for OpenFileProcessor
. To contribute a test please refer to this guide and this discussion.
open_text
[source][test]open_text
(**`fn`**:`PathOrStr`, **`enc`**=***`'utf-8'`***)\n",
"\n",
"No tests found for open_text
. To contribute a test please refer to this guide and this discussion.
class
TokenizeProcessor
[source][test]TokenizeProcessor
(**`ds`**:[`ItemList`](/data_block.html#ItemList)=***`None`***, **`tokenizer`**:[`Tokenizer`](/text.transform.html#Tokenizer)=***`None`***, **`chunksize`**:`int`=***`10000`***, **`mark_fields`**:`bool`=***`False`***, **`include_bos`**:`bool`=***`True`***, **`include_eos`**:`bool`=***`False`***) :: [`PreProcessor`](/data_block.html#PreProcessor)\n",
"\n",
"No tests found for TokenizeProcessor
. To contribute a test please refer to this guide and this discussion.
class
NumericalizeProcessor
[source][test]NumericalizeProcessor
(**`ds`**:[`ItemList`](/data_block.html#ItemList)=***`None`***, **`vocab`**:[`Vocab`](/text.transform.html#Vocab)=***`None`***, **`max_vocab`**:`int`=***`60000`***, **`min_freq`**:`int`=***`3`***) :: [`PreProcessor`](/data_block.html#PreProcessor)\n",
"\n",
"No tests found for NumericalizeProcessor
. To contribute a test please refer to this guide and this discussion.
class
SPProcessor
[source][test]SPProcessor
(**`ds`**:[`ItemList`](/data_block.html#ItemList)=***`None`***, **`pre_rules`**:`ListRules`=***`None`***, **`post_rules`**:`ListRules`=***`None`***, **`vocab_sz`**:`int`=***`None`***, **`max_vocab_sz`**:`int`=***`30000`***, **`model_type`**:`str`=***`'unigram'`***, **`max_sentence_len`**:`int`=***`20480`***, **`lang`**=***`'en'`***, **`char_coverage`**=***`None`***, **`tmp_dir`**=***`'tmp'`***, **`mark_fields`**:`bool`=***`False`***, **`include_bos`**:`bool`=***`True`***, **`include_eos`**:`bool`=***`False`***, **`sp_model`**=***`None`***, **`sp_vocab`**=***`None`***) :: [`PreProcessor`](/data_block.html#PreProcessor)\n",
"\n",
"No tests found for SPProcessor
. To contribute a test please refer to this guide and this discussion.
\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "10 | \n", "11 | \n", "12 | \n", "13 | \n", "14 | \n", "15 | \n", "16 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "crew | \n", "that | \n", "he | \n", "can | \n", "trust | \n", "to | \n", "help | \n", "him | \n", "pull | \n", "it | \n", "off | \n", "and | \n", "get | \n", "his | \n", "xxunk | \n", "None | \n", "None | \n", "
1 | \n", "want | \n", "a | \n", "good | \n", "family | \n", "movie | \n", ", | \n", "this | \n", "might | \n", "do | \n", ". | \n", "xxmaj | \n", "it | \n", "is | \n", "clean | \n", ". | \n", "None | \n", "None | \n", "
2 | \n", "director | \n", "of | \n", "many | \n", "bad | \n", "xxunk | \n", ") | \n", "tries | \n", "to | \n", "cover | \n", "the | \n", "info | \n", "up | \n", ", | \n", "but | \n", "goo | \n", "None | \n", "None | \n", "
3 | \n", "film | \n", ", | \n", "and | \n", "the | \n", "xxunk | \n", "xxunk | \n", "of | \n", "the | \n", "villain | \n", ", | \n", "humorous | \n", "or | \n", "not | \n", ", | \n", "are | \n", "None | \n", "None | \n", "
4 | \n", "cole | \n", "in | \n", "the | \n", "beginning | \n", "are | \n", "meant | \n", "to | \n", "draw | \n", "comparisons | \n", "which | \n", "leave | \n", "the | \n", "audience | \n", "xxunk | \n", ". | \n", "None | \n", "None | \n", "
5 | \n", "witness | \n", "xxmaj | \n", "brian | \n", "dealing | \n", "with | \n", "his | \n", "situation | \n", "through | \n", "first | \n", ", | \n", "primitive | \n", "means | \n", ", | \n", "and | \n", "then | \n", "None | \n", "None | \n", "
6 | \n", "film | \n", ", | \n", "or | \n", "not | \n", ". | \n", "\\n | \n", "\\n | \n", "\n", " | xxmaj | \n", "this | \n", "film | \n", ". | \n", "xxmaj | \n", "film | \n", "? | \n", "xxmaj | \n", "this | \n", "
7 | \n", "xxunk | \n", "sitting | \n", "through | \n", "this | \n", "bomb | \n", ". | \n", "xxmaj | \n", "the | \n", "crew | \n", "member | \n", "who | \n", "was | \n", "in | \n", "charge | \n", "of | \n", "None | \n", "None | \n", "
8 | \n", "this | \n", "film | \n", "is | \n", "viewed | \n", "as | \n", "non | \n", "xxup | \n", "xxunk | \n", "but | \n", "there | \n", "is | \n", "a | \n", "speech | \n", "by | \n", "xxmaj | \n", "None | \n", "None | \n", "
9 | \n", "mention | \n", "the | \n", "pace | \n", "of | \n", "the | \n", "movie | \n", ". | \n", "xxmaj | \n", "to | \n", "my | \n", "mind | \n", ", | \n", "this | \n", "new | \n", "version | \n", "None | \n", "None | \n", "
10 | \n", "of | \n", "yours | \n", "! | \n", "' | \n", "\\n | \n", "\\n | \n", "\n", " | xxmaj | \n", "director | \n", "xxmaj | \n", "xxunk | \n", "xxmaj | \n", "xxunk | \n", ", | \n", "who | \n", "is | \n", "xxunk | \n", "
11 | \n", "pair | \n", ", | \n", "xxmaj | \n", "harry | \n", "xxmaj | \n", "michell | \n", "as | \n", "xxmaj | \n", "harry | \n", ", | \n", "xxmaj | \n", "rosie | \n", "xxmaj | \n", "michell | \n", "as | \n", "None | \n", "None | \n", "
12 | \n", "cares | \n", "who | \n", "lives | \n", "and | \n", "who | \n", "dies | \n", ", | \n", "i | \n", "'ll | \n", "be | \n", "shocked | \n", ". | \n", "xxmaj | \n", "the | \n", "same | \n", "None | \n", "None | \n", "
13 | \n", "is | \n", "incredibly | \n", "stupid | \n", ", | \n", "with | \n", "a | \n", "detective | \n", "trying | \n", "to | \n", "track | \n", "down | \n", "a | \n", "suspected | \n", "serial | \n", "killer | \n", "None | \n", "None | \n", "
14 | \n", "independent | \n", "film | \n", "was | \n", "one | \n", "of | \n", "the | \n", "best | \n", "films | \n", "at | \n", "the | \n", "tall | \n", "grass | \n", "film | \n", "festival | \n", "that | \n", "None | \n", "None | \n", "
class
LanguageModelPreLoader
[source][test]LanguageModelPreLoader
(**`dataset`**:[`LabelList`](/data_block.html#LabelList), **`lengths`**:`Collection`\\[`int`\\]=***`None`***, **`bs`**:`int`=***`32`***, **`bptt`**:`int`=***`70`***, **`backwards`**:`bool`=***`False`***, **`shuffle`**:`bool`=***`False`***) :: [`Callback`](/callback.html#Callback)\n",
"\n",
"No tests found for LanguageModelPreLoader
. To contribute a test please refer to this guide and this discussion.
class
SortSampler
[source][test]SortSampler
(**`data_source`**:`NPArrayList`, **`key`**:`KeyFunc`) :: [`Sampler`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Sampler)\n",
"\n",
"class
SortishSampler
[source][test]SortishSampler
(**`data_source`**:`NPArrayList`, **`key`**:`KeyFunc`, **`bs`**:`int`) :: [`Sampler`](https://pytorch.org/docs/stable/data.html#torch.utils.data.Sampler)\n",
"\n",
"pad_collate
[source][test]pad_collate
(**`samples`**:`BatchSamples`, **`pad_idx`**:`int`=***`1`***, **`pad_first`**:`bool`=***`True`***, **`backwards`**:`bool`=***`False`***) → `Tuple`\\[`LongTensor`, `LongTensor`\\]\n",
"\n",
"No tests found for pad_collate
. To contribute a test please refer to this guide and this discussion.
new
[source][test]new
(**`items`**:`Iterator`\\[`T_co`\\], **`processor`**:`Union`\\[[`PreProcessor`](/data_block.html#PreProcessor), `Collection`\\[[`PreProcessor`](/data_block.html#PreProcessor)\\]\\]=***`None`***, **\\*\\*`kwargs`**) → `ItemList`\n",
"\n",
"No tests found for new
. To contribute a test please refer to this guide and this discussion.
get
[source][test]get
(**`i`**)\n",
"\n",
"No tests found for get
. To contribute a test please refer to this guide and this discussion.
process_one
[source][test]process_one
(**`item`**)\n",
"\n",
"No tests found for process_one
. To contribute a test please refer to this guide and this discussion.
process
[source][test]process
(**`ds`**)\n",
"\n",
"No tests found for process
. To contribute a test please refer to this guide and this discussion.
process_one
[source][test]process_one
(**`item`**)\n",
"\n",
"No tests found for process_one
. To contribute a test please refer to this guide and this discussion.
process
[source][test]process
(**`ds`**)\n",
"\n",
"No tests found for process
. To contribute a test please refer to this guide and this discussion.
process_one
[source][test]process_one
(**`item`**)\n",
"\n",
"No tests found for process_one
. To contribute a test please refer to this guide and this discussion.
reconstruct
[source][test]reconstruct
(**`t`**:`Tensor`)\n",
"\n",
"No tests found for reconstruct
. To contribute a test please refer to this guide and this discussion.
on_epoch_begin
[source][test]on_epoch_begin
(**\\*\\*`kwargs`**)\n",
"\n",
"No tests found for on_epoch_begin
. To contribute a test please refer to this guide and this discussion.
on_epoch_end
[source][test]on_epoch_end
(**\\*\\*`kwargs`**)\n",
"\n",
"No tests found for on_epoch_end
. To contribute a test please refer to this guide and this discussion.
class
LMLabelList
[source][test]LMLabelList
(**`items`**:`Iterator`\\[`T_co`\\], **\\*\\*`kwargs`**) :: [`EmptyLabelList`](/data_block.html#EmptyLabelList)\n",
"\n",
"No tests found for LMLabelList
. To contribute a test please refer to this guide and this discussion.
allocate_buffers
[source][test]allocate_buffers
()\n",
"\n",
"No tests found for allocate_buffers
. To contribute a test please refer to this guide and this discussion.
shuffle
[source][test]shuffle
()\n",
"\n",
"No tests found for shuffle
. To contribute a test please refer to this guide and this discussion.
fill_row
[source][test]fill_row
(**`forward`**, **`items`**, **`idx`**, **`row`**, **`ro`**, **`ri`**, **`overlap`**, **`lengths`**)\n",
"\n",
"No tests found for fill_row
. To contribute a test please refer to this guide and this discussion.