{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Tabular data handling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This module defines the main class to handle tabular data in the fastai library: [`TabularDataBunch`](/tabular.data.html#TabularDataBunch). As always, there is also a helper function to quickly get your data.\n", "\n", "To allow you to easily create a [`Learner`](/basic_train.html#Learner) for your data, it provides [`tabular_learner`](/tabular.data.html#tabular_learner)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [], "source": [ "from fastai.gen_doc.nbdoc import *\n", "from fastai.tabular import * \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "
class TabularDataBunch[source]TabularDataBunch(**`train_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`valid_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), **`fix_dl`**:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)=***`None`***, **`test_dl`**:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=***`None`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`path`**:`PathOrStr`=***`'.'`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`no_check`**:`bool`=***`False`***) :: [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"Create a [`DataBunch`](/basic_data.html#DataBunch) suitable for tabular data. "
],
"text/plain": [
"| \n", " | age | \n", "workclass | \n", "fnlwgt | \n", "education | \n", "education-num | \n", "marital-status | \n", "occupation | \n", "relationship | \n", "race | \n", "sex | \n", "capital-gain | \n", "capital-loss | \n", "hours-per-week | \n", "native-country | \n", "salary | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "49 | \n", "Private | \n", "101320 | \n", "Assoc-acdm | \n", "12.0 | \n", "Married-civ-spouse | \n", "NaN | \n", "Wife | \n", "White | \n", "Female | \n", "0 | \n", "1902 | \n", "40 | \n", "United-States | \n", ">=50k | \n", "
| 1 | \n", "44 | \n", "Private | \n", "236746 | \n", "Masters | \n", "14.0 | \n", "Divorced | \n", "Exec-managerial | \n", "Not-in-family | \n", "White | \n", "Male | \n", "10520 | \n", "0 | \n", "45 | \n", "United-States | \n", ">=50k | \n", "
| 2 | \n", "38 | \n", "Private | \n", "96185 | \n", "HS-grad | \n", "NaN | \n", "Divorced | \n", "NaN | \n", "Unmarried | \n", "Black | \n", "Female | \n", "0 | \n", "0 | \n", "32 | \n", "United-States | \n", "<50k | \n", "
| 3 | \n", "38 | \n", "Self-emp-inc | \n", "112847 | \n", "Prof-school | \n", "15.0 | \n", "Married-civ-spouse | \n", "Prof-specialty | \n", "Husband | \n", "Asian-Pac-Islander | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", ">=50k | \n", "
| 4 | \n", "42 | \n", "Self-emp-not-inc | \n", "82297 | \n", "7th-8th | \n", "NaN | \n", "Married-civ-spouse | \n", "Other-service | \n", "Wife | \n", "Black | \n", "Female | \n", "0 | \n", "0 | \n", "50 | \n", "United-States | \n", "<50k | \n", "
from_df[source]from_df(**`path`**, **`df`**:`DataFrame`, **`dep_var`**:`str`, **`valid_idx`**:`Collection`\\[`int`\\], **`procs`**:`Optional`\\[`Collection`\\[[`TabularProc`](/tabular.transform.html#TabularProc)\\]\\]=***`None`***, **`cat_names`**:`OptStrList`=***`None`***, **`cont_names`**:`OptStrList`=***`None`***, **`classes`**:`Collection`\\[`T_co`\\]=***`None`***, **`test_df`**=***`None`***, **`bs`**:`int`=***`64`***, **`val_bs`**:`int`=***`None`***, **`num_workers`**:`int`=***`4`***, **`dl_tfms`**:`Optional`\\[`Collection`\\[`Callable`\\]\\]=***`None`***, **`device`**:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=***`None`***, **`collate_fn`**:`Callable`=***`'data_collate'`***, **`no_check`**:`bool`=***`False`***) → [`DataBunch`](/basic_data.html#DataBunch)\n",
"\n",
"Create a [`DataBunch`](/basic_data.html#DataBunch) from `df` and `valid_idx` with `dep_var`. `kwargs` are passed to [`DataBunch.create`](/basic_data.html#DataBunch.create). "
],
"text/plain": [
"tabular_learner[source]tabular_learner(**`data`**:[`DataBunch`](/basic_data.html#DataBunch), **`layers`**:`Collection`\\[`int`\\], **`emb_szs`**:`Dict`\\[`str`, `int`\\]=***`None`***, **`metrics`**=***`None`***, **`ps`**:`Collection`\\[`float`\\]=***`None`***, **`emb_drop`**:`float`=***`0.0`***, **`y_range`**:`OptRange`=***`None`***, **`use_bn`**:`bool`=***`True`***, **\\*\\*`learn_kwargs`**)\n",
"\n",
"Get a [`Learner`](/basic_train.html#Learner) using `data`, with `metrics`, including a [`TabularModel`](/tabular.models.html#TabularModel) created using the remaining params. "
],
"text/plain": [
"class TabularList[source]TabularList(**`items`**:`Iterator`\\[`T_co`\\], **`cat_names`**:`OptStrList`=***`None`***, **`cont_names`**:`OptStrList`=***`None`***, **`procs`**=***`None`***, **\\*\\*`kwargs`**) → `TabularList` :: [`ItemList`](/data_block.html#ItemList)\n",
"\n",
"Basic [`ItemList`](/data_block.html#ItemList) for tabular data. "
],
"text/plain": [
"from_df[source]from_df(**`df`**:`DataFrame`, **`cat_names`**:`OptStrList`=***`None`***, **`cont_names`**:`OptStrList`=***`None`***, **`procs`**=***`None`***, **\\*\\*`kwargs`**) → `ItemList`\n",
"\n",
"Get the list of inputs in the `col` of `path/csv_name`. "
],
"text/plain": [
"get_emb_szs[source]get_emb_szs(**`sz_dict`**=***`None`***)\n",
"\n",
"Return the default embedding sizes suitable for this data or takes the ones in `sz_dict`. "
],
"text/plain": [
"show_xys[source]show_xys(**`xs`**, **`ys`**)\n",
"\n",
"Show the `xs` (inputs) and `ys` (targets). "
],
"text/plain": [
"show_xyzs[source]show_xyzs(**`xs`**, **`ys`**, **`zs`**)\n",
"\n",
"Show `xs` (inputs), `ys` (targets) and `zs` (predictions). "
],
"text/plain": [
"class TabularLine[source]TabularLine(**`cats`**, **`conts`**, **`classes`**, **`names`**) :: [`ItemBase`](/core.html#ItemBase)"
],
"text/plain": [
"class TabularProcessor[source]TabularProcessor(**`ds`**:[`ItemBase`](/core.html#ItemBase)=***`None`***, **`procs`**=***`None`***) :: [`PreProcessor`](/data_block.html#PreProcessor)\n",
"\n",
"Regroup the `procs` in one [`PreProcessor`](/data_block.html#PreProcessor). "
],
"text/plain": [
"process_one[source]process_one(**`item`**)"
],
"text/plain": [
"new[source]new(**`items`**:`Iterator`\\[`T_co`\\], **`processor`**:`Union`\\[[`PreProcessor`](/data_block.html#PreProcessor), `Collection`\\[[`PreProcessor`](/data_block.html#PreProcessor)\\]\\]=***`None`***, **\\*\\*`kwargs`**) → `ItemList`\n",
"\n",
"Create a new [`ItemList`](/data_block.html#ItemList) from `items`, keeping the same attributes. "
],
"text/plain": [
"get[source]get(**`o`**)\n",
"\n",
"Subclass if you want to customize how to create item `i` from `self.items`. "
],
"text/plain": [
"process[source]process(**`ds`**)"
],
"text/plain": [
"reconstruct[source]reconstruct(**`t`**:`Tensor`)\n",
"\n",
"Reconstruct one of the underlying item for its data `t`. "
],
"text/plain": [
"