{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tabular data handling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This module defines the main class to handle tabular data in the fastai library: [`TabularDataset`](/tabular.data.html#TabularDataset). As always, there is also a helper function to quickly get your data.\n", "\n", "To allow you to easily create a [`Learner`](/basic_train.html#Learner) for your data, it provides [`tabular_learner`](/tabular.data.html#tabular_learner)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [], "source": [ "from fastai.gen_doc.nbdoc import *\n", "from fastai.tabular import * \n", "from fastai import *" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide_input": true }, "outputs": [ { "data": { "text/markdown": [ "
class
TabularDataBunch
[source]TabularDataBunch
(`train_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `valid_dl`:[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), `test_dl`:`Optional`\\[[`DataLoader`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader)\\]=`None`, `device`:[`device`](https://pytorch.org/docs/stable/tensor_attributes.html#torch-device)=`None`, `tfms`:`Optional`\\[`Collection`\\[`Callable`\\]\\]=`None`, `path`:`PathOrStr`=`'.'`, `collate_fn`:`Callable`=`'data_collate'`) :: [`DataBunch`](/basic_data.html#DataBunch)"
],
"text/plain": [
"\n", " | age | \n", "workclass | \n", "fnlwgt | \n", "education | \n", "education-num | \n", "marital-status | \n", "occupation | \n", "relationship | \n", "race | \n", "sex | \n", "capital-gain | \n", "capital-loss | \n", "hours-per-week | \n", "native-country | \n", ">=50k | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "49 | \n", "Private | \n", "101320 | \n", "Assoc-acdm | \n", "12.0 | \n", "Married-civ-spouse | \n", "NaN | \n", "Wife | \n", "White | \n", "Female | \n", "0 | \n", "1902 | \n", "40 | \n", "United-States | \n", "1 | \n", "
1 | \n", "44 | \n", "Private | \n", "236746 | \n", "Masters | \n", "14.0 | \n", "Divorced | \n", "Exec-managerial | \n", "Not-in-family | \n", "White | \n", "Male | \n", "10520 | \n", "0 | \n", "45 | \n", "United-States | \n", "1 | \n", "
2 | \n", "38 | \n", "Private | \n", "96185 | \n", "HS-grad | \n", "NaN | \n", "Divorced | \n", "NaN | \n", "Unmarried | \n", "Black | \n", "Female | \n", "0 | \n", "0 | \n", "32 | \n", "United-States | \n", "0 | \n", "
3 | \n", "38 | \n", "Self-emp-inc | \n", "112847 | \n", "Prof-school | \n", "15.0 | \n", "Married-civ-spouse | \n", "Prof-specialty | \n", "Husband | \n", "Asian-Pac-Islander | \n", "Male | \n", "0 | \n", "0 | \n", "40 | \n", "United-States | \n", "1 | \n", "
4 | \n", "42 | \n", "Self-emp-not-inc | \n", "82297 | \n", "7th-8th | \n", "NaN | \n", "Married-civ-spouse | \n", "Other-service | \n", "Wife | \n", "Black | \n", "Female | \n", "0 | \n", "0 | \n", "50 | \n", "United-States | \n", "0 | \n", "
from_df
[source]from_df
(`path`, `df`:`DataFrame`, `dep_var`:`str`, `valid_idx`:`Collection`\\[`int`\\], `procs`:`Optional`\\[`Collection`\\[[`TabularProc`](/tabular.transform.html#TabularProc)\\]\\]=`None`, `cat_names`:`OptStrList`=`None`, `cont_names`:`OptStrList`=`None`, `classes`:`Collection`=`None`, `kwargs`) → [`DataBunch`](/basic_data.html#DataBunch)"
],
"text/plain": [
"tabular_learner
[source]tabular_learner
(`data`:[`DataBunch`](/basic_data.html#DataBunch), `layers`:`Collection`\\[`int`\\], `emb_szs`:`Dict`\\[`str`, `int`\\]=`None`, `metrics`=`None`, `ps`:`Collection`\\[`float`\\]=`None`, `emb_drop`:`float`=`0.0`, `y_range`:`OptRange`=`None`, `use_bn`:`bool`=`True`, `kwargs`)\n",
"\n",
"Get a [`Learner`](/basic_train.html#Learner) using `data`, with `metrics`, including a [`TabularModel`](/tabular.models.html#TabularModel) created using the remaining params. "
],
"text/plain": [
"class
TabularList
[source]TabularList
(`items`:`Iterator`, `cat_names`:`OptStrList`=`None`, `cont_names`:`OptStrList`=`None`, `procs`=`None`, `kwargs`) → `TabularList` :: [`ItemList`](/data_block.html#ItemList)"
],
"text/plain": [
"from_df
[source]from_df
(`df`:`DataFrame`, `cat_names`:`OptStrList`=`None`, `cont_names`:`OptStrList`=`None`, `procs`=`None`, `kwargs`) → `ItemList`\n",
"\n",
"Get the list of inputs in the `col` of `path/csv_name`. "
],
"text/plain": [
"get_emb_szs
[source]get_emb_szs
(`sz_dict`)\n",
"\n",
"Return the default embedding sizes suitable for this data or takes the ones in `sz_dict`. "
],
"text/plain": [
"show_xys
[source]show_xys
(`xs`, `ys`)\n",
"\n",
"Show the `xs` and `ys`. "
],
"text/plain": [
"show_xyzs
[source]show_xyzs
(`xs`, `ys`, `zs`)\n",
"\n",
"Show `xs` (inputs), `ys` (targets) and `zs` (predictions). "
],
"text/plain": [
"class
TabularLine
[source]TabularLine
(`cats`, `conts`, `classes`, `names`) :: [`ItemBase`](/core.html#ItemBase)"
],
"text/plain": [
"class
TabularProcessor
[source]TabularProcessor
(`ds`:[`ItemBase`](/core.html#ItemBase)=`None`, `procs`=`None`) :: [`PreProcessor`](/data_block.html#PreProcessor)"
],
"text/plain": [
"process_one
[source]process_one
(`item`)"
],
"text/plain": [
"new
[source]new
(`items`:`Iterator`, `kwargs`) → `TabularList`"
],
"text/plain": [
"get
[source]get
(`o`)"
],
"text/plain": [
"process
[source]process
(`ds`)"
],
"text/plain": [
"