{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# \"pytorch-widedeep, deep learning for tabular data I: data preprocessing, model components and basic use\"\n",
    "> a flexible package to combine tabular data with text and images using wide and deep models.\n",
    "\n",
    "- author: Javier Rodriguez\n",
    "- toc: true \n",
    "- badges: true\n",
    "- comments: true"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the first of a series of posts introducing [pytorch-widedeep](https://github.com/jrzaurin/pytorch-widedeep), which is intended to be a flexible package to use Deep Learning (hereafter DL) with tabular data and combine it with text and images via wide and deep models. `pytorch-widedeep` is partially based on Heng-Tze Cheng et al., 2016 [paper](https://arxiv.org/abs/1606.07792) [1].\n",
    "\n",
    "in this post I describe the data preprocessing functionalities of the library, the main components of the model, and the basic use of the library. In a separate post I will show a more advance use of `pytorch-widedeep`.\n",
    "\n",
    "Before I move any further I just want to emphasize that there are a number of libraries that implement functionalities to use DL on tabular data. To cite a few, the ubiquitous and fantastic [FastAI](https://docs.fast.ai/tutorial.tabular.html) (and their tabular api), NVIDIA's  [NVTabular](https://github.com/NVIDIA/NVTabular), the powerful [pytorch-tabnet](https://github.com/dreamquark-ai/tabnet) based on work of Sercan O. Arik and Tomas Pfisterfrom [2], which is starting to take victories in Kaggle competitions, and perhaps my favourite [AutoGluon Tabular](https://arxiv.org/abs/2003.06505) [3].\n",
    "\n",
    "It is not my intention to \"compete\" against these libraries. `pytorch-widedeep` started as an attempt to package and automate an algorithm I had to use a couple of times at work and ended up becoming the entertaining process that is building a library. Needless to say that if you wanted to apply DL to tabular data you should go and check all the libraries I mentioned before (as well as this one 🙂. You can find the source code [here]((https://github.com/jrzaurin/pytorch-widedeep))). \n",
    "\n",
    "## 1. Installation \n",
    "\n",
    "To install the package simply use pip:\n",
    "\n",
    "```bash\n",
    "pip install pytorch-widedeep\n",
    "```\n",
    "\n",
    "or directly from github\n",
    "\n",
    "```bash\n",
    "pip install git+https://github.com/jrzaurin/pytorch-widedeep.git\n",
    "```\n",
    "\n",
    "**Important note for Mac Users**\n",
    "\n",
    "Note that the following comments are not directly related to the package, but to the interplay between `pytorch` and `OSX` (more precisely `pytorch`'s dependency on `OpenMP` I believe) and in general parallel processing in Mac. \n",
    "\n",
    "In the first place, at the time of writing the latest `pytorch` version is `1.7`. This version is known to have some [issues](https://stackoverflow.com/questions/64772335/pytorch-w-parallelnative-cpp206) when running on Mac and the data-loaders might not run in parallel. \n",
    "\n",
    "On the other hand, since `Python 3.8` the `multiprocessing` library start method changed from ['fork' to 'spawn'](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods). This also affects the data-loaders (for any torch version) and they will not run in parallel. \n",
    "\n",
    "Therefore, for Mac users I suggest using `python 3.7` and `torch <= 1.6` (with its corresponding `torchvision` version, i.e. `<= 0.7.0`). I could have enforced this versioning via the `setup.py` file. However, there are a number of unknowns and I preferred to leave it as it is. For example I developed the package using *macOS Catalina* and maybe some of this issues are not present in the new release *Big Sur*. Also, I hope that they release soon a patch for `pytorch 1.7` and some, if not all these problems disappear. \n",
    "\n",
    "Installing `pytorch-widedeep` via `pip` will install the latest version. Therefore, if these problems are present and the dataloaders do not run in parallel, one can easily downgrade manually: \n",
    "\n",
    "```bash\n",
    "pip install torch==1.6.0 torchvision==0.7.0\n",
    "```\n",
    "\n",
    "*None of these issues affect Linux users*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. `pytorch-widedeep`  architectures\n",
    "\n",
    "In general terms, `pytorch-widedeep` is a package to use deep learning with tabular data. In particular, is intended to facilitate the combination of text and images with corresponding tabular data using wide and deep models. With that in mind there are a number of architectures that can be implemented with just a few lines of code. The main components of those architectures are shown in the Figure below:\n",
    "\n",
    "![](figures/pytorch-widedeep/widedeep_arch.png)\n",
    "\n",
    "The dashed boxes in the figure represent optional, overall components, and the dashed lines/arrows indicate the corresponding connections, depending on whether or not certain components are present. For example, the dashed, blue-arrows indicate that the `deeptabular`, `deeptext` and `deepimage` components are connected directly to the output neuron or neurons (depending on whether we are performing a binary classification or regression, or a multi-class classification) if the optional `deephead` is not present. Finally, the components within the faded-pink rectangle are concatenated.\n",
    "\n",
    "Note that it is not possible to illustrate the number of architectures and components available in ``pytorch-widedeep`` in one Figure. This is why I wrote before \"overall components\", because within the components represented by the boxes, there are a number of options as well. Therefore, for more details on possible architectures (and more) please, see the [documentation](https://pytorch-widedeep.readthedocs.io/en/latest/index.html), or the Examples folders and the notebooks in the [repo](https://github.com/jrzaurin/pytorch-widedeep).\n",
    "\n",
    "In math terms, and following the notation in the [paper](https://arxiv.org/abs/1606.07792), the expression for the architecture without a ``deephead`` component can be formulated as:\n",
    "\n",
    "$$\n",
    "preds = \\sigma(W^{T}_{wide}[x, \\phi(x)] + W^{T}_{deeptabular}a^{(l_f)}_{dense} + W^{T}_{deeptext}a^{(l_f)}_{text} + W^{T}_{deepimage}a^{(l_f)}_{image} + b) \n",
    "$$\n",
    "\n",
    "\n",
    "Where $W$ are the weight matrices applied to the wide model and to the final activations of the deep models, $a$ are these final activations, and $\\phi(x)$ are the cross product transformations of the original features $x$. In case you are wondering what are \"*cross product transformations*\", here is a quote taken directly from the paper: \"*For binary features, a cross-product transformation (e.g., “AND(gender=female, language=en)”) is 1 if and only if the constituent features (“gender=female” and “language=en”) are all 1, and 0 otherwise*\".\n",
    "\n",
    "\n",
    "While if there is a ``deephead`` component, the previous expression turns into:\n",
    "\n",
    "$$\n",
    "preds = \\sigma(W^{T}_{wide}[x, \\phi(x)] + W^{T}_{deephead}a^{(l_f)}_{deephead} + b)\n",
    "$$\n",
    "\n",
    "It is important to emphasize that **each individual component, `wide`, `deeptabular`, `deeptext` and `deepimage`, can be used independently** and in isolation. For example, one could use only `wide`, which is in simply a linear\n",
    "model. In fact, one of the most interesting offerings in ``pytorch-widedeep`` is the ``deeptabular`` component, and I intend to write a dedicated post focused on that component alone. \n",
    "\n",
    "Finally, while I recommend using the ``wide`` and ``deeptabular`` models in ``pytorch-widedeep`` it is very likely that users will want to use their own models for the ``deeptext`` and ``deepimage`` components. That is perfectly\n",
    "possible as long as the the custom models have an attribute called ``output_dim`` with the size of the last layer of activations, so that ``WideDeep`` can be constructed. Again, examples on how to use custom components can be found in the Examples folder in the repo. Just in case ``pytorch-widedeep`` includes standard text (stack of LSTMs) and image\n",
    "(pre-trained ResNets or stack of CNNs) models.\n",
    "\n",
    "\n",
    "## 3. Quick start (TL;DR)\n",
    "\n",
    "Maybe I should have started with this section, but I thought that knowing at least the architectures one can build with `pytorch-widedeep` was \"kind-off\" necessary. In any case and before diving into the details of the library, let's just say that you just want to quickly run one example and get the feel of how `pytorch-widedeep` works. Let's do so using the [adult census dataset](http://archive.ics.uci.edu/ml/datasets/Adult). \n",
    "\n",
    "In this example we will be fitting a model comprised by two components: `wide` and `deeptabular`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "#hide\n",
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\", category=DeprecationWarning)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#collapse-hide\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.metrics import accuracy_score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#collapse-hide\n",
    "adult = pd.read_csv(\"data/adult/adult.csv.zip\")\n",
    "adult.columns = [c.replace(\"-\", \"_\") for c in adult.columns]\n",
    "adult[\"income_label\"] = (adult[\"income\"].apply(lambda x: \">50K\" in x)).astype(int)\n",
    "adult.drop(\"income\", axis=1, inplace=True)\n",
    "\n",
    "for c in adult.columns:\n",
    "    if adult[c].dtype == 'O':\n",
    "        adult[c] = adult[c].apply(lambda x: \"unknown\" if x == \"?\" else x)\n",
    "        adult[c] = adult[c].str.lower()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>workclass</th>\n",
       "      <th>fnlwgt</th>\n",
       "      <th>education</th>\n",
       "      <th>educational_num</th>\n",
       "      <th>marital_status</th>\n",
       "      <th>occupation</th>\n",
       "      <th>relationship</th>\n",
       "      <th>race</th>\n",
       "      <th>gender</th>\n",
       "      <th>capital_gain</th>\n",
       "      <th>capital_loss</th>\n",
       "      <th>hours_per_week</th>\n",
       "      <th>native_country</th>\n",
       "      <th>income_label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>25</td>\n",
       "      <td>private</td>\n",
       "      <td>226802</td>\n",
       "      <td>11th</td>\n",
       "      <td>7</td>\n",
       "      <td>never-married</td>\n",
       "      <td>machine-op-inspct</td>\n",
       "      <td>own-child</td>\n",
       "      <td>black</td>\n",
       "      <td>male</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>united-states</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>38</td>\n",
       "      <td>private</td>\n",
       "      <td>89814</td>\n",
       "      <td>hs-grad</td>\n",
       "      <td>9</td>\n",
       "      <td>married-civ-spouse</td>\n",
       "      <td>farming-fishing</td>\n",
       "      <td>husband</td>\n",
       "      <td>white</td>\n",
       "      <td>male</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>50</td>\n",
       "      <td>united-states</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>28</td>\n",
       "      <td>local-gov</td>\n",
       "      <td>336951</td>\n",
       "      <td>assoc-acdm</td>\n",
       "      <td>12</td>\n",
       "      <td>married-civ-spouse</td>\n",
       "      <td>protective-serv</td>\n",
       "      <td>husband</td>\n",
       "      <td>white</td>\n",
       "      <td>male</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>united-states</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>44</td>\n",
       "      <td>private</td>\n",
       "      <td>160323</td>\n",
       "      <td>some-college</td>\n",
       "      <td>10</td>\n",
       "      <td>married-civ-spouse</td>\n",
       "      <td>machine-op-inspct</td>\n",
       "      <td>husband</td>\n",
       "      <td>black</td>\n",
       "      <td>male</td>\n",
       "      <td>7688</td>\n",
       "      <td>0</td>\n",
       "      <td>40</td>\n",
       "      <td>united-states</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>18</td>\n",
       "      <td>unknown</td>\n",
       "      <td>103497</td>\n",
       "      <td>some-college</td>\n",
       "      <td>10</td>\n",
       "      <td>never-married</td>\n",
       "      <td>unknown</td>\n",
       "      <td>own-child</td>\n",
       "      <td>white</td>\n",
       "      <td>female</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>30</td>\n",
       "      <td>united-states</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  workclass  fnlwgt     education  educational_num      marital_status  \\\n",
       "0   25    private  226802          11th                7       never-married   \n",
       "1   38    private   89814       hs-grad                9  married-civ-spouse   \n",
       "2   28  local-gov  336951    assoc-acdm               12  married-civ-spouse   \n",
       "3   44    private  160323  some-college               10  married-civ-spouse   \n",
       "4   18    unknown  103497  some-college               10       never-married   \n",
       "\n",
       "          occupation relationship   race  gender  capital_gain  capital_loss  \\\n",
       "0  machine-op-inspct    own-child  black    male             0             0   \n",
       "1    farming-fishing      husband  white    male             0             0   \n",
       "2    protective-serv      husband  white    male             0             0   \n",
       "3  machine-op-inspct      husband  black    male          7688             0   \n",
       "4            unknown    own-child  white  female             0             0   \n",
       "\n",
       "   hours_per_week native_country  income_label  \n",
       "0              40  united-states             0  \n",
       "1              50  united-states             0  \n",
       "2              40  united-states             1  \n",
       "3              40  united-states             1  \n",
       "4              30  united-states             0  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "adult_train, adult_test = train_test_split(adult, test_size=0.2, stratify=adult.income_label)\n",
    "\n",
    "adult.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The following lines below is all you need"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "epoch 1: 100%|██████████| 153/153 [00:03<00:00, 43.06it/s, loss=0.428, metrics={'acc': 0.802}] \n",
      "epoch 2: 100%|██████████| 153/153 [00:03<00:00, 44.41it/s, loss=0.389, metrics={'acc': 0.8217}]\n",
      "predict: 100%|██████████| 39/39 [00:00<00:00, 149.41it/s]\n"
     ]
    }
   ],
   "source": [
    "from pytorch_widedeep import Trainer\n",
    "from pytorch_widedeep.preprocessing import WidePreprocessor, TabPreprocessor\n",
    "from pytorch_widedeep.models import Wide, TabMlp, WideDeep\n",
    "from pytorch_widedeep.metrics import Accuracy\n",
    "\n",
    "# define wide, crossed, embedding and continuous columns, and target\n",
    "wide_cols = [\"education\", \"relationship\", \"workclass\", \"occupation\", \"native_country\", \"gender\"]\n",
    "cross_cols = [(\"education\", \"occupation\"), (\"native_country\", \"occupation\")]\n",
    "embed_cols = [(\"education\", 32), (\"workclass\", 32), (\"occupation\", 32), (\"native_country\", 32)]\n",
    "cont_cols = [\"age\", \"hours_per_week\"]\n",
    "target = adult_train[\"income_label\"].values\n",
    "\n",
    "# prepare wide component\n",
    "wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)\n",
    "X_wide = wide_preprocessor.fit_transform(adult_train)\n",
    "wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)\n",
    "\n",
    "# prepare deeptabular component\n",
    "tab_preprocessor = TabPreprocessor(embed_cols=embed_cols, continuous_cols=cont_cols)\n",
    "X_tab = tab_preprocessor.fit_transform(adult_train)\n",
    "deeptabular = TabMlp(\n",
    "    mlp_hidden_dims=[200, 100],\n",
    "    column_idx=tab_preprocessor.column_idx,\n",
    "    embed_input=tab_preprocessor.embeddings_input, \n",
    "    continuous_cols=cont_cols,\n",
    ")\n",
    "                   \n",
    "# build, compile and fit\n",
    "model = WideDeep(wide=wide, deeptabular=deeptabular)\n",
    "\n",
    "# Train\n",
    "trainer = Trainer(model, objective=\"binary\", metrics=[(Accuracy)])\n",
    "trainer.fit(X_wide=X_wide, X_tab=X_tab, target=target, n_epochs=2, batch_size=256) \n",
    "\n",
    "# predict\n",
    "X_wide_te = wide_preprocessor.transform(adult_test)\n",
    "X_tab_te = tab_preprocessor.transform(adult_test)\n",
    "preds = trainer.predict(X_wide=X_wide_te, X_tab=X_tab_te)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Preprocessors\n",
    "\n",
    "As you can see in Section 3, and as with any ML algorithm, the data need to be prepared/preprocessed before going through the model. This is handled by the `pytorch-widedeep` preprocessors. There is one preprocessor per `WideDeep` model component:\n",
    "\n",
    "```\n",
    "WidePreprocessor\n",
    "TabPreprocessor\n",
    "TextPreprocessor\n",
    "ImagePreprocessor\n",
    "```\n",
    "\n",
    "\"Behind the scenes\", these preprocessors use a series of helper functions and classes that are in the `utils` module. Initially I did not intend to \"expose\" them to the user, but I believe they can be useful for all sorts of preprocessing tasks, even if they are not related to `pytorch-widedeep`, so I made them available. The `utils` tools are:\n",
    "\n",
    "```\n",
    "deep_utils.LabelEncoder\n",
    "text_utils.simple_preprocess\n",
    "text_utils.get_texts\n",
    "text_utils.pad_sequences\n",
    "text_utils.build_embeddings_matrix\n",
    "fastai_transforms.Tokenizer\n",
    "fastai_transforms.Vocab\n",
    "image_utils.SimplePreprocessor\n",
    "image_utils.AspectAwarePreprocessor\n",
    "```\n",
    "\n",
    "They are accessible directly from `utils`, e.g.:\n",
    "\n",
    "```python\n",
    "from pytorch_widedeep.utils import LabelEncoder\n",
    "```\n",
    "\n",
    "Note that here I will be concentrating directly on the preprocessors. If you want more details on the `utils` tools, have a look to the [source code](https://github.com/jrzaurin/pytorch-widedeep/tree/master/pytorch_widedeep/utils) or read the [documentation](https://pytorch-widedeep.readthedocs.io/en/latest/index.html).\n",
    "\n",
    "### 4.1. `WidePreprocessor`\n",
    "\n",
    "The Wide component of the model is a linear model that in principle, could be implemented as a linear layer receiving the result of on one-hot encoded categorical columns. However, this is not memory efficient (at all). Therefore, we implement a liner layer as an Embedding layer plus a bias. I will explain it in a bit more detail later. For now, just know that `WidePreprocessor` simply encodes the categories numerically so that they are the indexes of the lookup table that is an Embedding layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/javier/.pyenv/versions/3.7.9/envs/wdposts/lib/python3.7/site-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.\n",
      "  and should_run_async(code)\n"
     ]
    }
   ],
   "source": [
    "#hide\n",
    "warnings.filterwarnings(\"ignore\", category=DeprecationWarning)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pytorch_widedeep.preprocessing import WidePreprocessor\n",
    "\n",
    "wide_cols = ['education', 'relationship','workclass','occupation','native_country','gender']\n",
    "crossed_cols = [('education', 'occupation'), ('native_country', 'occupation')]\n",
    "\n",
    "wide_preprocessor = WidePreprocessor(wide_cols=wide_cols, crossed_cols=crossed_cols)\n",
    "X_wide = wide_preprocessor.fit_transform(adult)\n",
    "# From here on, any new observation can be prepared by simply running `.transform`\n",
    "# new_X_wide = wide_preprocessor.transform(new_df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[  1,  17,  23, ...,  89,  91, 316],\n",
       "       [  2,  18,  23, ...,  89,  92, 317],\n",
       "       [  3,  18,  24, ...,  89,  93, 318],\n",
       "       ...,\n",
       "       [  2,  20,  23, ...,  90, 103, 323],\n",
       "       [  2,  17,  23, ...,  89, 103, 323],\n",
       "       [  2,  21,  29, ...,  90, 115, 324]])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_wide"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([  1,  17,  23,  32,  47,  89,  91, 316])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_wide[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the label encoding starts from 1. This is because it is convenient to leave 0 for padding, i.e. unknown categories. Let's take from example the first entry"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>education</th>\n",
       "      <th>relationship</th>\n",
       "      <th>workclass</th>\n",
       "      <th>occupation</th>\n",
       "      <th>native_country</th>\n",
       "      <th>gender</th>\n",
       "      <th>education_occupation</th>\n",
       "      <th>native_country_occupation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>11th</td>\n",
       "      <td>own-child</td>\n",
       "      <td>private</td>\n",
       "      <td>machine-op-inspct</td>\n",
       "      <td>united-states</td>\n",
       "      <td>male</td>\n",
       "      <td>11th-machine-op-inspct</td>\n",
       "      <td>united-states-machine-op-inspct</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  education relationship workclass         occupation native_country gender  \\\n",
       "0      11th    own-child   private  machine-op-inspct  united-states   male   \n",
       "\n",
       "     education_occupation        native_country_occupation  \n",
       "0  11th-machine-op-inspct  united-states-machine-op-inspct  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "wide_preprocessor.inverse_transform(X_wide[:1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we can see, `wide_preprocessor` numerically encodes the `wide_cols` and the `crossed_cols`, which can be recovered using the method `inverse_transform`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2 `TabPreprocessor`\n",
    "\n",
    "Simply, `TabPreprocessor` label-encodes the categorical columns and normalizes the numerical ones (unless otherwise specified)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pytorch_widedeep.preprocessing import TabPreprocessor\n",
    "\n",
    "# cat_embed_cols = [(column_name, embed_dim), ...]\n",
    "cat_embed_cols = [('education',10), ('relationship',8), ('workclass',10), ('occupation',10),('native_country',10)]\n",
    "continuous_cols = [\"age\",\"hours_per_week\"]\n",
    "\n",
    "tab_preprocessor = TabPreprocessor(embed_cols=cat_embed_cols, continuous_cols=continuous_cols)\n",
    "X_tab = tab_preprocessor.fit_transform(adult)\n",
    "# From here on, any new observation can be prepared by simply running `.transform`\n",
    "# new_X_deep = deep_preprocessor.transform(new_df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 1.          1.          1.          1.          1.         -0.99512893\n",
      "  -0.03408696]\n",
      " [ 2.          2.          1.          2.          1.         -0.04694151\n",
      "   0.77292975]\n",
      " [ 3.          2.          2.          3.          1.         -0.77631645\n",
      "  -0.03408696]\n",
      " [ 4.          2.          1.          1.          1.          0.39068346\n",
      "  -0.03408696]\n",
      " [ 4.          1.          3.          4.          1.         -1.50569139\n",
      "  -0.84110367]]\n"
     ]
    }
   ],
   "source": [
    "print(X_tab[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the label encoding starts from 1. This is because it is convenient to leave 0 for padding, i.e. unknown categories. Let's take from example the first entry\n",
    "\n",
    "Behind the scenes, `TabPreprocessor` uses [LabelEncoder](https://pytorch-widedeep.readthedocs.io/en/latest/utils/dense_utils.html), simply a custom numerical encoder for categorical features, available via\n",
    "\n",
    "```python\n",
    "from pytorch_widedeep.utils import LabelEncoder\n",
    "```\n",
    "\n",
    "### 4.3. `TextPreprocessor`\n",
    "\n",
    "This preprocessor returns the tokenized, padded sequences that will be directly \"fed\" to the `deeptext` component.\n",
    "\n",
    "To illustrate the text and image preprocessors I will use a small sample of the Airbnb listing dataset, which you can get [here](http://insideairbnb.com/get-the-data.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "airbnb=pd.read_csv(\"data/airbnb/airbnb_sample.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"My bright double bedroom with a large window has a relaxed feeling! It comfortably fits one or two and is centrally located just two blocks from Finsbury Park. Enjoy great restaurants in the area and easy access to easy transport tubes, trains and buses. Babies and children of all ages are welcome. Hello Everyone, I'm offering my lovely double bedroom in Finsbury Park area (zone 2) for let in a shared apartment.  You will share the apartment with me and it is fully furnished with a self catering kitchen. Two people can easily sleep well as the room has a queen size bed. I also have a travel cot for a baby for guest with small children.  I will require a deposit up front as a security gesture on both our parts and will be given back to you when you return the keys.  I trust anyone who will be responding to this add would treat my home with care and respect .  Best Wishes  Alina Guest will have access to the self catering kitchen and bathroom. There is the flat is equipped wifi internet,\""
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "texts = airbnb.description.tolist()\n",
    "texts[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The vocabulary contains 2192 tokens\n"
     ]
    }
   ],
   "source": [
    "from pytorch_widedeep.preprocessing import TextPreprocessor\n",
    "\n",
    "text_preprocessor = TextPreprocessor(text_col='description')\n",
    "X_text = text_preprocessor.fit_transform(airbnb)\n",
    "# From here on, any new observation can be prepared by simply running `.transform`\n",
    "# new_X_text = text_preprocessor.transform(new_df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[  29   48   37  367  818   17  910   17  177   15  122  349   53  879\n",
      " 1174  126  393   40  911    0   23  228   71  819    9   53   55 1380\n",
      "  225   11   18  308   18 1564   10  755    0  942  239   53   55    0\n",
      "   11   36 1013  277 1974   70   62   15 1475    9  943    5  251    5\n",
      "    0    5    0    5  177   53   37   75   11   10  294  726   32    9\n",
      "   42    5   25   12   10   22   12  136  100  145]\n"
     ]
    }
   ],
   "source": [
    "print(X_text[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`TextPreprocessor` uses the utilities within the [text_utils](https://pytorch-widedeep.readthedocs.io/en/latest/utils/text_utils.html) and the [fastai_transforms](https://pytorch-widedeep.readthedocs.io/en/latest/utils/fastai_transforms.html) modules. Again, all the utilities within those modules are are directly accessible from `utils`, e.g.:\n",
    "\n",
    "```python\n",
    "from pytorch_widedeep.utils import simple_preprocess, pad_sequences, build_embeddings_matrix, Tokenizer, Vocab\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.4 `ImagePreprocessor`\n",
    "\n",
    "Finally, `ImagePreprocessor` simply resizes the images, being aware of the aspect ratio. By default they will be resized to `(224, 224, ...)`. This is because the default `deepdense` component of the model is a pre-trained `ResNet` model, which requires inputs of height and width of 224.\n",
    "\n",
    "Let's have a look"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Reading Images from data/airbnb/property_picture/\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  4%|▍         | 41/1001 [00:00<00:02, 396.72it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Resizing\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1001/1001 [00:02<00:00, 354.70it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Computing normalisation metrics\n"
     ]
    }
   ],
   "source": [
    "from pytorch_widedeep.preprocessing import ImagePreprocessor\n",
    "\n",
    "image_preprocessor = ImagePreprocessor(img_col='id', img_path=\"data/airbnb/property_picture/\")\n",
    "X_images = image_preprocessor.fit_transform(airbnb)\n",
    "# From here on, any new observation can be prepared by simply running `.transform`\n",
    "# new_X_images = image_preprocessor.transform(new_df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(224, 224, 3)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_images[0].shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`ImagePreprocessor` uses two helpers: [`SimplePreprocessor` and `AspectAwarePreprocessor`](https://pytorch-widedeep.readthedocs.io/en/latest/utils/image_utils.html), available from the `utils` module, e.g.: \n",
    "\n",
    "```python\n",
    "from pytorch_widedeep.utils import SimplePreprocessor, AspectAwarePreprocessor\n",
    "```\n",
    "\n",
    "These two classes are directly taken from Adrian Rosebrock's fantastic book \"Deep Learning for Computer Vision\". Therefore, all credit to Adrian."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Model Components"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now have a look to the components that can be used to build a wide and deep model. The 5 main components of `WideDeep` are:\n",
    "\n",
    "```\n",
    "wide\n",
    "deeptabular\n",
    "deeptext\n",
    "deepimage\n",
    "deephead\n",
    "```\n",
    "\n",
    "The first 4 will be collected and combined by the `WideDeep` class, while the 5th one can be optionally added to the `WideDeep` model through its corresponding parameters: `deephead` or alternatively `head_layers`, `head_dropout` and `head_batchnorm`.\n",
    "\n",
    "\n",
    "### 5.1. `wide`\n",
    "\n",
    "The wide component is a Linear layer \"plugged\" into the output neuron(s)\n",
    "\n",
    "The only particularity of our implementation is that we have implemented the linear layer via an Embedding layer plus a bias. While the implementations are equivalent, the latter is faster and far more memory efficient, since we do not need to one hot encode the categorical features.\n",
    "\n",
    "Let's have a look:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "from torch import nn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>color</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>r</td>\n",
       "      <td>s</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>b</td>\n",
       "      <td>n</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>g</td>\n",
       "      <td>l</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  color size\n",
       "0     r    s\n",
       "1     b    n\n",
       "2     g    l"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({'color': ['r', 'b', 'g'], 'size': ['s', 'n', 'l']})\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "one hot encoded, the first observation (`color: r, size: s`) would be"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs_0_oh = (np.array([1., 0., 0., 1., 0., 0.])).astype('float32')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "if we simply numerically encode (or label encode) the values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "obs_0_le = (np.array([0, 3])).astype('int64')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that in the implementation of the package we start from 1, saving 0 for padding, i.e. unseen values.\n",
    "\n",
    "Now, let's see if the two implementations are equivalent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# we have 6 different values. Let's assume we are performing a regression, so pred_dim = 1\n",
    "lin = nn.Linear(6, 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "emb = nn.Embedding(6, 1) \n",
    "emb.weight = nn.Parameter(lin.weight.reshape_as(emb.weight))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([0.0656], grad_fn=<AddBackward0>)"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lin(torch.tensor(obs_0_oh))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([0.0656], grad_fn=<AddBackward0>)"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "emb(torch.tensor(obs_0_le)).sum() + lin.bias"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And this is precisely how the linear component `Wide` is implemented"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Wide(\n",
       "  (wide_linear): Embedding(11, 1, padding_idx=0)\n",
       ")"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pytorch_widedeep.models import Wide\n",
    "wide = Wide(wide_dim=10, pred_dim=1)\n",
    "wide"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Again, let me emphasize that even though the input dim is 10, the `Embedding` layer has 11 weights. This is because we save 0 for padding, which is used for unseen values during the encoding process"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.2. `deeptabular`\n",
    "\n",
    "There are 3 alternatives for the so called `deepdense` component of the model: `TabMlp` and `TabResnet` and the `TabTransformer`:\n",
    "\n",
    "\n",
    "1. ``TabMlp``: this is almost identical to the [tabular model](https://docs.fast.ai/tutorial.tabular.html) in the fantastic [fastai](https://docs.fast.ai/) library, and consists simply in embeddings representing the categorical features, concatenated with the continuous features, and passed then through a MLP.\n",
    "\n",
    "2. ``TabRenset``: This is similar to the previous model but the embeddings are passed through a series of ResNet blocks built with dense layers.\n",
    "\n",
    "3. ``TabTransformer``: Details on the TabTransformer can be found in: [TabTransformer: Tabular Data Modeling Using Contextual Embeddings](https://arxiv.org/pdf/2012.06678.pdf)\n",
    "\n",
    "\n",
    "For details on these 3 models and their options please see the examples in the [Examples folder](https://github.com/jrzaurin/pytorch-widedeep/tree/master/examples) and the [documentation](https://pytorch-widedeep.readthedocs.io/en/latest/). \n",
    "\n",
    "Through the development of the package, the `deeptabular` component became one of the core values of the package. The possibilities are numerous, and therefore, I will further describe this component in detail in a separate post. \n",
    "\n",
    "For now let's have a quick look:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's have a look first to `TabMlp`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TabMlp(\n",
       "  (embed_layers): ModuleDict(\n",
       "    (emb_layer_a): Embedding(5, 8, padding_idx=0)\n",
       "    (emb_layer_b): Embedding(5, 8, padding_idx=0)\n",
       "    (emb_layer_c): Embedding(5, 8, padding_idx=0)\n",
       "    (emb_layer_d): Embedding(5, 8, padding_idx=0)\n",
       "  )\n",
       "  (embedding_dropout): Dropout(p=0.1, inplace=False)\n",
       "  (tab_mlp): MLP(\n",
       "    (mlp): Sequential(\n",
       "      (dense_layer_0): Sequential(\n",
       "        (0): BatchNorm1d(33, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (1): Dropout(p=0.5, inplace=False)\n",
       "        (2): Linear(in_features=33, out_features=16, bias=False)\n",
       "        (3): LeakyReLU(negative_slope=0.01, inplace=True)\n",
       "      )\n",
       "      (dense_layer_1): Sequential(\n",
       "        (0): Linear(in_features=16, out_features=8, bias=True)\n",
       "        (1): LeakyReLU(negative_slope=0.01, inplace=True)\n",
       "      )\n",
       "    )\n",
       "  )\n",
       ")"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pytorch_widedeep.models import TabMlp\n",
    "\n",
    "# fake dataset\n",
    "X_tab = torch.cat((torch.empty(5, 4).random_(4), torch.rand(5, 1)), axis=1)\n",
    "colnames = ['a', 'b', 'c', 'd', 'e']\n",
    "embed_input = [(u,i,j) for u,i,j in zip(colnames[:4], [4]*4, [8]*4)]\n",
    "column_idx = {k:v for v,k in enumerate(colnames)}\n",
    "continuous_cols = ['e']\n",
    "\n",
    "# my advice would be to not use dropout in the last layer, but I add the option because you never \n",
    "# know..there is crazy people everywhere.\n",
    "tabmlp = TabMlp(\n",
    "    mlp_hidden_dims=[16,8], \n",
    "    mlp_dropout=[0.5, 0.], \n",
    "    mlp_batchnorm=True, \n",
    "    mlp_activation=\"leaky_relu\",\n",
    "    column_idx=column_idx,\n",
    "    embed_input=embed_input, \n",
    "    continuous_cols=continuous_cols)\n",
    "tabmlp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[-2.0658e-03,  5.0888e-01,  2.1883e-01, -3.1523e-03, -3.2836e-03,\n",
       "          8.3450e-02, -3.4315e-03, -8.6029e-04],\n",
       "        [-2.8116e-03,  2.1922e-01,  5.0364e-01, -1.3522e-03, -9.8741e-04,\n",
       "         -1.2356e-03, -1.4323e-03,  2.7542e-03],\n",
       "        [ 1.1020e-01,  4.0867e-01,  4.3776e-01,  3.1146e-03,  2.7392e-01,\n",
       "         -1.2640e-02,  1.2793e-02,  5.7851e-01],\n",
       "        [-4.4498e-03,  2.0174e-01,  1.1082e+00,  2.3353e-01, -1.9922e-05,\n",
       "         -4.9581e-03,  6.1367e-01,  9.4608e-01],\n",
       "        [-5.7167e-03,  2.7813e-01,  7.8706e-01, -3.6171e-03,  1.5563e-01,\n",
       "         -1.1303e-02, -7.6483e-04,  5.0236e-01]], grad_fn=<LeakyReluBackward1>)"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tabmlp(X_tab)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now have a look to `TabResnet`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TabResnet(\n",
       "  (embed_layers): ModuleDict(\n",
       "    (emb_layer_a): Embedding(5, 8, padding_idx=0)\n",
       "    (emb_layer_b): Embedding(5, 8, padding_idx=0)\n",
       "    (emb_layer_c): Embedding(5, 8, padding_idx=0)\n",
       "    (emb_layer_d): Embedding(5, 8, padding_idx=0)\n",
       "  )\n",
       "  (embedding_dropout): Dropout(p=0.1, inplace=False)\n",
       "  (tab_resnet): DenseResnet(\n",
       "    (dense_resnet): Sequential(\n",
       "      (lin1): Linear(in_features=33, out_features=16, bias=True)\n",
       "      (bn1): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "      (block_0): BasicBlock(\n",
       "        (lin1): Linear(in_features=16, out_features=8, bias=True)\n",
       "        (bn1): BatchNorm1d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (leaky_relu): LeakyReLU(negative_slope=0.01, inplace=True)\n",
       "        (dp): Dropout(p=0.1, inplace=False)\n",
       "        (lin2): Linear(in_features=8, out_features=8, bias=True)\n",
       "        (bn2): BatchNorm1d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (resize): Sequential(\n",
       "          (0): Linear(in_features=16, out_features=8, bias=True)\n",
       "          (1): BatchNorm1d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        )\n",
       "      )\n",
       "    )\n",
       "  )\n",
       ")"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pytorch_widedeep.models import TabResnet\n",
    "\n",
    "tabresnet = TabResnet(\n",
    "    blocks_dims=[16, 8],\n",
    "    blocks_dropout=0.1, \n",
    "    column_idx=column_idx,\n",
    "    embed_input=embed_input, \n",
    "    continuous_cols=continuous_cols,\n",
    ")\n",
    "    \n",
    "\n",
    "tabresnet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[-1.7038e-02, -2.2898e-03,  6.7239e-01, -1.1374e-02, -1.4843e-03,\n",
       "         -1.0570e-02,  5.0264e-01, -1.3277e-02],\n",
       "        [ 2.2679e+00, -5.1538e-04, -2.6135e-02, -2.9038e-02, -2.2504e-02,\n",
       "          5.5052e-01,  1.0497e+00,  1.3348e+00],\n",
       "        [ 2.5005e-01,  7.7862e-01,  4.0052e-01,  7.6070e-01,  5.2203e-01,\n",
       "          6.5057e-01, -2.3226e-02, -4.0509e-04],\n",
       "        [-1.3928e-02, -6.9325e-03,  1.6976e-01,  1.3968e+00,  5.9813e-01,\n",
       "         -9.4279e-03, -9.0917e-03,  7.7908e-01],\n",
       "        [ 5.7862e-01,  1.9515e-01,  1.3709e+00,  1.8836e+00,  1.2787e+00,\n",
       "          7.9873e-01,  1.6794e+00, -7.4565e-03]], grad_fn=<LeakyReluBackward1>)"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tabresnet(X_tab)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and finally, the `TabTransformer`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TabTransformer(\n",
       "  (embed_layers): ModuleDict(\n",
       "    (emb_layer_a): Embedding(5, 32, padding_idx=0)\n",
       "    (emb_layer_b): Embedding(5, 32, padding_idx=0)\n",
       "    (emb_layer_c): Embedding(5, 32, padding_idx=0)\n",
       "    (emb_layer_d): Embedding(5, 32, padding_idx=0)\n",
       "  )\n",
       "  (embedding_dropout): Dropout(p=0.1, inplace=False)\n",
       "  (blks): Sequential(\n",
       "    (block0): TransformerEncoder(\n",
       "      (self_attn): MultiHeadedAttention(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (inp_proj): Linear(in_features=32, out_features=96, bias=True)\n",
       "        (out_proj): Linear(in_features=32, out_features=32, bias=True)\n",
       "      )\n",
       "      (feed_forward): PositionwiseFF(\n",
       "        (w_1): Linear(in_features=32, out_features=128, bias=True)\n",
       "        (w_2): Linear(in_features=128, out_features=32, bias=True)\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (activation): GELU()\n",
       "      )\n",
       "      (attn_addnorm): AddNorm(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "      )\n",
       "      (ff_addnorm): AddNorm(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "      )\n",
       "    )\n",
       "    (block1): TransformerEncoder(\n",
       "      (self_attn): MultiHeadedAttention(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (inp_proj): Linear(in_features=32, out_features=96, bias=True)\n",
       "        (out_proj): Linear(in_features=32, out_features=32, bias=True)\n",
       "      )\n",
       "      (feed_forward): PositionwiseFF(\n",
       "        (w_1): Linear(in_features=32, out_features=128, bias=True)\n",
       "        (w_2): Linear(in_features=128, out_features=32, bias=True)\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (activation): GELU()\n",
       "      )\n",
       "      (attn_addnorm): AddNorm(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "      )\n",
       "      (ff_addnorm): AddNorm(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "      )\n",
       "    )\n",
       "    (block2): TransformerEncoder(\n",
       "      (self_attn): MultiHeadedAttention(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (inp_proj): Linear(in_features=32, out_features=96, bias=True)\n",
       "        (out_proj): Linear(in_features=32, out_features=32, bias=True)\n",
       "      )\n",
       "      (feed_forward): PositionwiseFF(\n",
       "        (w_1): Linear(in_features=32, out_features=128, bias=True)\n",
       "        (w_2): Linear(in_features=128, out_features=32, bias=True)\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (activation): GELU()\n",
       "      )\n",
       "      (attn_addnorm): AddNorm(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "      )\n",
       "      (ff_addnorm): AddNorm(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "      )\n",
       "    )\n",
       "    (block3): TransformerEncoder(\n",
       "      (self_attn): MultiHeadedAttention(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (inp_proj): Linear(in_features=32, out_features=96, bias=True)\n",
       "        (out_proj): Linear(in_features=32, out_features=32, bias=True)\n",
       "      )\n",
       "      (feed_forward): PositionwiseFF(\n",
       "        (w_1): Linear(in_features=32, out_features=128, bias=True)\n",
       "        (w_2): Linear(in_features=128, out_features=32, bias=True)\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (activation): GELU()\n",
       "      )\n",
       "      (attn_addnorm): AddNorm(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "      )\n",
       "      (ff_addnorm): AddNorm(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "      )\n",
       "    )\n",
       "    (block4): TransformerEncoder(\n",
       "      (self_attn): MultiHeadedAttention(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (inp_proj): Linear(in_features=32, out_features=96, bias=True)\n",
       "        (out_proj): Linear(in_features=32, out_features=32, bias=True)\n",
       "      )\n",
       "      (feed_forward): PositionwiseFF(\n",
       "        (w_1): Linear(in_features=32, out_features=128, bias=True)\n",
       "        (w_2): Linear(in_features=128, out_features=32, bias=True)\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (activation): GELU()\n",
       "      )\n",
       "      (attn_addnorm): AddNorm(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "      )\n",
       "      (ff_addnorm): AddNorm(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "      )\n",
       "    )\n",
       "    (block5): TransformerEncoder(\n",
       "      (self_attn): MultiHeadedAttention(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (inp_proj): Linear(in_features=32, out_features=96, bias=True)\n",
       "        (out_proj): Linear(in_features=32, out_features=32, bias=True)\n",
       "      )\n",
       "      (feed_forward): PositionwiseFF(\n",
       "        (w_1): Linear(in_features=32, out_features=128, bias=True)\n",
       "        (w_2): Linear(in_features=128, out_features=32, bias=True)\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (activation): GELU()\n",
       "      )\n",
       "      (attn_addnorm): AddNorm(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "      )\n",
       "      (ff_addnorm): AddNorm(\n",
       "        (dropout): Dropout(p=0.1, inplace=False)\n",
       "        (ln): LayerNorm((32,), eps=1e-05, elementwise_affine=True)\n",
       "      )\n",
       "    )\n",
       "  )\n",
       "  (tab_transformer_mlp): MLP(\n",
       "    (mlp): Sequential(\n",
       "      (dense_layer_0): Sequential(\n",
       "        (0): Linear(in_features=129, out_features=516, bias=True)\n",
       "        (1): ReLU(inplace=True)\n",
       "        (2): Dropout(p=0.1, inplace=False)\n",
       "      )\n",
       "      (dense_layer_1): Sequential(\n",
       "        (0): Linear(in_features=516, out_features=258, bias=True)\n",
       "        (1): ReLU(inplace=True)\n",
       "        (2): Dropout(p=0.1, inplace=False)\n",
       "      )\n",
       "    )\n",
       "  )\n",
       ")"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pytorch_widedeep.models import TabTransformer\n",
    "embed_input = [(u,i) for u,i in zip(colnames[:4], [4]*4)]\n",
    "tabtransformer = TabTransformer(\n",
    "    column_idx=column_idx, \n",
    "    embed_input=embed_input, \n",
    "    continuous_cols=continuous_cols\n",
    ")\n",
    "tabtransformer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[0.0000, 0.0000, 0.0000,  ..., 0.0399, 0.2358, 0.3762],\n",
       "        [0.1373, 0.0000, 0.0000,  ..., 0.0550, 0.0000, 0.0000],\n",
       "        [0.0000, 0.0000, 0.0000,  ..., 0.0000, 0.0212, 0.0000],\n",
       "        [0.3322, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.0000],\n",
       "        [0.2914, 0.0000, 0.0000,  ..., 0.0000, 0.0000, 0.6590]],\n",
       "       grad_fn=<MulBackward0>)"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tabtransformer(X_tab)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.3. `deeptext`\n",
    "\n",
    "`pytorch-widedeep` offers one model that can be passed to `WideDeep` as the `deeptext` component, `DeepText`, which is a standard and simple stack of LSTMs on top of word embeddings. You could also add a FC-Head on top of the LSTMs. The word embeddings can be pre-trained. In the future I aim to include some simple pre-trained models so that the combination between text and images is fair.\n",
    "\n",
    "On the other hand, while I recommend using the `wide` and `deeptabular` models within this package when building the corresponding wide and deep model components, it is very likely that the user will want to use custom text and image models. That is perfectly possible. Simply, build them and pass them as the corresponding parameters. Note that the custom models MUST return a last layer of activations (i.e. not the final prediction) so that these activations are collected by `WideDeep` and combined accordingly. In addition, the models MUST also contain an attribute output_dim with the size of these last layers of activations.\n",
    "\n",
    "I will illustrate all of the above more in detail in the second post of these series.\n",
    "\n",
    "Let's have a look to `DeepText`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "#hide\n",
    "warnings.filterwarnings(\"ignore\", category=DeprecationWarning)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from pytorch_widedeep.models import DeepText"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/javier/.pyenv/versions/3.7.9/envs/wdposts/lib/python3.7/site-packages/torch/nn/modules/rnn.py:60: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.1 and num_layers=1\n",
      "  \"num_layers={}\".format(dropout, num_layers))\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "DeepText(\n",
       "  (word_embed): Embedding(4, 4, padding_idx=0)\n",
       "  (rnn): LSTM(4, 4, batch_first=True, dropout=0.1)\n",
       ")"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_text = torch.cat((torch.zeros([5,1]), torch.empty(5, 4).random_(1,4)), axis=1)\n",
    "deeptext = DeepText(vocab_size=4, hidden_dim=4, n_layers=1, padding_idx=0, embed_dim=4)\n",
    "deeptext"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[ 0.1727, -0.0800, -0.2599, -0.1245],\n",
       "        [ 0.1530, -0.2874, -0.2385, -0.1379],\n",
       "        [-0.0747, -0.1666, -0.0124, -0.1875],\n",
       "        [-0.0382, -0.1085, -0.0167, -0.1702],\n",
       "        [-0.0393, -0.0926, -0.0141, -0.1371]], grad_fn=<SelectBackward>)"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "deeptext(X_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You could, if you wanted, add a Fully Connected Head (FC-Head) on top of it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
    "deeptext = DeepText(vocab_size=4, hidden_dim=8, n_layers=3, padding_idx=0, embed_dim=4, \n",
    "                    head_hidden_dims=[8,4], head_batchnorm=True, head_dropout=[0.5, 0.5])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DeepText(\n",
       "  (word_embed): Embedding(4, 4, padding_idx=0)\n",
       "  (rnn): LSTM(4, 8, num_layers=3, batch_first=True, dropout=0.1)\n",
       "  (texthead): MLP(\n",
       "    (mlp): Sequential(\n",
       "      (dense_layer_0): Sequential(\n",
       "        (0): Dropout(p=0.5, inplace=False)\n",
       "        (1): Linear(in_features=8, out_features=4, bias=True)\n",
       "        (2): ReLU(inplace=True)\n",
       "      )\n",
       "    )\n",
       "  )\n",
       ")"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "deeptext"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[0.4726, 0.0555, 0.0000, 0.1431],\n",
       "        [0.4907, 0.1357, 0.0000, 0.2591],\n",
       "        [0.4019, 0.0831, 0.0000, 0.1308],\n",
       "        [0.3942, 0.1759, 0.0000, 0.2517],\n",
       "        [0.3184, 0.0902, 0.0000, 0.1955]], grad_fn=<ReluBackward1>)"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "deeptext(X_text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.4. `deepimage`\n",
    "\n",
    "Similarly to `deeptext`, `pytorch-widedeep` offers one model that can be passed to `WideDeep` as the `deepimage` component, `DeepImage`, which is either a pre-trained ResNet (18, 34, or 50. Default is 18) or a stack of CNNs, to which one can add a FC-Head. If is a pre-trained ResNet, you can chose how many layers you want to defrost deep into the network with the parameter `freeze_n`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "DeepImage(\n",
       "  (backbone): Sequential(\n",
       "    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)\n",
       "    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "    (2): ReLU(inplace=True)\n",
       "    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)\n",
       "    (4): Sequential(\n",
       "      (0): BasicBlock(\n",
       "        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (relu): ReLU(inplace=True)\n",
       "        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "      )\n",
       "      (1): BasicBlock(\n",
       "        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (relu): ReLU(inplace=True)\n",
       "        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "      )\n",
       "    )\n",
       "    (5): Sequential(\n",
       "      (0): BasicBlock(\n",
       "        (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
       "        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (relu): ReLU(inplace=True)\n",
       "        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (downsample): Sequential(\n",
       "          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
       "          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        )\n",
       "      )\n",
       "      (1): BasicBlock(\n",
       "        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (relu): ReLU(inplace=True)\n",
       "        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "      )\n",
       "    )\n",
       "    (6): Sequential(\n",
       "      (0): BasicBlock(\n",
       "        (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
       "        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (relu): ReLU(inplace=True)\n",
       "        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (downsample): Sequential(\n",
       "          (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
       "          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        )\n",
       "      )\n",
       "      (1): BasicBlock(\n",
       "        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (relu): ReLU(inplace=True)\n",
       "        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "      )\n",
       "    )\n",
       "    (7): Sequential(\n",
       "      (0): BasicBlock(\n",
       "        (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n",
       "        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (relu): ReLU(inplace=True)\n",
       "        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (downsample): Sequential(\n",
       "          (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)\n",
       "          (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        )\n",
       "      )\n",
       "      (1): BasicBlock(\n",
       "        (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "        (relu): ReLU(inplace=True)\n",
       "        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
       "        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
       "      )\n",
       "    )\n",
       "    (8): AdaptiveAvgPool2d(output_size=(1, 1))\n",
       "  )\n",
       "  (imagehead): MLP(\n",
       "    (mlp): Sequential(\n",
       "      (dense_layer_0): Sequential(\n",
       "        (0): Dropout(p=0.1, inplace=False)\n",
       "        (1): Linear(in_features=512, out_features=64, bias=True)\n",
       "        (2): LeakyReLU(negative_slope=0.01, inplace=True)\n",
       "      )\n",
       "      (dense_layer_1): Sequential(\n",
       "        (0): Dropout(p=0.1, inplace=False)\n",
       "        (1): Linear(in_features=64, out_features=8, bias=True)\n",
       "        (2): LeakyReLU(negative_slope=0.01, inplace=True)\n",
       "      )\n",
       "    )\n",
       "  )\n",
       ")"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from pytorch_widedeep.models import DeepImage\n",
    "\n",
    "X_img = torch.rand((2,3,224,224))\n",
    "deepimage = DeepImage(head_hidden_dims=[512, 64, 8], head_activation=\"leaky_relu\")\n",
    "\n",
    "deepimage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[ 0.0965,  0.0056,  0.1143, -0.0007,  0.3860, -0.0050, -0.0023, -0.0011],\n",
       "        [ 0.2437, -0.0020, -0.0021,  0.2480,  0.6217, -0.0033, -0.0030,  0.0566]],\n",
       "       grad_fn=<LeakyReluBackward1>)"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "deepimage(X_img)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.5. `deephead`\n",
    "\n",
    "The are two possibilities when defining the so-called `deephead` component.\n",
    "\n",
    "1. When defining the `WideDeep` model there is a parameter called `head_hidden_dims` (and the corresponding related parameters. See the package documentation) that define the FC-head on top of the `deeptabular`, `deeptext` and `deepimage` components.\n",
    "\n",
    "2. Of course, you could also chose to define it yourself externally and pass it using the parameter `deephead`. Have a look at the [documentation](https://pytorch-widedeep.readthedocs.io/en/latest/wide_deep.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Conclusion\n",
    "\n",
    "This is the first of a series of posts introducing the python library `pytorch-widedeep`. This library is intended to be a flexible frame to combine tabular data with text and images via wide and deep models. Of course, it can also be used directly on \"traditional\" tabular data, without text and/or images.\n",
    "\n",
    "In this post I have shown how to quickly start using the library (Section 3) and explained the utilities available in the `preprocessing` module (Section 4) and and model component definitions (Section 5), available in the `models` module. \n",
    "\n",
    "In the next post I will show more advance uses that hopefully will illustrate `pytorch-widedeep`'s flexibility to build wide and deep models. \n",
    "\n",
    "#### References\n",
    "\n",
    "[1] Wide & Deep Learning for Recommender Systems. Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, et al. 2016. \t[arXiv:1606.07792](https://arxiv.org/abs/1606.07792)\n",
    "\n",
    "[2] TabNet: Attentive Interpretable Tabular Learning. Sercan O. Arik, Tomas Pfister, 2020. [arXiv:1908.07442](https://arxiv.org/abs/1908.07442)\n",
    "\n",
    "[3] AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data Nick Erickson, Jonas Mueller, Alexander Shirkov, et al., 2020. [arXiv:2003.06505](https://arxiv.org/abs/2003.06505)\n",
    "\n",
    "[4] Universal Language Model Fine-tuning for Text Classification. Jeremy Howard, Sebastian Ruder, 2018 [arXiv:1801.06146v5](https://arxiv.org/abs/1801.06146)\n",
    "\n",
    "[5] Single Headed Attention RNN: Stop Thinking With Your Head. Stephen Merity, 2019 [arXiv:1801.06146v5](arXiv:1911.11423v2)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}