{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "$$\n", "\\newcommand{\\mat}[1]{\\boldsymbol {#1}}\n", "\\newcommand{\\mattr}[1]{\\boldsymbol {#1}^\\top}\n", "\\newcommand{\\matinv}[1]{\\boldsymbol {#1}^{-1}}\n", "\\newcommand{\\vec}[1]{\\boldsymbol {#1}}\n", "\\newcommand{\\vectr}[1]{\\boldsymbol {#1}^\\top}\n", "\\newcommand{\\rvar}[1]{\\mathrm {#1}}\n", "\\newcommand{\\rvec}[1]{\\boldsymbol{\\mathrm{#1}}}\n", "\\newcommand{\\diag}{\\mathop{\\mathrm {diag}}}\n", "\\newcommand{\\set}[1]{\\mathbb {#1}}\n", "\\newcommand{\\cset}[1]{\\mathcal{#1}}\n", "\\newcommand{\\norm}[1]{\\left\\lVert#1\\right\\rVert}\n", "\\newcommand{\\pderiv}[2]{\\frac{\\partial #1}{\\partial #2}}\n", "\\newcommand{\\bb}[1]{\\boldsymbol{#1}}\n", "$$\n", "\n", "# CS236781: Deep Learning\n", "# Tutorial 7: Transfer Learning and Domain Adaptation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Introduction\n", "\n", "In this tutorial, we will cover:\n", "\n", "- Transfer learning contexts\n", "- Leveraging pre-trained models for supervised domain adaptation\n", "- Unsupervised domain adaptation using adversarial training" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "# Setup\n", "%matplotlib inline\n", "import os\n", "import sys\n", "import torch\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "plt.rcParams['font.size'] = 20\n", "data_dir = './' #os.path.expanduser('~/.pytorch-datasets')\n", "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Transfer learning" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### The supervised learning context" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "We have a labeled dataset of $N$ labelled samples: $\\left\\{ (\\vec{x}^i,y^i) \\right\\}_{i=1}^N$, where\n", "- $\\vec{x}^i = \\left(x^i_1, \\dots, x^i_D\\right) \\in \\mathcal{X}$ is a **sample** or **feature vector**.\n", "- $y^i \\in \\mathcal{Y}$ is the **label**.\n", "- For classification with $C$ classes, $\\mathcal{Y} = \\{0,\\dots,C-1\\}$, so each $y^i$ is a **class label**.\n", "- Usually we assume each labeled sample $(\\vec{x}^i,y^i)$\n", " is drawn from a joint distribution\n", " $$P_{(\\rvec{X}, \\rvar{Y})}=P_{\\rvec{X}}\\cdot P_{\\rvar{Y}|\\rvec{X}}$$\n", " - We assume some marginal sample distribution $P_{\\rvar{X}}$ exists.\n", " - We want to estimate properties of $P_{\\rvar{Y}|\\rvec{X}}$ from the data." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "So far, we considered mostly the traditional **supervised learning** setting:\n", "\n", "We assumed the **train** and **test** (which is supposed to represent future unseen data)\n", "sets are both sampled from the same **distribution** $P_{(\\rvec{X}, \\rvar{Y})}$ and both labeled.\n", "\n", "We assume this since we wanted to solve one task with one dataset, and we could\n", "therefore split our dataset into such sets.\n", "\n", "What happens when this is not the case?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "In the real world, we often don't have the perfect training set for our problem.\n", "I.e. we may not be able to sample i.i.d. from the underlying distribution.\n", "\n", "What should we do when the supervised learning assumption is invalid?\n", "\n", "