{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Sveučilište u Zagrebu
\n",
"Fakultet elektrotehnike i računarstva\n",
"\n",
"# Strojno učenje\n",
"\n",
"http://www.fer.unizg.hr/predmet/su\n",
"\n",
"Ak. god. 2015./2016.\n",
"\n",
"# Bilježnica 5: Regresija\n",
"\n",
"(c) 2015 Jan Šnajder\n",
"\n",
"Verzija: 0.3 (2015-11-09)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
}
],
"source": [
"import scipy as sp\n",
"import scipy.stats as stats\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"%pylab inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sadržaj:\n",
"\n",
"* Uvod\n",
"\n",
"* Osnovni pojmovi\n",
"\n",
"* Model, funkcija gubitka i optimizacijski postupak\n",
"\n",
"* Postupak najmanjih kvadrata\n",
"\n",
"* Probabilistička interpretacija regresije\n",
"\n",
"* Poopćeni linearan model regresije\n",
"\n",
"* Odabir modela\n",
"\n",
"* Regularizirana regresija\n",
"\n",
"* Sažetak"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Osnovni pojmovi\n",
"\n",
"* Označen skup podataka: $\\mathcal{D}=\\{(\\mathbf{x}^{(i)},y^{(i)})\\},\\quad \\mathbf{x}\\in\\mathbb{R}^n,\\quad y\\in\\mathbb{R}$\n",
"\n",
"\n",
"* Hipoteza $h$ aproksimira nepoznatu funkciju $f:\\mathbb{R}^n\\to\\mathbb{R}$\n",
"\n",
"\n",
"* Idealno, $y^{(i)}=f(\\mathbf{x}^{(i)})$, ali zbog šuma: $$y^{(i)}=f(\\mathbf{x}^{(i)})+\\varepsilon$$\n",
"\n",
"\n",
"* $\\mathbf{x}$ - **ulazna varijabla** (nezavisna, prediktorska)\n",
"\n",
"\n",
"* $y$ - **izlazna varijabla** (zavisna, kriterijska)\n",
"\n",
"\n",
"### Vrste regresije\n",
"\n",
"* Broj **ulaznih** (nezavisnih) varijabli:\n",
" * Univarijatna (jednostavna, jednostruka) regresija: $n=1$\n",
" * Multivarijatna (višestruka, multipla) regresija: $n>1$\n",
"\n",
"\n",
"* Broj **izlaznih** (zavisnih) varijabli:\n",
" * Jednoizlazna regresija: $f(\\mathbf{x}) = y$\n",
" * Višeizlazna regresija: $f(\\mathbf{x})=\\mathbf{y}$\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model, funkcija gubitka i optimizacijski postupak\n",
"\n",
"\n",
"### (1) Model\n",
"\n",
"* **Linearan model regresije**: $h$ je linearna funkcija parametara\n",
"$\\mathbf{w} = (w_0,\\dots,w_n)$\n",
"\n",
"\n",
"* Linearna regresija:\n",
" $$h(\\mathbf{x}|\\mathbf{w}) = w_0 + w_1 x_1 + w_2 x_2 + \\dots + w_n x_n$$\n",
"\n",
"\n",
"* Polinomijalna regresija:\n",
" * Univarijatna polinomijalna: $$h(x|\\mathbf{w}) = w_0 + w_1 x + w_2 x^2 + \\dots + w_d x^d\\quad (n=1)$$\n",
" * Multivarijatna polinomijalna: $$h(\\mathbf{x}|\\mathbf{w}) = w_0 + w_1 x_1 + w_2 x_2 + w_3 x_1 x_2 + w_4 x_1^2 + w_5 x_2^2\\quad (n=2, d=2)$$\n",
" * Modelira međuovisnost značajki (*cross-terms* $x_1 x_2, \\dots$) \n",
"\n",
"\n",
"* Općenite **bazne funkcije**:\n",
" $$h(\\mathbf{x}|\\mathbf{w}) = w_0 + w_1\\phi_1(\\mathbf{x}) + \\dots + w_m\\phi_m(\\mathbf{x})$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### (2) Funkcija gubitka (funkcija pogreške)\n",
"\n",
"* Kvadratni gubitak (engl. *quadratic loss*)\n",
"\n",
"$$\n",
"L(y^{(i)},h(\\mathbf{x}^{(i)})) = \\big(y^{(i)}-h(\\mathbf{x}^{(i)})\\big)^2\n",
"$$\n",
"\n",
"* Funkcija pogreške (proporcionalna s empirijskim očekivanjem gubitka):\n",
"$$\n",
"E(h|\\mathcal{D})=\\frac{1}{2}\n",
"\\sum_{i=1}^N\\big(y^{(i)}-h(\\mathbf{x}^{(i)})\\big)^2\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### (3) Optimizacijski postupak\n",
"\n",
"* Postupak **najmanjih kvadrata** (engl. *least squares*)\n",
"\n",
"$$\n",
"\\mathrm{argmin}_{\\mathbf{w}} E(\\mathbf{w}|\\mathcal{D})\n",
"$$\n",
"\n",
"\n",
"* Rješenje ovog optimizacijskog problema postoji u **zatvorenoj formi**\n",
"\n",
"\n",
"# Postupak najmanjih kvadrata\n",
"\n",
"\n",
"* Razmotrimo najprije linearnu regresiju:\n",
"$$h(\\mathbf{x}|\\mathbf{w}) = w_0 + w_1 x_1 + w_2 x_2 + \\dots + w_n x_n = \\sum_{i=1}^n w_i x_i + w_0$$\n",
"\n",
"\n",
"* Izračun je jednostavniji ako pređemo u matrični račun\n",
" * Svaki vektor primjera $\\mathbf{x}^{(i)}$ proširujemo *dummy* značajkom $x^{(i)}_0 = 1$, pa je model onda:\n",
"\n",
"$$h(\\mathbf{x}|\\mathbf{w}) = \\mathbf{w}^\\intercal \\mathbf{x}$$\n",
"\n",
"\n",
"* Skup primjera:\n",
"\n",
"$$\n",
"\\mathbf{X} = \n",
"\\begin{pmatrix}\n",
"1 & x^{(1)}_1 & x^{(1)}_2 \\dots & x^{(1)}_n\\\\\n",
"1 & x^{(2)}_1 & x^{(2)}_2 \\dots & x^{(2)}_n\\\\\n",
"\\vdots\\\\\n",
"1 & x^{(N)}_1 & x^{(N)}_2 \\dots & x^{(N)}_n\\\\\n",
"\\end{pmatrix}_{N\\times (n+1)}\n",
"=\n",
"\\begin{pmatrix}\n",
"1 & (\\mathbf{x}^{(1)})^\\intercal \\\\\n",
"1 & (\\mathbf{x}^{(2)})^\\intercal \\\\\n",
"\\vdots\\\\\n",
"1 & (\\mathbf{x}^{(N)})^\\intercal \\\\\n",
"1 & \\end{pmatrix}_{N\\times (n+1)}\n",
"$$\n",
"* Matricu primjera $\\mathbf{X}$ zovemo **dizajn-matrica**\n",
"\n",
"\n",
"* Vektor izlaznih vrijednosti:\n",
"$$\n",
"\\mathbf{y} = \n",
"\\begin{pmatrix}\n",
"y^{(1)}\\\\\n",
"y^{(2)}\\\\\n",
"\\vdots\\\\\n",
"y^{(N)}\\\\\n",
"\\end{pmatrix}_{N\\times 1}\n",
"$$\n",
"\n",
"### Egzaktno rješenje\n",
"\n",
"* Idealno, tražimo egzaktno rješenje, tj. rješenje za koje vrijedi\n",
"$$\n",
"(\\mathbf{x}^{(i)}, y^{(i)})\\in\\mathcal{D}.\\ h(\\mathbf{x}^{(i)}) = y^{(i)}\n",
"$$\n",
"odnosno\n",
"$$\n",
"(\\mathbf{x}^{(i)}, y^{(i)})\\in\\mathcal{D}.\\ \\mathbf{w}^\\intercal \\mathbf{x} = y^{(i)}\n",
"$$\n",
"\n",
"\n",
"* Možemo napisati kao matričnu jednadžbu ($N$ jednadžbi s $(n+1)$ nepoznanica):\n",
"\n",
"$$\n",
"\\mathbf{X}\\mathbf{w} = \\mathbf{y}\n",
"$$\n",
"\n",
"$$\n",
"\\begin{pmatrix}\n",
"1 & x^{(1)}_1 & x^{(1)}_2 \\dots & x^{(1)}_n\\\\\n",
"1 & x^{(2)}_1 & x^{(2)}_2 \\dots & x^{(2)}_n\\\\\n",
"\\vdots\\\\\n",
"1 & x^{(N)}_1 & x^{(N)}_2 \\dots & x^{(N)}_n\\\\\n",
"\\end{pmatrix}\n",
"\\cdot\n",
"\\begin{pmatrix}\n",
"w_0\\\\\n",
"w_1\\\\\n",
"\\vdots\\\\\n",
"w_n\\\\\n",
"\\end{pmatrix}\n",
"=\n",
"\\begin{pmatrix}\n",
"y^{(1)}\\\\\n",
"y^{(2)}\\\\\n",
"\\vdots\\\\\n",
"y^{(N)}\\\\\n",
"\\end{pmatrix}\n",
"$$\n",
"\n",
"* Egzaktno rješenje ovog sustava jednadžbi je\n",
"\n",
"$$\n",
"\\mathbf{w} = \\mathbf{X}^{-1}\\mathbf{y}\n",
"$$\n",
"\n",
"Međutim, rješenja nema ili ono nije jedinstveno ako:\n",
"\n",
"* (1) $\\mathbf{X}$ nije kvadratna, pa nema inverz. U pravilu:\n",
" * $N>(n+1)$
\n",
" $\\Rightarrow$ sustav je **preodređen** (engl. *overdetermined*) i nema rješenja\n",
" * $N<(n+1)$
\n",
" $\\Rightarrow$ sustav je **pododređen** (engl. *underdetermined*) i ima višestruka rješenja\n",
" \n",
"* (2) $\\boldsymbol{X}$ jest kvadratna (tj. $N=(n+1)$), ali ipak nema inverz (ovisno o rangu matrice)
$\\Rightarrow$ sustav je **nekonzistentan**\n",
"\n",
"\n",
"### Rješenje najmanjih kvadrata\n",
"\n",
"\n",
"* Približno rješenje sustava $\\mathbf{X}\\mathbf{w}=\\mathbf{y}$\n",
"\n",
"\n",
"* Funkcija pogreške: \n",
"$$\n",
"E(\\mathbf{w}|\\mathcal{D})=\\frac{1}{2}\n",
"\\sum_{i=1}^N\\big(\\mathbf{w}^\\intercal\\mathbf{x}^{(i)} - y^{(i)}\\big)^2\n",
"$$\n",
"\n",
"\n",
"* Matrični oblik:\n",
"\\begin{align*}\n",
"E(\\mathbf{w}|\\mathcal{D}) \n",
"=& \n",
"\\frac{1}{2} (\\mathbf{X}\\mathbf{w} - \\mathbf{y})^\\intercal (\\mathbf{X}\\mathbf{w} - \\mathbf{y})\\\\\n",
"=&\n",
"\\frac{1}{2}\n",
"(\\mathbf{w}^\\intercal\\mathbf{X}^\\intercal\\mathbf{X}\\mathbf{w} - \\mathbf{w}^\\intercal\\mathbf{X}^\\intercal\\mathbf{y} - \\mathbf{y}^\\intercal\\mathbf{X}\\mathbf{w} + \\mathbf{y}^\\intercal\\mathbf{y})\\\\\n",
"=&\n",
"\\frac{1}{2}\n",
"(\\mathbf{w}^\\intercal\\mathbf{X}^\\intercal\\mathbf{X}\\mathbf{w} - 2\\mathbf{y}^\\intercal\\mathbf{X}\\mathbf{w} + \\mathbf{y}^\\intercal\\mathbf{y})\n",
"\\end{align*}\n",
"\n",
"> Jednakosti linearne algebre:\n",
"> * $(A^\\intercal)^\\intercal = A$\n",
"> * $(AB)^\\intercal = B^\\intercal A^\\intercal$\n",
"\n",
"* Minimizacija pogreške:\n",
"$$\n",
"\\begin{align*}\n",
"\\nabla_{\\mathbf{w}}E &= \n",
"\\frac{1}{2}\\Big(\\mathbf{w}^\\intercal\\big(\\mathbf{X}^\\intercal\\mathbf{X}+(\\mathbf{X}^\\intercal\\mathbf{X})^\\intercal\\big) -\n",
"2\\mathbf{y}^\\intercal\\mathbf{X}\\Big) = \n",
"\\mathbf{X}^\\intercal\\mathbf{X}\\mathbf{w} - \\mathbf{X}^\\intercal\\mathbf{y} = \\mathbf{0}\n",
"\\end{align*}\n",
"$$\n",
"\n",
"\n",
"> Jednakosti linearne algebre:\n",
"> * $\\frac{\\mathrm{d}}{\\mathrm{d}x}x^\\intercal A x=x^\\intercal(A+A^\\intercal)$\n",
"> * $\\frac{\\mathrm{d}}{\\mathrm{d}x}A x=A$\n",
"\n",
"\n",
"* Dobivamo sustav tzv. **normalnih jednadžbi**:\n",
"$$\n",
"\\mathbf{X}^\\intercal\\mathbf{X}\\mathbf{w} = \\mathbf{X}^\\intercal\\mathbf{y}\n",
"$$\n",
"\n",
"\n",
"* Rješenje:\n",
"$$\n",
"\\mathbf{w} = (\\mathbf{X}^\\intercal\\mathbf{X})^{-1}\\mathbf{X}^\\intercal\\mathbf{y} = \\color{red}{\\mathbf{X}^{+}}\\mathbf{y}\n",
"$$\n",
"\n",
"\n",
"* Matrica $\\mathbf{X}^{+}=(\\mathbf{X}^\\intercal\\mathbf{X})^{-1}\\mathbf{X}^\\intercal$ je **pseudoinverz** (Moore-Penroseov inverz) matrice $\\mathbf{X}$\n",
"\n",
"\n",
"* **Q:** Kojih je dimenzija matrica $(\\mathbf{X}^\\intercal\\mathbf{X})^{-1}$?\n",
"* **Q:** Što utječe na složenost izračuna inverza matrice: broj primjera $N$ ili broj dimenzija $n$?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Probabilistička interpretacija regresije"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Ograničimo se BSO na univarijatnu ($n=1$) linearnu regresiju:\n",
"\n",
"$$\n",
"h(x|w_0, w_1) = w_0 + w_1 x\n",
"$$\n",
"\n",
"\n",
"* Zbog šuma u $\\mathcal{D}$:\n",
"$$\n",
" y^{(i)} = f(x^{(i)}) + \\color{red}{\\varepsilon}\n",
"$$\n",
"\n",
"* Prepostavka:\n",
"$$\n",
" \\color{red}{\\varepsilon}\\ \\sim\\ \\mathcal{N}(0, \\sigma^2)\n",
"$$\n",
"\n",
"* Posjedično:\n",
"$$\n",
" \\color{red}{y|x}\\ \\sim\\ \\mathcal{N}\\big(f(x), \\sigma^2\\big)\n",
"$$\n",
"odnosno\n",
"$$\n",
" \\color{red}{p(y|x)} = \\mathcal{N}\\big(f(x), \\sigma^2\\big)\n",
"$$\n",
"\n",
"* Vrijedi \n",
"$$\\mathbb{E}[y|x] = \\mu = f(x)$$\n",
"\n",
"\n",
"* Naš cilj je: $h(x|\\mathbf{w}) = f(x)$\n",
"\n",
"\n",
"* [Skica]\n",
"\n",
"\n",
"* $p(y^{(i)}|x^{(i)})$ je vjerojatnost da je $f(x^{(i)})$ generirala vrijednost $y^{(i)}$\n",
" * (Formulacija nije baš točna, jer je $x$ kontinuirana varijabla a $p$ je gustoća vjerojatnosti.)\n",
" \n",
"### Log-izglednost\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"\\ln\\mathcal{L}(\\mathbf{w}|\\mathcal{D}) \n",
"&= \n",
"\\ln p(\\mathcal{D}|\\mathbf{w}) = \n",
"\\ln\\prod_{i=1}^N p(x^{(i)}, y^{(i)}) =\n",
"\\ln\\prod_{i=1}^N p(y^{(i)}|x^{(i)}) p(x^{(i)}) \\\\ \n",
"&= \n",
"\\ln\\prod_{i=1}^N p(y^{(i)}|x^{(i)}) + \\underbrace{\\color{gray}{\\ln\\prod_{i=1}^N p(x^{(i)})}}_{\\text{Ne ovisi o $\\mathbf{w}$}} \\\\\n",
"& \\Rightarrow \\ln\\prod_{i=1}^N p(y^{(i)}|x^{(i)}) =\n",
"\\ln\\prod_{i=1}^N\\mathcal{N}\\big(h(x^{(i)}|\\mathbf{w}),\\sigma^2\\big)\\\\ &= \n",
"\\ln\\prod_{i=1}^N\\frac{1}{\\sqrt{2\\pi}\\sigma}\\exp\\Big\\{-\\frac{\\big(y^{(i)}-h(x^{(i)}|\\mathbf{w})\\big)^2}{2\\sigma^2}\\Big\\}\\\\ \n",
"&=\n",
"\\underbrace{\\color{gray}{-N\\ln(\\sqrt{2\\pi}\\sigma)}}_{\\text{konst.}} -\n",
"\\frac{1}{2\\color{gray}{\\sigma^2}}\\sum_{i=1}^N\\big(y^{(i)}-h(x^{(i)}|\\mathbf{w})\\big)^2\\\\\n",
"& \\Rightarrow\n",
"-\\frac{1}{2}\\sum_{i=1}^N\\big(y^{(i)}-h(x^{(i)}|\\mathbf{w})\\big)^2\n",
"\\end{align*}\n",
"$$\n",
"\n",
"\n",
"* Uz pretpostavku Gaussovog šuma, **maksimizacija izglednosti** odgovara **minimizaciji funkcije pogreške** definirane kao **zbroj kvadratnih odstupanja**:\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"\\mathrm{argmax}_{\\mathbf{w}} \\ln\\mathcal{L}(\\mathbf{w}|\\mathcal{D}) &= \\mathrm{argmin}_{\\mathbf{w}} E(\\mathbf{w}|\\mathcal{D})\\\\\n",
"E(h|\\mathcal{D}) &=\\frac{1}{2} \\sum_{i=1}^N\\big(y^{(i)}-h(x^{(i)}|\\mathbf{w})\\big)^2\\\\\n",
"L\\big(y,h(x|\\mathbf{w})\\big)\\ &\\propto\\ \\big(y - h(x|\\mathbf{w})\\big)^2\n",
"\\end{align*}\n",
"$$\n",
"\n",
"\n",
"* $\\Rightarrow$ Probabilističko opravdanje za kvadratnu funkciju gubitka\n",
"\n",
"\n",
"* Rješenje MLE jednako je rješenju koje daje postupak najmanjih kvadrata!\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Poopćeni linearan model regresije"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Zanima nas poopćenje na $n>1$ koje obuhvaća sve multivarijatne linearne modele regresije: univarijatna regresija, linearna regresija, polinomijalna regresija, ...\n",
" * $h(\\mathbf{x}|\\mathbf{w}) = w_0 + w_1 x_1 + w_2 x_2 + \\dots + w_n x_n$\n",
" * $h(x|\\mathbf{w}) = w_0 + w_1 x + w_2 x^2 + \\dots + w_d x^d$\n",
" * $h(\\mathbf{x}|\\mathbf{w}) = w_0 + w_1 x_1 + w_2 x_2 + w_3 x_1 x_2 + w_4 x_1^2 + w_5 x_2^2$\n",
" * ...\n",
"\n",
"\n",
"* Uvodimo fiksan skup **baznih funkcija** (nelinearne funkcije ulaznih varijabli):\n",
"$$\n",
" \\{\\phi_0, \\phi_1, \\phi_2, \\dots, \\phi_m\\}\n",
"$$ \n",
"gdje $\\phi_j:\\mathbb{R}^n\\to\\mathbb{R}$\n",
"\n",
"\n",
"* Dogovorno: $\\phi_0(\\mathbf{x}) = 1$\n",
"\n",
"\n",
"* Svaki vektor primjera u $n$-dimenzijskom originalnom ulaznom prostoru (engl. *input space*) $\\mathcal{X}$:\n",
"$$\n",
"\\mathbf{x} = (x_1, x_2, \\dots, x_n)\n",
"$$\n",
"preslikavamo u nov, $m$-dimenzijski prostor, tzv. **prostor značajki** (engl. *feature space*):\n",
"$$\n",
"\\boldsymbol{\\phi}(\\mathbf{x}) = \\big(\\phi_0(\\mathbf{x}), \\phi_1(\\mathbf{x}), \\dots, \\phi_m(\\mathbf{x})\\big)\n",
"$$\n",
"\n",
"\n",
"* **Funkija preslikavanja** (vektor baznih funkcija)\n",
"$$\n",
"\\begin{align*}\n",
"\\boldsymbol{\\phi}&:\\mathbb{R}^n\\to\\mathbb{R}^m:\\\\\n",
"\\boldsymbol{\\phi}(\\mathbf{x}) &= \\big(\\phi_0(\\mathbf{x}),\\dots,\\phi_m(\\mathbf{x})\\big)\\\\\n",
"\\end{align*}\n",
"$$\n",
"\n",
"\n",
"* Poopćen linearan model:\n",
"$$\n",
" h(\\mathbf{x}|\\mathbf{w}) = \\sum_{j=0}^m w_j\\phi_j(\\mathbf{x}) = \\mathbf{w}^\\intercal\\boldsymbol{\\phi}(\\mathbf{x})\n",
"$$\n",
"\n",
"\n",
"### Uobičajene funkcije preslikavanja\n",
"\n",
"\n",
"* Linearna regresija:\n",
"$$\n",
"\\boldsymbol{\\phi}(\\mathbf{x}) = (1,x_1,x_2,\\dots,x_n)\n",
"$$\n",
"\n",
"\n",
"* Univarijatna polinomijalna regresija: \n",
"$$\n",
"\\boldsymbol{\\phi}(x) = (1,x,x^2,\\dots,x^m)\n",
"$$\n",
"\n",
"\n",
"* Polinomijalna regresija drugog stupnja: \n",
"$$\n",
"\\boldsymbol{\\phi}(\\mathbf{x}) = (1,x_1,x_2,x_1 x_2, x_1^2, x_2^2)\n",
"$$\n",
"\n",
"\n",
"* Gaussove bazne funkcije (RBF):\n",
"$$\n",
"\\phi_j(x) = \\exp\\Big\\{-\\frac{(x-\\mu_j)^2}{2\\sigma^2}\\Big\\}\n",
"$$\n",
"\n",
"\n",
"* [Skica: RBF] \n",
"\n",
"### Prostor značajki\n",
"\n",
"\n",
"* **Funkcija preslikavanja značajki** $\\mathbf{\\phi} : \\mathbb{R}^n \\to \\mathbb{R}^m $ preslikava primjere iz $n$-dimenzijskog ulaznog prostora u $m$-dimenzijski prostor značajki\n",
"\n",
"\n",
"* Tipično je $m>n$\n",
"\n",
"\n",
"* Tada je funkcija koja je linearna u prostoru značajki **nelinearna u ulaznom prostoru**\n",
"\n",
"\n",
"* Dakle, možemo koristiti linearan model za nelinearne probleme\n",
"\n",
"\n",
"* Imamo unificiran postupak, neovisno koju funkciju $\\boldsymbol{\\phi}$ odaberemo"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Primjer: Preslikavanje iz ulaznog prostora u prostor značajki\n",
"\n",
"* $\\mathcal{X} = \\mathbb{R}$\n",
"* $n=1$, $m=3$\n",
"* $\\boldsymbol{\\phi} : \\mathbb{R} \\to \\mathbb{R}^3$\n",
"* $\\boldsymbol{\\phi}(x) = (1,x,x^2)$\n",
"* [Skica]"
]
},
{
"cell_type": "code",
"execution_count": 261,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def f(x) : return 3*(x - 2)**2 + 1\n",
"\n",
"x1 = 1\n",
"x2 = 2\n",
"x3 = 3"
]
},
{
"cell_type": "code",
"execution_count": 262,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW0AAAEACAYAAAB4ayemAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAG+hJREFUeJzt3Xt0ldWdxvHvT24R7yhiFSwUraXaAaoFKgoHEEK5eKGV\nilpHnVFb5dLaOhXtKK4ubV2d6VhiL85SUFuJFi1WEkQQiAEVBOUmNyXeQBAstmoREMieP/bBwZCQ\nN8k5Z7/vOc9nrSwPOS8nj5vkl332uy/mnENERJLhkNABREQkOhVtEZEEUdEWEUkQFW0RkQRR0RYR\nSRAVbRGRBGke5SIzewv4CNgL7HbO9chmKBERqV2kog04IOWc+yCbYURE5OAaMjxiWUshIiKRRC3a\nDnjWzJaY2TXZDCQiInWLOjzS2zm32czaArPNbK1zbn42g4mIyIEiFW3n3Ob0f983s2lAD2A+gJlp\n8xIRkUZwzjV42Lne4REza21mR6QfHwYMAlbW+MKx/7j99ttz/jWHD3c8+mj8cyalPZUz/EdDct54\no+PnP493xpAfjRVlTLsdMN/MlgGLgDLn3KxGf8UCMmgQzFJLSYGaNcv/DEhm1Ts84px7E+iWgyx5\nZ9AguPtucA5Mc2+kgGza5D/OPDN0kvxTMCsiU6lUzr/mqadCs2awdm30vxMiZ2MoZ2blW87Zs2HA\nAP/9n2tJacvGsqaMrYC/EdnU18hn114Lp58O48aFTiKSO5ddBv36wb//e+gk8WVmuGzciJSm0bi2\nFJrqat/THjgwdJL8pKKdZf37w/z5sGtX6CQiubF8ObRpA1/8Yugk+UlFO8vatIGvfhVeeCF0EpHc\n0KyR7FLRzgENkUghUdHOLhXtHFDRlkKxfTu89BLk+QSOoFS0c6BnT6iqgvffD51EJLsqK/3c7MMP\nD50kf6lo50CLFr7n8eyzoZOIZJeGRrJPRTtHNEQihUBFO/u0uCZH1q+Hvn1h40YtaZf8tHEjdOsG\nW7aEWQmZNFpcE3OdO0OrVvDqq6GTiGTHrFlw3nkq2Nmmop0jZjBkCMyYETqJSHaUl/vvcckuFe0c\nGjYMyspCpxDJvF27YM4c+Na3QifJfyraOZRKwYoV8IHOtJc8M38+dOkCbduGTpL/VLRzqKjI34x8\n5pnQSUQyq7wchg4NnaIwqGjnmIZIJB+Vlfnvbck+Fe0cGzIEZs6EPXtCJxHJjNdegx07oGvX0EkK\ng4p2jrVvDx06wMKFoZOIZEZZme+MaP1BbqhoBzBsmB8DFMkH5eUaGsklFe0Ahg5V0Zb88NFHsHix\nPw9SckNFO4AePWDzZnjnndBJRJpm1izo3RsOOyx0ksKhoh1As2Z+EYJ625J0muqXeyragWiIRJKu\nutpvy6CinVsq2oEUF/sN4z/5JHQSkcZZsgSOOw46dQqdpLCoaAdy9NHw9a/DvHmhk4g0joZGwlDR\nDkirIyXJtAoyDB2CENCaNX6Y5O23tTBBkmXzZjj9dNi6FZo3D50mmXQIQgJ95Sv+G14HI0jSzJjh\njxVTwc49Fe2AzDREIsmkoZFwVLQD09Q/SZpdu2DuXBg8OHSSwqSiHVjfvrByJWzbFjqJSDSVlXDG\nGX66n+SeinZgRUXQr5/frlUkCcrKNNUvJBXtGNAQiSSFcyraoalox8DQof4Ist27QycRObh16+DT\nT+Ff/iV0ksKloh0DJ54InTv7sUKROJs2DS64QOsKQopUtM2smZktNbPp2Q5UqEaM8D8QInE2bZr/\nXpVwok6NHwesBo7IYpaCdvIx5ZTdPpHbV+5ib1ErBo0dSx8NHEpMlJdXcvfds1i6tDm//OUeduwY\nxNChfULHKkj1Fm0zaw8MAe4Ebsx6ogJUWV7Oql+NY8GnVZAeIrm1qgpAhVuCKy+vZNy4Z6iquhOA\n2bPhjTduBVDhDiDK8Mj/ADcB1VnOUrBmTZzInekivc+dVVXMLikJlEjk/02cOOuzgr1PVdWdlJTM\nDpSosB20p21mw4CtzrmlZpaq67oJEyZ89jiVSpFK1Xmp1KL5rl21fr7Zzp05TiJyoF27ai8TO3c2\ny3GSZKuoqKCioqLJr1Pf8MjZwPlmNgQoAo40s4edc1fsf9H+RVsabk+rVrV+fm9RUY6TiByoVas9\ntX6+qGhvjpMkW80O7R133NGo1zno8Ihz7hbnXAfnXCfgEmBuzYItTTdo7Fhu7dz5c5+7pXNnBo4Z\nEyiRyP8bO3YQrVvf+rnPde58C2PGDAyUqLA1dGNFbZydBftuNv5nSQn/3LKT5VVFTPjNGN2ElFg4\n99w+OAcDBvwne/Y0o6hoL2PGDNZNyEB0CELM7N0LJ50Ezz/vF9yIhPboo/DHP2qrhUzTIQh5olkz\nv+JMC20kLv7yFy2oiRMV7RjS6kiJi507YdYsOP/80ElkHxXtGOrXz58fuXlz6CRS6GbPhm7doG3b\n0ElkHxXtGGrZEoYMgSefDJ1ECp2GRuJHRTumNEQioe3ZA9Onw0UXhU4i+1PRjqniYli0CD74IHQS\nKVSVldCpE3ToEDqJ7E9FO6YOOwz699dJ7RKOhkbiSUU7xkaM8D84IrlWXa29s+NKRTvGhg2DuXNh\n+/bQSaTQLF4MRx0Fp50WOonUpKIdY8ccA7166aR2yT0NjcSXinbMaYhEcs05Fe04U9GOuQsvhBkz\nYMeO0EmkUCxb5qf7de8eOonURkU75k44Ac48U5v1SO6UlsKoUTpxPa5UtBNg1Cj/gySSbdXVfle/\nUaNCJ5G6qGgnwIgR8Oyz8OGHoZNIvnvhBTjySPja10InkbqoaCfAMcdAKqW9SCT79g2NSHypaCeE\nhkgk2/bsgalT4ZJLQieRg1HRTojhw2HhQnj//dBJJF/NmeP3GtGJSfGmop0Qhx3mt2udOjV0EslX\nGhpJBhXtBNEQiWTLzp3w17/CyJGhk0h9VLQTpLgYVq+Gd94JnUTyzYwZfjHNiSeGTiL1UdFOkJYt\n/fS/xx4LnUTyjYZGksOcc017ATPX1NeQ6ObOhR//GJYuDZ1E8sVHH/mDDt58E9q0CZ2mcJgZzrkG\nrztVTzth+vaFLVtg7drQSSRf/PWv0KePCnZSqGgnTLNm/maRbkhKpmhoJFk0PJJAixbB974H69Zp\nUx9pmr/9zc/LfvddOPzw0GkKi4ZHCkiPHrB3L7zySugkknSPPw7f+pYKdpKoaCeQmeZsS2aUlsKl\nl4ZOIQ2h4ZGEWrUKBg+Gt9+GQ/SrVxph40bo2hU2bYJWrUKnKTwaHikwp5/ud/9bsCB0Ekmqxx6D\niy5SwU4aFe0EGzUKpkwJnUKSasoUzRpJIg2PJNg770C3bv7O/6GHhk4jSbJiBQwdCm+95aeRSu5p\neKQAnXwyfOMbMG1a6CSSNJMnw5VXqmAnkYp2wl19NUyaFDqFJMmnn8Ijj/iiLcmjop1wF1wAy5b5\nt7kiUUyf7m9k67CDZKq3aJtZkZktMrNlZrbazH6Ri2ASTVGRv5n04IOhk0hSTJrk36FJMkW6EWlm\nrZ1zn5hZc2AB8BPn3IL0c7oRGdjSpXDhhX6XNs3ZloN5911/0vrGjdC6deg0hS2rNyKdc5+kH7YE\nmgEfNPQLSfZ07+53aJs7N3QSibuHH4aLL1bBTrJIRdvMDjGzZcAWYJ5zbnV2Y0lD6Yak1Mc5DY3k\ng6g97WrnXDegPdDHzFJZTSUNdumlUF4Of/976CQSVwsW+NOPevQInUSaonlDLnbOfWhm5cBZQMW+\nz0+YMOGza1KpFKlUKjPpJLJjj/V7kZSWwvXXh04jcbSvl63tfMOoqKigoqKiya9T741IMzsO2OOc\n+4eZHQo8A9zhnJuTfl43ImPimWfg1lthyZLQSSRuPv7YHym2bh20axc6jUB2b0R+AZibHtNeBEzf\nV7AlXs47D7ZuheXLQyeRuPnzn6FfPxXsfKC9R/LMbbf5g1rvuSd0EomT3r1h/HgYNix0EtmnsT1t\nFe0888Yb0LOnn4/bsmXoNBIHa9dC//5+g7HmDbqLJdmkDaMEgC99yS+emD49dBKJi8mT4YorVLDz\nhYp2HtKcbdln926/oOaqq0InkUxR0c5DI0bAiy/6IRIpbDNn+o2hTjstdBLJFBXtPNS6NYwc6d8W\nS2G7/371svONbkTmqeXL/UyBN9/UWGaheustOOssf/jzYYeFTiM16UakfE7XrtCpEzz5ZOgkEsrv\nfucPOlDBzi/qaeexqVPh3nvhuedCJ5Fc++QT+OIXYdEiP6NI4kc9bTnAhRdCVZU/xFUKy5Qp0KuX\nCnY+UtHOYy1awPe/DyUloZNILjnn/83HjAmdRLJBwyN5butWP92rqsoflCD5r7ISrr0WVq/WSUZx\npuERqdXxx8Pw4fDAA6GTSK6UlMDo0SrY+Uo97QKweLGft71+PTRrFjqNZNOGDX7m0NtvwxFHhE4j\nB6OettTpG9/wW3KWl4dOItl2331w+eUq2PlMPe0C8cgj8OCDMHt26CSSLTt3+ml+8+fDl78cOo3U\nRz1tOaiLL4ZXX4U1a0InkWz585+he3cV7Hynol0gWraEa67xi20k/+yb5jd6dOgkkm0aHikgmzbB\nGWf4/UiOOip0GsmkhQvhssvgtdd0szkpNDwi9TrxRBg0yI9tS34pKYEbblDBLgTqaReY55/3mwit\nW6d5vPnivfegSxf/Duroo0OnkajU05ZIzj7bTwd7+unQSSRT/vAH+O53VbALhXraBai01G/bOX9+\n6CTSVB9/7DeFev55zRpJGvW0JbKRI/1b6srK0Emkqe67DwYMUMEuJOppF6j774fHH/dnCEoy7dzp\ne9lPP+2XrkuyqKctDXLFFbBqFSxZEjqJNNbkyXDmmSrYhUY97QL2m9/4IZInngidRBpq924/JDJl\nCnzzm6HTSGOopy0Nds01sGCB33dZkqW01J8BqoJdeFS0C1jr1jBuHPzyl6GTSENUV8MvfgG33BI6\niYTQPHQACeuGG6BzZ3jjDZ0nmBTTpsGRR/pZI1J41NMucEcdBdddB7/6VegkEoVzcNddvpdtDR4N\nlXygG5HC++/7cyRffdXvTyLxNXMm3HQTLF+ubQiSTjcipdHatvVTAH/969BJpD533QXjx6tgFzL1\ntAWAjRv9fN/XXoNjjw2dRmozfz5cdRWsXQvNdTcq8dTTliZp3x6+/W0/d1vi6c474eabVbALnXra\n8pn166FXL6iq0iEJcfPyy3Dhhf7fqFWr0GkkE9TTliY75RQYNkwzSeJo/Hj/oYIt9fa0zawD8DBw\nPOCA/3XOTdzvefW088iGDdCtG6xYASedFDqNAMya5c9+XLUKWrQInUYypbE97ShF+wTgBOfcMjM7\nHHgZuNA5tyb9vIp2nrn5Zvjb3/xOgBLW3r1+U6jbboMRI0KnkUzK2vCIc+4959yy9ON/AmsAzebN\nYzffDE895edtS1iPPOK3G7jootBJJC4adCPSzDoCzwGnpwu4etp56p574NlnoawsdJLCtWOHX/RU\nWgq9e4dOI5mW9RuR6aGRx4Fx+wq25K8f/MDv/jdvXugkhaukBM46SwVbPi/SjE8zawE8AfzJOfdk\nzecnTJjw2eNUKkUqlcpQPAmlVSu/+u4//gMWLdIKvFzbts3P4lmwIHQSyZSKigoqKiqa/DpRbkQa\n8BCwzTn3o1qe1/BInqquhp494cc/hksuCZ2msNx4oz9O7He/C51EsiWbs0fOASqBFfgpfwDjnXMz\n08+raOexefPg3/4N1qzRHOFcefNNPyyyejW0axc6jWRL1op2hC+sop3nhg2D886DH/4wdJLCcOml\n8JWv+Gl+kr9UtCVrXn0V+vf3m0kdfXToNPltyRI4/3zf1ocfHjqNZJOWsUvWnHGGLyQ6liy7nPN7\nZU+YoIItdVNPWyJ5912/deuLL8Kpp4ZOk5+mTYNbb/VbCGgnv/yn4RHJul//2i+2mTNHR11l2j/+\n4d/RlJbCueeGTiO5oOERybqxY+Gjj2Dy5NBJ8s/NN8PQoSrYUj/1tKVBli2DQYNg5UpNR8uU+fNh\n1Ch/w1c3eguHetqSE926wdVXw7hxoZPkh1274NprYeJEFWyJRkVbGuz22/3UNG0m1XR33eXnZGvb\nVYlKwyPSKHPnwpVX+o35jzgidJpkWrUKUik/5KQDJwqPZo9Izl19tZ9PPHFi/dfK51VXwznnwBVX\nwPe/HzqNhKCiLTn3wQdw+ul+fnGvXqHTJMtvfwuPPgrPPacdFAuVirYE8dhj8POfwyuvQMuWodMk\nw4YN8PWvQ2UldOkSOo2EotkjEsTIkdCxI9x9d+gkyeAc3HADjBmjgi2No562NNmGDX4r0SefhG9+\nM3SaePv97+G+++Cll/TOpNBpeESCKiuD66+Hl1+Gtm1Dp4mnxYv9qscXXoBTTgmdRkLT8IgENWwY\nXH45XHYZ7N0bOk38bNvmh5Luu08FW5pGPW3JmD17YOBA6NvXby8qXnU1DB/ux7D/679Cp5G40PCI\nxMJ778GZZ8KkSVBcHDpNPNx1F8yY4Y9ua9EidBqJCxVtiY3KSj8UsHgxdOgQOk1Yc+f6IaMlS7Tq\nUT5PY9oSG336+NPER46ETz8NnSacTZv8OP+f/qSCLZmjnrZkRXU1XHQRdOoE99wTOk3u7d7tz9Us\nLoaf/Sx0Gokj9bQlVg45BB58EJ56yq+aLDTjx/uNtG65JXQSyTc6iU6y5phj4IknfG+zTRs/s6QQ\n/Pd/+19WL76ofUUk8/QtJVnVvbsv3Jde6k9oyXe//z3ce68/R/PYY0OnkXykoi1Zd+65MGUKfPvb\nfvl2vnroIT+9b84czZqR7FHRlpwYOBAeeMAvMlmxInSazJs61Y9jz54NX/pS6DSSz1S0JWeGD4eS\nEhg8GNauDZ0mc8rKYPRoePppf3SYSDbpRqTk1MiRsGOH73k/91zye6XPPutP8Ckrg65dQ6eRQqCi\nLTn3r/8K27fDgAG+cJ98cuhEjbNggb/B+sQT0KNH6DRSKDQ8IkFcfz388Id+/+3KytBpGm7SJL94\n6JFH/I1WkVzRikgJauZM3/MePx7GjQNr8Pqw3Nq1C8aO9b9o/vIXnT4jjacVkZJIgwfDwoXw8MN+\nqGH79tCJ6vbOO75XvW2bn7qogi0hqGhLcJ06wfPPQ1GRP9X99ddDJzrQnDnQs6e/kTp1ql+iLhKC\nirbEwqGH+nHi0aOhd2+/DDwOnPOHFl9+uV8g9JOfxH8IR/KbxrQldhYtgosvhn79/Ak4nTqFyzF+\nvB+yefxxrXKUzNKYtuSNnj1h5Uro2NGf8j56NGzenLuvv3IlXHABfOc7cMklfmqfCrbERb1F28wm\nmdkWM1uZi0AiAEcdBXfc4VdOtmoFZ5wBP/2pvwmYLevX+1NmBg6EVMqPrV97rY4Ik3iJ0tOeDAzO\ndhCR2rRt67c6Xb4cPvwQTjvNF/PXX/fjzXUpn11O8VXFpK5MUXxVMeWzy2u9bu9eePlluO46fxO0\nSxf/2j/6kb8xKhI39a6IdM7NN7OO2Y8iUrf27eEPf4CbbvI3Bvv183tVDxjgP/r3hxNP9NeWzy5n\n3G/HUdW96rO/X/Vb/3jIeUNZt86f3ThnDlRUwPHH+x0IX3vN7/stEmeRbkSmi/Z059zXanlONyIl\n55zzRXbOHP8xbx60awfnnAOzVhXzTvGsA/7OF6YXc8h7Mz9X7Pv10/mNEkZjb0RmZO+RCRMmfPY4\nlUqRSqUy8bIidTLzQyWnneaXxO/dC8uW+YU6897cVevfOeLYnZSVwimnaNqe5F5FRQUVFRVNfh31\ntCXvFF9VzKyOB/a0i98uZuakmQESiRxIU/5E0sZeOpbOSzt/7nOdX+nMmFFjAiUSyZx6e9pmVgr0\nBY4FtgK3Oecm7/e8etoSO+WzyykpLWFn9U6KDilizKgxDB04NHQskc80tqetFZEiIgFoeEREpACo\naIuIJIiKtohIgqhoi4gkiIq2iEiCqGiLiCSIiraISIKoaIuIJIiKtohIgqhoi4gkiIq2iEiCqGiL\niCSIiraISIKoaIuIJIiKtohIgqhoi4gkiIq2iEiCqGiLiCSIiraISIKoaIuIJIiKtohIgqhoi4gk\niIq2iEiCqGiLiCSIiraISIKoaIuIJIiKtohIgqhoi4gkiIq2iEiCqGiLiCSIiraISIKoaIuIJIiK\ntohIgqhoi4gkiIq2iEiC1Fu0zWywma01s9fN7Ke5CCUiIrU7aNE2s2bAvcBg4KvAKDPrkotgmVZR\nURE6QiTKmVnKmVlJyJmEjE1RX0+7B7DeOfeWc2438ChwQfZjZV5S/iGVM7OUM7OSkDMJGZuivqJ9\nErBhvz9vTH9OREQCqK9ou5ykEBGRSMy5uuuymfUCJjjnBqf/PB6ods7dvd81KuwiIo3gnLOG/p36\ninZzYB0wANgEvASMcs6taWxIERFpvOYHe9I5t8fMRgPPAM2AB1SwRUTCOWhPW0RE4iXyisgoi2zM\nbGL6+eVm1j1zMaOrL6eZpczsQzNbmv74WYCMk8xsi5mtPMg1cWjLg+aMQ1umc3Qws3lmtsrMXjWz\nsXVcF7RNo+QM3aZmVmRmi8xsmZmtNrNf1HFd6LasN2fotqyRpVk6w/Q6no/ens65ej/wQyPrgY5A\nC2AZ0KXGNUOAGenHPYGFUV47kx8Rc6aAp3KdrUaGc4HuwMo6ng/elhFzBm/LdI4TgG7px4fj78PE\n8fszSs7gbQq0Tv+3ObAQOCdubRkxZ/C23C/LjcAjteVpaHtG7WlHWWRzPvAQgHNuEXC0mbWL+PqZ\nEnUxUIPv2GaSc24+8PeDXBKHtoySEwK3JYBz7j3n3LL0438Ca4ATa1wWvE0j5oTw35+fpB+2xHeE\nPqhxSfC2TH/t+nJCDL4/zaw9vjDfT+15GtSeUYt2lEU2tV3TPuLrZ0qUnA44O/02ZIaZfTVn6aKL\nQ1tGEbu2NLOO+HcHi2o8Fas2PUjO4G1qZoeY2TJgCzDPObe6xiWxaMsIOYO3Zdr/ADcB1XU836D2\njFq0o96trPlbJNd3OaN8vVeADs65rkAJ8GR2IzVa6LaMIlZtaWaHA48D49I92QMuqfHnIG1aT87g\nbeqcq3bOdcMXjj5mlqrlsuBtGSFn8LY0s2HAVufcUg7e64/cnlGL9rtAh/3+3AH/2+Bg17RPfy6X\n6s3pnPt439sq59zTQAsza5O7iJHEoS3rFae2NLMWwBPAn5xztf1wxqJN68sZpzZ1zn0IlANn1Xgq\nFm25T105Y9KWZwPnm9mbQCnQ38wernFNg9ozatFeApxqZh3NrCXwXeCpGtc8BVwBn62k/IdzbkvE\n18+UenOaWTszs/TjHvhpj7WNhYUUh7asV1zaMp3hAWC1c+6eOi4L3qZRcoZuUzM7zsyOTj8+FBgI\nLK1xWRzast6codsSwDl3i3Oug3OuE3AJMNc5d0WNyxrUngddXLPfF651kY2ZXZd+/j7n3AwzG2Jm\n64HtwFUN/R9sqig5ge8APzCzPcAn+IbMKTMrBfoCx5nZBuB2/GyX2LRllJzEoC3TegOXAyvMbN8P\n7i3AyRCrNq03J+Hb9AvAQ2Z2CL5T90fn3Jy4/axHyUn4tqyNA2hKe2pxjYhIgui4MRGRBFHRFhFJ\nEBVtEZEEUdEWEUkQFW0RkQRR0RYRSRAVbRGRBFHRFhFJkP8D3vOeLBEdiMkAAAAASUVORK5CYII=\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"xs = linspace(0, 4)\n",
"y = f(xs)\n",
"plt.ylim(0,5)\n",
"plt.plot(xs, y)\n",
"plt.plot(x1, f(x1), 'ro')\n",
"plt.plot(x2, f(x2), 'go')\n",
"plt.plot(x3, f(x3), 'bo')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 263,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def phi(x): return sp.array([1, x, x**2])"
]
},
{
"cell_type": "code",
"execution_count": 264,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 1, 1])"
]
},
"execution_count": 264,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"phi(x1)"
]
},
{
"cell_type": "code",
"execution_count": 265,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 4])"
]
},
"execution_count": 265,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"phi(x2)"
]
},
{
"cell_type": "code",
"execution_count": 266,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 3, 9])"
]
},
"execution_count": 266,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"phi(x3)"
]
},
{
"cell_type": "code",
"execution_count": 267,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"xs1 = linspace(0, 5)\n",
"xs2 = linspace(0, 10)\n",
"X1, X2 = np.meshgrid(xs1, xs2)"
]
},
{
"cell_type": "code",
"execution_count": 268,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"phi_X = 3*X2 - 12*X1 + 13"
]
},
{
"cell_type": "code",
"execution_count": 286,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAWwAAAD7CAYAAABOi672AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFJBJREFUeJzt3X+Q1PV9x/HXG40VDWhakiCnU5xODSaSgWgQKzRro8Gm\nmqoBCYfEybS520wSjpN4KJkxN2KbmplLYpPOJBg1Os2dp4QzcElZTnAb03aSHGCkcgFbjsbaKsYc\n55gcxw8//YNdesDdsfv9sd9fz8fMjXvL97v33uF4z9fXvr+fjznnBACIvwlRFwAAqAwNGwASgoYN\nAAlBwwaAhKBhA0BC0LABICHODOuFzYx5QQDwwDlnoz0f6hW2c87T15e+9CXP5yb1i/ecjS/ecza+\n/Lzn8RCJAEBC0LABICFi2bBzuVzUJdQc7zkbeM/ZENZ7tvEyEzN7WNJfSNrvnJtZeu73JXVK+kNJ\n+yTd6pw7MMq57nR5DADgRGYm5/FDx0ckXX/Sc3dJ6nHOXSJpS+l7AEDIxm3YzrlnJQ2c9PTHJD1a\nevyopJtCqAsAcBIvGfa7nXOvlh6/KundAdYDABiDrw8dSyE1QTUQsv/68Y/12/37PZ374ouv6/nn\nXz39gYg9L3c6vmpmU51zr5jZBZLG/C1qbW09/jiXy2Xy02LAr4H+fj2xcKHqu7t17rveVdW5Q0OH\ntWjRk8rnr9D738//DMdRsVhUsVis6Nhxp0QkycymS9o4YkrkK5Jed87db2Z3STrfOXfKB49MiQD+\nHT10SA/Pm6eZ9fWau2JF1efn8906cOCgOjo+LrNRBw8QM+NNiYx7hW1mHZI+JGmKmb0k6R5Jfyfp\nCTP7K5XG+oItF0BZT0uLJk2bpiubmqo+t6Njp7Zs6de2bQ0065QYt2E755aM8UfXhlALgBH6urq0\n+wc/UMP27VU33D17Xtfy5Zu0efNtmjz590KqELUW2mp9ALwb6O9Xd2Oj6ru7NfEd76jq3KGhw7r1\n1ie1Zs01mj37gpAqRBROm2F7fmEybMCTI8PDemTePM1cutRTbt3YuFGDg8Pk1gnlOcMGUHs9LS2a\nVFfnKbdub9+prVv3kVunFA0biJG+ri7t2bDBc27d1ERunWY0bCAm/ObWixaRW6cdGTYQA37nrcmt\n04MMG4g5v/PW5NbZQMMGIuZn3nr37l8zb50hNGwgQv7nrdeRW2cIGTYQkSPDw3pk/nxfufUbbxxS\ne/stRCEpQoYNxJDf3PqZZ/apt5fcOkto2EAE+tavD2Deehm5dcbQsIEaG9i7V935vO91QmbNmhpS\nhYgrGjZQA4VCQW1ta2VvHdV1v9ql+atXq27OnKpfZ8WKTZoxY4oaGi4PoUrEHQ0bCFmhUNDNN9+u\noaH7db2+px0TfqXLZsyo+nXIreFrT0cAp9fWtlZDQ/drhibrPXpR69/6qr761Qereo3y+tadnQvJ\nrTOMhg3UwPl6TTeqUevUqYM6t6pzWd8aZTRsIGTNyz+lW+1uPatr9bL6NHHiKq1c2VD5+c0FzZgx\nRY2N5NZZR4YNhMyeflrvm/tB9Z97UNfZBq1c+agWLFhQ0bnsy4iRaNhAiEauE7K8yhE+9mXEyWjY\nQEjYlxFBYy0RIAR+17fO57s1ODjMOiEZxFoiQI35X9+6n3lrnIKGDQTMz/rW5dy6p4d1QnAqGjYQ\noKBya9YJwWjIsIGAlHPry5Ys0VXNzVWfn89368CBg+zLmHFk2EANlHNrLx8yMm+NStCwgQAEkVsz\nb43ToWEDPjFvjVohwwZ8ILdG0MiwgZCQW6OWaNiAR+TWqDXPy6ua2d1m9oKZ7TSzdjPjtw6ZUc6t\nF3Z2klujZjw1bDObLunTkj7gnJsp6QxJnwiuLCC+jh46pHWLF2ve3Xd72pexubmg97yH9a1RPa+R\nyBuSDks6x8yOSjpH0suBVQXEGLk1ouKpYTvnfmNmbZJ+JWlIUsE593SglQExRG6NKHlq2Gb2R5JW\nSJouaVDSk2a21Dn3vZHHtba2Hn+cy+WUy+W81glErpxbL9m4kdwagSkWiyoWixUd62kO28wWS7rO\nOffXpe+XSZrrnPvsiGOYw0ZqBDFvPTBwUI8/zrw1xjfeHLbXKZFfSpprZhPt2G/ftZJ2eS0QiLsg\ncusHH7yRZg1fvGbYvzCzxyT1SnpL0nZJa4MsDIgLcmvEBbemA+MY6O/Xd668UvXd3VWP8A0NHdZV\nVz2kfP4K5fNXhFQh0ma8SISGDYyBdUIQBdYSATzYfOedzFsjVmjYwCj61q/Xng0byK0RKzRs4CQD\ne/eqO59n3hqxQ4YNjMC8NaJGhg1UiHVCEGc0bKCEeWvEHQ0bUDD7Mt57b47cGqEiw0bmMW+NOCHD\nBsbhd95669Z+9faSWyN8NGxkWhDz1j09y8itURM0bGTWQH+/uvN5X7n1ffddo1mzpoZUIXAiMmxk\nUjm3nllf7ykKyee7NTg4rPb2W4hCECgybOAk5XnrK5uaqj6X3BpRoWEjc4KYtya3RhRo2MiUIOat\n16wht0Y0yLCRGeTWSAIybEDHcuvJdXXk1kgsGjYygdwaaUDDRuoFkVszb404IMNGqpFbI2nIsJFZ\nPatWkVsjNWjYSK1fPvXUsdx62zZya6QCDRupVM6t/e7LSG6NOCHDRuocz62XLtVcD1EIuTWiRIaN\nTDk+b718edXnklsjzmjYSBXmrZFmNGykBvPWSDsybKQC89ZICzJspB7rhCALaNhIPHJrZAUNG4lG\nbo0smeD1RDM738zWmVmfme0ys7lBFgacztFDh7Ru8WLN/+IXVTdnTtXnNzcXNGPGFDU0XB5CdUDw\n/FxhPyDpR865hWZ2pqRzA6oJqAjz1sgaTw3bzM6TNN85d7skOeeOSBoMsjBgPOTWyCKvkcjFkl4z\ns0fMbLuZPWhm5wRZGDCWcm69sLOTdUKQKV4jkTMlfUDS55xzPzezr0u6S9I9Iw9qbW09/jiXyymX\ny3n8ccAxx3Pr1at95daNjeTWiIdisahisVjRsZ5unDGzqZL+zTl3cen7eZLucs7dMOIYbpxB4Dat\nWKED+/ZpcVdX1VFIR8dO3XNPUdu2NRCFILYCv3HGOfeKmb1kZpc45/ZIulbSC36KBE4niNx68+bb\naNZILD9TIp+X9D0zO0vSf0r6VDAlAacKYt56zZprNHv2BSFVCISPtUQQe0GsE3LgwEF1dHycET7E\nHmuJINF6Wlo0ado0z+uEbNnSr23bmLdG8tGwEWvk1sD/o2EjtsitgRORYSOWyK2RVWTYSBxya+BU\nNGzEDrk1MDoaNmKF3BoYGxk2YoPcGiDDRkKQWwPjo2EjFsitgdOjYSNy5NZAZciwESlya+BEZNiI\nLXJroHI0bESG3BqoDg0bkSC3BqpHho2aI7cGxkaGjVghtwa8oWGjpsitAe9o2KgZcmvAHzJs1AS5\nNVAZMmxEjtwa8I+GjdCRWwPBoGEjVOTWQHDIsBGaIHLrwcFhtbffQhSCzCDDRiT85tZbt/art5fc\nGiijYSMUfnPrpqZN2rx5Gbk1MAING4ELKreeNWtqSBUCyUSGjUCRWwP+kGGjZsitgfDQsBGYIOat\ne3rIrYGx0LARiCBy6/vuI7cGxjPBz8lmdoaZ7TCzjUEVhOQ5euiQ1i1erPmrV6tuzpyqz29uLujS\nS9+phobLQ6gOSA+/V9hNknZJmhRALUionpYWTa6rqzi3LhQKavtGmyTpA5fdrq1bf0NuDVTAc8M2\nswslfVTS30i6I7CKkCjV5taFQkE3f+JmDeWGpDf/QD1fma1/+OaV5NZABfxEIl+TdKektwKqBQlT\nzq0XdnZWnFu3faPtWLN+35nSzkXS+7fqqR99K+RKgXTwdIVtZjdI2u+c22FmubGOa21tPf44l8sp\nlxvzUCSM39xam66X3vmaNL1XOnJd8AUCCVEsFlUsFis61tONM2b2t5KWSToi6WxJkyV93zn3yRHH\ncONMim1asUIH9u3T4q6uqrLnQqGgG29ZrcN2tZRbq4n/MkFdj3dpwYIFIVYLJMd4N874vtPRzD4k\n6QvOuRtPep6GnVJ9XV3afMcdati+veoRvj17XtecOd/SjJm9mnzeb7Xy8ytp1sAItbjTkc6cEeXc\nesnGjZ7mrRctelJf/vJH9JnPfDGkCoH0Yi0RVMzvOiGNjRs1ODjMvozAOFhLBIHwv07IPvZlBHyg\nYaMi7MsIRI+GjdMKIre+994c+zICPpFhY1zk1kBtkWHDMz+5dXs7uTUQJBo2xhTMvozk1kBQaNgY\n1cDeveTWQMyQYeMUR4aH9ci8eZq5dCm5NVBjZNioSk9LiyZVsb71SMxbA+GhYeMEfevXa8+GDZ5y\n6927f828NRAiGjaOG9i7V935vOfc+tZb15FbAyEiw4YkcmsgLsiwcVpPr1rlObdm3hqoDRo21Ld+\nved56927f828NVAjNOyMK+fW9d3d5NZAzJFhZxi5NRA/ZNgYFfPWQLLQsDOKeWsgeWjYGURuDSQT\nGXbGkFsD8UaGjeOeXrVKky+8kNwaSCAadoawLyOQbDTsjCjvy+g9t35Sa9ZcQ24NRIgMOwPK+zJe\ntmSJrmpurvp8cmugdsiwM668L6OXDxnJrYH4oGGnnJ/cmnlrIF5o2CnmP7dm3hqIEzLslCK3BpKJ\nDDuDelpaNLmujtwaSBEadgoxbw2kEw07ZZi3BtKLDDtF/ObW+Xy3Dhw4SG4NRCjwDNvMLpL0mKR3\nSXKS1jrn/t57iQiC33nrLVv6ya2BGPMaiRyW1Oyce87M3i5pm5n1OOf6AqwNVSC3BtJvgpeTnHOv\nOOeeKz1+U1KfpGlBFobKlXPrhZ2d5NZAivnOsM1suqR/lvS+UvMuP0+GXQPk1kC6hDaHXYpD1klq\nGtmsy1pbW48/zuVyyuVyfn4cRuF/3rpfvb3k1kBUisWiisViRcd6vsI2s7dJ6pb0T865r4/y51xh\nh6yvq0ub77hDDdu3Vx2F7Nnzuq6++mH19CzTrFlTQ6oQQLXCmBIxSQ9J2jVas0b4gpi3vu++a2jW\nQIJ4usI2s3mSfizpeR0b65Oku51zm0YcwxV2SMq59cz6ek9RCLk1EF+BX2E7534ijxMm8K88b+11\nX0bmrYFk4tb0hGF9ayC7aNgJEsT61sxbA8nFWiIJwfrWQDawHnYKsC8jABp2ArBOCACJhh175dx6\nycaNrBMCZBwZdoyxTgiQPWTYCcX61gBGomHHFLk1gJPRsGOIfRkBjIYMO2ZYJwTINjLsBPG7Tgjr\nWwPpRcOOWKFQ0Nq2NknSwrlztd9nbt3Ts4zcGkgpGnaECoWCbr/5Zt0/NKTDkn7R06MPPvCAp9x6\n0SLWtwbSjiVSI7S2rU33Dw3pNh1bVHyapPbu7qpfZ8WKTbr00ilqaLg86BIBxAhX2DHQI2mSpLd5\nOLejY6eeeWYfuTWQAVxhR6hh5Up966yztE3S7yTdNXGiGlaurPj8cm79xBOLyK2BDOAKO0JzLrlE\nN0ycqL5Zs7T9vPP06MqVWrBgQUXnklsD2cMcdkT8zluzvjWQTsxhx5Cfeev2dta3BrKIhh0Bv/sy\nNjWxTgiQRTTsGgtiX8Z7782xTgiQQWTYNXRkeFiPzJuny+rr2ZcRwKjIsGOip6VFk+rq2JcRgCc0\n7BrpW79eezZs8Jxbs741ABp2DQzs3avufN5Xbs361gDIsEPG+tYAqkGGHSHWtwYQFBp2iILYl5H1\nrQGU0bBDEsS+jKwTAmAkMuwQlHPry5Ys8TRvnc93a3BwWO3ttxCFABkzXobteXlVM7vezH5pZi+a\n2Srv5aVPObf2Om+9ZUu/vv3tG2jWAE7gKRIxszMkfVPStZJelvRzM9vgnOsLsrgkCiK3Zt4awGi8\nXmHPkfQfzrl9zrnDkh6X9JfBlZVM5dx6YWen59yaeWsAY/HasOskvTTi+/8uPZdZRw8d0rrFizV/\n9WrVzZlT9fnNzQXNmDFFjY3sywhgdF6nRLL5aeI49r/wgt753vd6mrceHDyol156g5tjAIzLa8N+\nWdJFI76/SMeusk/Q2tp6/HEul1Mul/P44+LvgtmzddN3v+vp3PPOO1s//GF9sAUBSIRisahisVjR\nsZ7G+szsTEm7JX1Y0v9I+pmkJSM/dMzyWB8AeBX4renOuSNm9jlJBUlnSHqICREACBc3zgBAjIRy\n4wwAoLZo2ACQELFs2JV+YpomvOds4D1nQ1jvmYYdE7znbOA9Z0OmGjYA4FQ0bABIiFDH+kJ5YQBI\nubHG+kJr2ACAYBGJAEBC0LABICFi17CztvWYmT1sZq+a2c6oa6kVM7vIzJ4xsxfM7N/NbHnUNYXN\nzM42s5+a2XNmtsvMvhx1TbVgZmeY2Q4z2xh1LbVgZvvM7PnSe/5Z4K8fpwy7tPXYbo3YekwnrQKY\nNmY2X9Kbkh5zzs2Mup5aMLOpkqY6554zs7dL2ibppjT/PUuSmZ3jnPtdabXLn0j6gnPuJ1HXFSYz\nu0PS5ZImOec+FnU9YTOzfkmXO+d+E8brx+0KO3NbjznnnpU0EHUdteSce8U591zp8ZuS+iRNi7aq\n8Dnnfld6eJaOrXIZyj/quDCzCyV9VNJ3JGVpZ47Q3mvcGjZbj2WMmU2XNFvST6OtJHxmNsHMnpP0\nqqRnnHO7oq4pZF+TdKekt6IupIacpKfNrNfMPh30i8etYccnn0HoSnHIOklNpSvtVHPOveWcmyXp\nQkl/ama5iEsKjZndIGm/c26HsnV1fbVzbrakP5f02VLkGZi4NeyKth5D8pnZ2yR9X9I/Oueeirqe\nWnLODUr6oaQroq4lRH8i6WOlTLdD0p+Z2WMR1xQ659z/lv77mqQuHYt5AxO3ht0r6Y/NbLqZnSVp\nsaQNEdeEgNmxnYYfkrTLOff1qOupBTObYmbnlx5PlHSdpB3RVhUe59xq59xFzrmLJX1C0lbn3Cej\nritMZnaOmU0qPT5X0kckBTr9FauG7Zw7Iqm89dguSZ0ZmBzokPSvki4xs5fM7FNR11QDV0u6TdI1\npfGnHWZ2fdRFhewCSVtLGfZPJW10zm2JuKZaykLc+W5Jz474O+52zm0O8gfEaqwPADC2WF1hAwDG\nRsMGgISgYQNAQtCwASAhaNgAkBA0bABICBo2ACQEDRsAEuL/AFYNWgmCywNbAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plt.contour(X, Y, phi_X, levels=[1,4])\n",
"plt.scatter(phi(x1)[1], phi(x1)[2], c='r')\n",
"plt.scatter(phi(x2)[1], phi(x2)[2], c='g')\n",
"plt.scatter(phi(x3)[1], phi(x3)[2], c='b')\n",
"plt.legend()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Optimizacijski postupak\n",
"\n",
"* Ništa se ne mijenja u odnosu na ono što smo već izveli, samo umjesto $\\mathbf{X}$ imamo dizajn-matricu $\\boldsymbol{\\Phi}$\n",
"\n",
"\n",
"* Dizajn-matrica:\n",
"$$\n",
"\\boldsymbol{\\Phi} = \n",
"\\begin{pmatrix}\n",
"1 & \\phi_1(\\mathbf{x}^{(1)}) & \\dots & \\phi_m(\\mathbf{x}^{(1)})\\\\\n",
"1 & \\phi_1(\\mathbf{x}^{(2)}) & \\dots & \\phi_m(\\mathbf{x}^{(2)})\\\\\n",
"\\vdots\\\\\n",
"1 & \\phi_1(\\mathbf{x}^{(N)}) & \\dots & \\phi_m(\\mathbf{x}^{(N)})\\\\\n",
"\\end{pmatrix}_{N\\times m}\n",
"=\n",
"\\begin{pmatrix}\n",
"\\mathbf{\\phi}(\\mathbf{x}^{(1)})^\\intercal \\\\\n",
"\\mathbf{\\phi}(\\mathbf{x}^{(2)})^\\intercal \\\\\n",
"\\vdots\\\\\n",
"\\mathbf{\\phi}(\\mathbf{x}^{(N)})^\\intercal \\\\\n",
"\\end{pmatrix}_{N\\times m}\n",
"$$\n",
"\n",
"* Prije smo imali:\n",
"$$\n",
"\\mathbf{w} = (\\mathbf{X}^\\intercal\\mathbf{X})^{-1}\\mathbf{X}^\\intercal\\mathbf{y} = \\color{red}{\\mathbf{X}^{+}}\\mathbf{y}\n",
"$$\n",
"a sada imamo:\n",
"$$\n",
"\\mathbf{w} = (\\boldsymbol{\\Phi}^\\intercal\\boldsymbol{\\Phi})^{-1}\\boldsymbol{\\Phi}^\\intercal\\mathbf{y} = \\color{red}{\\boldsymbol{\\Phi}^{+}}\\mathbf{y}\n",
"$$\n",
"gdje\n",
"$$\n",
"\\boldsymbol{\\Phi}^{+}=(\\boldsymbol{\\Phi}^\\intercal\\boldsymbol{\\Phi})^{-1}\\boldsymbol{\\Phi}^\\intercal\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Odabir modela"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* Poopćeni linearan model regresije ima jedan **hiperparametar**: funkciju preslikavanje $\\boldsymbol{\\phi}$\n",
"\n",
"\n",
"* Alternativno, možemo reći da se radi o dva hiperparametra:\n",
" * izgled baznih funkcija $\\phi_j$\n",
" * broj baznih funkcija $m$ (dimenzija prostora značajki)\n",
"\n",
"\n",
"* Hiperparametre treba namjestiti tako da odgovaraju podatcima, odnosno treba\n",
"dobro **odabrati model**\n",
"\n",
"\n",
"* U suprotnom model može biti **podnaučen** ili **prenaučen**\n",
"\n",
"\n",
"* Ako model ima mnogo parametra, lako ga je prenaučiti\n",
"\n",
"\n",
"* Sprečavanje prenaučenosti:\n",
" 1. Koristiti više primjera za učenje\n",
" 2. Odabrati model unakrsnom provjerom\n",
" 3. **Regularizacija**\n",
" 4. Bayesovska regresija (bayesovski odabir modela) $\\Rightarrow$ nećemo raditi\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Regularizirana regresija\n",
"\n",
"\n",
"### Ideja\n",
"\n",
"* Opažanje: kod linearnih modela, što je model složeniji, to ima veće vrijednosti parametara $\\mathbf{w}$\n",
"\n",
"\n",
"* Prenaučeni linearni modeli imaju:\n",
" * ukupno previše parametara (težina) i/ili\n",
" * prevelike vrijednosti pojedinačnih parametara\n",
"\n",
"\n",
"* Ideja: **ograničiti rast vrijednosti parametara** kažnjavanjem hipoteza s visokim vrijednostima parametara\n",
"\n",
"\n",
"* Time ostvarujemo **kompromis** između točnosti i jednostavnosti modela i to već **pri samom učenju** modela\n",
"\n",
"\n",
"* Efektivno se **graničava složenost** modela i sprečava se prenaučenost\n",
"\n",
"\n",
"* Cilj: što više parametara (težina) pritegnuti na nulu $\\Rightarrow$ **rijetki modeli** (engl. *sparse models*)\n",
"\n",
"\n",
"* Rijetki modeli su:\n",
" * teži za prenaučiti\n",
" * računalno jednostavniji\n",
" * interpretabilniji\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Regularizacija\n",
"\n",
"* U funkciju pogreške (koju minimiziramo) ugrađujemo mjeru složenosti modela:\n",
"\n",
"$$\n",
" E' = \\textrm{empirijska pogreška} + \\color{red}{\\lambda\\times\\textrm{složenost modela}}\n",
"$$\n",
"\n",
"$$\n",
" E'(\\mathbf{w}|\\mathcal{D}) = E(\\mathbf{w}|\\mathcal{D}) + \\underbrace{\\color{red}{\\lambda E_R(\\mathbf{w})}}_{\\text{reg. izraz}}\n",
"$$\n",
"\n",
"* $\\lambda$ je **regularizacijski faktor**\n",
" * $\\lambda=0\\ \\Rightarrow$ neregularizirana funkcija pogreške\n",
" * Veća vrijednost regularizacijskog faktora $\\lambda$ uzrokuje smanjuje efektivne složenost modela\n",
"\n",
"\n",
"* [Skica: Regularizirana regresija]\n",
"\n",
"\n",
"* Općenit regularizacijski izraz: **p-norma vektora težina**\n",
"$$\n",
" E_R(\\mathbf{w}) = \\|\\mathbf{w}\\|_p = \\Big(\\sum_{j=\\color{red}{1}}^m |w_j|^p\\Big)^{\\frac{1}{p}}\n",
"$$\n",
"\n",
"\n",
"* L2-norma ($p=2$):\n",
"$$\\|\\mathbf{w}\\|_2 = \\sqrt{\\sum_{j=\\color{red}{1}}^m w_j^2} = \\sqrt{\\mathbf{w}^\\intercal\\mathbf{w}}$$\n",
"\n",
"\n",
"* L1-norma ($p=1$):\n",
"$$\\|\\mathbf{w}\\|_1 = \\sum_{j=\\color{red}{1}}^m |w_j|$$\n",
"\n",
"\n",
"* L0-norma ($p=0$):\n",
"$$\\|\\mathbf{w}\\|_0 = \\sum_{j=\\color{red}{1}}^m \\mathbf{1}\\{w_j\\neq 0\\}$$\n",
"\n",
"\n",
"* **NB:** Težina $w_0$ se ne regularizira\n",
" * **Q:** Zašto?\n",
" \n",
" \n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Regularizirani linearni model regresije\n",
" \n",
"* **L2-regularizacija** ili Tikhononova regularizacija $\\Rightarrow$ **Ridge regression**:\n",
"$$\n",
"E(\\mathbf{w}|\\mathcal{D})=\\frac{1}{2}\n",
"\\sum_{i=1}^N\\big(\\mathbf{w}^\\intercal\\boldsymbol{\\phi}(\\mathbf{x}^{(i)}) - y^{(i)}\\big)^2\n",
"+ \\color{red}{\\frac{\\lambda}{2}\\|\\mathbf{w}\\|^2_2}\n",
"$$\n",
" * ima rješenje u zatvorenoj formi\n",
" \n",
"\n",
"* **L1-regularizacija** $\\Rightarrow$ **LASSO regularization** (engl. *least absolute shrinkage and selection operator*)\n",
"$$\n",
"E(\\mathbf{w}|\\mathcal{D})=\\frac{1}{2}\n",
"\\sum_{i=1}^N\\big(\\mathbf{w}^\\intercal\\boldsymbol{\\phi}(\\mathbf{x}^{(i)}) - y^{(i)}\\big)^2\n",
"+ \\color{red}{\\frac{\\lambda}{2}\\|\\mathbf{w}\\|_1}\n",
"$$\n",
" * nema rješenje u zatvorenoj formi!\n",
"\n",
"\n",
"* **L0-regularizacija**\n",
"$$\n",
"E(\\mathbf{w}|\\mathcal{D})=\\frac{1}{2}\n",
"\\sum_{i=1}^N\\big(\\mathbf{w}^\\intercal\\mathbf{\\phi}(\\mathbf{x}^{(i)}) - y^{(i)}\\big)^2\n",
"+ \\color{red}{\\frac{\\lambda}{2}\\sum_{j=1}^m\\mathbf{1}\\{w_j\\neq0\\}}\n",
"$$\n",
" * NP-potpun problem!\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### L2-regularizacija\n",
"\n",
"* Linearna regresija sa L2-regularizacijom ima rješenje u zatvorenoj formi:\n",
"\n",
"$$\n",
"\\begin{align*}\n",
"E'(\\mathbf{w}|\\mathcal{D}) &= \\frac{1}{2}\n",
"(\\boldsymbol{\\Phi}\\mathbf{w} - \\mathbf{y})^\\intercal\n",
"(\\boldsymbol{\\Phi}\\mathbf{w} - \\mathbf{y}) + \\color{red}{\\frac{\\lambda}{2}\\mathbf{w}^\\intercal\\mathbf{w}}\\\\\n",
"&=\n",
"\\frac{1}{2}\n",
"(\\mathbf{w}^\\intercal\\boldsymbol{\\Phi}^\\intercal\\boldsymbol{\\Phi}\\mathbf{w} - 2\\mathbf{y}^\\intercal\\boldsymbol{\\Phi}\\mathbf{w} + \\mathbf{y}^\\intercal\\mathbf{y}\n",
"+ \\color{red}{\\lambda\\mathbf{w}^\\intercal\\mathbf{w}})\\\\\n",
"\\nabla_{\\mathbf{w}}E' &= \n",
"\\boldsymbol{\\Phi}^\\intercal\\boldsymbol{\\Phi}\\mathbf{w} - \\boldsymbol{\\Phi}^\\intercal\\mathbf{y} + \\color{red}{\\lambda\\mathbf{w}} \\\\\n",
"&=\n",
"(\\boldsymbol{\\Phi}^\\intercal\\boldsymbol{\\Phi} + \\color{red}{\\lambda\\mathbf{I}})\\mathbf{w} - \\boldsymbol{\\Phi}^\\intercal\\mathbf{y} = 0 \\\\\n",
"\\mathbf{w} &= (\\boldsymbol{\\Phi}^\\intercal\\boldsymbol{\\Phi} + \\color{red}{\\lambda\\mathbf{I}})^{-1}\\boldsymbol{\\Phi}^\\intercal\\mathbf{y}\\\\\n",
"\\end{align*}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Napomene\n",
"\n",
"* Iznos parametra $w_j$ odgovara važnosti značajke, a predznak upućuje na njezin utjecaj (pozitivan ili negativan) na izlaznu vrijednost\n",
"\n",
"\n",
"* Regularizacija smanjuje složenost modela na način da prigušuje vrijednosti pojedinih značajki, odnosno efektivno ih izbacuje (kada $w_j\\to0$)\n",
" * Ako je model nelinearan, to znači smanjivanje nelinearnosti\n",
" \n",
" \n",
"* Težinu $w_0$ treba izuzeti iz regularizacijskog izraza (jer ona definira pomak) ili treba centrirati podatke tako da $\\overline{y}=0$, jer onda $w_0\\to0$\n",
"\n",
"\n",
"* L2-regularizacija kažnjava težine proporcionalno njihovom iznosu (velike težine više, a manje težine manje) Teško će parametri biti pritegnuti baš na nulu. Zato **L2-regularizacija ne rezultira rijetkim modelima**\n",
"\n",
"\n",
"* L1-regularizirana regresija rezultira rijetkim modelima, ali nema rješenja u zatvorenoj formi (međutim mogu se koristiti iterativni optimizacijski postupci\n",
"\n",
"\n",
"* Regularizacija je korisna kod modela s puno parametara, jer je takve modele lako prenaučiti\n",
"\n",
"\n",
"* Regularizacija smanjuje mogućnost prenaučenosti, ali ostaje problem odabira hiperparametra $\\lambda$\n",
" * Taj se odabir najčešće radi **unakrsnom provjerom**\n",
" \n",
" \n",
"* **Q:** Koju optimalnu vrijednost za $\\lambda$ bismo dobili kada bismo optimizaciju radili na skupu za učenje?\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sažetak\n",
"\n",
"\n",
"* **Linearan model regresije** linearan je u parametrima\n",
"\n",
"\n",
"* Parametri linearnog modela uz kvadratnu funkciju gubitka imaju rješenje u zatvorenoj formi u obliku **pseudoinverza dizajn-matrice**\n",
"\n",
"\n",
"* Nelinearnost regresijske funkcije ostvaruje se uporabom nelinearnih **baznih funkcija** (preslikavanjem ulaznog prostora u prostor značajki\n",
"\n",
"\n",
"* Uz pretpostavku normalno distribuiranog šuma, **MLE je istovjetan postupku najmanjih kvadrata**, što daje probabilističko opravdanje za uporabu kvadratne funkcije gubitka\n",
"\n",
"\n",
"* **Regularizacija smanjuje prenaučenost** ugradnjom dodatnog izraza u funkciju pogreške kojim se kažnjava složenost modela\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 0
}