{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "4db23f5f-90f5-4f80-8efa-d9137af4a447", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/jovyan/work/d2l/notebooks/d2l_utils/d2l.py:119: SyntaxWarning: assertion is always true, perhaps remove parentheses?\n", " assert(self, 'net'), 'Neural network is defined'\n", "/home/jovyan/work/d2l/notebooks/d2l_utils/d2l.py:123: SyntaxWarning: assertion is always true, perhaps remove parentheses?\n", " assert(self, 'trainer'), 'trainer is not inited'\n" ] } ], "source": [ "import sys\n", "sys.path.append('/home/jovyan/work/d2l/notebooks/d2l_utils')\n", "import d2l\n", "import torch\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "\n", "class HighDimData(d2l.DataModule):\n", " def __init__(self, num_train, num_val, num_inputs, batch_size):\n", " super().__init__()\n", " self.save_hyperparameters()\n", " n = num_train + num_val\n", " self.X = torch.randn(n, num_inputs)\n", " noise = torch.randn(n, 1) * 0.01\n", " self.w, self.b = torch.ones(num_inputs, 1) * 0.01, 0.05\n", " self.y = torch.matmul(self.X, self.w) + self.b + noise\n", "\n", " def get_dataloader(self, train):\n", " i = slice(0, self.num_train) if train else slice(self.num_train, None)\n", " return self.get_tensorloader([self.X, self.y], train, i)\n", "\n", "\n", "class WeightDecayScratch(d2l.LinearRegressScratch):\n", " def __init__(self, num_inputs, lambd, lr, sigma=0.01):\n", " super().__init__(num_inputs, lr, sigma)\n", " self.save_hyperparameters()\n", "\n", " def loss(self, y_hat, y):\n", " return super().loss(y_hat, y) + self.lambd * d2l.l2_penalty(self.w)\n", " \n", "\n", "class WeightDecay(d2l.LinearRegression):\n", " def __init__(self, wd, lr):\n", " super().__init__(lr)\n", " self.save_hyperparameters()\n", "\n", " def configure_optimizers(self):\n", " return torch.optim.SGD([{'params': self.net.weight, 'weight_decay': self.wd},\n", " {'params': self.net.bias}], lr=self.lr)\n", "\n", "\n", "def train_strach(lambd, trainer, data):\n", " model = WeightDecayScratch(num_inputs=200, lambd=lambd, lr=0.01)\n", " trainer.fit(model, data)\n", " print(f'l2 norm of w:{d2l.l2_penalty(model.w):.2g}')" ] }, { "cell_type": "code", "execution_count": 2, "id": "4f33664e-2337-42bd-b62e-9a9fcd4a59bd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "l2 norm of w:0.0097\n" ] }, { "data": { "image/svg+xml": [ "\n", "\n", "\n" ], "text/plain": [ "

" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data = HighDimData(num_train=20, num_val=100, num_inputs=200, batch_size=5)\n", "trainer = d2l.Trainer(max_epochs=10)\n", "train_strach(0, trainer, data)" ] }, { "cell_type": "code", "execution_count": 3, "id": "453d5bf7-55f1-4647-b43f-23685c3f3c36", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "l2 norm of w:0.0013\n" ] }, { "data": { "image/svg+xml": [ "\n", "\n", "\n" ], "text/plain": [ "

" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "trainer = d2l.Trainer(max_epochs=10)\n", "train_strach(3, trainer, data)" ] }, { "cell_type": "code", "execution_count": 9, "id": "16fc3c26-9fac-4a7d-94fc-df69475f453f", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "l2 norm of w: 0.0136\n" ] }, { "data": { "image/svg+xml": [ "\n", "\n", "\n" ], "text/plain": [ "

" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model = WeightDecay(wd=3, lr=0.01)\n", "model.board.yscale = 'log'\n", "trainer = d2l.Trainer(max_epochs=10)\n", "trainer.fit(model, data)\n", "print(f'l2 norm of w: {d2l.l2_penalty(model.net.weight):.4g}')" ] }, { "cell_type": "markdown", "id": "2ae432e3-29db-4fb4-8428-ef121a7c203a", "metadata": {}, "source": [ "# 3.7.6. Exercises" ] }, { "cell_type": "markdown", "id": "1a390f42-f0b5-4c5a-8a23-ea76e14a0865", "metadata": {}, "source": [ "## 1. Experiment with the value of $\\lambda$ in the estimation problem in this section. Plot training and validation accuracy as a function of $\\lambda$. What do you observe?" ] }, { "cell_type": "code", "execution_count": null, "id": "a2a20c67-360d-420d-8f26-c46d464b18bd", "metadata": {}, "outputs": [], "source": [ "data = HighDimData(num_train=20, num_val=100, num_inputs=200, batch_size=5)\n", "model = WeightDecay(wd=3, lr=0.01)\n", "model.board.yscale = 'log'\n", "trainer = d2l.Trainer(max_epochs=10)\n", "trainer.fit(model, data)" ] }, { "cell_type": "markdown", "id": "5e07b28b-6333-43b4-9770-da514de917d8", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "563b57e8-2d8f-4cc1-94fe-7067fa0415c0", "metadata": {}, "source": [ "## 2. Use a validation set to find the optimal value of \n", ". Is it really the optimal value? Does this matter?\n", "\n" ] }, { "cell_type": "markdown", "id": "cc8a4140-5e47-48cf-aa49-6b2e2b33b50d", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "6f451971-adba-42fb-9bef-6847759f270b", "metadata": {}, "source": [ "## 3. What would the update equations look like if instead of \n", " we used \n", " as our penalty of choice (\n", " regularization)?\n", "\n" ] }, { "cell_type": "markdown", "id": "c6f21f5d-fd89-48c1-b756-c74341a99d56", "metadata": { "tags": [] }, "source": [] }, { "cell_type": "markdown", "id": "c03b039e-e1af-44a1-9c91-2913a1edd69c", "metadata": {}, "source": [ "## 4. We know that \n", ". Can you find a similar equation for matrices (see the Frobenius norm in Section 2.3.11)?\n", "\n" ] }, { "cell_type": "markdown", "id": "1cf46005-f6d0-4281-bcd6-d05185d2232c", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "6964c72b-372e-442c-8661-779679ba22e4", "metadata": {}, "source": [ "## 5. Review the relationship between training error and generalization error. In addition to weight decay, increased training, and the use of a model of suitable complexity, what other ways might help us deal with overfitting?\n", "\n" ] }, { "cell_type": "markdown", "id": "d4e30a7e-66d4-4881-b181-42900e5147db", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "591d7cfd-121f-4ca6-b103-8e5e44680937", "metadata": {}, "source": [ "## 6. In Bayesian statistics we use the product of prior and likelihood to arrive at a posterior via \n", ". How can you identify \n", " with regularization?" ] }, { "cell_type": "markdown", "id": "5d38d78d-8e40-4fe7-9d26-4f44a15df6c3", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "479e5a48-5c38-4e64-8d9a-b2e71d373c22", "metadata": {}, "source": [] }, { "cell_type": "markdown", "id": "a9095a9b-2679-4a14-afe7-56c8e8f28e49", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:d2l]", "language": "python", "name": "conda-env-d2l-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }