{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "4db23f5f-90f5-4f80-8efa-d9137af4a447",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/jovyan/work/d2l/notebooks/d2l_utils/d2l.py:119: SyntaxWarning: assertion is always true, perhaps remove parentheses?\n",
" assert(self, 'net'), 'Neural network is defined'\n",
"/home/jovyan/work/d2l/notebooks/d2l_utils/d2l.py:123: SyntaxWarning: assertion is always true, perhaps remove parentheses?\n",
" assert(self, 'trainer'), 'trainer is not inited'\n"
]
}
],
"source": [
"import sys\n",
"sys.path.append('/home/jovyan/work/d2l/notebooks/d2l_utils')\n",
"import d2l\n",
"import torch\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"\n",
"class HighDimData(d2l.DataModule):\n",
" def __init__(self, num_train, num_val, num_inputs, batch_size):\n",
" super().__init__()\n",
" self.save_hyperparameters()\n",
" n = num_train + num_val\n",
" self.X = torch.randn(n, num_inputs)\n",
" noise = torch.randn(n, 1) * 0.01\n",
" self.w, self.b = torch.ones(num_inputs, 1) * 0.01, 0.05\n",
" self.y = torch.matmul(self.X, self.w) + self.b + noise\n",
"\n",
" def get_dataloader(self, train):\n",
" i = slice(0, self.num_train) if train else slice(self.num_train, None)\n",
" return self.get_tensorloader([self.X, self.y], train, i)\n",
"\n",
"\n",
"class WeightDecayScratch(d2l.LinearRegressScratch):\n",
" def __init__(self, num_inputs, lambd, lr, sigma=0.01):\n",
" super().__init__(num_inputs, lr, sigma)\n",
" self.save_hyperparameters()\n",
"\n",
" def loss(self, y_hat, y):\n",
" return super().loss(y_hat, y) + self.lambd * d2l.l2_penalty(self.w)\n",
" \n",
"\n",
"class WeightDecay(d2l.LinearRegression):\n",
" def __init__(self, wd, lr):\n",
" super().__init__(lr)\n",
" self.save_hyperparameters()\n",
"\n",
" def configure_optimizers(self):\n",
" return torch.optim.SGD([{'params': self.net.weight, 'weight_decay': self.wd},\n",
" {'params': self.net.bias}], lr=self.lr)\n",
"\n",
"\n",
"def train_strach(lambd, trainer, data):\n",
" model = WeightDecayScratch(num_inputs=200, lambd=lambd, lr=0.01)\n",
" trainer.fit(model, data)\n",
" print(f'l2 norm of w:{d2l.l2_penalty(model.w):.2g}')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "4f33664e-2337-42bd-b62e-9a9fcd4a59bd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"l2 norm of w:0.0097\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"data = HighDimData(num_train=20, num_val=100, num_inputs=200, batch_size=5)\n",
"trainer = d2l.Trainer(max_epochs=10)\n",
"train_strach(0, trainer, data)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "453d5bf7-55f1-4647-b43f-23685c3f3c36",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"l2 norm of w:0.0013\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"trainer = d2l.Trainer(max_epochs=10)\n",
"train_strach(3, trainer, data)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "16fc3c26-9fac-4a7d-94fc-df69475f453f",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"l2 norm of w: 0.0136\n"
]
},
{
"data": {
"image/svg+xml": [
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"model = WeightDecay(wd=3, lr=0.01)\n",
"model.board.yscale = 'log'\n",
"trainer = d2l.Trainer(max_epochs=10)\n",
"trainer.fit(model, data)\n",
"print(f'l2 norm of w: {d2l.l2_penalty(model.net.weight):.4g}')"
]
},
{
"cell_type": "markdown",
"id": "2ae432e3-29db-4fb4-8428-ef121a7c203a",
"metadata": {},
"source": [
"# 3.7.6. Exercises"
]
},
{
"cell_type": "markdown",
"id": "1a390f42-f0b5-4c5a-8a23-ea76e14a0865",
"metadata": {},
"source": [
"## 1. Experiment with the value of $\\lambda$ in the estimation problem in this section. Plot training and validation accuracy as a function of $\\lambda$. What do you observe?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a2a20c67-360d-420d-8f26-c46d464b18bd",
"metadata": {},
"outputs": [],
"source": [
"data = HighDimData(num_train=20, num_val=100, num_inputs=200, batch_size=5)\n",
"model = WeightDecay(wd=3, lr=0.01)\n",
"model.board.yscale = 'log'\n",
"trainer = d2l.Trainer(max_epochs=10)\n",
"trainer.fit(model, data)"
]
},
{
"cell_type": "markdown",
"id": "5e07b28b-6333-43b4-9770-da514de917d8",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"id": "563b57e8-2d8f-4cc1-94fe-7067fa0415c0",
"metadata": {},
"source": [
"## 2. Use a validation set to find the optimal value of \n",
". Is it really the optimal value? Does this matter?\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "cc8a4140-5e47-48cf-aa49-6b2e2b33b50d",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"id": "6f451971-adba-42fb-9bef-6847759f270b",
"metadata": {},
"source": [
"## 3. What would the update equations look like if instead of \n",
" we used \n",
" as our penalty of choice (\n",
" regularization)?\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "c6f21f5d-fd89-48c1-b756-c74341a99d56",
"metadata": {
"tags": []
},
"source": []
},
{
"cell_type": "markdown",
"id": "c03b039e-e1af-44a1-9c91-2913a1edd69c",
"metadata": {},
"source": [
"## 4. We know that \n",
". Can you find a similar equation for matrices (see the Frobenius norm in Section 2.3.11)?\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "1cf46005-f6d0-4281-bcd6-d05185d2232c",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"id": "6964c72b-372e-442c-8661-779679ba22e4",
"metadata": {},
"source": [
"## 5. Review the relationship between training error and generalization error. In addition to weight decay, increased training, and the use of a model of suitable complexity, what other ways might help us deal with overfitting?\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "d4e30a7e-66d4-4881-b181-42900e5147db",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"id": "591d7cfd-121f-4ca6-b103-8e5e44680937",
"metadata": {},
"source": [
"## 6. In Bayesian statistics we use the product of prior and likelihood to arrive at a posterior via \n",
". How can you identify \n",
" with regularization?"
]
},
{
"cell_type": "markdown",
"id": "5d38d78d-8e40-4fe7-9d26-4f44a15df6c3",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"id": "479e5a48-5c38-4e64-8d9a-b2e71d373c22",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"id": "a9095a9b-2679-4a14-afe7-56c8e8f28e49",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:d2l]",
"language": "python",
"name": "conda-env-d2l-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}