{ "cells": [ { "cell_type": "markdown", "id": "e59a56ee", "metadata": {}, "source": [ "**实验四:神经网络中的前向传播与后向传播**" ] }, { "cell_type": "code", "execution_count": 2, "id": "62f100c9", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "eebaf0fe", "metadata": {}, "source": [ "**第一部分:PyTorch介绍**" ] }, { "cell_type": "markdown", "id": "2a394db6", "metadata": {}, "source": [ "这里介绍一小部分PyTorch常用的库和函数,更多需求可参阅[PyTorch官方教程](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)以及[PyTorch官方文档](https://pytorch.org/docs/stable/index.html)。" ] }, { "cell_type": "code", "execution_count": 3, "id": "6fae149b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2.1.0+cpu\n" ] } ], "source": [ "import torch # 导入的是 torch 而不是 pytorch\n", "print(torch.__version__) # 输出当前pytorch的版本" ] }, { "cell_type": "markdown", "id": "392780bb", "metadata": {}, "source": [ "1.Tensor" ] }, { "cell_type": "markdown", "id": "904623e4", "metadata": {}, "source": [ "Tensor与NumPy中的ndarray很相似,但Tensor可以利用GPU来加速计算(虽然本门课不用)。" ] }, { "cell_type": "markdown", "id": "23642996", "metadata": {}, "source": [ "1.1. Tensor的创建" ] }, { "cell_type": "code", "execution_count": 4, "id": "38cdf3ad", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[-2.9818e-08, 1.8357e-42, 0.0000e+00],\n", " [ 0.0000e+00, 0.0000e+00, 0.0000e+00]])\n", "tensor([[1, 2, 3],\n", " [4, 5, 6]])\n", "tensor([[0.6417, 0.6556, 0.7544, 0.3146],\n", " [0.3440, 0.0655, 0.0828, 0.5430],\n", " [0.5535, 0.4441, 0.1285, 0.8876]])\n", "tensor([[0., 0., 0.],\n", " [0., 0., 0.]])\n", "tensor([[1., 1., 1.],\n", " [1., 1., 1.]])\n" ] } ], "source": [ "# 创建一个未初始化的Tensor\n", "x = torch.empty(2, 3)\n", "print(x)\n", "\n", "# 从一个列表创建Tensor\n", "x = torch.tensor([[1,2,3],[4,5,6]])\n", "print(x)\n", "\n", "# 创建一个随机Tensor\n", "x = torch.rand([3, 4])\n", "print(x)\n", "\n", "# 创建一个全零Tensor\n", "x = torch.zeros([2, 3])\n", "print(x)\n", "\n", "# 创建一个全一Tensor\n", "x = torch.ones([2, 3])\n", "print(x)" ] }, { "cell_type": "markdown", "id": "9f354fd4", "metadata": {}, "source": [ "1.2. Tensor的运算" ] }, { "cell_type": "code", "execution_count": 5, "id": "8070a903", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[7, 7, 7],\n", " [7, 7, 7]])\n", "tensor([[-5, -3, -1],\n", " [ 1, 3, 5]])\n", "tensor([[ 6, 10, 12],\n", " [12, 10, 6]])\n", "tensor([[ 6, 10, 12],\n", " [12, 10, 6]])\n", "tensor([[28, 10],\n", " [73, 28]])\n", "tensor([[28, 10],\n", " [73, 28]])\n", "tensor([[1, 2],\n", " [3, 4],\n", " [5, 6]])\n", "tensor([[1, 2, 3],\n", " [4, 5, 6],\n", " [6, 5, 4],\n", " [3, 2, 1]])\n", "tensor([[1, 2, 3, 6, 5, 4],\n", " [4, 5, 6, 3, 2, 1]])\n" ] } ], "source": [ "# 加减法\n", "x = torch.tensor([[1,2,3],[4,5,6]])\n", "y = torch.tensor([[6,5,4],[3,2,1]])\n", "print(x + y)\n", "print(x - y)\n", "\n", "# 对应位置相乘\n", "print(x * y)\n", "print(x.mul(y))\n", "\n", "# 矩阵乘法\n", "print(x.matmul(y.T))\n", "print(x @ y.T)\n", "\n", "# reshape\n", "print(x.reshape(3, 2))\n", "\n", "# 拼接\n", "print(torch.cat([x,y], dim=0)) # 纵向拼接\n", "print(torch.cat([x,y], dim=1)) # 横向拼接" ] }, { "cell_type": "markdown", "id": "95961cee", "metadata": {}, "source": [ "1.3. Tensor与ndarray的相互转换" ] }, { "cell_type": "code", "execution_count": 6, "id": "d1269212", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[1, 2, 3],\n", " [4, 5, 6]])\n", "[[1 2 3]\n", " [4 5 6]]\n", "[[0 0 0]\n", " [0 0 0]]\n", "tensor([[0, 0, 0],\n", " [0, 0, 0]])\n" ] } ], "source": [ "x = torch.tensor([[1,2,3],[4,5,6]])\n", "print(x)\n", "\n", "# 从Tensor转换到ndarray\n", "y = x.numpy()\n", "print(y)\n", "\n", "# Tensor与ndarray是共享空间的\n", "x[:]=0\n", "print(y)\n", "\n", "# 从ndarray到Tensor\n", "z = torch.from_numpy(y)\n", "print(z)" ] }, { "cell_type": "markdown", "id": "bb6c36a7", "metadata": {}, "source": [ "2.自动求梯度" ] }, { "cell_type": "code", "execution_count": 7, "id": "3b30fec3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[3., 4.]]) tensor(1.)\n", "tensor([0.1966, 0.1966, 0.1966, 0.1966, 0.1966, 0.1966, 0.1966, 0.1966, 0.1966,\n", " 0.1966, 0.1966, 0.1966, 0.1966, 0.1966, 0.1966, 0.1966, 0.1966, 0.1966,\n", " 0.1966, 0.1966])\n", "tensor(4.)\n", "tensor(5.)\n", "tensor(0.)\n" ] } ], "source": [ "a = torch.tensor([[1.,2.]], requires_grad=True) # 把requires_grad设为True, 开始跟踪基于它的所有运算\n", "b = torch.tensor([[3.],[4.]])\n", "c = torch.tensor(5., requires_grad=True)\n", "y = a @ b + c\n", "y.backward() #自动计算梯度\n", "print(a.grad, c.grad) #输出叶子节点a和c的梯度\n", "\n", "# 可支持多种运算求梯度,如torch.mean(),torch.sum()等\n", "a = torch.ones(20, requires_grad=True)\n", "z = torch.sum(torch.sigmoid(a))\n", "z.backward()\n", "print(a.grad)\n", "\n", "# 多次求梯度时梯度会累加,可使用tensor.grad.zero_()进行手动清零\n", "x = torch.tensor(2., requires_grad=True)\n", "y = x ** 2\n", "y.backward()\n", "print(x.grad)\n", "z = x + 3\n", "z.backward()\n", "print(x.grad)\n", "x.grad.zero_()\n", "print(x.grad)" ] }, { "cell_type": "markdown", "id": "96d22808", "metadata": {}, "source": [ "3. 神经网络(官方教程中的例子)" ] }, { "cell_type": "code", "execution_count": 8, "id": "207b6433", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Net(\n", " (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))\n", " (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))\n", " (fc1): Linear(in_features=400, out_features=120, bias=True)\n", " (fc2): Linear(in_features=120, out_features=84, bias=True)\n", " (fc3): Linear(in_features=84, out_features=10, bias=True)\n", ")\n" ] } ], "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "\n", "# 继承自nn.Module\n", "class Net(nn.Module):\n", "\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " # 卷积层\n", " # 1 input image channel, 6 output channels, 5x5 square convolution\n", " # kernel\n", " self.conv1 = nn.Conv2d(1, 6, 5)\n", " self.conv2 = nn.Conv2d(6, 16, 5)\n", " # an affine operation: y = Wx + b\n", " # nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)\n", " # 其中in_features表示有多少个输入,out_features表示该层有多少个神经元\n", " self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5*5 from image dimension\n", " self.fc2 = nn.Linear(120, 84)\n", " self.fc3 = nn.Linear(84, 10)\n", "\n", " def forward(self, x):\n", " # Max pooling over a (2, 2) window\n", " x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))\n", " # If the size is a square, you can specify with a single number\n", " x = F.max_pool2d(F.relu(self.conv2(x)), 2)\n", " x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension\n", " x = F.relu(self.fc1(x))\n", " x = F.relu(self.fc2(x))\n", " x = self.fc3(x)\n", " return x\n", "\n", "\n", "net = Net()\n", "print(net)" ] }, { "cell_type": "code", "execution_count": 9, "id": "6c2a943f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n", "torch.Size([6, 1, 5, 5])\n" ] } ], "source": [ "# 该神经网络中可学习的参数可以通过net.parameters()访问\n", "params = list(net.parameters())\n", "print(len(params))\n", "print(params[0].size()) # conv1's .weight" ] }, { "cell_type": "code", "execution_count": 10, "id": "b733263e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[-0.0647, 0.0301, 0.0526, -0.1407, 0.0690, -0.0806, -0.0420, 0.1001,\n", " 0.0486, 0.1200]], grad_fn=)\n" ] } ], "source": [ "# 随机生成一个输入送入net中,除了第0维是样本个数外,其余维度要与forward()参数中x的维度对应上\n", "input = torch.randn(1, 1, 32, 32) # 1个样本,该样本是有1个通道的32×32的图像\n", "out = net(input) # 进行一次forward()前向传播\n", "print(out)" ] }, { "cell_type": "code", "execution_count": 11, "id": "168659b3", "metadata": {}, "outputs": [], "source": [ "net.zero_grad() # 手动清零神经网络中参数的梯度\n", "out.backward(torch.randn(1, 10)) # 用随机梯度进行一次backward()后向传播来计算梯度" ] }, { "cell_type": "code", "execution_count": 12, "id": "af35f14f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(0.5634, grad_fn=)\n" ] } ], "source": [ "output = net(input)\n", "target = torch.randn(10) # a dummy target, for example\n", "target = target.view(1, -1) # make it the same shape as output\n", "# nn模块提供了许多种类的损失函数,如nn.CrossEntropyLoss()、nn.MSELoss()等等\n", "criterion = nn.MSELoss()\n", "\n", "loss = criterion(output, target)\n", "print(loss)" ] }, { "cell_type": "markdown", "id": "1428f7ce", "metadata": {}, "source": [ "计算图如下:\n", "input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d\n", " -> flatten -> linear -> relu -> linear -> relu -> linear\n", " -> MSELoss\n", " -> loss" ] }, { "cell_type": "code", "execution_count": 13, "id": "e9ada5be", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\n" ] } ], "source": [ "# 查看计算图中的函数\n", "print(loss.grad_fn) # MSELoss\n", "print(loss.grad_fn.next_functions[0][0]) # Linear\n", "print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU" ] }, { "cell_type": "code", "execution_count": 14, "id": "854ee4d1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "conv1.bias.grad before backward\n", "None\n", "conv1.bias.grad after backward\n", "tensor([ 0.0008, -0.0093, -0.0016, 0.0013, 0.0035, -0.0099])\n" ] } ], "source": [ "net.zero_grad() # zeroes the gradient buffers of all parameters\n", "\n", "print('conv1.bias.grad before backward')\n", "print(net.conv1.bias.grad)\n", "\n", "# 进行一次后向传播\n", "loss.backward()\n", "\n", "print('conv1.bias.grad after backward')\n", "print(net.conv1.bias.grad)" ] }, { "cell_type": "code", "execution_count": 15, "id": "d7187507", "metadata": {}, "outputs": [], "source": [ "# 用梯度下降法(手动)更新net中的参数\n", "learning_rate = 0.01\n", "for f in net.parameters():\n", " f.data.sub_(f.grad.data * learning_rate)" ] }, { "cell_type": "code", "execution_count": 16, "id": "06d237aa", "metadata": {}, "outputs": [], "source": [ "# 用PyTorch的优化器来更新net中的参数\n", "import torch.optim as optim\n", "\n", "# create your optimizer\n", "optimizer = optim.SGD(net.parameters(), lr=0.01)\n", "\n", "# 在每次循环中应该做的事:\n", "optimizer.zero_grad() # 把梯度清零\n", "output = net(input) # 进行一次前向传播\n", "loss = criterion(output, target) # 计算误差\n", "loss.backward() # 后向传播\n", "optimizer.step() # 参数更新" ] }, { "cell_type": "markdown", "id": "50be2d1f", "metadata": {}, "source": [ "**第二部分:实验内容**" ] }, { "cell_type": "markdown", "id": "ba5214e5", "metadata": {}, "source": [ "[Red Wine Quality](https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009)是一个关于红酒品质的数据集,总共有1599个样本,每个样本包含11个(都是连续的)特征以及1个标签,每个标签的取值是连续的。本次实验已经按照8:2的比例划分成了训练数据集'wine_train.csv'以及测试数据集'wine_test.csv',且每个数据集都已经做了归一化处理。" ] }, { "cell_type": "markdown", "id": "d4d9ccac", "metadata": {}, "source": [ "1) 读入训练数据集'wine_train.csv'与测试数据集'wine_test.csv'。" ] }, { "cell_type": "code", "execution_count": null, "id": "c759660f", "metadata": {}, "outputs": [], "source": [ "# -- Your code here --\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "d77619fd", "metadata": {}, "source": [ "2) 利用线性层和激活函数搭建一个神经网络,要求输入和输出维度与数据集维度一致,而神经网络深度、隐藏层大小、激活函数种类等超参数自行调整。" ] }, { "cell_type": "code", "execution_count": null, "id": "26359bae", "metadata": {}, "outputs": [], "source": [ "# -- Your code here --\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "9b706eea", "metadata": {}, "source": [ "3) 用梯度下降法进行模型参数更新,记下每轮迭代中的训练损失和测试损失。" ] }, { "cell_type": "code", "execution_count": null, "id": "cad2f094", "metadata": {}, "outputs": [], "source": [ "# -- Your code here --\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "ff5ff45e", "metadata": {}, "source": [ "4) 画出训练损失和测试损失关于迭代轮数的折线图。" ] }, { "cell_type": "code", "execution_count": null, "id": "3ba3071f", "metadata": {}, "outputs": [], "source": [ "# -- Your code here --\n", "\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }