{"cells":[{"metadata":{"_uuid":"defd8d7b388efe04b784f271aadad03d18257db8"},"cell_type":"markdown","source":"![Next Generation Weapon](https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/02/pytorch-logo-flat-300x210.png)\n\n\n---\n10 Minute of Pytorch\n---\n---\n* [Pytorch Introduction](#Pytorch-Introduction)\n    * [PyTorch provides two high-level features](#PyTorch-provides-two-high-level-features)\n* [Environment Configuration](#Environment-Configuration)\n    * [Install pytorch on Windows Snapshop Tutorial](https://www.superdatascience.com/pytorch/)\n* [**Section : 1. Pytorch Basic Foundation**](#Section-1-:-Pytorch-Basic-Foundation)\n    * [Tensor](#Tensor)\n        * [Construct a 5x3 matrix uninitialized](#Construct-a-5x3-matrix-uninitialized)\n        * [Convert to numpy](#Convert-to-numpy)\n        * [Size of tensor](#Size-of-tensor)\n        * [From Numpy to tensor](#From-Numpy-to-tensor)\n    * [Tensor Operation](#Tensor-Operation)\n        * [Random similar to numpy](#Random-similar-to-numpy)\n        * [Construct a matrix filled zeros and of dtype long](#Construct-a-matrix-filled-zeros-and-of-dtype-long)\n        * [Construct a tensor directly from data](#Construct-a-tensor-directly-from-data)\n        * [Create tensor based on existing tensor](#Create-tensor-based-on-existing-tensor)\n        * [Basic Tensor Operation](#Basic-Tensor-Operation)\n    * [Variable](#Variable)\n    * [Activation Function](#Activation-Function)\n        * [Generate Fake Data](#Generate-Fake-Data)\n        * [Popular Activation Function](#Popular-Activation-Function)\n        * [Activation Function plot from data](#Activation-Function-plot-from-data)\n        \n* [**Section : 2. Neural Network**](#Section-:-2.-Neural-Network)\n     * [Linear Regression](#Linear-Regression)\n     * [Relationship Fitting Regression Model](#Relationship-Fitting-Regression-Model)\n     * [Distinguish type classification](#Distinguish-type-classification)\n     * [Easy way to Buid Neural Network](#Easy-way-to-Buid-Neural-Network)\n     * [Save and Reload Model](#Save-and-Reload-Model)\n     * [Train on Batch](#Train-on-batch)\n     * [Optimizers](#Optimizers)\n \n* [**Section : 3. Advance Neural Network**](#Section-:-3.-Advance-Neural-Network)\n    * [CNN](#CNN)\n    * [RNN-Classification](#RNN-Classification)\n    * [RNN-Regression](#RNN-Regression)\n    * [AutoEncoder](#AutoEncoder)\n    * [DQN Reinforcement Learning](#DQN-Reinforcement-Learning)\n    * [A3C Reinforcement Learning](https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2) [(Code)](https://github.com/nailo2c/a3c/blob/master/tutorial.ipynb)\n    * [Generative Adversarial Network](#Generative-Adversarial-Network)\n    * [Conditional GAN](#Conditional-GAN)\n    \n    \n    \n\n\n---\nPytorch Introduction\n---\n---\nPyTorch is an open source machine learning library for Python, based on [Torch](https://en.wikipedia.org/wiki/Torch_(machine_learning),used for applications such as natural language processing.It is primarily developed by Facebook's artificial-intelligence research group, and Uber's \"Pyro\" software for probabilistic programming is built on it.\n\n#### PyTorch provides two high-level features:\n\n1. *Tensor computation (like NumPy) with strong GPU acceleration*\n1. *Deep Neural Networks built on a tape-based autodiff system*\n\n---\nEnvironment Configuration\n---\n---\n![](https://s3-ap-south-1.amazonaws.com/av-blog-media/wp-content/uploads/2018/02/3-768x368.png)\n\n* [Install pytorch on Windows Snapshop Tutorial](https://www.superdatascience.com/pytorch/)\n"},{"metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","trusted":true,"_kg_hide-input":true},"cell_type":"code","source":"# This Python 3 environment comes with many helpful analytics libraries installed\n# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python\n# For example, here's several helpful packages to load in \n\nimport numpy as np # linear algebra\nimport pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\nimport torch\n# Input data files are available in the \"../input/\" directory.\n# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory\nimport seaborn as sns\n\nimport matplotlib.pyplot as plt\nplt.style.use(\"fivethirtyeight\")\n%matplotlib inline\n\nimport os\n# print(os.listdir(\"../input\"))\n\n# Any results you write to the current directory are saved as output.","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"2bce7b12cff40b56ce8af6d91a6ae9d66f3002f2"},"cell_type":"markdown","source":"Section-1 : Pytorch Basic Foundation\n---"},{"metadata":{"_uuid":"0926a73445c87f7922ea64e7291898c05d3433b6"},"cell_type":"markdown","source":"### Tensor\n---\n* Tensors are matrix-like data structures which are essential components in deep learning libraries and efficient computation. **Graphical Processing Units (GPUs)** are especially effective at calculating operations between **tensors**, and this has spurred the surge in deep learning capability in recent times. In PyTorch, tensors can be declared simply in a number of ways:\n![](https://cdn-images-1.medium.com/max/2000/1*_D5ZvufDS38WkhK9rK32hQ.jpeg)"},{"metadata":{"_uuid":"db856481041bacaaff7b2807d1542c624b12783b"},"cell_type":"markdown","source":"#### **Construct a 5x3 matrix uninitialized**"},{"metadata":{"trusted":true,"_uuid":"e283b512132cf225224c27ff66758781ce217a70"},"cell_type":"code","source":"x = torch.empty(5, 3)\nprint(x)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"1bcc3b6d14990723c7217273d08a839181d81f4a"},"cell_type":"markdown","source":"#### **Convert to numpy**"},{"metadata":{"trusted":true,"_uuid":"fecc35e44cde3030f69d93a00522e7e1b3113b29"},"cell_type":"code","source":"x.numpy()","execution_count":null,"outputs":[]},{"metadata":{"_cell_guid":"79c7e3d0-c299-4dcb-8224-4455121ee9b0","collapsed":true,"_uuid":"d629ff2d2480ee46fbb7e2d37f6b5fab8052498a","trusted":false},"cell_type":"markdown","source":"#### **Size of tensor**"},{"metadata":{"trusted":true,"_uuid":"ede73451e22abb3b147855ee808708230a99db5a"},"cell_type":"code","source":"x.size()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"e12343d1d7be74bbbeade807afee7b076f8e12d8"},"cell_type":"markdown","source":"#### **From Numpy to tensor**"},{"metadata":{"trusted":true,"_uuid":"8b1b2db2bcddcbec0d3033a64324713d79685fb6"},"cell_type":"code","source":"a = np.array([[3,4],[4,3]])\nb = torch.from_numpy(a)\nprint(b)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"c48d6aa48851df032302d10234f51f665ab67d4b"},"cell_type":"markdown","source":"#### Tensor Operations"},{"metadata":{"_uuid":"a5c42a5adafe466547bded96386c8807b2180bb6"},"cell_type":"markdown","source":"#### **Random similar to numpy**"},{"metadata":{"trusted":true,"_uuid":"74c7afd6c8e24be2d0978ba6354d884641f01085"},"cell_type":"code","source":"x = torch.rand(5, 3)\nprint(x)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"ec7bf86d20194f77503de0322c3fb1c9c25b680f"},"cell_type":"markdown","source":"#### **Construct a matrix filled zeros and of dtype long**"},{"metadata":{"trusted":true,"_uuid":"aaa10a15d8004587c51590796880ebdaa5d9e76d"},"cell_type":"code","source":"x = torch.zeros(5, 3, dtype=torch.long)\nprint(x)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"3298ee02fe9253e24691e6eec6e3605a1273df28"},"cell_type":"code","source":"x = torch.ones(3, 3, dtype=torch.long)\nprint(x)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"14f281d96ce1f98ebc6523e38da2c65cf6ff6d7d"},"cell_type":"markdown","source":"#### **Construct a tensor directly from data**"},{"metadata":{"trusted":true,"_uuid":"67526d6597c4b90a5d68830f880726456a975cc7"},"cell_type":"code","source":"x = torch.tensor([2.5, 7])\nprint(x)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"ecdbb1621488e4e13e8ac717cac80dd821f888a2"},"cell_type":"markdown","source":"#### **Create tensor based on existing tensor** - This method has reuse property of input Tensor"},{"metadata":{"trusted":true,"_uuid":"ec1b8393fe846f4bcc98be4263d20e0390cf9733"},"cell_type":"code","source":"x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes\nprint(x)\nprint(x.size())","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"7950f96c69110a17ca256d33acee61dc6cdf0262"},"cell_type":"code","source":"x = torch.randn_like(x, dtype=torch.float)    # override dtype!\nprint(x) \nprint(x.size())","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"8471fcdbe158773139c8989b5cf55887f54e7b31"},"cell_type":"markdown","source":"#### Basic Tensor Operation\n---\n![](https://devblogs.nvidia.com/wp-content/uploads/2018/05/tesnor_core_diagram.png)"},{"metadata":{"_uuid":"91b5fa058babe04545e8a5b2a3436a907895ba33"},"cell_type":"markdown","source":"* **Addition(`torch.add()`)**\n* **Substraction(`torch.sub()`)**\n* **Division(`torch.div()`)**\n* ** Multiplication( `torch.mul()`)**"},{"metadata":{"trusted":true,"_uuid":"fb062b21a4287197eda0a48c5c0495e14a2376f9"},"cell_type":"code","source":"x = torch.rand(5, 3)\ny = torch.rand(5, 3)\nprint(x + y) # This is old method\nprint(torch.add(x, y)) # pytorch method","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"24f38d02e51ea66bdb7272de7a9c46d981af8367"},"cell_type":"code","source":"x = torch.rand(5, 3)\ny = torch.rand(5, 3)\nprint(x - y) # This is old method\nprint(torch.sub(x, y)) # pytorch method","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"7982ed05ec8d51a9e4934395e105dd0620492ea0"},"cell_type":"code","source":"x = torch.rand(5, 3)\ny = torch.rand(5, 3)\nprint(x / y) # This is old method\nprint(torch.div(x, y)) # pytorch method","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"cfb775e2bdbbc1751852e904c882380f4082ff22"},"cell_type":"code","source":"x = torch.rand(5, 3)\ny = torch.rand(5, 3)\nprint(x * y) # This is old method\nprint(torch.mul(x, y)) # pytorch method","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"c70aa755a1f8ac9e02dd0af5b48fcad623d14577"},"cell_type":"markdown","source":"* **Add x to y**"},{"metadata":{"trusted":true,"_uuid":"67c3f59055d15ee8dae750cdee345811639b6ec6"},"cell_type":"code","source":"# adds x to y\ny.add_(x)\nprint(y)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"8dac2e92070e4a5e881594517fc01dbe352e0828"},"cell_type":"markdown","source":"* **Standard Numpy like Indexing**"},{"metadata":{"trusted":true,"_uuid":"b7f46bb13d9fa7df23a5baa71d85522b621216ad"},"cell_type":"code","source":"print(x[:, 1])","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"93a4492d08e9927dab04a47f6f1a01c520a2b87a"},"cell_type":"markdown","source":"* **Resizing**"},{"metadata":{"trusted":true,"_uuid":"44a0a96742874dac897f074e8e05496bede452ce"},"cell_type":"code","source":"x = torch.randn(4, 4)\ny = x.view(16)\nz = x.view(-1, 8)  # the size -1 is inferred from other dimensions\nprint(x.size(), y.size(), z.size())","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"70c712b8acad65e71cb6d5592312328ff5eeb349"},"cell_type":"markdown","source":"---\n* **Note** : Any operation that mutates a tensor in-place is post-fixed with an _. For example:_ `x.copy_(y), x.t_()`, will change `x`.  \nfor more about tensor go on this link : https://pytorch.org/docs/stable/tensors.html\n---"},{"metadata":{"_uuid":"f6ba17b313dc8a80fae284665875bb469e9816e8"},"cell_type":"markdown","source":"### Variable\n* The difference between pytorch and numpy is that it provides automatic derivation, which can automatically give you the gradient of the parameters you want. This operation is provided by another basic element, Variable.\n![](https://raw.githubusercontent.com/pytorch/pytorch/master/docs/source/_static/img/dynamic_graph.gif)\n* A Variable wraps a Tensor. It supports nearly all the API’s defined by a Tensor. Variable also provides a backward method to perform backpropagation. For example, to backpropagate a loss function to train model parameter x, we use a variable loss to store the value computed by a loss function. Then, we call loss.backward which computes the gradients ∂loss∂x for all trainable parameters. PyTorch will store the gradient results back in the corresponding variable x.\n* Variable in torch is to build a computational graph, but this graph is dynamic compared with a static graph in Tensorflow or Theano.So torch does not have placeholder, torch can just pass variable to the computational graph."},{"metadata":{"trusted":true,"_uuid":"d939f0526f7892d728a25428ded113dbb595ed94"},"cell_type":"code","source":"import torch\nfrom torch.autograd import Variable","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"485da1755da35f5350aff6aaab02100b9527ff0a"},"cell_type":"markdown","source":"* Build a **tensor**\n* Build a **variable, usually for compute gradients**"},{"metadata":{"trusted":true,"_uuid":"c2ca2eb93e5dd72108b458053a52f5cefda82867"},"cell_type":"code","source":"tensor = torch.FloatTensor([[1,2],[3,4]])  \nvariable = Variable(tensor, requires_grad=True)   \nprint(tensor)       # [torch.FloatTensor of size 2x2]\nprint(variable)     # [torch.FloatTensor of size 2x2]","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"13aee16d4362a934150b3e6f6651504692c28fdb"},"cell_type":"markdown","source":"* Till now the **tensor and variable** seem the same.However, the **variable is a part of the graph**, it's a part of the **auto-gradient.**"},{"metadata":{"trusted":true,"_uuid":"09b3f9b15860b64f3259bca6ba15fe971fe64288"},"cell_type":"code","source":"t_out = torch.mean(tensor*tensor)       # x^2\nv_out = torch.mean(variable*variable)   # x^2\nprint(t_out)\nprint(v_out)    # 7.5","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"7573a5a4d883bf41e857a0a2eb98ff114b4f7543"},"cell_type":"markdown","source":"*  Backpropagation from v_out\n*  `v_out  =  1 / 4  *  sum(variable * variable)`\n* the gradients w.r.t the variable, `d(v_out)/d(variable) = 1/4*2*variable = variable/2`"},{"metadata":{"trusted":true,"_uuid":"fceab449af3ec6b1519b85ca115749acf3003c76"},"cell_type":"code","source":"v_out.backward()   \nprint(variable.grad)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"e626633f80a78ffcb1a087ff0963702c76fe86ec"},"cell_type":"markdown","source":"*  This is data in **variable format**"},{"metadata":{"trusted":true,"_uuid":"62b36b73db4c357eac8100af2bde7838e6cda9a8"},"cell_type":"code","source":"print(variable)    ","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"9c690507873b381e889002121b1dc4ce059f93c9"},"cell_type":"markdown","source":"* This is data in **tensor format**"},{"metadata":{"trusted":true,"_uuid":"206c2abb5c80eb4cbd9299c25525dafdd7323c19"},"cell_type":"code","source":"print(variable.data)    # ","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"73664766c6b150c57c7a6a3817d6020eb6d906b0"},"cell_type":"markdown","source":"* This is in **numpy format**"},{"metadata":{"trusted":true,"_uuid":"4a3cd640c19724ff42fc15f6bc46eae5d529c222"},"cell_type":"code","source":"print(variable.data.numpy())    # numpy format","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"12d05ef4479530eb66550609f9ac40e7a39bd312"},"cell_type":"markdown","source":"### Activation Function\n---\n* Activation functions is used for a Artificial Neural Network to learn and make sense of something really complicated and Non-linear complex functional mappings between the **inputs and response variable.** They introduce non-linear properties to our Network.\n* Their main purpose is to convert a ***input signal of a node in a A-NN to an output signal.** That **output signal** now is used as a **input in the next layer** in the stack.\n![](https://cdn-images-1.medium.com/max/1200/1*ZafDv3VUm60Eh10OeJu1vw.png)"},{"metadata":{"trusted":true,"_uuid":"521babb82c128a3a6d649dd54bdc3894b82a2326"},"cell_type":"code","source":"import torch\nimport torch.nn.functional as F\nfrom torch.autograd import Variable\nimport matplotlib.pyplot as plt\nimport warnings\nwarnings.filterwarnings(action='ignore')","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"d1e9a7648eb9ca56e77ab3313565bc600bb15f19"},"cell_type":"markdown","source":"#### Generate Fake Data"},{"metadata":{"trusted":true,"_uuid":"e7cee2ae3128b0317f27a9897f3f48963f29feff"},"cell_type":"code","source":"x = torch.linspace(-5, 5, 200)  # x data (tensor), shape=(200, 1)\nx = Variable(x)\nx_np = x.data.numpy()   # numpy array for plotting","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"5868b4385fd6165ca8dd5614a7892e82fb4880f5"},"cell_type":"markdown","source":"#### Popular Activation Function"},{"metadata":{"trusted":true,"_uuid":"30dd763dba3d0cbc34d80b952bb91ed0e61f8518"},"cell_type":"code","source":"y_relu = F.relu(x).data.numpy()\ny_sigmoid = F.sigmoid(x).data.numpy()\ny_tanh = F.tanh(x).data.numpy()\ny_softplus = F.softplus(x).data.numpy()\n\n# y_softmax = F.softmax(x)\n# softmax is a special kind of activation function, it is about probability\n# and will make the sum as 1.","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"fe08d0f08c4999146140262ab437d8ffbd180323"},"cell_type":"markdown","source":"You can see that all activation function we have applied here.softmax is special kind of activation function which is used for **multiclass classification problem.**"},{"metadata":{"_uuid":"50aef3287aa7502409a718771d3655b3ca392447"},"cell_type":"markdown","source":"#### Activation Function plot from data"},{"metadata":{"_kg_hide-input":true,"trusted":true,"_uuid":"108717eb6476ff939f293734825b375f8e3beb4d"},"cell_type":"code","source":"%matplotlib inline\n\nplt.figure(1, figsize=(20, 20))\nplt.subplot(221)\nplt.scatter(x_np, y_relu, c='red', label='relu')\nplt.ylim((-1, 5))\nplt.legend(loc='best')\n\nplt.subplot(222)\nplt.scatter(x_np, y_sigmoid, c='orange', label='sigmoid')\nplt.ylim((-0.2, 1.2))\nplt.legend(loc='best')\n\nplt.subplot(223)\nplt.scatter(x_np, y_tanh, c='green', label='tanh')\nplt.ylim((-1.2, 1.2))\nplt.legend(loc='best')\n\nplt.subplot(224)\nplt.scatter(x_np, y_softplus, c='blue', label='softplus')\nplt.ylim((-0.2, 6))\nplt.legend(loc='best')\n\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"01615e045b29501645385fbe79165e885e61f7cb"},"cell_type":"markdown","source":"Section : 2. Neural Network\n---"},{"metadata":{"_uuid":"b7c09ba200f3149fd845caefefb0d274d84670ed"},"cell_type":"markdown","source":"---\n### Linear Regression\n\n---\n\n* It is worth underlining that this is an example focused on re-applying the techniques introduced. Indeed, PyTorch offers much more advanced methodologies to accomplish the same task, introduced in the following tutorials.\n* In this example we will consider a simple one-dimensional synthetic problem (with some added noise):\nExample : \n![](https://i.pinimg.com/564x/cd/6e/c7/cd6ec7106cfd22cdfd5c542a84a8b869.jpg)"},{"metadata":{"trusted":true,"_uuid":"69ee80cb72d6b1ec3938e55ca0b13fb2f1f6ca99"},"cell_type":"code","source":"X = np.random.rand(30, 1)*2.0\nw = np.random.rand(2, 1)\ny = X*w[0] + w[1] + np.random.randn(30, 1) * 0.05","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"2f62441abf7a6403597ba1555a850949beb76770"},"cell_type":"markdown","source":"![](https://iaml.it/blog/fun-with-pytorch-part-1/images/regressione_lineare_1.png)"},{"metadata":{"_uuid":"89c536061d2ff997cc67dfb9aa0e0bbb25100f94"},"cell_type":"markdown","source":"In order to detect the line's coefficient, we define a linear model:\n#### Define Weight and Bias"},{"metadata":{"trusted":true,"_uuid":"9903d5b844f7ecd9efb2519ac4ab0505dcd2252b"},"cell_type":"code","source":"W = Variable(torch.rand(1, 1), requires_grad=True)\nb = Variable(torch.rand(1), requires_grad=True)\n\ndef linear(x):\n    return torch.matmul(x, W) + b","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"b3b02335a0506d2e467b35b080bef1007d935b12"},"cell_type":"markdown","source":"* Using `torch.matmul` is redundant in this case, but we want the function to be as general as possible to be re-used for more complex models."},{"metadata":{"trusted":true,"_uuid":"ce81031d6790ac2ba2356e58c361575f16aa4bb7"},"cell_type":"code","source":"Xt = Variable(torch.from_numpy(X)).float()\nyt = Variable(torch.from_numpy(y)).float()\n\nfor epoch in range(2500):\n\n    # Compute predictions\n    y_pred = linear(Xt)\n\n    # Compute cost function\n    loss = torch.mean((y_pred - yt) ** 2)\n\n    # Run back-propagation\n    loss.backward()\n\n    # Update variables\n    W.data = W.data - 0.005*W.grad.data\n    b.data = b.data - 0.005*b.grad.data\n\n    # Reset gradients\n    W.grad.data.zero_()\n    b.grad.data.zero_()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"e974cf7bbf0ddc5b66d21e212ba1f0438fd3fbb3"},"cell_type":"markdown","source":"* After Training we can see the graph like this\n![](https://iaml.it/blog/fun-with-pytorch-part-1/images/regressione_lineare_2.png)"},{"metadata":{"_uuid":"e762e75e07cac863383ea1111e8c037e96cf44fe"},"cell_type":"markdown","source":"### Relationship Fitting Regression Model\n---\n\n> In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables.\n\n...[Regression Analysis](https://en.wikipedia.org/wiki/Regression_analysis)\n\n![](https://mahritaharahap.files.wordpress.com/2014/06/relationship.png)"},{"metadata":{"trusted":true,"_uuid":"5fae06436ce660125b7dd13e6834308c8577130e"},"cell_type":"code","source":"import torch\nfrom torch.autograd import Variable\nimport torch.nn.functional as F\nimport matplotlib.pyplot as plt\n%matplotlib inline","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"97fdd07fa5b9f6c0ace91a0f950acc50473c10ed"},"cell_type":"code","source":"torch.manual_seed(1)    # reproducible","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"21b9f4f0724a0a92be06cd65fca7e2d8a68a2d71"},"cell_type":"code","source":"x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1)  # x data (tensor), shape=(100, 1)\ny = x.pow(2) + 0.2*torch.rand(x.size())                 # noisy y data (tensor), shape=(100, 1)\n\n# torch can only train on Variable, so convert them to Variable\nx, y = Variable(x), Variable(y)\nplt.figure(figsize=(20,8))\nplt.scatter(x.data.numpy(), y.data.numpy(), color = \"orange\")\nplt.title('Regression Analysis')\nplt.xlabel('Independent varible')\nplt.ylabel('Dependent varible')\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"ae8798af66da9081f5828291e89ddbd67114f14b"},"cell_type":"markdown","source":"### Model Fitting"},{"metadata":{"trusted":true,"_uuid":"d35956346940eb6da4b25f996066abc6b582e356"},"cell_type":"code","source":"class Net(torch.nn.Module):\n    def __init__(self, n_feature, n_hidden, n_output):\n        super(Net, self).__init__()\n        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer\n        self.predict = torch.nn.Linear(n_hidden, n_output)   # output layer\n\n    def forward(self, x):\n        x = F.relu(self.hidden(x))      # activation function for hidden layer\n        x = self.predict(x)             # linear output\n        return x","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"67257384c501210e61a73778130b36a95eea243d"},"cell_type":"code","source":"net = Net(n_feature=1, n_hidden=10, n_output=1)     # define the network\nprint(net)  # net architecture","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"1d29e19aaeab0768c108206d22efe24123d92497"},"cell_type":"markdown","source":"* **Optimizer** - It is the most popular Optimization algorithms used in optimizing a Neural Network. Now gradient descent is majorly used to do Weights updates in a Neural Network Model , i.e update and tune the Model's parameters in a direction so that we can minimize the Loss function."},{"metadata":{"trusted":true,"_uuid":"8bdd37e53efbb350fc05338540886457821ea45f"},"cell_type":"code","source":"optimizer = torch.optim.SGD(net.parameters(), lr=0.2)\nloss_func = torch.nn.MSELoss()  # this is for regression mean squared loss","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"63f93f579a275fa5c3bce0ebdb652cc4eb67acde"},"cell_type":"code","source":"plt.ion()   # something about plotting","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"scrolled":false,"_uuid":"da70cd89afb2ac8ca2b6c740c8b94d150077f3ef"},"cell_type":"code","source":"for t in range(100):\n    prediction = net(x)     # input x and predict based on x\n\n    loss = loss_func(prediction, y)     # must be (1. nn output, 2. target)\n\n    optimizer.zero_grad()   # clear gradients for next train\n    loss.backward()         # backpropagation, compute gradients\n    optimizer.step()        # apply gradients\n    if t % 10 == 0:\n        # plot and show learning process\n        plt.figure(figsize=(20,6))\n        plt.cla()\n        plt.scatter(x.data.numpy(), y.data.numpy(), color = \"orange\")\n        plt.plot(x.data.numpy(), prediction.data.numpy(), 'g-', lw=3)\n        plt.text(0.3, 0, 'Loss=%.4f' % loss.data.numpy(), fontdict={'size': 25, 'color':  'red'})\n        plt.show()\n        plt.pause(0.1)\n\nplt.ioff()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"25bdb46e92c63cd00ba49cce8128b656ba47bf8a"},"cell_type":"markdown","source":"### Distinguish type classification\n---\n* Classification is a data mining function that assigns items in a collection to target categories or classes. The goal of classification is to accurately predict the target class for each case in the data. For example, a classification model could be used to identify loan applicants as low, medium, or high credit risks."},{"metadata":{"_uuid":"4d58418ee0d2aab3479b014cb583125521e1518c"},"cell_type":"markdown","source":"![](https://morvanzhou.github.io/static/results/torch/1-1-3.gif)"},{"metadata":{"trusted":true,"_uuid":"1212b62f799d3904a41fff753bc73571bd1dda78"},"cell_type":"code","source":"import torch\nfrom torch.autograd import Variable\nimport torch.nn.functional as F\nimport matplotlib.pyplot as plt\n%matplotlib inline\n\ntorch.manual_seed(1)    # reproducible","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"cef3c10d00a1e3812821093ab52eb845b23da24c"},"cell_type":"markdown","source":"### Generate Fake Data"},{"metadata":{"trusted":true,"_uuid":"fae92cfddbf6a813c2b3ecad2b5ad375f4a8fb6a"},"cell_type":"code","source":"n_data = torch.ones(100, 2)\nx0 = torch.normal(2*n_data, 1)      # class0 x data (tensor), shape=(100, 2)\ny0 = torch.zeros(100)               # class0 y data (tensor), shape=(100, 1)\nx1 = torch.normal(-2*n_data, 1)     # class1 x data (tensor), shape=(100, 2)\ny1 = torch.ones(100)                # class1 y data (tensor), shape=(100, 1)\nx = torch.cat((x0, x1), 0).type(torch.FloatTensor)  # shape (200, 2) FloatTensor = 32-bit floating\ny = torch.cat((y0, y1), ).type(torch.LongTensor)    # shape (200,) LongTensor = 64-bit integer\n\n# torch can only train on Variable, so convert them to Variable\nx, y = Variable(x), Variable(y)\nplt.figure(figsize=(10,10))\nplt.scatter(x.data.numpy()[:, 0], x.data.numpy()[:, 1], c=y.data.numpy(), s=100, lw=0, cmap='RdYlGn')\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"dcab0282bb5fd385fd3b6586281e0406d3bd87be"},"cell_type":"markdown","source":"### Model design"},{"metadata":{"trusted":true,"_uuid":"8e700f2b785ca3e448f5ea4dc66da1c5eb7883b6"},"cell_type":"code","source":"class Net(torch.nn.Module):\n    def __init__(self, n_feature, n_hidden, n_output):\n        super(Net, self).__init__()\n        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer\n        self.out = torch.nn.Linear(n_hidden, n_output)   # output layer\n\n    def forward(self, x):\n        x = F.relu(self.hidden(x))      # activation function for hidden layer\n        x = self.out(x)\n        return x","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"c3d3e56f2d8844686b6a70be7c67121ef3eb2c0b"},"cell_type":"markdown","source":"### Define Network"},{"metadata":{"trusted":true,"_uuid":"14a2581e8ec4216e9d2b90f59fc86733a90aa1d3"},"cell_type":"code","source":"net = Net(n_feature=2, n_hidden=10, n_output=2)     # define the network\nprint(net)  # net architecture\n\n# Loss and Optimizer\n# Softmax is internally computed.\n# Set parameters to be updated.\noptimizer = torch.optim.SGD(net.parameters(), lr=0.02)\nloss_func = torch.nn.CrossEntropyLoss()  # the target label is NOT an one-hotted","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"5da1dee078e1f6601d3c18b90dfd9d6e94932a40"},"cell_type":"markdown","source":"### Plot the Model"},{"metadata":{"trusted":true,"_uuid":"b034bea947d56ffde1c0c1d98f8d6498de4205a9"},"cell_type":"code","source":"plt.ion()   # something about plotting","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"scrolled":false,"_uuid":"e125b50242695e6cd0d28bd386b5a06dd76db5cc"},"cell_type":"code","source":"for t in range(100):\n    out = net(x)                 # input x and predict based on x\n    loss = loss_func(out, y)     # must be (1. nn output, 2. target), the target label is NOT one-hotted\n\n    optimizer.zero_grad()   # clear gradients for next train\n    loss.backward()         # backpropagation, compute gradients\n    optimizer.step()        # apply gradients\n    \n    if t % 10 == 0 or t in [3, 6]:\n        # plot and show learning process\n        plt.figure(figsize=(20,6))\n        plt.cla()\n        _, prediction = torch.max(F.softmax(out), 1)\n        pred_y = prediction.data.numpy().squeeze()\n        target_y = y.data.numpy()\n        plt.scatter(x.data.numpy()[:, 0], x.data.numpy()[:, 1], c=pred_y, s=100, lw=0, cmap='RdYlGn')\n        accuracy = sum(pred_y == target_y)/200.\n        plt.text(1.5, -4, 'Accuracy=%.2f' % accuracy, fontdict={'size': 20, 'color':  'red'})\n        plt.show()\n        plt.pause(0.1)\n\nplt.ioff()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"34b7f38e9c535b5440370e80ab57881f19d89546"},"cell_type":"markdown","source":"### Easy way to Buid Neural Network\n---"},{"metadata":{"trusted":true,"_uuid":"e56972abb6726f0899ca01c6145a027c531dbaad"},"cell_type":"markdown","source":"**Neural Network** - Neural networks, a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data.\n\n![](https://www.researchgate.net/profile/Reza_Rezaee8/publication/230905093/figure/fig1/AS:300580867198976@1448675471259/a-Schematic-diagram-of-an-artificial-neural-network-structure-which-consists-of-an.png)"},{"metadata":{"trusted":true,"_uuid":"9ca39db84619c26aa2465fe5af5dafdf406abb2c"},"cell_type":"code","source":"import torch\nimport torch.nn.functional as F\nfrom torch.autograd import Variable\nimport matplotlib.pyplot as plt\n%matplotlib inline\n\ntorch.manual_seed(1)    # reproducible","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"3c98f22b3315ee1de66a2597b9b72aed45fdcaa9"},"cell_type":"markdown","source":"### Design Model"},{"metadata":{"trusted":true,"_uuid":"d800ea7867d072a44d44a544fd93708469152ede"},"cell_type":"code","source":"# replace following class code with an easy sequential network\nclass Net(torch.nn.Module):\n    def __init__(self, n_feature, n_hidden, n_output):\n        super(Net, self).__init__()\n        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer\n        self.predict = torch.nn.Linear(n_hidden, n_output)   # output layer\n\n    def forward(self, x):\n        x = F.relu(self.hidden(x))      # activation function for hidden layer\n        x = self.predict(x)             # linear output\n        return x","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"73236a12281950cc2652825d88c4132f7a2d533d"},"cell_type":"markdown","source":"### Define Model"},{"metadata":{"trusted":true,"_uuid":"3f98b3fa8140b6bff2e99315e9fa7b99730b1ad7"},"cell_type":"code","source":"net1 = Net(1, 10, 1)\n\n# easy and fast way to build your network\nnet2 = torch.nn.Sequential(\n    torch.nn.Linear(1, 10),\n    torch.nn.ReLU(),\n    torch.nn.Linear(10, 1)\n)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"f6211ed3e35e0cc0080f3f316f04b28e9e21fa3d"},"cell_type":"code","source":"print(net1)     # net1 architecture\nprint(net2)     # net2 architecture","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"f1d24b5a9284ad1caeda3262eab9612fd4102987"},"cell_type":"markdown","source":"## Save and Reload Model\n---\n\n### Generate Fake Data"},{"metadata":{"trusted":true,"_uuid":"e8bc1b721de759fd1a5972ee821aeec59c6ed151"},"cell_type":"code","source":"x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1)  # x data (tensor), shape=(100, 1)\ny = x.pow(2) + 0.2*torch.rand(x.size())  # noisy y data (tensor), shape=(100, 1)\nx, y = Variable(x, requires_grad=False), Variable(y, requires_grad=False)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"3a7952b838b049841882e22c0efbf77a470cf4f3"},"cell_type":"markdown","source":"### Model Saving Function in Pickle Format"},{"metadata":{"trusted":true,"_uuid":"760b511aa0a22e12cf98290131a65fa8d4cf449a"},"cell_type":"code","source":"def save():\n    # save net1\n    net1 = torch.nn.Sequential(\n        torch.nn.Linear(1, 10),\n        torch.nn.ReLU(),\n        torch.nn.Linear(10, 1)\n    )\n    optimizer = torch.optim.SGD(net1.parameters(), lr=0.5)\n    loss_func = torch.nn.MSELoss()\n\n    for t in range(100):\n        prediction = net1(x)\n        loss = loss_func(prediction, y)\n        optimizer.zero_grad()\n        loss.backward()\n        optimizer.step()\n\n    # plot result\n    plt.figure(1, figsize=(20, 5))\n    plt.subplot(131)\n    plt.title('Net1')\n    plt.scatter(x.data.numpy(), y.data.numpy(), color = \"orange\")\n    plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)\n\n    # 2 ways to save the net\n    torch.save(net1, 'net.pkl')  # save entire net\n    torch.save(net1.state_dict(), 'net_params.pkl')   # save only the parameters","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"88ae3fbaaf521ad9dbc4940d147cbb647ee42668"},"cell_type":"markdown","source":"### Load save model"},{"metadata":{"trusted":true,"_uuid":"8ab407f76b335009a089aa9408f5bc70b40bf981"},"cell_type":"code","source":"def restore_net():\n    # restore entire net1 to net2\n    net2 = torch.load('net.pkl')\n    prediction = net2(x)\n\n    # plot resulta\n    plt.figure(1, figsize=(20, 5))\n    plt.subplot(132)\n    plt.title('Net2')\n    plt.scatter(x.data.numpy(), y.data.numpy(), color = \"orange\")\n    plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"b1b6e64a124fe6fc21f7f873d501e0525aee4b2e"},"cell_type":"markdown","source":"### Parameter Restore for net1 to net3"},{"metadata":{"trusted":true,"_uuid":"a5e3e88e84439062f4a3d3bd57bcc2ec2c0468b3"},"cell_type":"code","source":"def restore_params():\n    # restore only the parameters in net1 to net3\n    net3 = torch.nn.Sequential(\n        torch.nn.Linear(1, 10),\n        torch.nn.ReLU(),\n        torch.nn.Linear(10, 1)\n    )\n\n    # copy net1's parameters into net3\n    net3.load_state_dict(torch.load('net_params.pkl'))\n    prediction = net3(x)\n\n    # plot result\n    plt.figure(1, figsize=(20, 5))\n    plt.subplot(133)\n    plt.title('Net3')\n    plt.scatter(x.data.numpy(), y.data.numpy(), color = \"orange\")\n    plt.plot(x.data.numpy(), prediction.data.numpy(), 'r-', lw=5)\n    plt.show()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"833b1b336e747d020f544648b2f41c18874ec3e4"},"cell_type":"markdown","source":"### Plot the Models"},{"metadata":{"trusted":true,"_uuid":"33dc6ebe683568f2f4d37276efcc915f9fe53ecf"},"cell_type":"code","source":"# save net1\nsave()\n# restore entire net (may slow)\nrestore_net()\n# restore only the net parameters\nrestore_params()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"a0fb3ec5cd6ef96995abeceaad33dd50e7f396e4"},"cell_type":"markdown","source":"## Train on batch\n---\n![](https://i0.wp.com/mlexplained.com/wp-content/uploads/2018/02/%E3%82%B9%E3%82%AF%E3%83%AA%E3%83%BC%E3%83%B3%E3%82%B7%E3%83%A7%E3%83%83%E3%83%88-2018-02-07-10.32.59.png?w=1500)"},{"metadata":{"trusted":true,"_uuid":"cd93111f4958fdba7066063ac35f7fc8d2d9e29e"},"cell_type":"code","source":"import torch\nimport torch.utils.data as Data\n\ntorch.manual_seed(1)    # reproducible","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"e4df6d5f9b60869e5790cc34f7a5824342988ce6"},"cell_type":"code","source":"BATCH_SIZE = 5\n# BATCH_SIZE = 8","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"759845a9532dd71d1be03832480b9f855cbf73b0"},"cell_type":"markdown","source":"#### Dataset Preparation"},{"metadata":{"trusted":true,"_uuid":"1637aede0a38256a951751633ae50ea2d550ef86"},"cell_type":"code","source":"x = torch.linspace(1, 10, 10)       # this is x data (torch tensor)\ny = torch.linspace(10, 1, 10)       # this is y data (torch tensor)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"7fa6c468de10a0834daeb33e6a06097407650ca6"},"cell_type":"markdown","source":"### Load dataset "},{"metadata":{"trusted":true,"_uuid":"e325d0c264888ae2196b86729e99db62c4c609b6"},"cell_type":"code","source":"torch_dataset = Data.TensorDataset(x, y)\nloader = Data.DataLoader(\n    dataset=torch_dataset,      # torch TensorDataset format\n    batch_size=BATCH_SIZE,      # mini batch size\n    shuffle=True,               # random shuffle for training\n    num_workers=2,              # subprocesses for loading data\n)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"c00e2de8795f9ae071fb0dcb752d9e8d3872888d"},"cell_type":"markdown","source":"### Model Treaning"},{"metadata":{"trusted":true,"_uuid":"4f6d7a6b4af28de27bcbe84aa8bda4b277c0c251"},"cell_type":"code","source":"def show_batch():\n    for epoch in range(3):   # train entire dataset 3 times\n        for step, (batch_x, batch_y) in enumerate(loader):  # for each training step\n            # train your data...\n            print('Epoch: ', epoch, '| Step: ', step, '| batch x: ',\n                  batch_x.numpy(), '| batch y: ', batch_y.numpy())","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"a27025cb3e2d4dc2a345c070eb564f9216f54b84"},"cell_type":"code","source":"if __name__ == '__main__':\n    show_batch()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"9dcb09ca82ac749898d89630e1fd1fd619eb8797"},"cell_type":"markdown","source":"## Optimizers\n\nNice Article on Optimizer : https://blog.paperspace.com/intro-to-optimization-momentum-rmsprop-adam/"},{"metadata":{"_uuid":"cc703ff01f78a668888f05c8eb99722ff6c99d1c"},"cell_type":"markdown","source":"![](https://image.slidesharecdn.com/optimizationtalk-171126132036/95/optimization-for-deep-learning-26-638.jpg?cb=1511702523)"},{"metadata":{"trusted":true,"_uuid":"6ff7c611149c03cadd18797e0ed4fb38dc54731e"},"cell_type":"code","source":"import torch\nimport torch.utils.data as Data\nimport torch.nn.functional as F\nfrom torch.autograd import Variable\nimport matplotlib.pyplot as plt\n%matplotlib inline","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"b63b18948228a4fce9b36c15cf5c5411b7751769"},"cell_type":"code","source":"torch.manual_seed(1)    # reproducible","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"f6f3478018d2b93bbca670d8a92487e10af6d166"},"cell_type":"markdown","source":"### Paramater Defined"},{"metadata":{"trusted":true,"_uuid":"39d9da5a142762ea248a4777309fa51ff661bc15"},"cell_type":"code","source":"LR = 0.01\nBATCH_SIZE = 32\nEPOCH = 12","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"26a07f02d6294539b7b9925bba02eaeb219ce5f2"},"cell_type":"markdown","source":"### Generate Fake Data"},{"metadata":{"trusted":true,"_uuid":"612a5203bce5f21e22294482a96909e97a8e2f44"},"cell_type":"code","source":"# fake dataset\nx = torch.unsqueeze(torch.linspace(-1, 1, 1000), dim=1)\ny = x.pow(2) + 0.1*torch.normal(torch.zeros(*x.size()))\n\n# plot dataset\nplt.figure(figsize=(20,8))\nplt.scatter(x.numpy(), y.numpy(), color = \"orange\")\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"65e0be82889e540ede0147a8f40306f3322b408a"},"cell_type":"markdown","source":"### Put Dataset in torch dataset"},{"metadata":{"trusted":true,"_uuid":"b6b9db50ce6d47eb31707e486f903b5a5f308119"},"cell_type":"code","source":"torch_dataset = Data.TensorDataset(x, y)\nloader = Data.DataLoader(\n    dataset=torch_dataset, \n    batch_size=BATCH_SIZE, \n    shuffle=True, num_workers=2,)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"58679fd5782c39433ccd577dbce8a981f4b733a3"},"cell_type":"markdown","source":"### Default Network"},{"metadata":{"trusted":true,"_uuid":"9f0c9cca868d165c957877d987397c3cec241433"},"cell_type":"code","source":"class Net(torch.nn.Module):\n    def __init__(self):\n        super(Net, self).__init__()\n        self.hidden = torch.nn.Linear(1, 20)   # hidden layer\n        self.predict = torch.nn.Linear(20, 1)   # output layer\n\n    def forward(self, x):\n        x = F.relu(self.hidden(x))      # activation function for hidden layer\n        x = self.predict(x)             # linear output\n        return x","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"b908e56325c63d029bb31bcfa3641e78cc990034"},"cell_type":"markdown","source":"### Different Net"},{"metadata":{"trusted":true,"_uuid":"9087a692b668c991bfc2aa2447f10b27e10e69e6"},"cell_type":"code","source":"net_SGD         = Net()\nnet_Momentum    = Net()\nnet_RMSprop     = Net()\nnet_Adam        = Net()\nnets = [net_SGD, net_Momentum, net_RMSprop, net_Adam]","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"bad12067f0237a7318cd8ce5491f0c740fd06769"},"cell_type":"markdown","source":"### Different optimizers"},{"metadata":{"trusted":true,"_uuid":"59ec56cb6adf9c25f44d3b65c4bfc3636b432a11"},"cell_type":"code","source":"opt_SGD         = torch.optim.SGD(net_SGD.parameters(), lr=LR)\nopt_Momentum    = torch.optim.SGD(net_Momentum.parameters(), lr=LR, momentum=0.8)\nopt_RMSprop     = torch.optim.RMSprop(net_RMSprop.parameters(), lr=LR, alpha=0.9)\nopt_Adam        = torch.optim.Adam(net_Adam.parameters(), lr=LR, betas=(0.9, 0.99))\noptimizers = [opt_SGD, opt_Momentum, opt_RMSprop, opt_Adam]","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"03d5944542e6d563d61de977880bf324b58ba758"},"cell_type":"code","source":"loss_func = torch.nn.MSELoss()\nlosses_his = [[], [], [], []]   # record loss","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"df2ab81582ed834fd1ee9eb59aeed7b62d89df44"},"cell_type":"markdown","source":"### Model Training"},{"metadata":{"trusted":true,"_uuid":"fea175c4b991489cf71eb553830949af25855044"},"cell_type":"code","source":"for epoch in range(EPOCH):\n    print('Epoch: ', epoch)\n    for step, (batch_x, batch_y) in enumerate(loader):          # for each training step\n        b_x = Variable(batch_x)\n        b_y = Variable(batch_y)\n\n        for net, opt, l_his in zip(nets, optimizers, losses_his):\n            output = net(b_x)              # get output for every net\n            loss = loss_func(output, b_y)  # compute loss for every net\n            opt.zero_grad()                # clear gradients for next train\n            loss.backward()                # backpropagation, compute gradients\n            opt.step()                     # apply gradients\n            l_his.append(loss.data[0])     # loss recoder\nplt.figure(figsize = (20,10))\nlabels = ['SGD', 'Momentum', 'RMSprop', 'Adam']\nfor i, l_his in enumerate(losses_his):\n    plt.plot(l_his, label=labels[i])\nplt.legend(loc='best')\nplt.xlabel('Steps')\nplt.ylabel('Loss')\nplt.ylim((0, 0.2))\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"87b71d0bc6b70fd6e15caf1f9c318210951e17ef"},"cell_type":"markdown","source":"# Section : 3. Advance Neural Network\n---"},{"metadata":{"trusted":true,"_uuid":"54f76d1f0f0458f46b42bf3facff14dbb2540803"},"cell_type":"markdown","source":"## CNN \nLearning Link : http://cs231n.github.io/convolutional-networks/\n---\n\n* **Convolutional neural networks** are an *artificial neural network structure that has gradually emerged in recent years.* Because **convolutional neural networks** can give **better prediction results in image and speech recognition, this technology** is also widely used. The most commonly used aspect of convolutional neural networks is **computer image recognition,** but because of constant innovation, it is also **used in video analysis, natural language processing, drug discovery,** etc. The most recent Alpha Go, let the computer see Knowing Go, there is also the use of this technology.\n\n---\n\n![](https://software.intel.com/sites/default/files/managed/20/df/dev-journey-article15-fig11-examples-of-pooling.png)\n![](https://www.researchgate.net/profile/Serkan_Kiranyaz/publication/313676923/figure/fig2/AS:461584393871361@1487061703456/A-standard-2D-CNN-10.png)\n\n---\n\n* Let's take a look at the word convolutional neural network. **\"Convolution\" and \"neural network\".** \n* ***Convolution*** means that **the neural network is no longer processing the input information of each pixel, but every picture. A small block of pixels is processed, which enhances the continuity of the picture information. It allows the neural network to see the picture instead of a point. This also deepens the understanding of the picture by the neural network. **\n* Specifically, *The convolutional neural network has a **batch filter** that continuously scrolls the information in the image on the image. **Each time it collects, it only collects a small pixel area, and then sorts the collected information. The information has some actual representations.***\n* **For example,** the neural network can see some edge image information, and then in the same step, use a similar batch filter to scan the generated edge information, the neural network from these The edge information summarizes the higher-level information structure. For example, **the edge of the summary can draw eyes, nose, etc.** After a filter, the face information is also from this. Information nose eyes are summed up. The last information we then set into the general picture of several layers fully connected neural layer classification, so that we can get input can be divided into what type of result."},{"metadata":{"_uuid":"a40320290a50a6a2ce40bd7077d1697d1f86f1ca"},"cell_type":"markdown","source":"### Convolution\n![](https://morvanzhou.github.io/static/results/ML-intro/cnn4.png)\n![](https://thigiacmaytinh.com/wp-content/uploads/2018/05/kernel.png)\n\n### Pooling\n![](https://morvanzhou.github.io/static/results/ML-intro/cnn5.png)\n* The study found that at each convolution, the neural layer may inadvertently lose some information. At this time, pooling can solve this problem well. And pooling is a process of filtering and filtering. Filter the useful information in the layer and analyze it for the next layer. It also reduces the computational burden of the neural network. That is to say, in the volume set, we do not compress the length and width, try to retain more Information, compression work is handed over to the pool, such an additional work can be very effective to improve accuracy. With these technologies, we can build a convolutional neural network of our own.\n\n### Popular CNN Structure\n![](https://morvanzhou.github.io/static/results/ML-intro/cnn6.png)\n* A popular construction structure is this. From bottom to top, the first is the input image, after a convolution, and then the convolution information is processed in a pooling manner. Here, the max pooling method is used. Then, after the same processing, the obtained second processed information is transmitted to the two layers of fully connected neural layers, which is also a general two-layer neural network layer. Finally, it is connected to a classifier for classification prediction. "},{"metadata":{"trusted":true,"_uuid":"164f2f52fb59815b2dbf3e13f7fd1ad554a681ed"},"cell_type":"code","source":"import torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport torch.utils.data as Data\nimport torchvision\nimport matplotlib.pyplot as plt\n%matplotlib inline","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"232b6611549738cece0785aa4d0e00a3ecc8e575"},"cell_type":"code","source":"torch.manual_seed(1)    # reproducible","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"4ba324d5941d2cc740e49229f675ae68f83803e3"},"cell_type":"markdown","source":"### Hyper Parameters"},{"metadata":{"trusted":true,"_uuid":"f869a1177c1d46d6d8b232b25c29869768a8fd92"},"cell_type":"code","source":"# Hyper Parameters\nEPOCH = 1               # train the training data n times, to save time, we just train 1 epoch\nBATCH_SIZE = 50\nLR = 0.001              # learning rate\nDOWNLOAD_MNIST = False","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"7cb80bb727854b4beb5271d24449245abb03beb8"},"cell_type":"markdown","source":"### Dataset Reading"},{"metadata":{"trusted":true,"_uuid":"b0dc03fd1d238e278a84b08173f90db3b7bc09fb"},"cell_type":"code","source":"# Mnist digits dataset\ntrain_data = torchvision.datasets.MNIST(\n    root='../input/mnist/mnist/',\n    train=True,                                     # this is training data\n    transform=torchvision.transforms.ToTensor(),    # Converts a PIL.Image or numpy.ndarray to\n                                                    # torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]\n    download=DOWNLOAD_MNIST,                        # download it if you don't have it\n)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"f8dbc24149dabcd2799df64bc8bad83916089d0f"},"cell_type":"markdown","source":"### Show data"},{"metadata":{"trusted":true,"_uuid":"d2f85d23ac27842acccf895aece811cffe145737"},"cell_type":"code","source":"# plot one example\nprint(train_data.train_data.size())                 # (60000, 28, 28)\nprint(train_data.train_labels.size())               # (60000)\nplt.imshow(train_data.train_data[0].numpy(), cmap='gray')\nplt.title('%i' % train_data.train_labels[0])\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"4a7c5ad7551a9f48a84575bb827fcd0fd2604506"},"cell_type":"markdown","source":"### Data Convert into Variable"},{"metadata":{"trusted":true,"_uuid":"8fcbd936ca670ccb97890dfdf123e97ca0e65b1f"},"cell_type":"code","source":"# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28)\ntrain_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"f93dd126a421e0eaaf382fe629f240d7aced03ee"},"cell_type":"code","source":"# convert test data into Variable, pick 2000 samples to speed up testing\ntest_data = torchvision.datasets.MNIST(root='../input/mnist/mnist/', train=False)\ntest_x = Variable(torch.unsqueeze(test_data.test_data, dim=1)).type(torch.FloatTensor)[:2000]/255.   # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)\ntest_y = test_data.test_labels[:2000]","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"61a153b7ad53df7157d4ac1b39c1c6ab0fcb4c92"},"cell_type":"markdown","source":"### Model Design"},{"metadata":{"trusted":true,"_uuid":"c91b65bceeccefad626b848c3fddeb531b62f3ae"},"cell_type":"code","source":"class CNN(nn.Module):\n    def __init__(self):\n        super(CNN, self).__init__()\n        self.conv1 = nn.Sequential(         # input shape (1, 28, 28)\n            nn.Conv2d(\n                in_channels=1,              # input height\n                out_channels=16,            # n_filters\n                kernel_size=5,              # filter size\n                stride=1,                   # filter movement/step\n                padding=2,                  # if want same width and length of this image after con2d, padding=(kernel_size-1)/2 if stride=1\n            ),                              # output shape (16, 28, 28)\n            nn.ReLU(),                      # activation\n            nn.MaxPool2d(kernel_size=2),    # choose max value in 2x2 area, output shape (16, 14, 14)\n        )\n        self.conv2 = nn.Sequential(         # input shape (1, 28, 28)\n            nn.Conv2d(16, 32, 5, 1, 2),     # output shape (32, 14, 14)\n            nn.ReLU(),                      # activation\n            nn.MaxPool2d(2),                # output shape (32, 7, 7)\n        )\n        self.out = nn.Linear(32 * 7 * 7, 10)   # fully connected layer, output 10 classes\n\n    def forward(self, x):\n        x = self.conv1(x)\n        x = self.conv2(x)\n        x = x.view(x.size(0), -1)           # flatten the output of conv2 to (batch_size, 32 * 7 * 7)\n        output = self.out(x)\n        return output, x    # return x for visualization","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"a89b066b153727adb04fb1e3555c6f843106b682"},"cell_type":"code","source":"cnn = CNN()\nprint(cnn)  # net architecture","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"c451b9f2845cb310265b1cabd0cd001f61c08e69"},"cell_type":"code","source":"optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)   # optimize all cnn parameters\nloss_func = nn.CrossEntropyLoss()                       # the target label is not one-hotted","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"ab1afa191360790390cb6f5cbcd2a33cff8869df"},"cell_type":"markdown","source":"### Resulting Graphs"},{"metadata":{"trusted":true,"scrolled":false,"_uuid":"575c8189b4fa9899cc858162cfa3fe3841d395fb"},"cell_type":"code","source":"# following function (plot_with_labels) is for visualization, can be ignored if not interested\nfrom matplotlib import cm\ntry: from sklearn.manifold import TSNE; HAS_SK = True\nexcept: HAS_SK = False; print('Please install sklearn for layer visualization')\ndef plot_with_labels(lowDWeights, labels):\n    plt.figure(figsize = (20,6))\n    plt.cla()\n    X, Y = lowDWeights[:, 0], lowDWeights[:, 1]\n    for x, y, s in zip(X, Y, labels):\n        c = cm.rainbow(int(255 * s / 9)); plt.text(x, y, s, backgroundcolor=c, fontsize=9)\n    plt.xlim(X.min(), X.max()); plt.ylim(Y.min(), Y.max()); plt.title('Visualize last layer'); plt.show(); plt.pause(0.01)\n\nplt.ion()\n# training and testing\nfor epoch in range(EPOCH):\n    for step, (x, y) in enumerate(train_loader):   # gives batch data, normalize x when iterate train_loader\n        b_x = Variable(x)   # batch x\n        b_y = Variable(y)   # batch y\n\n        output = cnn(b_x)[0]               # cnn output\n        loss = loss_func(output, b_y)   # cross entropy loss\n        optimizer.zero_grad()           # clear gradients for this training step\n        loss.backward()                 # backpropagation, compute gradients\n        optimizer.step()                # apply gradients\n\n        if step % 100 == 0:\n            test_output, last_layer = cnn(test_x)\n            pred_y = torch.max(test_output, 1)[1].data.squeeze()\n            accuracy = (pred_y == test_y).sum().item() / float(test_y.size(0))\n            print('Epoch: ', epoch, '| train loss: %.4f' % loss.data[0], '| test accuracy: %.2f' % accuracy)\n            if HAS_SK:\n                # Visualization of trained flatten layer (T-SNE)\n                tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)\n                plot_only = 500\n                low_dim_embs = tsne.fit_transform(last_layer.data.numpy()[:plot_only, :])\n                labels = test_y.numpy()[:plot_only]\n                plot_with_labels(low_dim_embs, labels)\nplt.ioff()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"476ec2b293effe78558feacf0bde13386a2fef51"},"cell_type":"code","source":"# print 10 predictions from test data\ntest_output, _ = cnn(test_x[:10])\npred_y = torch.max(test_output, 1)[1].data.numpy().squeeze()\nprint(pred_y, 'prediction number')\nprint(test_y[:10].numpy(), 'real number')","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"22b0528e550f47439c9a1d3f09d6ee56587abe85"},"cell_type":"markdown","source":" ## RNN-Classification\n ---"},{"metadata":{"_uuid":"2e658ebf8870e9402eb27cff85a0033b82679dc6"},"cell_type":"markdown","source":"![](https://i.stack.imgur.com/WSOie.png)\n\n### Sequence Modeling\n\n![](https://morvanzhou.github.io/static/results/ML-intro/rnn2.png)\n* We imagine that there is now a set of sequence data **data 0,1,2,3.** When **predicting result0**, we are based on **data0, and when predicting other data,** we are only based on a single data. The neural networks are all the same NN. However, the data is in an ascending order, just like cooking in the kitchen. Sauce A is placed earlier than Sauce B, otherwise it will be scented. So the ordinary neural network structure can not let NN understands the association between these data.\n\n### Neural network for processing sequence\n\n![](https://morvanzhou.github.io/static/results/ML-intro/rnn3.png)\n\n* So how do we let the association between data be analyzed by NN? Think about how humans analyze the ***relationship between things.*** The most basic way is to remember *what happened before.* Then we let the neural network also have this. ***The ability to remember what happened before***. \n* When analyzing **Data0, we store the analysis results in memory. Then when analyzing data1, NN will generate new memories, but new memories and old memories are unrelated. Simply call the old memory and analyze it together. If you continue to analyze more ordered data, RNN will accumulate the previous memories and analyze them together.**\n\n![](https://morvanzhou.github.io/static/results/ML-intro/rnn4.png)\n\n* Let's repeat the process just now, but this time to add some mathematical aspects. Every time RNN is done, it will produce a description of the **current state**. We replace it with a **shorthand S(t), then the RNN starts Analyze x(t+1), which produces s(t+1) from x(t+1), but y(t+1) is created by s(t) and s(t+1)** So the RNN we usually see can also be expressed like this.\n\n### Application of RNN\n\n* The form of RNN is not limited to this. His structure is very free. If it is used for classification problems, for example, **if a person says a sentence, the emotional color of this sentence is positive or negative. Then we will You can use the RNN that outputs the judgment result only at the last time.**\n* Or this is the picture description RNN, we only need an X to replace the input picture, and then generate a paragraph describing the picture.\n* Or the **RNN of the language translation, give a paragraph of English, and then translate it into Any Langauge.**\n* With these different forms of RNN, RNN becomes powerful. There are many interesting RNN applications. **For example, let RNN describe the photo. Let RNN write the academic paper, let RNN write the program script, let RNN compose. We The average person can't even tell if this is written by the machine.**"},{"metadata":{"trusted":true,"_uuid":"e2b0e7e687520bf03721ae4a6fd075e6cee88b4c"},"cell_type":"code","source":"import torch\nfrom torch import nn\nfrom torch.autograd import Variable\nimport torchvision.datasets as dsets\nimport torchvision.transforms as transforms\nimport matplotlib.pyplot as plt\n%matplotlib inline\n\ntorch.manual_seed(1)    # reproducible","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"459e21565a29112fdbbf6c6ca65fc9c2c9b44d80"},"cell_type":"markdown","source":"### Hyper Parameters"},{"metadata":{"trusted":true,"_uuid":"36ec0007e3293dcdd2df5abe7df429510e212b54"},"cell_type":"code","source":"EPOCH = 1               # train the training data n times, to save time, we just train 1 epoch\nBATCH_SIZE = 64\nTIME_STEP = 28          # rnn time step / image height\nINPUT_SIZE = 28         # rnn input size / image width\nLR = 0.01               # learning rate\nDOWNLOAD_MNIST = True   # set to True if haven't download the data","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"cf5f642e58ea79a56b92e121bee0c24c5236e166"},"cell_type":"code","source":"# Mnist digital dataset\ntrain_data = dsets.MNIST(\n    root='../input/mnist/mnist/',\n    train=True,                         # this is training data\n    transform=transforms.ToTensor(),    # Converts a PIL.Image or numpy.ndarray to\n                                        # torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]\n    download=DOWNLOAD_MNIST,            # download it if you don't have it\n)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"e6a0fc0131bbcaf8aa6837cf1043c9b59d8e603e"},"cell_type":"code","source":"# plot one example\nprint(train_data.train_data.size())     # (60000, 28, 28)\nprint(train_data.train_labels.size())   # (60000)\nplt.imshow(train_data.train_data[0].numpy(), cmap='gray')\nplt.title('%i' % train_data.train_labels[0])\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"915d464d5611ad0b52ed4c6e058db3302a1069c7"},"cell_type":"markdown","source":"### Data Loader for easy mini-batch return in training"},{"metadata":{"trusted":true,"_uuid":"cbf6003a1dece8b3606ca31f3a19808951fb069a"},"cell_type":"code","source":"train_loader = torch.utils.data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"b3acffc271ce37b10f394292fcc8dca93487a463"},"cell_type":"markdown","source":"### Convert test data into Variable, pick 2000 samples to speed up testing"},{"metadata":{"trusted":true,"_uuid":"905d5257f6868bb35eeb7344a3ee6b7367f1bb24"},"cell_type":"code","source":"test_data = dsets.MNIST(root='../input/mnist/mnist/', train=False, transform=transforms.ToTensor())\ntest_x = Variable(test_data.test_data, volatile=True).type(torch.FloatTensor)[:2000]/255.   # shape (2000, 28, 28) value in range(0,1)\ntest_y = test_data.test_labels.numpy().squeeze()[:2000]    # covert to numpy array","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"41fbe751f225524863ce8ac3770fe05ef24e8b51"},"cell_type":"markdown","source":"### Model Design"},{"metadata":{"trusted":true,"_uuid":"685e5e43c1e61da25812e9e29b842f36ee901b18"},"cell_type":"code","source":"class RNN(nn.Module):\n    def __init__(self):\n        super(RNN, self).__init__()\n\n        self.rnn = nn.LSTM(         # if use nn.RNN(), it hardly learns\n            input_size=INPUT_SIZE,\n            hidden_size=64,         # rnn hidden unit\n            num_layers=1,           # number of rnn layer\n            batch_first=True,       # input & output will has batch size as 1s dimension. e.g. (batch, time_step, input_size)\n        )\n\n        self.out = nn.Linear(64, 10)\n\n    def forward(self, x):\n        # x shape (batch, time_step, input_size)\n        # r_out shape (batch, time_step, output_size)\n        # h_n shape (n_layers, batch, hidden_size)\n        # h_c shape (n_layers, batch, hidden_size)\n        r_out, (h_n, h_c) = self.rnn(x, None)   # None represents zero initial hidden state\n\n        # choose r_out at the last time step\n        out = self.out(r_out[:, -1, :])\n        return out","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"60f24312fdbc2f3661a07e3c94a497e795bd3443"},"cell_type":"code","source":"rnn = RNN()\nprint(rnn)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"ebc1a1daa6f30762d3dcc022da47415163c1a454"},"cell_type":"code","source":"optimizer = torch.optim.Adam(rnn.parameters(), lr=LR)   # optimize all cnn parameters\nloss_func = nn.CrossEntropyLoss() ","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"3c86a795e17863cf11545391cef15d56d297b830"},"cell_type":"markdown","source":"### Training and Testing"},{"metadata":{"_kg_hide-output":false,"_kg_hide-input":true,"trusted":true,"_uuid":"3aae73599c4ee8ff5066a652e7e2b70a765bcc1a"},"cell_type":"code","source":"# training and testing\nfor epoch in range(EPOCH):\n    for step, (x, y) in enumerate(train_loader):        # gives batch data\n        b_x = Variable(x.view(-1, 28, 28))              # reshape x to (batch, time_step, input_size)\n        b_y = Variable(y)                               # batch y\n\n        output = rnn(b_x)                               # rnn output\n        loss = loss_func(output, b_y)                   # cross entropy loss\n        optimizer.zero_grad()                           # clear gradients for this training step\n        loss.backward()                                 # backpropagation, compute gradients\n        optimizer.step()                                # apply gradients\n\n        if step % 50 == 0:\n            test_output = rnn(test_x)                   # (samples, time_step, input_size)\n            pred_y = torch.max(test_output, 1)[1].data.numpy().squeeze()\n            accuracy = sum(pred_y == test_y) / float(test_y.size)\n            print('Epoch: ', epoch, '| train loss: %.4f' % loss.data[0], '| test accuracy: %.2f' % accuracy)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"e240370063394d2651a779ab79a14f4dd58d4220"},"cell_type":"markdown","source":"### Predicted and Actual Value match"},{"metadata":{"trusted":true,"_uuid":"4980f16a020cb54a249b8db3d75b05025a0c1ea8"},"cell_type":"code","source":"# print 20 predictions from test data\ntest_output = rnn(test_x[:20].view(-1, 28, 28))\npred_y = torch.max(test_output, 1)[1].data.numpy().squeeze()\nprint(pred_y, 'prediction number')\nprint(test_y[:20], 'real number')","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"81d98fe36b40734f282213b4e15fd83852b1c5aa"},"cell_type":"code","source":"plt.figure(1, figsize=(20, 8))\nplt.plot(pred_y, c='green', label='Predicted')\nplt.plot(test_y[:20], c='orange', label='Actual')\nplt.xlabel(\"Index\")\nplt.ylabel(\"Predicted/Actual Value\")\nplt.title(\"RNN Classification Result Analysis\")\nplt.legend(loc='best')","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"5cd501338fc9b0406f37be1eea49bd3f14f9aafa"},"cell_type":"markdown","source":"## RNN-Regression\n---\nNote : **Regression Concept I have already explaned above, so here I have applied direct RNN on Regression.**"},{"metadata":{"trusted":true,"_uuid":"3c0055c221537ec58c1bd0284777971578639a31"},"cell_type":"code","source":"import torch\nfrom torch import nn\nfrom torch.autograd import Variable\nimport numpy as np\nimport matplotlib.pyplot as plt\n%matplotlib inline\n\ntorch.manual_seed(1)    # reproducible","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"70f217d164646aff355c44fad7fe83dd4b763afc"},"cell_type":"markdown","source":"###  Hyper Parameters"},{"metadata":{"trusted":true,"_uuid":"c007938f26483126323865073f0b4aa1d0f86fab"},"cell_type":"code","source":"# Hyper Parameters\nTIME_STEP = 10      # rnn time step\nINPUT_SIZE = 1      # rnn input size\nLR = 0.02           # learning rate","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"19cd6842458f534aec49e6ee8fd8fe129b2d3dee"},"cell_type":"markdown","source":"### Show data"},{"metadata":{"trusted":true,"_uuid":"e6c575b9d6c201a1251a1f1912cf07c4b3d92667"},"cell_type":"code","source":"steps = np.linspace(0, np.pi*2, 100, dtype=np.float32)\nx_np = np.sin(steps)    # float32 for converting torch FloatTensor\ny_np = np.cos(steps)\nplt.figure(figsize=(20,5))\nplt.plot(steps, y_np, 'r-', label='target (cos)')\nplt.plot(steps, x_np, 'b-', label='input (sin)')\nplt.legend(loc='best')\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"2a4cb5be7c39f2db1aebec98bc05a6ec50916882"},"cell_type":"markdown","source":"### Model Design"},{"metadata":{"trusted":true,"_uuid":"c56e3203a104bf2a8c17856d9a1b9a5c65317e05"},"cell_type":"code","source":"class RNN(nn.Module):\n    def __init__(self):\n        super(RNN, self).__init__()\n\n        self.rnn = nn.RNN(\n            input_size=INPUT_SIZE,\n            hidden_size=32,     # rnn hidden unit\n            num_layers=1,       # number of rnn layer\n            batch_first=True,   # input & output will has batch size as 1s dimension. e.g. (batch, time_step, input_size)\n        )\n        self.out = nn.Linear(32, 1)\n\n    def forward(self, x, h_state):\n        # x (batch, time_step, input_size)\n        # h_state (n_layers, batch, hidden_size)\n        # r_out (batch, time_step, hidden_size)\n        r_out, h_state = self.rnn(x, h_state)\n\n        outs = []    # save all predictions\n        for time_step in range(r_out.size(1)):    # calculate output for each time step\n            outs.append(self.out(r_out[:, time_step, :]))\n        return torch.stack(outs, dim=1), h_state\n\n        # instead, for simplicity, you can replace above codes by follows\n        # r_out = r_out.view(-1, 32)\n        # outs = self.out(r_out)\n        # return outs, h_state","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"2d2a7c0493de189a788997f69a9dfb6c02e0f1e8"},"cell_type":"code","source":"rnn = RNN()\nprint(rnn)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"aadcbc7697fe659bd418eb115220e97720f540a5"},"cell_type":"code","source":"optimizer = torch.optim.Adam(rnn.parameters(), lr=LR)   # optimize all cnn parameters\nloss_func = nn.MSELoss()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"db76f1b5d8764df6a7be15072e8154890cb2965a"},"cell_type":"markdown","source":"### Model Training and Result Plotting"},{"metadata":{"trusted":true,"_uuid":"7184dc9ba7fe21860de28b79299034178943aead"},"cell_type":"code","source":"h_state = None      # for initial hidden state","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"fcbd317e1fe40c5a974b4cf628f9297254c90485"},"cell_type":"code","source":"plt.ion()           # continuously plot","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"scrolled":false,"_uuid":"1d9d27477d6a5944f8f67bcc50d31b0bd6c77d5a"},"cell_type":"code","source":"for step in range(50):\n    start, end = step * np.pi, (step+1)*np.pi   # time range\n    # use sin predicts cos\n    steps = np.linspace(start, end, TIME_STEP, dtype=np.float32)\n    x_np = np.sin(steps)    # float32 for converting torch FloatTensor\n    y_np = np.cos(steps)\n\n    x = torch.from_numpy(x_np[np.newaxis, :, np.newaxis])    # shape (batch, time_step, input_size)\n    y = torch.from_numpy(y_np[np.newaxis, :, np.newaxis])\n\n    prediction, h_state = rnn(x, h_state)   # rnn output\n    # !! next step is important !!\n    h_state = h_state.data        # repack the hidden state, break the connection from last iteration\n\n    loss = loss_func(prediction, y)         # calculate loss\n    optimizer.zero_grad()                   # clear gradients for this training step\n    loss.backward()                         # backpropagation, compute gradients\n    optimizer.step()                        # apply gradients\n\n    # plotting\n    plt.figure(1, figsize=(20, 5))\n    plt.plot(steps, y_np.flatten(), 'r-', label=\"Actual\")\n    plt.plot(steps, prediction.data.numpy().flatten(), 'b-', label=\"Predicted\")\n    plt.xlabel(\"Steps\")\n    plt.ylabel(\"Actual/Predicted\")\n    plt.title(\"Result\")\n    plt.draw(); plt.pause(0.05)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"bc0d8961a7dadf87aee5fb5f5fc9ed6c96310f08"},"cell_type":"code","source":"plt.ioff()\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"989502e4ae3bab3d15f73d45dec49b97b247f3ba"},"cell_type":"markdown","source":"## AutoEncoder\n---\n\n* ***Nueral Network Unsupervised form is Known as AutoEncoder.***\n\n* Autoencoders (AE) are neural networks that aims to copy their inputs to their outputs. They work by compressing the input into a latent-space representation, and then reconstructing the output from this representation. This kind of network is composed of two parts :\n\n* **1) Encoder:** This is the part of the network that compresses the input into a latent-space representation. It can be represented by an encoding function h=f(x).\n* **2) Decoder:** This part aims to reconstruct the input from the latent space representation. It can be represented by a decoding function r=g(h).\n\n![](https://cdn-images-1.medium.com/max/880/1*V_YtxTFUqDrmmu2JqMZ-rA.png)\n\n### What are autoencoders used for ?\n* Data denoising and Dimensionality reduction for data visualization are considered as two main interesting practical applications of autoencoders. With appropriate dimensionality and sparsity constraints, autoencoders can learn data projections that are more interesting than PCA or other basic techniques.\n\n### Types of autoencoder :\n* **Vanilla autoencoder**: In its simplest form, the autoencoder is a three layers net, i.e. a neural net with one hidden layer. The input and output are the same, and we learn how to reconstruct the input, for example using the adam optimizer and the mean squared error loss function.([Code](https://gist.githubusercontent.com/nathanhubens/604fd9cfc2d7f3d022e3ef0cf4b787de/raw/bc437b49f974b767e488a4a896b0e869d87a39d6/vanilla%20autoencoder))\n\n* **Multilayer autoencoder** :  If one hidden layer is not enough, we can obviously extend the autoencoder to more hidden layers.([Code](https://gist.githubusercontent.com/nathanhubens/219ab6efcfbab95508495eb6d6e41884/raw/d50fe4cb3b5d361da68156c789d5bd25f5dad321/multilayer%20autoencoder))\n* **Convolutional autoencoder**: We may also ask ourselves: can autoencoders be used with Convolutions instead of Fully-connected layers ?\nThe answer is yes and the principle is the same, but using images (3D vectors) instead of flattened 1D vectors. The input image is downsampled to give a latent representation of smaller dimensions and force the autoencoder to learn a compressed version of the images. ([Code](https://gist.github.com/nathanhubens/2f11dd9257263874b94966eb48e42922/raw/48f545769116607f713cded00c82d698ce9fb25a/convolutional%20autoencoder))\n\n* **Regularized autoencoder**:  There are other ways we can constraint the reconstruction of an autoencoder than to impose a hidden layer of smaller dimension than the input. Rather than limiting the model capacity by keeping the encoder and decoder shallow and the code size small, regularized autoencoders use a loss function that encourages the model to have other properties besides the ability to copy its input to its output. In practice, we usually find two types of regularized autoencoder: the sparse autoencoder and the denoising autoencoder.\n\n    * **Sparse autoencoder**: Sparse autoencoders are typically used to learn features for another task such as classification. An autoencoder that has been regularized to be sparse must respond to unique statistical features of the dataset it has been trained on, rather than simply acting as an identity function. In this way, training to perform the copying task with a sparsity penalty can yield a model that has learned useful features as a byproduct.([Code](https://gist.github.com/nathanhubens/c6000eee8d6f919d01465183f79a62b6/raw/2e5085740299cf1d9f0a28ddfb438eee4bfe5903/Sparse%20autoencoder))\n\n    * **Denoising autoencoder :** Rather than adding a penalty to the loss function, we can obtain an autoencoder that learns something useful by changing the reconstruction error term of the loss function. This can be done by adding some noise of the input image and make the autoencoder learn to remove it. By this means, the encoder will extract the most important features and learn a robuster representation of the data.([Code](https://gist.github.com/nathanhubens/2c2a7cc138e3d170956c109b10f5a7f7/raw/3b44ff899fe23f2224edf782e8dd068551d83ec5/denoising%20ae))\n"},{"metadata":{"trusted":true,"_uuid":"7ca49132e6961f71377f78573cd35588cca4b5a5"},"cell_type":"code","source":"import torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport torch.utils.data as Data\nimport torchvision\nimport matplotlib.pyplot as plt\nfrom mpl_toolkits.mplot3d import Axes3D\nfrom matplotlib import cm\nimport numpy as np\n%matplotlib inline\n\ntorch.manual_seed(1)    # reproducible","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"031d2820d9f63c8c797940298ea3b45bbeba22b4"},"cell_type":"markdown","source":"### Hyper Parameters"},{"metadata":{"trusted":true,"_uuid":"87b602fd510e5645119566a74bf4c96599f61008"},"cell_type":"code","source":"EPOCH = 10\nBATCH_SIZE = 64\nLR = 0.005         # learning rate\nDOWNLOAD_MNIST = False\nN_TEST_IMG = 5","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"cb9e58cbfe55692a887ed5dfe363ba31cc95a9fd"},"cell_type":"code","source":"# Mnist digits dataset\ntrain_data = torchvision.datasets.MNIST(\n    root='../input/mnist/mnist/',\n    train=True,                                     # this is training data\n    transform=torchvision.transforms.ToTensor(),    # Converts a PIL.Image or numpy.ndarray to\n                                                    # torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]\n    download=DOWNLOAD_MNIST,                        # download it if you don't have it\n)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"d3cbb367b3c2fff9471c3942b7bf0c6ae66e905a"},"cell_type":"code","source":"# plot one example\nprint(train_data.train_data.size())     # (60000, 28, 28)\nprint(train_data.train_labels.size())   # (60000)\nplt.imshow(train_data.train_data[2].numpy(), cmap='gray')\nplt.title('%i' % train_data.train_labels[2])\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"499aa7e8181942ab9c5d7008a93f6558133710fe"},"cell_type":"markdown","source":"### Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28)"},{"metadata":{"trusted":true,"_uuid":"2fa1f2d9dc2229f5eb64f317d7052ee356d632e3"},"cell_type":"code","source":"train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"1dc4630ce1a1014a93d1b03c94ec52e4f7be21f4"},"cell_type":"markdown","source":"### Model Design"},{"metadata":{"trusted":true,"_uuid":"1be24d229b8f7cd2aef6ca03a78ccec08a3238f6"},"cell_type":"code","source":"class AutoEncoder(nn.Module):\n    def __init__(self):\n        super(AutoEncoder, self).__init__()\n\n        self.encoder = nn.Sequential(\n            nn.Linear(28*28, 128),\n            nn.Tanh(),\n            nn.Linear(128, 64),\n            nn.Tanh(),\n            nn.Linear(64, 12),\n            nn.Tanh(),\n            nn.Linear(12, 3),   # compress to 3 features which can be visualized in plt\n        )\n        self.decoder = nn.Sequential(\n            nn.Linear(3, 12),\n            nn.Tanh(),\n            nn.Linear(12, 64),\n            nn.Tanh(),\n            nn.Linear(64, 128),\n            nn.Tanh(),\n            nn.Linear(128, 28*28),\n            nn.Sigmoid(),       # compress to a range (0, 1)\n        )\n\n    def forward(self, x):\n        encoded = self.encoder(x)\n        decoded = self.decoder(encoded)\n        return encoded, decoded","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"773ce99aa7aa977da7dd7afd88be78fc2751b0c8"},"cell_type":"markdown","source":"### Model Training"},{"metadata":{"trusted":true,"_uuid":"f825000880040d93de24f838917de15512d04d94"},"cell_type":"code","source":"autoencoder = AutoEncoder()\nprint(autoencoder)\n\noptimizer = torch.optim.Adam(autoencoder.parameters(), lr=LR)\nloss_func = nn.MSELoss()\n\n# original data (first row) for viewing\nview_data = Variable(train_data.train_data[:N_TEST_IMG].view(-1, 28*28).type(torch.FloatTensor)/255.)\n\nfor epoch in range(EPOCH):\n    for step, (x, y) in enumerate(train_loader):\n        b_x = Variable(x.view(-1, 28*28))   # batch x, shape (batch, 28*28)\n        b_y = Variable(x.view(-1, 28*28))   # batch y, shape (batch, 28*28)\n        b_label = Variable(y)               # batch label\n\n        encoded, decoded = autoencoder(b_x)\n\n        loss = loss_func(decoded, b_y)      # mean square error\n        optimizer.zero_grad()               # clear gradients for this training step\n        loss.backward()                     # backpropagation, compute gradients\n        optimizer.step()                    # apply gradients\n\n        if step % 500 == 0 and epoch in [0, 5, EPOCH-1]:\n            print('Epoch: ', epoch, '| train loss: %.4f' % loss.data[0])\n\n            # plotting decoded image (second row)\n            _, decoded_data = autoencoder(view_data)\n            \n            # initialize figure\n            f, a = plt.subplots(2, N_TEST_IMG, figsize=(5, 2))\n            \n            for i in range(N_TEST_IMG):\n                a[0][i].imshow(np.reshape(view_data.data.numpy()[i], (28, 28)), cmap='gray'); a[0][i].set_xticks(()); a[0][i].set_yticks(())\n    \n            for i in range(N_TEST_IMG):\n                a[1][i].clear()\n                a[1][i].imshow(np.reshape(decoded_data.data.numpy()[i], (28, 28)), cmap='gray')\n                a[1][i].set_xticks(()); a[1][i].set_yticks(())\n            plt.show(); plt.pause(0.05)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"464981595e556d7d8de70c70f329ea2918ad7669"},"cell_type":"markdown","source":"### Visualize in 3D plot"},{"metadata":{"trusted":true,"_uuid":"b1184d0a00f064bdf635d8935c55461abad9a3a8"},"cell_type":"code","source":"# visualize in 3D plot\nview_data = Variable(train_data.train_data[:200].view(-1, 28*28).type(torch.FloatTensor)/255.)\nencoded_data, _ = autoencoder(view_data)\nfig = plt.figure(2,figsize=(15,6)); ax = Axes3D(fig)\nX, Y, Z = encoded_data.data[:, 0].numpy(), encoded_data.data[:, 1].numpy(), encoded_data.data[:, 2].numpy()\nvalues = train_data.train_labels[:200].numpy()\nfor x, y, z, s in zip(X, Y, Z, values):\n    c = cm.rainbow(int(255*s/9)); ax.text(x, y, z, s, backgroundcolor=c)\nax.set_xlim(X.min(), X.max()); ax.set_ylim(Y.min(), Y.max()); ax.set_zlim(Z.min(), Z.max())\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"779a02c5f0e9a2b72aa154a5e994a867cc764936"},"cell_type":"markdown","source":"## DQN Reinforcement Learning\n---\n\n*  **Intensive learning**, Deep Q Network is referred to as DQN for short. The Google Deep Mind team is relying on this DQN to make computers play more powerful than us.\n\n![](https://cdn-images-1.medium.com/max/1600/1*M8RWevLxhus56RABFEGYYQ.png)\n\n### Reinforcement learning and neural networks\n---\n\n* **The reinforcement learning methods are more traditional methods.** Nowadays, with the various applications of **machine learning in daily life**, various machine learning methods are also integrated, merged and upgraded. The intensive study explored is a method that combines **neural network and Q learning , called Deep Q Network.** Why is this new structure proposed? Originally, traditional form-based reinforcement learning has such a bottleneck.\n\n### The role of neural networks\n---\n![](https://morvanzhou.github.io/static/results/ML-intro/DQN2.png)\n* We use a table to store each state state, and the Q value that each behavior action has in this state. The problem is that it is too complicated, and the state can be more than the stars in the sky (such as Go). Using tables to store them, I am afraid that our computer has not enough memory, and it is time consuming to search for the corresponding state in such a large table. However, in machine learning, there is a way It is very good for this kind of thing, that is the neural network. \n* We can use the state and action as the input of the neural network, and then the neural network analysis to get the Q value of the action, so we do not need to record the Q value in the table, and It is directly using the neural network to generate Q values. Another form is that we can only input the state value, output all the action values, and then directly select the action with the maximum value as the next step according to the principle of Q learning. \n* The action. We can imagine that the neural network accepts external information, which is equivalent to collecting information from the eyes, nose and ears, and then outputting each action through brain processing. The value, finally select the action by means of reinforcement learning.\n\n### Q-Learning [Reference](https://medium.com/@jonathan_hui/rl-dqn-deep-q-network-e207751f7ae4)\n---\n* Q-learning learns the action-value function Q(s, a): how good to take an action at a particular state. For example, for the board position below, how good to move the pawn two steps forward. Literally, we assign a scalar value over the benefit of making such a move.\n\n![](https://cdn-images-1.medium.com/max/800/1*srmv0GScAs6vObPfPj0-uQ.png)\n\n* In Q-learning, we build a memory table Q[s, a] to store Q-values for all possible combinations of s and a. If you are a chess player, it is the cheat sheet for the best move. In the example above, we may realize that moving the pawn 2 steps ahead has the highest Q values over all others. (The memory consumption will be too high for the chess game. But let’s stick with this approach a little bit longer.)\n\n* Technical speaking, we sample an action from the current state. We find out the reward R (if any) and the new state s’ (the new board position). From the memory table, we determine the next action a’ to take which has the maximum Q(s’, a’).\n\n![](https://cdn-images-1.medium.com/max/800/1*yh8Z2t41HBvY5gwBPTqVVQ.jpeg)\n\n* In a video game, we score points (rewards) by shooting down the enemy. In a chess game, the reward is +1 when we win or -1 if we lose. So there is only one reward given and it takes a while to get it.\n\n* We can take a single move a and see what reward R can we get. This creates a one-step look ahead. R + Q(s’, a’) becomes the target that we want Q(s, a) to be. For example, say all Q values are equal to one now. If we move the joystick to the right and score 2 points, we want to move Q(s, a) closer to 3 (i.e. 2 + 1).\n\n![](https://cdn-images-1.medium.com/max/800/1*9CdBkaFzFRyACvj5P2Af2Q.png)\n\n* As we keep playing, we maintain a running average for Q. The values will get better and with some tricks, the Q values will converge.\n\n---\n### Q-Learning Algorithm\n\n---\n![](https://cdn-images-1.medium.com/max/800/1*5ffOxpSgIJCYn0XccfFYUQ.png)\n\n\n* However, if the combinations of states and actions are too large, the memory and the computation requirement for Q will be too high. To address that, we switch to a deep network Q (DQN) to approximate Q(s, a). This is called Deep Q-learning. With the new approach, we generalize the approximation of the Q-value function rather than remembering the solutions.\n\n![](https://rubenfiszel.github.io/posts/rl4j/qmodeling.png)\n\n---\n### Deep Q-Network Algorithm with experience replay\n\n---\n\n**Algorithms**\n![](https://cdn-images-1.medium.com/max/800/1*8coZ4g_pRtfyoHmsuzMH6g.png)\n"},{"metadata":{"trusted":true,"_uuid":"e0290b77192ffa9c0c9abbe85fbd12b7546fe448"},"cell_type":"code","source":"import torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport torch.nn.functional as F\nimport numpy as np\nimport gym","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"921888128091e55ab348a784a9e2df90148b7eb2"},"cell_type":"markdown","source":"### Hyper Parameters"},{"metadata":{"trusted":true,"_uuid":"619858da52d2fb5ef2e1e1fcf9976feece3a8279"},"cell_type":"code","source":"# Hyper Parameters\nBATCH_SIZE = 32\nLR = 0.01                   # learning rate\nEPSILON = 0.9               # greedy policy\nGAMMA = 0.9                 # reward discount\nTARGET_REPLACE_ITER = 100   # target update frequency\nMEMORY_CAPACITY = 2000\nenv = gym.make('CartPole-v0')\nenv = env.unwrapped\nN_ACTIONS = env.action_space.n\nN_STATES = env.observation_space.shape[0]","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"8c5587261dbf3e82f755272487d19752fceb2f02"},"cell_type":"markdown","source":"### Neural Network Design"},{"metadata":{"trusted":true,"_uuid":"01ea194a6b40baf77cae3e9a379735a3dd37aee6"},"cell_type":"code","source":"class Net(nn.Module):\n    def __init__(self, ):\n        super(Net, self).__init__()\n        self.fc1 = nn.Linear(N_STATES, 10)\n        self.fc1.weight.data.normal_(0, 0.1)   # initialization\n        self.out = nn.Linear(10, N_ACTIONS)\n        self.out.weight.data.normal_(0, 0.1)   # initialization\n\n    def forward(self, x):\n        x = self.fc1(x)\n        x = F.relu(x)\n        actions_value = self.out(x)\n        return actions_value","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"a52d6c876e88da7bb66233d32244b06d28432da4"},"cell_type":"markdown","source":"### DQN Model Design"},{"metadata":{"trusted":true,"_uuid":"c51a70fe92dbf3d64784faf607bc61f63675c89a"},"cell_type":"code","source":"class DQN(object):\n    def __init__(self):\n        self.eval_net, self.target_net = Net(), Net()\n\n        self.learn_step_counter = 0                                     # for target updating\n        self.memory_counter = 0                                         # for storing memory\n        self.memory = np.zeros((MEMORY_CAPACITY, N_STATES * 2 + 2))     # initialize memory\n        self.optimizer = torch.optim.Adam(self.eval_net.parameters(), lr=LR)\n        self.loss_func = nn.MSELoss()\n\n    def choose_action(self, x):\n        x = Variable(torch.unsqueeze(torch.FloatTensor(x), 0))\n        # input only one sample\n        if np.random.uniform() < EPSILON:   # greedy\n            actions_value = self.eval_net.forward(x)\n            action = torch.max(actions_value, 1)[1].data.numpy()[0, 0]     # return the argmax\n        else:   # random\n            action = np.random.randint(0, N_ACTIONS)\n        return action\n\n    def store_transition(self, s, a, r, s_):\n        transition = np.hstack((s, [a, r], s_))\n        # replace the old memory with new memory\n        index = self.memory_counter % MEMORY_CAPACITY\n        self.memory[index, :] = transition\n        self.memory_counter += 1\n\n    def learn(self):\n        # target parameter update\n        if self.learn_step_counter % TARGET_REPLACE_ITER == 0:\n            self.target_net.load_state_dict(self.eval_net.state_dict())\n        self.learn_step_counter += 1\n\n        # sample batch transitions\n        sample_index = np.random.choice(MEMORY_CAPACITY, BATCH_SIZE)\n        b_memory = self.memory[sample_index, :]\n        b_s = Variable(torch.FloatTensor(b_memory[:, :N_STATES]))\n        b_a = Variable(torch.LongTensor(b_memory[:, N_STATES:N_STATES+1].astype(int)))\n        b_r = Variable(torch.FloatTensor(b_memory[:, N_STATES+1:N_STATES+2]))\n        b_s_ = Variable(torch.FloatTensor(b_memory[:, -N_STATES:]))\n\n        # q_eval w.r.t the action in experience\n        q_eval = self.eval_net(b_s).gather(1, b_a)  # shape (batch, 1)\n        q_next = self.target_net(b_s_).detach()     # detach from graph, don't backpropagate\n        q_target = b_r + GAMMA * q_next.max(1)[0]   # shape (batch, 1)\n        loss = self.loss_func(q_eval, q_target)\n\n        self.optimizer.zero_grad()\n        loss.backward()\n        self.optimizer.step()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"0a0c4178c49a9a2b6080f9e9d66d71699da55431"},"cell_type":"code","source":"dqn = DQN()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"34074d0620ac95081082177e86980e68d74599a3"},"cell_type":"code","source":"'''\nprint('\\nCollecting experience...')\nfor i_episode in range(400):\n    s = env.reset()\n    ep_r = 0\n    while True:\n        env.render()\n        a = dqn.choose_action(s)\n\n        # take action\n        s_, r, done, info = env.step(a)\n\n        # modify the reward\n        x, x_dot, theta, theta_dot = s_\n        r1 = (env.x_threshold - abs(x)) / env.x_threshold - 0.8\n        r2 = (env.theta_threshold_radians - abs(theta)) / env.theta_threshold_radians - 0.5\n        r = r1 + r2\n\n        dqn.store_transition(s, a, r, s_)\n\n        ep_r += r\n        if dqn.memory_counter > MEMORY_CAPACITY:\n            dqn.learn()\n            if done:\n                print('Ep: ', i_episode,\n                      '| Ep_r: ', round(ep_r, 2))\n\n        if done:\n            break\n        s = s_\n'''","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"3897a655df567ccfedae0b5a272a107bf26050cd"},"cell_type":"markdown","source":"#### Note : After Running this code you can find below output this will not open here you have to run in your own computer.because it will generate ikernel for this.\n\n![](https://pytorch.org/tutorials/_images/cartpole1.gif)"},{"metadata":{"_uuid":"d0e85e492aea0a1e1284972ae04d06be0c2be610"},"cell_type":"markdown","source":"## Generative Adversarial Network\n---\n![](https://cdn-images-1.medium.com/max/2000/1*AZ5-3WdNdYyC2U0Aq7RhIg.png)\n\n* In **2014, Ian Goodfellow and his colleagues at the University of Montreal published** a [stunning paper](https://arxiv.org/pdf/1406.2661.pdf) introducing the world to **GANs, or generative adversarial networks.** Through an innovative combination of computational graphs and game theory they showed that, given enough modeling power, two models fighting against each other would be able to co-train through plain old backpropagation.\n\n![](https://cdn-images-1.medium.com/max/800/1*-gFsbymY9oJUQJ-A3GTfeg.png)\n\n\n* The models play **two distinct (literally, adversarial) roles.** Given some real data set **R, G** is the **generator**, trying to **create fake data** that looks just like **the genuine data,** while **D** is the **discriminator**, getting data from either the real set or **G** and labeling the difference. **Goodfellow’s metaphor** (and a fine one it is) was that **G** was like a **team of forgers trying to match real paintings** with their output, while **D** was the team of **detectives trying to tell the difference**. (Except that in this case, the forgers G never get to see the original data — only the judgments of D. They’re like blind forgers.)\n\n* **GANs** or **Generative Adversarial Networks** are a kind of neural networks that is composed of **2** separate deep neural networks competing each other: the **generator** and **the discriminator**.\n\n* **GAN** to generate various things. It can **generate realistic images, 3D-models, videos, and a lot more.Like this Example below**\n![](https://cdn-images-1.medium.com/max/800/1*NmRWSaTpBydHKnGIEAyWMw.png)\n\n---\n### Idea of GAN\n\n---\n#### Architecture of GAN\n![](https://cdn-images-1.medium.com/max/2000/1*39Nnni_nhPDaLu9AnTLoWw.png)\n**1) Generator**\n![](https://cdn-images-1.medium.com/max/1000/1*7i9iCdLZraZkrMy1-KADrA.png)\n**2) Discriminator:**\n![](https://www.researchgate.net/profile/Sinan_Kaplan2/publication/319093376/figure/fig20/AS:526859935731712@1502624605127/Architecture-of-proposed-discriminator-network-which-is-part-of-GAN-based-on-CNN-units.png)\n\n### Conceptual Diagram\n\n![](https://cdn-images-1.medium.com/max/1200/1*M2Er7hbryb2y0RP1UOz5Rw.png)\n\n> * **\"The generator will try to generate fake images that fool the discriminator into thinking that they’re real. And the discriminator will try to distinguish between a real and a generated image as best as it could when an image is fed.”**\n\n* They both get stronger together until the **discriminator** cannot **distinguish** between the **real and the generated images anymore**. It could do nothing more than **predicting real or fake with only 50% accuracy.** This is no more useful than **flipping a coin and guess.** \n* This inaccuracy of the **discriminator occurs because the generator generates really realistic face images** that it seems like they are actually real. So, it is normally expected that it wouldn’t be able to distinguish them. When that happens, the most educated guess would be as equally useful as an uneducated random guess.\n\n---\n### The optimal generator\n---\n* Intuitively, the Code vector that I shown earlier in the generator will represent things that are abstract. For example, if the Code vector has 100 dimensions, there might be a dimension that represents “face age” or “gender” automatically.\n* Why would it learn such representation? Because knowing people ages and their gender helps you draw their face more properly!\n\n---\n### The optimal discriminator\n---\n* When given an image, the discriminator must be looking for components of the face to be able to distinguish correctly.\n* Intuitively, some of the discriminator’s hidden neurons will be excited when it sees things like eyes, mouths, hair, etc. These features are good for other purposes later like classification!"},{"metadata":{"trusted":true,"_uuid":"202315ba7045e5a1ab5d57d1694d4dba07b5faf9"},"cell_type":"code","source":"import torch\nimport torch.nn as nn\nfrom torch.autograd import Variable\nimport numpy as np\nimport matplotlib.pyplot as plt\n%matplotlib inline\n\ntorch.manual_seed(1)    # reproducible\nnp.random.seed(1)","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"b133648a4abe7fa97465ff8d252b753986bbc6c4"},"cell_type":"markdown","source":"### Hyper Parameters"},{"metadata":{"trusted":true,"_uuid":"9ab37aef9f695880734072e92bee6afa223e3de6"},"cell_type":"code","source":"# Hyper Parameters\nBATCH_SIZE = 64\nLR_G = 0.0001           # learning rate for generator\nLR_D = 0.0001           # learning rate for discriminator\nN_IDEAS = 5             # think of this as number of ideas for generating an art work (Generator)\nART_COMPONENTS = 15     # it could be total point G can draw in the canvas\nPAINT_POINTS = np.vstack([np.linspace(-1, 1, ART_COMPONENTS) for _ in range(BATCH_SIZE)])","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"56aa4c25ea1cc8ade6e470267e62986c5a4f52df"},"cell_type":"code","source":"# show our beautiful painting range\nplt.figure(figsize=(20,6))\nplt.plot(PAINT_POINTS[0], 2 * np.power(PAINT_POINTS[0], 2) + 1, c='#74BCFF', lw=3, label='upper bound')\nplt.plot(PAINT_POINTS[0], 1 * np.power(PAINT_POINTS[0], 2) + 0, c='#FF9359', lw=3, label='lower bound')\nplt.legend(loc='upper right')\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"46aa593166b142f74829754a9bb2767e05ae6240"},"cell_type":"code","source":"def artist_works():     # painting from the famous artist (real target)\n    a = np.random.uniform(1, 2, size=BATCH_SIZE)[:, np.newaxis]\n    paintings = a * np.power(PAINT_POINTS, 2) + (a-1)\n    paintings = torch.from_numpy(paintings).float()\n    return Variable(paintings)\n\nG = nn.Sequential(                      # Generator\n    nn.Linear(N_IDEAS, 128),            # random ideas (could from normal distribution)\n    nn.ReLU(),\n    nn.Linear(128, ART_COMPONENTS),     # making a painting from these random ideas\n)\n\nD = nn.Sequential(                      # Discriminator\n    nn.Linear(ART_COMPONENTS, 128),     # receive art work either from the famous artist or a newbie like G\n    nn.ReLU(),\n    nn.Linear(128, 1),\n    nn.Sigmoid(),                       # tell the probability that the art work is made by artist\n)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"a7ef55a26481701f82c496f0a15ef6952a1f8eea"},"cell_type":"code","source":"opt_D = torch.optim.Adam(D.parameters(), lr=LR_D)\nopt_G = torch.optim.Adam(G.parameters(), lr=LR_G)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"scrolled":true,"_uuid":"62cad22133e3fdc165eb496e63b048f835134a6d"},"cell_type":"code","source":"for step in range(800):\n    artist_paintings = artist_works()           # real painting from artist\n    G_ideas = torch.randn(BATCH_SIZE, N_IDEAS)  # random ideas\n    G_paintings = G(G_ideas)                    # fake painting from G (random ideas)\n\n    prob_artist0 = D(artist_paintings)          # D try to increase this prob\n    prob_artist1 = D(G_paintings)               # D try to reduce this prob\n\n    D_loss = - torch.mean(torch.log(prob_artist0) + torch.log(1. - prob_artist1))\n    G_loss = torch.mean(torch.log(1. - prob_artist1))\n\n    opt_D.zero_grad()\n    D_loss.backward(retain_graph=True)      # reusing computational graph\n    opt_D.step()\n\n    opt_G.zero_grad()\n    G_loss.backward()\n    opt_G.step()\n\n    if step % 50 == 0:  # plotting\n        plt.figure(figsize=(20,5))\n        plt.cla()\n        plt.plot(PAINT_POINTS[0], G_paintings.data.numpy()[0], c='#4AD631', lw=3, label='Generated painting',)\n        plt.plot(PAINT_POINTS[0], 2 * np.power(PAINT_POINTS[0], 2) + 1, c='#74BCFF', lw=3, label='upper bound')\n        plt.plot(PAINT_POINTS[0], 1 * np.power(PAINT_POINTS[0], 2) + 0, c='#FF9359', lw=3, label='lower bound')\n        plt.text(-.5, 2.3, 'D accuracy=%.2f (0.5 for D to converge)' % prob_artist0.data.numpy().mean(), fontdict={'size': 13})\n        plt.text(-.5, 2, 'D score= %.2f (-1.38 for G to converge)' % -D_loss.data.numpy(), fontdict={'size': 13})\n        plt.ylim((0, 3));plt.legend(loc='upper right', fontsize=10);plt.draw();plt.pause(0.01)\n\nplt.ioff()\nplt.show()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"9d4e161bba1861f332c1eafa00a6e1791188b472"},"cell_type":"markdown","source":"### Conditional GAN\n---"},{"metadata":{"trusted":true,"_uuid":"67d57864c70eae7169e535b03a69eb4fc4434a00"},"cell_type":"code","source":"import torch\nimport torchvision\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.utils.data import DataLoader\nfrom torchvision import datasets\nfrom torchvision import transforms\nfrom torchvision.utils import save_image\nimport numpy as np\nimport datetime\nimport scipy.misc","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"4602c23dbc76e26daf1b2bd511a13a195da56c6b"},"cell_type":"code","source":"MODEL_NAME = 'ConditionalGAN'\nDEVICE = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"05faed4660886b95a065a54dc7008a7ef2e25cce"},"cell_type":"code","source":"def to_cuda(x):\n    return x.to(DEVICE)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"2ccf5d4f3cc2aabb6dadd20b7328bbc071aab751"},"cell_type":"code","source":"def to_onehot(x, num_classes=10):\n    assert isinstance(x, int) or isinstance(x, (torch.LongTensor, torch.cuda.LongTensor))\n    if isinstance(x, int):\n        c = torch.zeros(1, num_classes).long()\n        c[0][x] = 1\n    else:\n        x = x.cpu()\n        c = torch.LongTensor(x.size(0), num_classes)\n        c.zero_()\n        c.scatter_(1, x, 1) # dim, index, src value\n    return c","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"eaaffdf1d48445c5334546ccf1bcadc21a5c187a"},"cell_type":"code","source":"def get_sample_image(G, n_noise=100):\n    \"\"\"\n        save sample 100 images\n    \"\"\"\n    for num in range(10):\n        c = to_cuda(to_onehot(num))\n        for i in range(10):\n            z = to_cuda(torch.randn(1, n_noise))\n            y_hat = G(z,c)\n            line_img = torch.cat((line_img, y_hat.view(28, 28)), dim=1) if i > 0 else y_hat.view(28, 28)\n        all_img = torch.cat((all_img, line_img), dim=0) if num > 0 else line_img\n    img = all_img.cpu().data.numpy()\n    return img","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"ab59b945ee0369856fa3a8f45ff8b33ffdb160c6"},"cell_type":"code","source":"class Discriminator(nn.Module):\n    \"\"\"\n        Simple Discriminator w/ MLP\n    \"\"\"\n    def __init__(self, input_size=784, label_size=10, num_classes=1):\n        super(Discriminator, self).__init__()\n        self.layer1 = nn.Sequential(\n            nn.Linear(input_size+label_size, 200),\n            nn.ReLU(),\n            nn.Dropout(),\n        )\n        self.layer2 = nn.Sequential(\n            nn.Linear(200, 200),\n            nn.ReLU(),\n            nn.Dropout(),\n        )\n        self.layer3 = nn.Sequential(\n            nn.Linear(200, num_classes),\n            nn.Sigmoid(),\n        )\n    \n    def forward(self, x, y):        \n        x, y = x.view(x.size(0), -1), y.view(y.size(0), -1).float()\n        v = torch.cat((x, y), 1) # v: [input, label] concatenated vector\n        y_ = self.layer1(v)\n        y_ = self.layer2(y_)\n        y_ = self.layer3(y_)\n        return y_","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"ae756784296e8b9f21724afa5467d871fee8ae26"},"cell_type":"code","source":"class Generator(nn.Module):\n    \"\"\"\n        Simple Generator w/ MLP\n    \"\"\"\n    def __init__(self, input_size=100, label_size=10, num_classes=784):\n        super(Generator, self).__init__()\n        self.layer = nn.Sequential(\n            nn.Linear(input_size+label_size, 200),\n            nn.LeakyReLU(0.2),\n            nn.Linear(200, 200),\n            nn.LeakyReLU(0.2),\n            nn.Linear(200, num_classes),\n            nn.Tanh()\n        )\n        \n    def forward(self, x, y):\n        x, y = x.view(x.size(0), -1), y.view(y.size(0), -1).float()\n        v = torch.cat((x, y), 1) # v: [input, label] concatenated vector\n        y_ = self.layer(v)\n        y_ = y_.view(x.size(0), 1, 28, 28)\n        return y_","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"031872ce378fc23ece1058a30b6bb0716f126cfb"},"cell_type":"code","source":"D = to_cuda(Discriminator())\nG = to_cuda(Generator())","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"1521fe42b371068b5c4729bda3d747c6778bf978"},"cell_type":"code","source":"transform = transforms.Compose([transforms.ToTensor(),\n                                transforms.Normalize(mean=(0.5, 0.5, 0.5),\n                                std=(0.5, 0.5, 0.5))]\n)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"32e4b5aac2ae8ac0790053185bcba95a9278e6ad"},"cell_type":"code","source":"mnist = datasets.MNIST(root='../input/mnist/mnist/', train=True, transform=transform, download=True)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"85bf28a52cd8c3a0bd7d5c7e5693c6bfd986f2fb"},"cell_type":"code","source":"batch_size = 64\ncondition_size = 10","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"5c24f47cdef0e4019d582a73c407180bb9f80fe0"},"cell_type":"code","source":"data_loader = DataLoader(dataset=mnist, batch_size=batch_size, shuffle=True, drop_last=True)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"034053c3cff4d04b3595b65816a1ab3757358058"},"cell_type":"code","source":"criterion = nn.BCELoss()\nD_opt = torch.optim.Adam(D.parameters())\nG_opt = torch.optim.Adam(G.parameters())","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"b7cbfdd37bf0c0054aa1d55bedbaf961400cfacf","trusted":true},"cell_type":"code","source":"max_epoch = 100 # need more than 200 epochs for training generator\nstep = 0\nn_critic = 5 # for training more k steps about Discriminator\nn_noise = 100","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"02eba95d352eb1d11e5b06162864706c4daf8463"},"cell_type":"code","source":"D_labels = to_cuda(torch.ones(batch_size)) # Discriminator Label to real\nD_fakes = to_cuda(torch.zeros(batch_size)) # Discriminator Label to fake","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"c746c1c1bfa8fb83169269f6631c2f60df27fd2a","scrolled":false},"cell_type":"code","source":"for epoch in range(max_epoch):\n    for idx, (images, labels) in enumerate(data_loader):\n        step += 1\n        # Training Discriminator\n        x = to_cuda(images)\n        y = labels.view(batch_size, 1)\n        y = to_cuda(to_onehot(y))\n        x_outputs = D(x, y)\n        D_x_loss = criterion(x_outputs, D_labels)\n\n        z = to_cuda(torch.randn(batch_size, n_noise))\n        z_outputs = D(G(z, y), y)\n        D_z_loss = criterion(z_outputs, D_fakes)\n        D_loss = D_x_loss + D_z_loss\n        \n        D.zero_grad()\n        D_loss.backward()\n        D_opt.step()\n        \n        if step % n_critic == 0:\n            # Training Generator\n            z = to_cuda(torch.randn(batch_size, n_noise))\n            z_outputs = D(G(z, y), y)\n            G_loss = criterion(z_outputs, D_labels)\n\n            G.zero_grad()\n            G_loss.backward()\n            G_opt.step()\n        \n        if step % 1000 == 0:\n            print('Epoch: {}/{}, Step: {}, D Loss: {}, G Loss: {}'.format(epoch, max_epoch, step, D_loss.data[0], G_loss.data[0]))\n            \n        if epoch % 5 == 0:\n            G.eval()\n            img = get_sample_image(G)\n            scipy.misc.imsave('{}_epoch_{}_type1.jpg'.format(MODEL_NAME, epoch), img)\n            G.train()","execution_count":null,"outputs":[]},{"metadata":{"trusted":true,"_uuid":"4d16ed63dfbac95999e3afe7e41a8ca320297875"},"cell_type":"code","source":"def save_checkpoint(state, file_name='checkpoint.pth.tar'):\n    torch.save(state, file_name)\n    \n# Saving params.\n# torch.save(D.state_dict(), 'D_c.pkl')\n# torch.save(G.state_dict(), 'G_c.pkl')\nsave_checkpoint({'epoch': epoch + 1, 'state_dict':D.state_dict(), 'optimizer' : D_opt.state_dict()}, 'D_dc.pth.tar')\nsave_checkpoint({'epoch': epoch + 1, 'state_dict':G.state_dict(), 'optimizer' : G_opt.state_dict()}, 'G_dc.pth.tar')","execution_count":null,"outputs":[]},{"metadata":{"_uuid":"07063ffde7c645b19b76c49a7ce16a335e793a4f"},"cell_type":"markdown","source":"### If you find this useful upvote it so I will make more tutorial like this...!!!\n### Thanks for Reading...🙏🙏🙏"}],"metadata":{"kernelspec":{"display_name":"Python 3","language":"python","name":"python3"},"language_info":{"name":"python","version":"3.6.6","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"}},"nbformat":4,"nbformat_minor":1}