\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"final_animation"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2021-03-01T15:16:28.416633Z",
"start_time": "2021-03-01T15:16:28.271580Z"
}
},
"outputs": [],
"source": [
"plt.rcParams[\"animation.html\"] = \"none\"\n",
"plt.ion()\n",
"fig.clf()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Adaline: The math behind the train formula"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"The training process consists in updating connection weights by a $\\Delta$ that will increase the prediction ability of the network:\n",
"\n",
"$\n",
"w = w + \\Delta\n",
"$\n",
"\n",
" \n",
"\n",
"Using gradient descent we compute $\\Delta$ based on:\n",
"- A cost function that expresses how much is the network's error going to be based on a weight value chosen; \n",
"- A learning rate parameter $\\mu$ that determines the step size to apply to weight updates.\n",
"\n",
"$\n",
"w = w - \\mu \\cdot \\frac{\\partial}{\\partial w}J(w)\n",
"$\n",
"\n",
" \n",
"\n",
"In the context of Adaline we can use Mean Squared Error (MSE) as cost/error function, where $i$ identifies a sample from the training dataset, $y$ identifies the network's prediction and $t$ the target/expected output:\n",
"\n",
"$J(w) = MSE = \\frac{1}{n}\\sum_{i=1}^{n}(y-t)^2$\n",
"\n",
"$\n",
"w = w - \\mu \\cdot \\frac{\\partial}{\\partial w}\\frac{1}{n}\\sum_{i} (y - t)^2\n",
"$\n",
"\n",
"The derivative of a sum is the sum of the derivatives:\n",
"\n",
"$\n",
"\\Leftarrow\\kern-4pt\\Rightarrow w = w - \\mu \\cdot \\frac{1}{n}\\sum_{i} \\frac{\\partial}{\\partial w} (y - t)^2\n",
"$\n",
"\n",
"Applying the power function rule $f(x) = g(x)^p$ we have $f'(x) = p \\cdot g(x)^{p-1} \\cdot g'(x)$:\n",
"\n",
"$\n",
"\\Leftarrow\\kern-4pt\\Rightarrow w = w - \\mu \\cdot \\frac{1}{n}\\sum_{i} 2(y-t)\\frac{\\partial}{\\partial w} (y - t)\n",
"$\n",
"\n",
"$y$ is in fact the output of the network, so we can expand it in order to get $w$ explicitly and compute the derivative:\n",
"\n",
"$\n",
"\\Leftarrow\\kern-4pt\\Rightarrow w = w - \\mu \\cdot \\frac{1}{n}\\sum_{i} 2(y-t)\\frac{\\partial}{\\partial w} (f(wx + b) - t)\n",
"$\n",
"\n",
"Adaline's activation function $f(z)$ is the linear function, meaning $f(z)=z$:\n",
"\n",
"$\n",
"\\Leftarrow\\kern-4pt\\Rightarrow w = w - \\mu \\cdot \\frac{1}{n}\\sum_{i} 2(y-t)\\frac{\\partial}{\\partial w} (wx + b - t)\n",
"$\n",
"\n",
"$\n",
"\\Leftarrow\\kern-4pt\\Rightarrow w = w - \\mu \\cdot \\frac{1}{n}\\sum_{i} 2(y-t) \\cdot x\n",
"$\n",
"\n",
" \n",
"\n",
"Final formula (for a single weight, for each sample of the dataset):\n",
"\n",
"$\n",
"w = w - \\mu \\cdot 2(y-t) \\cdot x\n",
"$\n",
"\n",
"Bear in mind there are slightly different versions but they will still be valid. For example one can define MSE as $\\frac{1}{n}\\sum_{i=1}^{n}(y-t)^2$ "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Multilayer Network: The math behind the train formula"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2021-03-01T15:16:28.435929Z",
"start_time": "2021-03-01T15:16:28.423985Z"
}
},
"outputs": [],
"source": [
"# TODO"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gradient Descent - Visualizing the different flavours with Adaline network\n",
"\n",
"From the data processing point of view, there are three gradient types being used in the industry: batch training, mini-batch training and stochastic gradient descent (SGD)."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2021-03-01T15:16:28.476667Z",
"start_time": "2021-03-01T15:16:28.447730Z"
}
},
"outputs": [],
"source": [
"n_samples = 2000\n",
"\n",
"data_dictionary = {\n",
" 'x' : np.concatenate((np.random.normal(12, 1, n_samples), np.random.normal(3, 0.5, n_samples)), axis=None),\n",
" 'class' : ['blue']*n_samples + ['orange']*n_samples\n",
"}\n",
"\n",
"dataset = pd.DataFrame(data_dictionary)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2021-03-01T15:16:29.093551Z",
"start_time": "2021-03-01T15:16:28.480623Z"
}
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"text/plain": [
"