\n",
"Small CheatSheet on matrix derivatives (click the triangle to extend)
\n",
"\n",
"$$\\large \\begin{array}{rcl} \n",
"\\frac{\\partial}{\\partial \\textbf{X}} \\textbf{X}^{\\text{T}} \\textbf{A} &=& \\textbf{A} \\\\\n",
"\\frac{\\partial}{\\partial \\textbf{X}} \\textbf{X}^{\\text{T}} \\textbf{A} \\textbf{X} &=& \\left(\\textbf{A} + \\textbf{A}^{\\text{T}}\\right)\\textbf{X} \\\\\n",
"\\frac{\\partial}{\\partial \\textbf{A}} \\textbf{X}^{\\text{T}} \\textbf{A} \\textbf{y} &=& \\textbf{X}^{\\text{T}} \\textbf{y}\\\\\n",
"\\frac{\\partial}{\\partial \\textbf{X}} \\textbf{A}^{-1} &=& -\\textbf{A}^{-1} \\frac{\\partial \\textbf{A}}{\\partial \\textbf{X}} \\textbf{A}^{-1} \n",
"\\end{array}$$\n",
"
\n",
" \n",
"\n",
"\n",
"What we get is:\n",
"\n",
"$$\\Large \\begin{array}{rcl} \\frac{\\partial \\mathcal{L}}{\\partial \\textbf{w}} &=& \\frac{\\partial}{\\partial \\textbf{w}} \\frac{1}{2n} \\left( \\textbf{y}^{\\text{T}} \\textbf{y} -2\\textbf{y}^{\\text{T}} \\textbf{X} \\textbf{w} + \\textbf{w}^{\\text{T}} \\textbf{X}^{\\text{T}} \\textbf{X} \\textbf{w}\\right) \\\\\n",
"&=& \\frac{1}{2n} \\left(-2 \\textbf{X}^{\\text{T}} \\textbf{y} + 2\\textbf{X}^{\\text{T}} \\textbf{X} \\textbf{w}\\right)\n",
"\\end{array}$$\n",
"\n",
"$$\\Large \\begin{array}{rcl} \\frac{\\partial \\mathcal{L}}{\\partial \\textbf{w}} = 0 &\\Leftrightarrow& \\frac{1}{2n} \\left(-2 \\textbf{X}^{\\text{T}} \\textbf{y} + 2\\textbf{X}^{\\text{T}} \\textbf{X} \\textbf{w}\\right) = 0 \\\\\n",
"&\\Leftrightarrow& -\\textbf{X}^{\\text{T}} \\textbf{y} + \\textbf{X}^{\\text{T}} \\textbf{X} \\textbf{w} = 0 \\\\\n",
"&\\Leftrightarrow& \\textbf{X}^{\\text{T}} \\textbf{X} \\textbf{w} = \\textbf{X}^{\\text{T}} \\textbf{y} \\\\\n",
"&\\Leftrightarrow& \\textbf{w} = \\left(\\textbf{X}^{\\text{T}} \\textbf{X}\\right)^{-1} \\textbf{X}^{\\text{T}} \\textbf{y}\n",
"\\end{array}$$\n",
"\n",
"Bearing in mind all the definitions and conditions described above, we can say that, based on the