{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 벡터, 행렬에 대한 미분\n", "\n", "

2018.01.02 조준우 metamath@gmail.com

\n", "\n", "
\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 스칼라-벡터, 스칼라-행렬, 벡터-벡터간의 미분\n", "\n", "### 스칼라를 스칼라로 미분\n", "\n", "- 가장 일반적인 경우로 따로 설명 필요없음\n", "\n", "### 스칼라를 벡터로 미분\n", "\n", "- $\\mathbf{x}: n \\times 1$\n", "\n", "- 분자레이아웃numerator layout\n", "\n", "$$\n", "\\frac{\\partial \\, x}{\\partial \\, \\mathbf{x}} = \\begin{bmatrix}\n", "\\dfrac{\\partial x}{\\partial x_{1}} & \\dfrac{\\partial x}{\\partial x_{2}} & \\cdots & \\dfrac{\\partial x}{\\partial x_{n}}\n", "\\end{bmatrix}\n", "$$\n", "\n", "- 분모레이아웃denominator layout\n", "\n", "$$\n", "\\frac{\\partial \\, x}{\\partial \\, \\mathbf{x}} = \\begin{bmatrix}\n", "\\dfrac{\\partial x}{\\partial x_{1}} \\\\\n", "\\dfrac{\\partial x}{\\partial x_{2}} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial x}{\\partial x_{n}} \\\\\n", "\\end{bmatrix}\n", "$$\n", "\n", "- 분모, 분자 레이아웃은 쓰는 사람 마음\n", "\n", "### 스칼라를 행렬로 미분\n", "\n", "- $\\mathbf{X} : m \\times n$\n", "\n", "$$\n", "\\frac{\\partial \\, x}{\\partial \\, \\mathbf{X}} =\n", "\\begin{bmatrix}\n", "\\dfrac{\\partial x}{\\partial X_{11}} & \\dfrac{\\partial x}{\\partial X_{12}} & \\cdots & \\dfrac{\\partial x}{\\partial X_{1n}} \\\\\n", "\\dfrac{\\partial x}{\\partial X_{21}} & \\dfrac{\\partial x}{\\partial X_{22}} & \\cdots & \\dfrac{\\partial x}{\\partial X_{2n}} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial x}{\\partial X_{m1}} & \\dfrac{\\partial x}{\\partial X_{m2}} & \\cdots & \\dfrac{\\partial x}{\\partial X_{mn}}\n", "\\end{bmatrix}\n", "$$\n", "\n", "\n", "### 벡터를 스칼라로 미분\n", "\n", "- 분자레이아웃\n", "\n", "$$\n", "\\frac{\\partial \\, \\mathbf{x}}{\\partial \\, x} =\n", "\\begin{bmatrix}\n", "\\dfrac{\\partial x_{1}}{\\partial x} \\\\\n", "\\dfrac{\\partial x_{2}}{\\partial x} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial x_{n}}{\\partial x} \n", "\\end{bmatrix}\n", "$$\n", "\n", "- 분모레이아웃\n", "\n", "$$\n", "\\frac{\\partial \\, \\mathbf{x}}{\\partial \\, x} =\n", "\\begin{bmatrix}\n", "\\dfrac{\\partial x_{1}}{\\partial x} &\n", "\\dfrac{\\partial x_{2}}{\\partial x} &\n", "\\cdots &\n", "\\dfrac{\\partial x_{n}}{\\partial x} \n", "\\end{bmatrix}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 벡터를 벡터로 미분\n", "\n", "- $\\mathbf{f} : m \\times 1$, $\\mathbf{x} : n \\times 1$\n", "\n", "- 분자레이아웃(야코비안과 같은 경우)\n", "\n", "$$\n", "\\frac{\\partial \\, \\mathbf{f}}{\\partial \\, \\mathbf{x}} =\n", "\\begin{bmatrix}\n", "\\dfrac{\\partial f_{1}}{\\partial x_{1}} & \\dfrac{\\partial f_{1}}{\\partial x_{2}} & \\cdots & \\dfrac{\\partial f_{1}}{\\partial x_{n}} \\\\\n", "\\dfrac{\\partial f_{2}}{\\partial x_{1}} & \\dfrac{\\partial f_{2}}{\\partial x_{2}} & \\cdots & \\dfrac{\\partial f_{2}}{\\partial x_{n}} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial f_{m}}{\\partial x_{1}} & \\dfrac{\\partial f_{m}}{\\partial x_{2}} & \\cdots & \\dfrac{\\partial f_{m}}{\\partial x_{n}}\n", "\\end{bmatrix}\n", "$$\n", "\n", "- 분모레이아웃\n", "\n", "$$\n", "\\frac{\\partial \\, \\mathbf{f}}{\\partial \\, \\mathbf{x}} =\n", "\\begin{bmatrix}\n", "\\dfrac{\\partial f_{1}}{\\partial x_{1}} & \\dfrac{\\partial f_{2}}{\\partial x_{1}} & \\cdots & \\dfrac{\\partial f_{m}}{\\partial x_{1}} \\\\\n", "\\dfrac{\\partial f_{1}}{\\partial x_{2}} & \\dfrac{\\partial f_{2}}{\\partial x_{2}} & \\cdots & \\dfrac{\\partial f_{m}}{\\partial x_{2}} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial f_{1}}{\\partial x_{n}} & \\dfrac{\\partial f_{2}}{\\partial x_{n}} & \\cdots & \\dfrac{\\partial f_{m}}{\\partial x_{n}}\n", "\\end{bmatrix}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 벡터를 벡터로 미분할 때 체인룰\n", "\n", "세 벡터 변수 $\\mathbf{x}$, $\\mathbf{y}$, $\\mathbf{z}$에 대해 $\\mathbf{y} = f(\\mathbf{x})$, $\\mathbf{z} = g(\\mathbf{y})$인 함수관계가 있을 때 $\\dfrac{\\partial \\, \\mathbf{z}}{\\partial \\, \\mathbf{x}}$\n", "\n", "$$\n", "\\mathbf{x} = \\begin{bmatrix}\n", "x_{1} \\\\\n", " x_{2} \\\\\n", "\\vdots \\\\\n", "x_{n} \\\\\n", "\\end{bmatrix} \\qquad\n", "\\mathbf{y} = \\begin{bmatrix}\n", "y_{1} \\\\\n", "y_{2} \\\\\n", "\\vdots \\\\\n", "y_{r} \\\\\n", "\\end{bmatrix} \\qquad\n", "\\mathbf{z} = \\begin{bmatrix}\n", "z_{1} \\\\\n", "z_{2} \\\\\n", "\\vdots \\\\\n", "z_{m} \\\\\n", "\\end{bmatrix}\n", "$$\n", "\n", "- 분자레이아웃\n", "\n", "$$\n", "\\frac{\\partial \\, \\mathbf{z}}{\\partial \\, \\mathbf{x}} =\n", "\\begin{bmatrix}\n", "\\dfrac{\\partial z_{1}}{\\partial x_{1}} & \\dfrac{\\partial z_{1}}{\\partial x_{2}} & \\cdots & \\dfrac{\\partial z_{1}}{\\partial x_{n}} \\\\\n", "\\dfrac{\\partial z_{2}}{\\partial x_{1}} & \\dfrac{\\partial z_{2}}{\\partial x_{2}} & \\cdots & \\dfrac{\\partial z_{2}}{\\partial x_{n}} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial z_{m}}{\\partial x_{1}} & \\dfrac{\\partial z_{m}}{\\partial x_{2}} & \\cdots & \\dfrac{\\partial z_{m}}{\\partial x_{n}}\n", "\\end{bmatrix}\n", "$$\n", "\n", "- 체인룰에 의해 다음과 같으므로\n", "\n", "$$\n", "\\frac{\\partial\\, z_i}{\\partial \\, x_j} = \\frac{\\partial\\, z_i}{\\partial \\, y_k}\\frac{\\partial \\, y_k}{\\partial \\, x_j}\n", "$$\n", "\n", "- 체인룰을 각 요소에 적용하고 행렬곱으로 분해하면 \n", "\n", "$$\n", "\\begin{align}\n", "\\frac{\\partial \\, \\mathbf{z}}{\\partial \\, \\mathbf{x}} &=\n", "\\begin{bmatrix}\n", "\\dfrac{\\partial z_{1}}{\\partial y_k} \\dfrac{\\partial y_k}{\\partial x_{1}} & \n", "\\dfrac{\\partial z_{1}}{\\partial y_k} \\dfrac{\\partial y_k}{\\partial x_{2}} & \n", "\\cdots & \n", "\\dfrac{\\partial z_{1}}{\\partial y_k} \\dfrac{\\partial y_k}{\\partial x_{n}} \\\\\n", "\\dfrac{\\partial z_{2}}{\\partial y_k} \\dfrac{\\partial y_k}{\\partial x_{1}} & \n", "\\dfrac{\\partial z_{2}}{\\partial y_k} \\dfrac{\\partial y_k}{\\partial x_{2}} & \n", "\\cdots & \n", "\\dfrac{\\partial z_{2}}{\\partial y_k} \\dfrac{\\partial y_k}{\\partial x_{n}} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial z_{m}}{\\partial y_k} \\dfrac{\\partial y_k}{\\partial x_{1}} & \n", "\\dfrac{\\partial z_{m}}{\\partial y_k} \\dfrac{\\partial y_k}{\\partial x_{2}} & \n", "\\cdots &\n", "\\dfrac{\\partial z_{m}}{\\partial y_k} \\dfrac{\\partial y_k}{\\partial x_{n}}\n", "\\end{bmatrix} \\\\[10pt] &=\n", "\\begin{bmatrix}\n", "\\dfrac{\\partial z_{1}}{\\partial y_{1}} & \\dfrac{\\partial z_{1}}{\\partial y_{2}} & \\cdots & \\dfrac{\\partial z_{1}}{\\partial y_{r}} \\\\\n", "\\dfrac{\\partial z_{2}}{\\partial y_{1}} & \\dfrac{\\partial z_{2}}{\\partial y_{2}} & \\cdots & \\dfrac{\\partial z_{2}}{\\partial y_{r}} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial z_{m}}{\\partial y_{1}} & \\dfrac{\\partial z_{m}}{\\partial y_{2}} & \\cdots & \\dfrac{\\partial z_{m}}{\\partial y_{r}}\n", "\\end{bmatrix} \\,\n", "\\begin{bmatrix}\n", "\\dfrac{\\partial y_{1}}{\\partial x_{1}} & \\dfrac{\\partial y_{1}}{\\partial x_{2}} & \\cdots & \\dfrac{\\partial y_{1}}{\\partial x_{n}} \\\\\n", "\\dfrac{\\partial y_{2}}{\\partial x_{1}} & \\dfrac{\\partial y_{2}}{\\partial x_{2}} & \\cdots & \\dfrac{\\partial y_{2}}{\\partial x_{n}} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial y_{r}}{\\partial x_{1}} & \\dfrac{\\partial y_{r}}{\\partial x_{2}} & \\cdots & \\dfrac{\\partial y_{r}}{\\partial x_{n}}\n", "\\end{bmatrix}\\\\[10pt] &=\n", "\\frac{\\partial \\, \\mathbf{z}}{\\partial \\, \\mathbf{y}} \\frac{\\partial \\, \\mathbf{y}}{\\partial \\, \\mathbf{x}}\n", "\\end{align}\n", "$$\n", "\n", "- 분자 레이아웃으로 하면 스칼라 미분의 체인룰과 별 다를 것이 없음\n", "\n", "- 스칼라라면 \n", "\n", "$$\n", "\\frac{\\partial \\, z}{\\partial \\, x} = \\frac{\\partial \\, z}{\\partial \\, y} \\frac{\\partial \\, y }{\\partial \\, x} = \\frac{\\partial \\, y }{\\partial \\, x} \\frac{\\partial \\, z}{\\partial \\, y} \n", "$$\n", "\n", "처럼 어떤 순서로 체인룰을 적어도 상관없지만 관습적으로 첫번째처럼 오른쪽으로 가면서 체인룰을 적는다. \n", "\n", "- 하지만 분모 레이아웃으로 하면 다음과 같다.\n", "\n", "$$\n", "\\begin{align}\n", "\\left(\\frac{\\partial \\, \\mathbf{z}}{\\partial \\, \\mathbf{x}}\\right)^{\\text{T}} \n", "&= \\left(\\frac{\\partial \\, \\mathbf{z}}{\\partial \\, \\mathbf{y}} \\frac{\\partial \\, \\mathbf{y}}{\\partial \\, \\mathbf{x}} \\right)^{\\text{T}} \\\\[5pt]\n", "&= \\left( \\frac{\\partial \\, \\mathbf{y}}{\\partial \\, \\mathbf{x}} \\right)^{\\text{T}} \\left(\\frac{\\partial \\, \\mathbf{z}}{\\partial \\, \\mathbf{y}} \\right)^{\\text{T}}\n", "\\end{align}\n", "$$\n", "\n", "- 분모 레이아웃일 때는 체인룰의 진행 방향이 왼쪽으로 가면서 진행된다는것 주의해야 함" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 행렬을 스칼라로 미분\n", "\n", "- $\\mathbf{X} : m \\times n$\n", "\n", "$$\n", "\\frac{\\partial \\, \\mathbf{X}}{\\partial \\, x} =\n", "\\begin{bmatrix}\n", "\\dfrac{\\partial X_{11}}{\\partial x} & \\dfrac{\\partial X_{12}}{\\partial x} & \\cdots & \\dfrac{\\partial X_{1n}}{\\partial x} \\\\\n", "\\dfrac{\\partial X_{21}}{\\partial x} & \\dfrac{\\partial X_{22}}{\\partial x} & \\cdots & \\dfrac{\\partial X_{2n}}{\\partial x} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial X_{m1}}{\\partial x} & \\dfrac{\\partial X_{m2}}{\\partial x} & \\cdots & \\dfrac{\\partial X_{mn}}{\\partial x}\n", "\\end{bmatrix}\n", "$$\n", "\n", "## 새로운 연산자\n", "\n", "- 벡터를 행렬로 미분하거나 행렬을 행렬로 미분하기위해 필요한 연산자를 정의\n", "\n", "### 크로네커 곱Kronecker product[1]\n", "\n", "\n", "\n", "- $\\mathbf{A}: p \\times q $와 $\\mathbf{B}: r \\times s $가 있을 때 이 두 행렬의 크로네커 곱은 다음과 같고 $pr \\times qs$ 행렬이 된다.\n", "\n", "$$\n", "\\mathbf{A} \\otimes \\mathbf{B} = \\{ a_{ij}\\mathbf{B} \\}\n", "$$\n", "\n", "$\\mathbf{A}$가 2 x 2 행렬이면 다음처럼 된다.\n", "\n", "$$\n", "\\begin{bmatrix} a_{11} & a_{12} \\\\ a_{21} & a_{22} \\end{bmatrix} \\otimes \\mathbf{B} = \\begin{bmatrix} a_{11}\\mathbf{B} & a_{12}\\mathbf{B} \\\\ a_{21}\\mathbf{B} & a_{22}\\mathbf{B} \\end{bmatrix}\n", "$$\n", "\n", "- 구체적인 예\n", "\n", "$$\n", "\\mathbf{A} = \\begin{bmatrix} \\color{RoyalBlue}{3} & \\color{OrangeRed}{5} \\\\ \\color{YellowGreen}{9} & \\color{Goldenrod}{7} \\end{bmatrix} \\qquad \\text{and} \\qquad \\mathbf{b} = \\begin{bmatrix} 4 \\\\ 5\\\\ 6 \\end{bmatrix}\n", "$$\n", "\n", "$$\n", "\\mathbf{A} \\otimes \\mathbf{b} = \\begin{bmatrix}\n", "\\color{RoyalBlue}{3 \\cdot 4} & \\color{OrangeRed}{5 \\cdot 4} \\\\\n", "\\color{RoyalBlue}{3 \\cdot 5} & \\color{OrangeRed}{5 \\cdot 5} \\\\\n", "\\color{RoyalBlue}{3 \\cdot 6} & \\color{OrangeRed}{5 \\cdot 6} \\\\\n", "\\color{YellowGreen}{9 \\cdot 4} & \\color{Goldenrod}{7 \\cdot 4} \\\\\n", "\\color{YellowGreen}{9 \\cdot 5} & \\color{Goldenrod}{7 \\cdot 5} \\\\\n", "\\color{YellowGreen}{9 \\cdot 6} & \\color{Goldenrod}{7 \\cdot 6} \n", "\\end{bmatrix}= \\begin{bmatrix}\n", "\\color{RoyalBlue}{12} & \\color{OrangeRed}{20} \\\\\n", "\\color{RoyalBlue}{15} & \\color{OrangeRed}{25} \\\\\n", "\\color{RoyalBlue}{18} & \\color{OrangeRed}{30} \\\\\n", "\\color{YellowGreen}{36} & \\color{Goldenrod}{28} \\\\\n", "\\color{YellowGreen}{45} & \\color{Goldenrod}{35} \\\\\n", "\\color{YellowGreen}{54} & \\color{Goldenrod}{42} \n", "\\end{bmatrix}\n", "$$\n", "\n", "\n", "### vec과 vec 전치[2]\n", "\n", "- 행렬의 열벡터를 열방향으로 죽 늘어 세워 행렬을 벡터화 시키는 연산자\n", "\n", "$$\n", "\\text{vec}\\left( \\begin{bmatrix} \\color{RoyalBlue}{a_{11}} & \\color{OrangeRed}{a_{12}} \\\\ \\color{RoyalBlue}{a_{21}} & \\color{OrangeRed}{a_{22}} \\end{bmatrix} \\right) = \n", "\\begin{bmatrix} \\color{RoyalBlue}{a_{11} \\\\ a_{21}} \\\\ \\color{OrangeRed}{a_{12} \\\\ a_{22}} \\end{bmatrix}\n", "$$\n", "\n", "- vec 전치 vec transpose : 전치의 일반화\n", "\n", "$$\n", "\\begin{bmatrix}\n", "\\color{RoyalBlue}{a_{11}} & \\color{Goldenrod}{a_{12}} \\\\\n", "\\color{RoyalBlue}{a_{21}} & \\color{Goldenrod}{a_{22}} \\\\\n", "\\color{OrangeRed}{a_{31}} & \\color{Violet}{a_{32}} \\\\\n", "\\color{OrangeRed}{a_{41}} & \\color{Violet}{a_{42}} \\\\\n", "\\color{YellowGreen}{a_{51}} & \\color{Emerald}{a_{52}} \\\\\n", "\\color{YellowGreen}{a_{61}} & \\color{Emerald}{a_{62}} \\\\\n", "\\end{bmatrix}^{(2)} =\n", "\\begin{bmatrix}\n", "\\color{RoyalBlue}{a_{11}} & \\color{OrangeRed}{a_{31}} & \\color{YellowGreen}{a_{51}} \\\\\n", "\\color{RoyalBlue}{a_{21}} & \\color{OrangeRed}{a_{41}} & \\color{YellowGreen}{a_{61}} \\\\\n", "\\color{Goldenrod}{a_{12}} & \\color{Violet}{a_{32}} & \\color{Emerald}{a_{52}} \\\\\n", "\\color{Goldenrod}{a_{22}} & \\color{Violet}{a_{42}} & \\color{Emerald}{a_{62}}\n", "\\end{bmatrix}\n", "$$\n", "\n", "$$\n", "\\begin{bmatrix}\n", "\\color{RoyalBlue}{a_{11}} & \\color{Goldenrod}{a_{12}} \\\\\n", "\\color{RoyalBlue}{a_{21}} & \\color{Goldenrod}{a_{22}} \\\\\n", "\\color{RoyalBlue}{a_{31}} & \\color{Goldenrod}{a_{32}} \\\\\n", "\\color{OrangeRed}{a_{41}} & \\color{Violet}{a_{42}} \\\\\n", "\\color{OrangeRed}{a_{51}} & \\color{Violet}{a_{52}} \\\\\n", "\\color{OrangeRed}{a_{61}} & \\color{Violet}{a_{62}} \\\\\n", "\\end{bmatrix}^{(3)} =\n", "\\begin{bmatrix}\n", "\\color{RoyalBlue}{a_{11}} & \\color{OrangeRed}{a_{41}} \\\\\n", "\\color{RoyalBlue}{a_{21}} & \\color{OrangeRed}{a_{51}} \\\\\n", "\\color{RoyalBlue}{a_{31}} & \\color{OrangeRed}{a_{61}} \\\\\n", "\\color{Goldenrod}{a_{12}} & \\color{Violet}{a_{42}} \\\\\n", "\\color{Goldenrod}{a_{22}} & \\color{Violet}{a_{52}} \\\\\n", "\\color{Goldenrod}{a_{32}} & \\color{Violet}{a_{62}}\n", "\\end{bmatrix}\n", "$$\n", "\n", "- (1)전치는 일반적인 전치와 동일하게 됨 $\\mathbf{A}^{(1)} = \\mathbf{A}^{\\text{T}}$\n", "\n", "- (행개수)전치는 vec과 동일하게 됨 $\\mathbf{A}^{(rows(\\mathbf{A}))} = \\text{vec}(\\mathbf{A})$\n", "\n", "- 전치에 들어갈 수 있는 숫자 $(r)$은 행개수를 나눌 수 있는 자연수\n", "\n", "- 따라서 행벡터는 (1)전치만 성립" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 벡터를 행렬로 미분\n", "\n", "- $\\mathbf{x} : p \\times 1$\n", "- $\\mathbf{X} : m \\times n$\n", "- 크로네커곱을 이용하여 벡터를 스칼라로 미분하게 한 다음 벡터를 분자레이아웃으로 미분\n", "\n", "$$\n", "\\begin{align}\n", "\\frac{\\partial \\, \\mathbf{x}}{\\partial \\, \\mathbf{X}} = \\frac{\\partial}{\\partial \\, \\mathbf{X}} \\otimes \\mathbf{x} \n", "&= \\begin{bmatrix}\n", "\\dfrac{\\partial}{\\partial \\, X_{11}} & \\dfrac{\\partial}{\\partial \\, X_{12}} & \\cdots & \\dfrac{\\partial}{\\partial \\, X_{1n}} \\\\\n", "\\dfrac{\\partial}{\\partial \\, X_{21}} & \\dfrac{\\partial}{\\partial \\, X_{22}} & \\cdots & \\dfrac{\\partial}{\\partial \\, X_{2n}} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial}{\\partial \\, X_{m1}} & \\dfrac{\\partial}{\\partial \\, X_{m2}} & \\cdots & \\dfrac{\\partial}{\\partial \\, X_{mn}}\n", "\\end{bmatrix} \\otimes \\mathbf{x} \\\\[5pt]\n", "&=\\begin{bmatrix}\n", "\\dfrac{\\partial \\, \\mathbf{x}}{\\partial \\, X_{11}} & \\dfrac{\\partial \\, \\mathbf{x}}{\\partial \\, X_{12}} & \\cdots & \\dfrac{\\partial \\, \\mathbf{x}}{\\partial \\, X_{1n}} \\\\\n", "\\dfrac{\\partial \\, \\mathbf{x}}{\\partial \\, X_{21}} & \\dfrac{\\partial \\, \\mathbf{x}}{\\partial \\, X_{22}} & \\cdots & \\dfrac{\\partial \\, \\mathbf{x}}{\\partial \\, X_{2n}} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial \\, \\mathbf{x}}{\\partial \\, X_{m1}} & \\dfrac{\\partial \\, \\mathbf{x}}{\\partial \\, X_{m2}} & \\cdots & \\dfrac{\\partial \\, \\mathbf{x}}{\\partial \\, X_{mn}}\n", "\\end{bmatrix} \\\\[5pt]\n", "&=\\begin{bmatrix}\n", "\\begin{pmatrix} \\dfrac{x_1}{\\partial \\, X_{11}} \\\\ \\dfrac{x_2}{\\partial \\, X_{11}} \\\\ \\vdots \\\\ \\dfrac{x_p}{\\partial \\, X_{11}} \\end{pmatrix} & \n", "\\begin{pmatrix} \\dfrac{x_1}{\\partial \\, X_{12}} \\\\ \\dfrac{x_2}{\\partial \\, X_{12}} \\\\ \\vdots \\\\ \\dfrac{x_p}{\\partial \\, X_{12}} \\end{pmatrix} &\n", "\\cdots &\n", "\\begin{pmatrix} \\dfrac{x_1}{\\partial \\, X_{1n}} \\\\ \\dfrac{x_2}{\\partial \\, X_{1n}} \\\\ \\vdots \\\\ \\dfrac{x_p}{\\partial \\, X_{1n}} \\end{pmatrix} \\\\\n", "\\begin{pmatrix} \\dfrac{x_1}{\\partial \\, X_{21}} \\\\ \\dfrac{x_2}{\\partial \\, X_{21}} \\\\ \\vdots \\\\ \\dfrac{x_p}{\\partial \\, X_{21}} \\end{pmatrix} & \n", "\\begin{pmatrix} \\dfrac{x_1}{\\partial \\, X_{22}} \\\\ \\dfrac{x_2}{\\partial \\, X_{22}} \\\\ \\vdots \\\\ \\dfrac{x_p}{\\partial \\, X_{22}} \\end{pmatrix} &\n", "\\cdots &\n", "\\begin{pmatrix} \\dfrac{x_1}{\\partial \\, X_{2n}} \\\\ \\dfrac{x_2}{\\partial \\, X_{2n}} \\\\ \\vdots \\\\ \\dfrac{x_p}{\\partial \\, X_{2n}} \\end{pmatrix} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\begin{pmatrix} \\dfrac{x_1}{\\partial \\, X_{m1}} \\\\ \\dfrac{x_2}{\\partial \\, X_{m1}} \\\\ \\vdots \\\\ \\dfrac{x_p}{\\partial \\, X_{m1}} \\end{pmatrix} & \n", "\\begin{pmatrix} \\dfrac{x_1}{\\partial \\, X_{m2}} \\\\ \\dfrac{x_2}{\\partial \\, X_{m2}} \\\\ \\vdots \\\\ \\dfrac{x_p}{\\partial \\, X_{m2}} \\end{pmatrix} &\n", "\\cdots &\n", "\\begin{pmatrix} \\dfrac{x_1}{\\partial \\, X_{mn}} \\\\ \\dfrac{x_2}{\\partial \\, X_{mn}} \\\\ \\vdots \\\\ \\dfrac{x_p}{\\partial \\, X_{mn}} \\end{pmatrix} \\\\\n", "\\end{bmatrix} \n", "\\end{align}\n", "$$\n", "\n", "- 분모에 vec연산자를 이용하여 벡터를 벡터로 미분하는것도 가능(아래 예제에서 확인함)\n", "\n", "- 행렬을 행렬로 미분하는 경우도 두 방식 모두 똑같이 적용 가능" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 예제[3]\n", "- $\\mathbf{X} : m \\times n $, $\\mathbf{b} : n \\times 1$, $\\mathbf{Xb} : m \\times 1$\n", "\n", "- 분모를 행렬로 그대로 미분하는 경우\n", "\n", "$$\n", "\\begin{align}\n", "\\frac{\\partial \\, \\mathbf{Xb}}{\\partial \\, \\mathbf{X}} \n", "&= \\frac{\\partial }{\\partial \\mathbf{X}} \\otimes \\mathbf{Xb} \\\\[5pt]\n", "&= \\begin{bmatrix}\n", "\\dfrac{\\partial \\, \\mathbf{Xb}}{\\partial \\, X_{11}} & \\dfrac{\\partial \\, \\mathbf{Xb}}{\\partial \\, X_{12}} & \\cdots & \\dfrac{\\partial \\, \\mathbf{Xb}}{\\partial \\, X_{1n}} \\\\\n", "\\dfrac{\\partial \\, \\mathbf{Xb}}{\\partial \\, X_{21}} & \\dfrac{\\partial \\, \\mathbf{Xb}}{\\partial \\, X_{22}} & \\cdots & \\dfrac{\\partial \\, \\mathbf{Xb}}{\\partial \\, X_{2n}} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial \\, \\mathbf{Xb}}{\\partial \\, X_{m1}} & \\dfrac{\\partial \\, \\mathbf{Xb}}{\\partial \\, X_{m2}} & \\cdots & \\dfrac{\\partial \\, \\mathbf{Xb}}{\\partial \\, X_{mn}}\n", "\\end{bmatrix} \\\\[5pt]\n", "&= \\begin{bmatrix}\n", "\\begin{pmatrix} \\dfrac{\\partial \\, (\\mathbf{Xb})_{1}}{\\partial \\, X_{11}}\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{2}}{\\partial \\, X_{11}}\\\\\\vdots\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{m}}{\\partial \\, X_{11}} \\end{pmatrix} & \\begin{pmatrix} \\dfrac{\\partial \\, (\\mathbf{Xb})_{1}}{\\partial \\, X_{12}}\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{2}}{\\partial \\, X_{12}}\\\\\\vdots\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{m}}{\\partial \\, X_{12}} \\end{pmatrix} & \\cdots & \\begin{pmatrix} \\dfrac{\\partial \\, (\\mathbf{Xb})_{1}}{\\partial \\, X_{1n}}\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{2}}{\\partial \\, X_{1n}}\\\\\\vdots\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{m}}{\\partial \\, X_{1n}} \\end{pmatrix} \\\\\n", "\\begin{pmatrix} \\dfrac{\\partial \\, (\\mathbf{Xb})_{1}}{\\partial \\, X_{21}}\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{2}}{\\partial \\, X_{21}}\\\\\\vdots\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{m}}{\\partial \\, X_{21}} \\end{pmatrix} & \\begin{pmatrix} \\dfrac{\\partial \\, (\\mathbf{Xb})_{1}}{\\partial \\, X_{22}}\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{2}}{\\partial \\, X_{22}}\\\\\\vdots\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{m}}{\\partial \\, X_{22}} \\end{pmatrix} & \\cdots & \\begin{pmatrix} \\dfrac{\\partial \\, (\\mathbf{Xb})_{1}}{\\partial \\, X_{2n}}\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{2}}{\\partial \\, X_{2n}}\\\\\\vdots\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{m}}{\\partial \\, X_{2n}} \\end{pmatrix} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\begin{pmatrix} \\dfrac{\\partial \\, (\\mathbf{Xb})_{1}}{\\partial \\, X_{m1}}\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{2}}{\\partial \\, X_{m1}}\\\\\\vdots\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{m}}{\\partial \\, X_{m1}} \\end{pmatrix} & \\begin{pmatrix} \\dfrac{\\partial \\, (\\mathbf{Xb})_{1}}{\\partial \\, X_{m2}}\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{2}}{\\partial \\, X_{m2}}\\\\\\vdots\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{m}}{\\partial \\, X_{m2}} \\end{pmatrix} & \\cdots & \\begin{pmatrix} \\dfrac{\\partial \\, (\\mathbf{Xb})_{1}}{\\partial \\, X_{mn}}\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{2}}{\\partial \\, X_{mn}}\\\\\\vdots\\\\\\dfrac{\\partial \\, (\\mathbf{Xb})_{m}}{\\partial \\, X_{mn}} \\end{pmatrix}\n", "\\end{bmatrix} \\\\[5pt]\n", "&= \\begin{bmatrix}\n", "\\begin{pmatrix} b_1 \\\\ 0 \\\\ \\vdots \\\\ 0 \\end{pmatrix} & \n", "\\begin{pmatrix} b_2 \\\\ 0 \\\\ \\vdots \\\\ 0 \\end{pmatrix} & \\cdots & \n", "\\begin{pmatrix} b_n \\\\ 0 \\\\ \\vdots \\\\ 0 \\end{pmatrix} \\\\\n", "\\begin{pmatrix} 0 \\\\ b_1 \\\\ \\vdots \\\\ 0 \\end{pmatrix} & \n", "\\begin{pmatrix} 0 \\\\ b_2 \\\\ \\vdots \\\\ 0 \\end{pmatrix} & \\cdots & \n", "\\begin{pmatrix} 0 \\\\ b_n \\\\ \\vdots \\\\ 0 \\end{pmatrix} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\begin{pmatrix} 0 \\\\ 0 \\\\ \\vdots \\\\ b_1 \\end{pmatrix} & \n", "\\begin{pmatrix} 0 \\\\ 0 \\\\ \\vdots \\\\ b_2 \\end{pmatrix} & \\cdots & \n", "\\begin{pmatrix} 0 \\\\ 0 \\\\ \\vdots \\\\ b_n \\end{pmatrix} \n", "\\end{bmatrix}\n", "\\end{align}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- 분모를 $\\text{vec}(\\mathbf{X})$로 바꿔서 미분하는 경우 (분모레이아웃)\n", "\n", "$$\n", "\\begin{align}\n", "\\frac{\\partial \\, \\mathbf{Xb}}{\\partial \\, \\mathbf{X}} &= \\frac{\\partial \\, \\mathbf{Xb}}{\\partial \\, \\left(\\text{vec}(\\mathbf{X}) \\right)} \\\\[5pt]\n", "&= \\begin{bmatrix}\n", "\\dfrac{\\partial \\, (\\mathbf{Xb})_{1}}{\\partial \\, X_{11}} & \\dfrac{\\partial \\, (\\mathbf{Xb})_{2}}{\\partial \\, X_{11}} & \\cdots & \\dfrac{\\partial \\, (\\mathbf{Xb})_{m}}{\\partial \\, X_{11}} \\\\\n", "\\dfrac{\\partial \\, (\\mathbf{Xb})_{1}}{\\partial \\, X_{21}} & \\dfrac{\\partial \\, (\\mathbf{Xb})_{2}}{\\partial \\, X_{21}} & \\cdots & \\dfrac{\\partial \\, (\\mathbf{Xb})_{m}}{\\partial \\, X_{21}} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial \\, (\\mathbf{Xb})_{1}}{\\partial \\, X_{mn}} & \\dfrac{\\partial \\, (\\mathbf{Xb})_{2}}{\\partial \\, X_{mn}} & \\cdots & \\dfrac{\\partial \\, (\\mathbf{Xb})_{m}}{\\partial \\, X_{mn}}\n", "\\end{bmatrix} \\\\[5pt]\n", "&= \\begin{bmatrix}\n", "b_1 & 0 & \\cdots & 0 \\\\\n", "0 & b_1 & \\cdots & 0 \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "0 & 0 & 0 & b_1 \\\\ \n", "b_2 & 0 & \\cdots & 0 \\\\\n", "0 & b_2 & \\cdots & 0 \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "0 & 0 & 0 & b_2 \\\\ - & - & - & - \\\\\n", "\\vdots & \\vdots & \\vdots & \\vdots \\\\ - & - & - & - \\\\\n", "b_n & 0 & \\cdots & 0 \\\\\n", "0 & b_n & \\cdots & 0 \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "0 & 0 & 0 & b_n\n", "\\end{bmatrix}\n", "\\end{align}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- 두 결과가 다른가? 첫번째 결과를 (m)-transpose 시키면 두번째 결과와 같아 진다.\n", "\n", "$$\n", "\\begin{bmatrix}\n", "\\begin{pmatrix} \\color{RoyalBlue}{b_1 \\\\ 0 \\\\ \\vdots \\\\ 0} \\end{pmatrix} & \n", "\\begin{pmatrix} \\color{OrangeRed}{b_2 \\\\ 0 \\\\ \\vdots \\\\ 0} \\end{pmatrix} & \\cdots & \n", "\\begin{pmatrix} \\color{YellowGreen}{b_n \\\\ 0 \\\\ \\vdots \\\\ 0} \\end{pmatrix} \\\\\n", "\\begin{pmatrix} \\color{RoyalBlue}{0 \\\\ b_1 \\\\ \\vdots \\\\ 0} \\end{pmatrix} & \n", "\\begin{pmatrix} \\color{OrangeRed}{0 \\\\ b_2 \\\\ \\vdots \\\\ 0} \\end{pmatrix} & \\cdots & \n", "\\begin{pmatrix} \\color{YellowGreen}{0 \\\\ b_n \\\\ \\vdots \\\\ 0} \\end{pmatrix} \\\\\n", "\\color{RoyalBlue}{\\vdots} & \\vdots & \\ddots & \\vdots \\\\\n", "\\begin{pmatrix} \\color{RoyalBlue}{0 \\\\ 0 \\\\ \\vdots \\\\ b_1} \\end{pmatrix} & \n", "\\begin{pmatrix} \\color{OrangeRed}{0 \\\\ 0 \\\\ \\vdots \\\\ b_2} \\end{pmatrix} & \\cdots & \n", "\\begin{pmatrix} \\color{YellowGreen}{0 \\\\ 0 \\\\ \\vdots \\\\ b_n} \\end{pmatrix} \n", "\\end{bmatrix}^{(m)} = \\begin{bmatrix}\n", "\\color{RoyalBlue}{b_1} & \\color{RoyalBlue}{0} & \\color{RoyalBlue}{\\cdots} & \\color{RoyalBlue}{0} \\\\\n", "\\color{RoyalBlue}{0} & \\color{RoyalBlue}{b_1} & \\color{RoyalBlue}{\\cdots} & \\color{RoyalBlue}{0} \\\\\n", "\\color{RoyalBlue}{\\vdots} & \\color{RoyalBlue}{\\vdots} & \\color{RoyalBlue}{\\ddots} & \\color{RoyalBlue}{\\vdots} \\\\\n", "\\color{RoyalBlue}{0} & \\color{RoyalBlue}{0} & \\color{RoyalBlue}{0} & \\color{RoyalBlue}{b_1} \\\\ \n", "\\color{OrangeRed}{b_2} & \\color{OrangeRed}{0} & \\color{OrangeRed}{\\cdots} & \\color{OrangeRed}{0} \\\\\n", "\\color{OrangeRed}{0} & \\color{OrangeRed}{b_2} & \\color{OrangeRed}{\\cdots} & \\color{OrangeRed}{0} \\\\\n", "\\color{OrangeRed}{\\vdots} & \\color{OrangeRed}{\\vdots} & \\color{OrangeRed}{\\ddots} & \\color{OrangeRed}{\\vdots} \\\\\n", "\\color{OrangeRed}{0} & \\color{OrangeRed}{0} & \\color{OrangeRed}{0} & \\color{OrangeRed}{b_2} \\\\ - & - & - & - \\\\\n", "\\vdots & \\vdots & \\vdots & \\vdots \\\\ - & - & - & - \\\\\n", "\\color{YellowGreen}{b_n} & \\color{YellowGreen}{0} & \\color{YellowGreen}{\\cdots} & \\color{YellowGreen}{0} \\\\\n", "\\color{YellowGreen}{0} & \\color{YellowGreen}{b_n} & \\color{YellowGreen}{\\cdots} & \\color{YellowGreen}{0} \\\\\n", "\\color{YellowGreen}{\\vdots} & \\color{YellowGreen}{\\vdots} & \\color{YellowGreen}{\\ddots} & \\color{YellowGreen}{\\vdots} \\\\\n", "\\color{YellowGreen}{0} & \\color{YellowGreen}{0} & \\color{YellowGreen}{0} & \\color{YellowGreen}{b_n}\n", "\\end{bmatrix}\n", "$$\n", "\n", "- 따라서 다음과 같다\n", "\n", "$$\n", "\\left( \\frac{\\partial }{\\partial \\mathbf{X}} \\otimes \\mathbf{Xb} \\right)^{(m)} = \\frac{\\partial \\, \\mathbf{Xb}}{\\partial \\, \\left(\\text{vec}(\\mathbf{X}) \\right)} \n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 곱의 미분[1]\n", "\n", "$\\mathbf{X} : m \\times n$, $\\mathbf{Y} : n \\times r$ , $\\mathbf{Z} : p \\times q$ 일 때 미분 결과는 $mp \\times rq$\n", "\n", "$$\n", "\\frac{\\partial \\, (\\mathbf{XY})}{\\partial \\, \\mathbf{Z}} = \\left( \\frac{\\partial \\, \\mathbf{X} }{\\partial \\, \\mathbf{Z}} \\right) \\left( \\mathbf{I}_{q} \\otimes \\mathbf{Y} \\right) + \\left( \\mathbf{I}_{p} \\otimes \\mathbf{X} \\right)\\left( \\frac{\\partial \\, \\mathbf{Y}}{\\partial \\, \\mathbf{Z}} \\right)\n", "$$\n", "\n", "행렬로 미분을 할때도 곱의 미분법이 그대로 적용되나 차원 맞춤에 주의 해야 한다. \n", "\n", "위 미분이 다음처럼 되지 않는것은 \n", "\n", "$$\n", "\\frac{\\partial \\, (\\mathbf{XY})}{\\partial \\, \\mathbf{Z}} = \\left( \\frac{\\partial \\, \\mathbf{X} }{\\partial \\, \\mathbf{Z}} \\right) \\mathbf{Y} + \\mathbf{X} \\left( \\frac{\\partial \\, \\mathbf{Y}}{\\partial \\, \\mathbf{Z}} \\right)\n", "$$\n", "\n", "$\\frac{\\partial \\, \\mathbf{X} }{\\partial \\, \\mathbf{Z}}$가 $mp \\times nq$가 되기 때문에 $\\mathbf{Y}$를 바로 곱할 수 가 없기 때문이다. \n", "\n", "뒤에서 곱해지는 $\\mathbf{Y}$가 어떤 형태로 변해야 적절히 식의 곱을 유지할 수 있는지 알아보기 위해 $\\mathbf{X} : 1 \\times 2$, $\\mathbf{Y} : 2 \\times 1$, $\\mathbf{Z} : 2 \\times 2$로 두고 예를 들어보면\n", "\n", "$$\n", "\\mathbf{X}\\mathbf{Y} = \n", "\\begin{bmatrix} \\color{RoyalBlue}{X_1} & \\color{OrangeRed}{X_2} \\end{bmatrix} \n", "\\begin{bmatrix} \\color{RoyalBlue}{Y_1} \\\\ \\color{OrangeRed}{Y_2} \\end{bmatrix} = \\color{RoyalBlue}{X_1} \\color{RoyalBlue}{Y_1} + \\color{OrangeRed}{X_2} \\color{OrangeRed}{Y_2}\n", "$$\n", "\n", "처럼 $\\mathbf{X}$와 $\\mathbf{Y}$의 곱은 $X_i Y_i$가 되어야 한다.\n", "\n", "아래처럼 $\\mathbf{X}$가 미분된 결과에 $\\mathbf{Y}$가 $X_i Y_i$ 형태로 적절히 곱해지기 위해서는 \n", "\n", "$$\n", "\\frac{\\partial \\, \\mathbf{X} }{\\partial \\, \\mathbf{Z}} =\n", "\\begin{bmatrix}\n", "\\dfrac{\\partial \\, \\color{RoyalBlue}{X_1}}{\\partial \\, Z_{11}} & \\dfrac{\\partial \\, \\color{OrangeRed}{X_2}}{\\partial \\, Z_{11}} & \\dfrac{\\partial \\, \\color{RoyalBlue}{X_1}}{\\partial \\, Z_{12}} & \\dfrac{\\partial \\, \\color{OrangeRed}{X_2}}{\\partial \\, Z_{12}} \\\\\n", "\\dfrac{\\partial \\, \\color{RoyalBlue}{X_1}}{\\partial \\, Z_{21}} & \\dfrac{\\partial \\, \\color{OrangeRed}{X_2}}{\\partial \\, Z_{21}} & \\dfrac{\\partial \\, \\color{RoyalBlue}{X_1}}{\\partial \\, Z_{22}} & \\dfrac{\\partial \\, \\color{OrangeRed}{X_2}}{\\partial \\, Z_{22}}\n", "\\end{bmatrix}\n", "$$\n", "\n", "$\\mathbf{Y}$의 형태가 다음처럼 확장되어야 한다.\n", "\n", "$$\n", "\\begin{bmatrix}\n", "\\color{RoyalBlue}{Y_1} & 0 \\\\ \\color{OrangeRed}{Y_2} & 0 \\\\ 0 & \\color{RoyalBlue}{Y_1} \\\\ 0 & \\color{OrangeRed}{Y_2}\n", "\\end{bmatrix} = \\mathbf{I}_{2} \\otimes \\mathbf{Y}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 곱의 미분 예제 : 텐서플로우 코리아 임성빈님에 의해 제시됨, https://www.facebook.com/groups/TensorFlowKR/permalink/581553265519069/\n", "\n", "$\\mathbf{X} : m \\times n = m \\times 1$, $\\mathbf{Y} : n \\times r = 1 \\times 1$, $\\mathbf{Z} : p \\times q = p \\times 1$ 일 때 $\\dfrac{\\partial \\, (\\mathbf{XY})}{\\partial \\, \\mathbf{Z}}$\n", "\n", "- 분자의 $\\mathbf{XY} : m \\times 1$인 벡터이므로 결국 $m \\times 1$ 벡터를 $p \\times 1$벡터로 미분하는 것이 되어 결과적으로 다음과 같이 야코비안 행렬이 된다. (분자레이아웃)\n", "\n", "\n", "$$\n", "\\frac{\\partial \\, (\\mathbf{XY})}{\\partial \\, \\mathbf{Z}}=\n", "\\begin{bmatrix}\n", "\\dfrac{\\partial \\, \\left(\\mathbf{XY}\\right)_1}{\\partial \\, \\mathbf{Z}_1} & \\dfrac{\\partial \\, \\left(\\mathbf{XY}\\right)_1}{\\partial \\, \\mathbf{Z}_2} & \\cdots & \\dfrac{\\partial \\, \\left(\\mathbf{XY}\\right)_1}{\\partial \\, \\mathbf{Z}_p} \\\\\n", "\\dfrac{\\partial \\, \\left(\\mathbf{XY}\\right)_2}{\\partial \\, \\mathbf{Z}_1} & \\dfrac{\\partial \\, \\left(\\mathbf{XY}\\right)_2}{\\partial \\, \\mathbf{Z}_2} & \\cdots & \\dfrac{\\partial \\, \\left(\\mathbf{XY}\\right)_2}{\\partial \\, \\mathbf{Z}_p} \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\dfrac{\\partial \\, \\left(\\mathbf{XY}\\right)_m}{\\partial \\, \\mathbf{Z}_1} & \\dfrac{\\partial \\, \\left(\\mathbf{XY}\\right)_m}{\\partial \\, \\mathbf{Z}_2} & \\cdots & \\dfrac{\\partial \\, \\left(\\mathbf{XY}\\right)_m}{\\partial \\, \\mathbf{Z}_p}\n", "\\end{bmatrix}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- 이것을 vec 연산자를 사용하여 미분하는 경우 이미 분자, 분모가 모두 벡터이므로 결과는 위와 동일하게 된다.\n", "\n", "- 하지만 이것을 크로네커 곱을 이용한 방법으로 나타내면 조금 복잡해지는데\n", "\n", "$$\n", "\\begin{align}\n", "\\frac{\\partial \\, (\\mathbf{XY})}{\\partial \\, \\mathbf{Z}} &= \\left( \\frac{\\partial \\, \\mathbf{X} }{\\partial \\, \\mathbf{Z}} \\right) \\left( \\mathbf{I}_{1} \\otimes \\mathbf{Y} \\right) + \\left( \\mathbf{I}_{p} \\otimes \\mathbf{X} \\right)\\left( \\frac{\\partial \\, \\mathbf{Y}}{\\partial \\, \\mathbf{Z}} \\right) \\\\\n", "&=\\left( \\frac{\\partial \\,}{\\partial \\, \\mathbf{Z}} \\otimes \\mathbf{X} \\right) \\left( \\mathbf{I}_{1} \\otimes \\mathbf{Y} \\right) + \\left( \\mathbf{I}_{p} \\otimes \\mathbf{X} \\right)\\left( \\frac{\\partial \\, }{\\partial \\, \\mathbf{Z}} \\otimes \\mathbf{Y} \\right)\n", "\\end{align}\n", "$$\n", "\n", "로 되며, 이때 $\\left( \\frac{\\partial \\, \\mathbf{X} }{\\partial \\, \\mathbf{Z}} \\right)$가 벡터를 벡터로 미분하는 경우이기 때문에 야코비안이 될 수도 있지만 크로네커 곱을 이용하여 미분을 계산할 때 모든 미분을 일관성있게 크로네커 곱의 방식으로 기술해야 한다. 그렇지 않으면 차원 맞춤이 깨지는 오류가 발생한다. 모든 미분을 크로네커 곱으로 확장하면\n", "\n", "$$\n", "\\underbrace{\\left( \\frac{\\partial \\,}{\\partial \\, \\mathbf{Z}} \\otimes \\mathbf{X} \\right)}_{mp \\times 1} \\underbrace{\\left( \\mathbf{I}_{1} \\otimes \\mathbf{Y} \\right)}_{1 \\times 1} + \\underbrace{\\left( \\mathbf{I}_{p} \\otimes \\mathbf{X} \\right)}_{mp \\times p} \\underbrace{\\left( \\frac{\\partial \\, }{\\partial \\, \\mathbf{Z}} \\otimes \\mathbf{Y} \\right)}_{p \\times 1}\n", "$$\n", "\n", "가 되어 결과는 $mp \\times 1$이 되고 이를 (m)-transpose 시키면 $m \\times p$ 야코비안과 일치하게 된다. 직접 계산을 해보면\n", "\n", "$$\n", "\\begin{align}\n", "\\frac{\\partial \\, \\mathbf{X} }{\\partial \\, \\mathbf{Z}} \n", "&= \\frac{\\partial \\,}{\\partial \\, \\mathbf{Z}} \\otimes \\mathbf{X} = \\begin{bmatrix}\n", "\\dfrac{\\partial}{\\partial \\, Z_{1}} \\\\\n", "\\dfrac{\\partial}{\\partial \\, Z_{2}} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial}{\\partial \\, Z_{p}}\n", "\\end{bmatrix}\\otimes \\mathbf{X} = \\left[\n", "\\begin{array}{c}\n", "\\color{RoyalBlue}{\\begin{matrix}\n", "\\dfrac{\\partial \\, X_1}{\\partial \\, Z_1} \\\\\n", "\\dfrac{\\partial \\, X_2}{\\partial \\, Z_1} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial \\, X_m}{\\partial \\, Z_1}\n", "\\end{matrix}} \\\\\n", "\\hline\n", "\\color{OrangeRed}{\\begin{matrix}\n", "\\dfrac{\\partial \\, X_1}{\\partial \\, Z_2} \\\\\n", "\\dfrac{\\partial \\, X_2}{\\partial \\, Z_2} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial \\, X_m}{\\partial \\, Z_2}\n", "\\end{matrix}} \\\\\n", "\\hline\n", "\\vdots \\\\\n", "\\hline\n", "\\color{YellowGreen}{\\begin{matrix}\n", "\\dfrac{\\partial \\, X_1}{\\partial \\, Z_p} \\\\\n", "\\dfrac{\\partial \\, X_2}{\\partial \\, Z_p} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial \\, X_m}{\\partial \\, Z_p}\n", "\\end{matrix}}\n", "\\end{array}\n", "\\right]\n", "\\end{align}\n", "$$\n", "\n", "$$\n", "\\mathbf{I}_{1} \\otimes \\mathbf{Y} = \\left[ Y_1 \\right]\n", "$$\n", "\n", "$$\n", "\\begin{align}\n", "\\mathbf{I}_p \\otimes \\mathbf{X} &= \\begin{bmatrix}\n", "1 & 0 & \\cdots & 0 \\\\\n", "0 & 1 & \\cdots & 0 \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "0 & 0 & \\cdots & 1\n", "\\end{bmatrix} \\otimes \\mathbf{X} = \\left[\n", "\\begin{array}{c}\n", "\\begin{matrix}\n", " X_1 & 0 & \\cdots & 0 \\\\\n", " X_2 & 0 & \\cdots & 0 \\\\\n", " \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", " X_m & 0 & \\cdots & 0 \n", "\\end{matrix} \\\\\n", "\\hline \n", "\\begin{matrix}\n", " 0 & X_1 & \\cdots & 0 \\\\\n", " 0 & X_2 & \\cdots & 0 \\\\\n", " \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", " 0 & X_m & \\cdots & 0 \n", "\\end{matrix} \\\\\n", "\\hline\n", "\\vdots \\\\\n", "\\hline \n", "\\begin{matrix}\n", " 0 & 0 & \\cdots & X_1 \\\\\n", " 0 & 0 & \\cdots & X_2 \\\\\n", " \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", " 0 & 0 & \\cdots & X_m \n", "\\end{matrix}\n", "\\end{array}\n", "\\right]\n", "\\end{align}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "\\frac{\\partial \\, \\mathbf{Y}}{\\partial \\, \\mathbf{Z}} = \\frac{\\partial}{\\partial \\, \\mathbf{Z}} \\otimes \\mathbf{Y} = \\begin{bmatrix}\n", "\\dfrac{\\partial \\, Y_1}{\\partial \\, Z_1} \\\\\n", "\\dfrac{\\partial \\, Y_1}{\\partial \\, Z_2} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial \\, Y_1}{\\partial \\, Z_p}\n", "\\end{bmatrix}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "각 항을 실제로 곱해보면\n", "\n", "$$\n", "\\begin{align}\n", "&\\underbrace{\\left( \\frac{\\partial \\,}{\\partial \\, \\mathbf{Z}} \\otimes \\mathbf{X} \\right)}_{mp \\times 1} \\underbrace{\\left( \\mathbf{I}_{1} \\otimes \\mathbf{Y} \\right)}_{1 \\times 1} + \\underbrace{\\left( \\mathbf{I}_{p} \\otimes \\mathbf{X} \\right)}_{mp \\times p} \\underbrace{\\left( \\frac{\\partial \\, }{\\partial \\, \\mathbf{Z}} \\otimes \\mathbf{Y} \\right)}_{p \\times 1} \\\\\n", "=& \\, \\left[\n", "\\begin{array}{c}\n", "\\color{RoyalBlue}{\\begin{matrix}\n", "Y_1 \\dfrac{\\partial \\, X_1}{\\partial \\, Z_1} \\\\\n", "Y_1 \\dfrac{\\partial \\, X_2}{\\partial \\, Z_1} \\\\\n", "\\vdots \\\\\n", "Y_1 \\dfrac{\\partial \\, X_m}{\\partial \\, Z_1}\n", "\\end{matrix}} \\\\\n", "\\hline\n", "\\color{OrangeRed}{\\begin{matrix}\n", "Y_1 \\dfrac{\\partial \\, X_1}{\\partial \\, Z_2} \\\\\n", "Y_1 \\dfrac{\\partial \\, X_2}{\\partial \\, Z_2} \\\\\n", "\\vdots \\\\\n", "Y_1 \\dfrac{\\partial \\, X_m}{\\partial \\, Z_2}\n", "\\end{matrix}} \\\\\n", "\\hline\n", "\\vdots \\\\\n", "\\hline\n", "\\color{YellowGreen}{\\begin{matrix}\n", "Y_1 \\dfrac{\\partial \\, X_1}{\\partial \\, Z_p} \\\\\n", "Y_1 \\dfrac{\\partial \\, X_2}{\\partial \\, Z_p} \\\\\n", "\\vdots \\\\\n", "Y_1 \\dfrac{\\partial \\, X_m}{\\partial \\, Z_p}\n", "\\end{matrix}}\n", "\\end{array}\n", "\\right] + \n", "\\left[\n", "\\begin{array}{c}\n", "\\color{RoyalBlue}{\\begin{matrix}\n", "X_1 \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_1} \\\\\n", "X_2 \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_1} \\\\\n", "\\vdots \\\\\n", "X_m \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_1}\n", "\\end{matrix}} \\\\\n", "\\hline\n", "\\color{OrangeRed}{\\begin{matrix}\n", "X_1 \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_2} \\\\\n", "X_2 \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_2} \\\\\n", "\\vdots \\\\\n", "X_m \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_2}\n", "\\end{matrix}} \\\\\n", "\\hline\n", "\\vdots \\\\\n", "\\hline\n", "\\color{YellowGreen}{\\begin{matrix}\n", "X_1 \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_p} \\\\\n", "X_2 \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_p} \\\\\n", "\\vdots \\\\\n", "X_m \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_p}\n", "\\end{matrix}}\n", "\\end{array}\n", "\\right] \\quad = \\quad\n", "\\left[\n", "\\begin{array}{c}\n", "\\color{RoyalBlue}{\\begin{matrix}\n", "Y_1 \\dfrac{\\partial \\, X_1}{\\partial \\, Z_1} + X_1 \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_1} \\\\\n", "Y_1 \\dfrac{\\partial \\, X_2}{\\partial \\, Z_1} + X_2 \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_1} \\\\\n", "\\vdots \\\\\n", "Y_1 \\dfrac{\\partial \\, X_m}{\\partial \\, Z_1} + X_m \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_1}\n", "\\end{matrix}} \\\\\n", "\\hline\n", "\\color{OrangeRed}{\\begin{matrix}\n", "Y_1 \\dfrac{\\partial \\, X_1}{\\partial \\, Z_2} + X_1 \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_2} \\\\\n", "Y_1 \\dfrac{\\partial \\, X_2}{\\partial \\, Z_2} + X_2 \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_2} \\\\\n", "\\vdots \\\\\n", "Y_1 \\dfrac{\\partial \\, X_m}{\\partial \\, Z_2} + X_m \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_2}\n", "\\end{matrix}} \\\\\n", "\\hline\n", "\\vdots \\\\\n", "\\hline\n", "\\color{YellowGreen}{\\begin{matrix}\n", "Y_1 \\dfrac{\\partial \\, X_1}{\\partial \\, Z_p} + X_1 \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_p} \\\\\n", "Y_1 \\dfrac{\\partial \\, X_2}{\\partial \\, Z_p} + X_2 \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_p} \\\\\n", "\\vdots \\\\\n", "Y_1 \\dfrac{\\partial \\, X_m}{\\partial \\, Z_p} + X_m \\dfrac{\\partial \\, Y_1}{\\partial \\, Z_p}\n", "\\end{matrix}}\n", "\\end{array}\n", "\\right] \\quad = \\quad\n", "\\left[\n", "\\begin{array}{c}\n", "\\color{RoyalBlue}{\\begin{matrix}\n", "\\dfrac{\\partial \\, X_1 Y_1}{\\partial \\, Z_1} \\\\\n", "\\dfrac{\\partial \\, X_2 Y_1}{\\partial \\, Z_1} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial \\, X_m Y_1}{\\partial \\, Z_1}\n", "\\end{matrix}} \\\\\n", "\\hline\n", "\\color{OrangeRed}{\\begin{matrix}\n", "\\dfrac{\\partial \\, X_1 Y_1}{\\partial \\, Z_2} \\\\\n", "\\dfrac{\\partial \\, X_2 Y_1}{\\partial \\, Z_2} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial \\, X_m Y_1}{\\partial \\, Z_2}\n", "\\end{matrix}} \\\\\n", "\\hline\n", "\\vdots \\\\\n", "\\hline\n", "\\color{YellowGreen}{\\begin{matrix}\n", "\\dfrac{\\partial \\, X_1 Y_1}{\\partial \\, Z_p} \\\\\n", "\\dfrac{\\partial \\, X_2 Y_1}{\\partial \\, Z_p} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial \\, X_m Y_1}{\\partial \\, Z_p}\n", "\\end{matrix}}\n", "\\end{array}\n", "\\right]\n", "\\end{align}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "최종 결과를 (m)-transpose 시키면 야코비안이 된다.\n", "\n", "$$\n", "\\left[\\left( \\frac{\\partial \\,}{\\partial \\, \\mathbf{Z}} \\otimes \\mathbf{X} \\right) \\left( \\mathbf{I}_{1} \\otimes \\mathbf{Y} \\right) + \\left( \\mathbf{I}_{p} \\otimes \\mathbf{X} \\right)\\left( \\frac{\\partial \\, }{\\partial \\, \\mathbf{Z}} \\otimes \\mathbf{Y} \\right)\\right]^{(m)} =\n", "\\left[\n", "\\begin{array}{c}\n", "\\color{RoyalBlue}{\\begin{matrix}\n", "\\dfrac{\\partial \\, X_1 Y_1}{\\partial \\, Z_1} \\\\\n", "\\dfrac{\\partial \\, X_2 Y_1}{\\partial \\, Z_1} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial \\, X_m Y_1}{\\partial \\, Z_1}\n", "\\end{matrix}} \\\\\n", "\\hline\n", "\\color{OrangeRed}{\\begin{matrix}\n", "\\dfrac{\\partial \\, X_1 Y_1}{\\partial \\, Z_2} \\\\\n", "\\dfrac{\\partial \\, X_2 Y_1}{\\partial \\, Z_2} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial \\, X_m Y_1}{\\partial \\, Z_2}\n", "\\end{matrix}} \\\\\n", "\\hline\n", "\\vdots \\\\\n", "\\hline\n", "\\color{YellowGreen}{\\begin{matrix}\n", "\\dfrac{\\partial \\, X_1 Y_1}{\\partial \\, Z_p} \\\\\n", "\\dfrac{\\partial \\, X_2 Y_1}{\\partial \\, Z_p} \\\\\n", "\\vdots \\\\\n", "\\dfrac{\\partial \\, X_m Y_1}{\\partial \\, Z_p}\n", "\\end{matrix}}\n", "\\end{array}\n", "\\right] ^{(m)} = \\quad\n", "\\begin{bmatrix}\n", "\\color{RoyalBlue}{\\dfrac{\\partial \\, X_1 Y_1}{\\partial \\, Z_1}} & \\color{OrangeRed}{\\dfrac{\\partial \\, X_1 Y_1}{\\partial \\, Z_2}} & \\cdots & \\color{YellowGreen}{\\dfrac{\\partial \\, X_1 Y_1}{\\partial \\, Z_p}} \\\\\n", "\\color{RoyalBlue}{\\dfrac{\\partial \\, X_2 Y_1}{\\partial \\, Z_1}} & \\color{OrangeRed}{\\dfrac{\\partial \\, X_2 Y_1}{\\partial \\, Z_2}} & \\cdots & \\color{YellowGreen}{\\dfrac{\\partial \\, X_2 Y_1}{\\partial \\, Z_p}}\\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\color{RoyalBlue}{\\dfrac{\\partial \\, X_m Y_1}{\\partial \\, Z_1}} & \\color{OrangeRed}{\\dfrac{\\partial \\, X_m Y_1}{\\partial \\, Z_2}} & \\cdots & \\color{YellowGreen}{\\dfrac{\\partial \\, X_m Y_1}{\\partial \\, Z_p}}\n", "\\end{bmatrix} = \\frac{\\partial \\, \\mathbf{XY}}{\\partial \\, \\mathbf{Z}}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 행렬 미분의 몇가지 공식[4]\n", "\n", "- 머신러닝을 공부하다 보면 역전파 알고리즘이나 정규분포의 MLE를 구할 때 행렬 미분이 쓰이는 경우가 있는데 그때 유용하게 쓸 수 있는 몇개지 공식들을 정리했다.\n", "\n", "#### [1] $\\dfrac{\\partial \\, \\mathbf{X}^{-1}}{\\partial \\, x} = -\\mathbf{X}^{-1}\\dfrac{\\partial \\, \\mathbf{X}}{\\partial \\, x}\\mathbf{X}^{-1}$ (matrix cookbook eq.59)\n", "\n", "$$\n", "\\begin{align}\n", "&\\mathbf{X}^{-1} \\mathbf{X} = \\mathbf{I} \\\\[5pt]\n", "&\\mathbf{X}^{-1} \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, x} + \\frac{\\partial \\, \\mathbf{X}^{-1}}{\\partial \\, x} \\mathbf{X} = \\mathbf{0} \\\\[5pt]\n", "& \\frac{\\partial \\, \\mathbf{X}^{-1}}{\\partial \\, x} \\mathbf{X} = - \\mathbf{X}^{-1} \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, x} \\\\[5pt]\n", "&\\frac{\\partial \\, \\mathbf{X}^{-1}}{\\partial \\, x} = - \\mathbf{X}^{-1} \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, x} \\mathbf{X}^{-1}\n", "\\end{align} \n", "$$\n", "\n", "#### [2] $\\dfrac{\\partial \\, }{\\partial \\, \\mathbf{A}} \\text{tr}(\\mathbf{AB}) = \\mathbf{B}^{\\text{T}}$ (matrix cookbook eq.100)\n", "\n", "$$\n", "\\text{tr}(\\mathbf{AB}) =\\sum_{i} \\sum_{j} (\\mathbf{A})_{ij}(\\mathbf{B})_{ji}\n", "$$\n", "\n", "이므로 인덱스 형식으로 쓰면 \n", "\n", "$$\n", "\\dfrac{\\partial \\, }{\\partial \\, (\\mathbf{A})_{ij}} \\sum_{i} \\sum_{j} (\\mathbf{A})_{ij}(\\mathbf{B})_{ji} = (\\mathbf{B})_{ji}\n", "$$\n", "\n", "따라서\n", "\n", "$$\\dfrac{\\partial \\, }{\\partial \\, \\mathbf{A}} \\text{tr}(\\mathbf{AB}) = \\mathbf{B}^{\\text{T}}$$\n", "\n", "같은 방법으로\n", "\n", "$$\n", "\\dfrac{\\partial \\, }{\\partial \\, (\\mathbf{B})_{ji}} \\sum_{i} \\sum_{j} (\\mathbf{A})_{ij}(\\mathbf{B})_{ji} = (\\mathbf{A})_{ij}\n", "$$\n", "\n", "따라서 \n", "\n", "$$\\dfrac{\\partial \\, }{\\partial \\, \\mathbf{B}} \\text{tr}(\\mathbf{AB}) = \\mathbf{A}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### [3] $\\dfrac{\\partial \\, \\lvert \\mathbf{X} \\rvert}{\\partial \\, \\mathbf{X}} = \\lvert \\mathbf{X} \\rvert \\left(\\mathbf{X}^{-1}\\right)^{\\text{T}}$ (matrix cookbook eq.49)\n", "\n", "위 식을 보이기 위해서는 미리 알아야할 내용이 조금 있다. 우선 역행렬은 다음처럼 구할 수 있으며[5]\n", "\n", "$$\n", "\\mathbf{X}^{-1} = \\frac{1}{\\lvert \\mathbf{X} \\rvert } \\left[ C_{ij} \\right]^{\\text{T}}\n", "$$\n", "\n", "위 식에서 $C_{ij}$는 다음처럼 정의되는 여인수이다. $M_{ij}$는 $\\mathbf{X}$의 i행과 j열을 제외하여 얻은 부분 행렬의 행렬식을 나타낸다.\n", "\n", "$$\n", "C_{ij} = (-1)^{i+j}M_{ij}\n", "$$\n", "\n", "이 여인수의 행렬의 전치 $\\left[ C_{ij} \\right]^{\\text{T}}$를 adjugate 행렬[6]이라하고 다음처럼 표시한다.\n", "\n", "$$\n", "\\text{adj}(\\mathbf{X}) = \\left[ C_{ij} \\right]^{\\text{T}}\n", "$$\n", "\n", "이를 이용하여 역행렬을 다시 나타내면\n", "\n", "$$\n", "\\mathbf{X}^{-1} = \\frac{1}{\\lvert \\mathbf{X} \\rvert } \\text{adj}(\\mathbf{X}) \\tag{1}\n", "$$\n", "\n", "표준 미분소 표현Canonical differential form에 대해 이와 동등한 미분 또는 도함수표현Equivalent derivative form을 다음과 같이 몇가지를 써볼 수 있다.[7]\n", "\n", "$$\n", "\\begin{align}\n", "dy = a dx &\\implies \\frac{dy}{dx} = a \\\\[5pt]\n", "dy = \\mathbf{a} d\\mathbf{x} &\\implies \\frac{dy}{\\text{d}\\mathbf{x}} = \\mathbf{a} \\\\[5pt]\n", "dy = \\text{tr}(\\mathbf{A} \\text{d}\\mathbf{X}) &\\implies \\frac{dy}{\\text{d}\\mathbf{X}} = \\mathbf{A}\n", "\\end{align} \\tag{2}\n", "$$\n", "\n", "위 식에서 $dy$, $dx$, $\\text{d}\\mathbf{X}$는 미분소differential or infinitesimal로 변수의 미소변량을 나타내고 이 미소변량의 비율인 $\\frac{dy}{dx}$, $\\frac{dy}{\\text{d}\\mathbf{X}}$을 미분 또는 도함수derivative라 한다.[8],[9] \n", "\n", "세번째 식은 위에서 보인 $\\frac{\\partial \\, }{\\partial \\, \\mathbf{B}} \\text{tr}(\\mathbf{AB}) = \\mathbf{A}$를 이용하면\n", "\n", "$$\n", "\\frac{ \\text{tr}(\\mathbf{A} \\text{d}\\mathbf{X}) }{\\text{d}\\mathbf{X}} = \\frac{ \\text{d}\\left(\\text{tr}(\\mathbf{A} \\mathbf{X})\\right) }{\\text{d}\\mathbf{X}}= \\mathbf{A}\n", "$$\n", "\n", "임을 바로 알 수 있다.\n", "\n", "한편 행렬식의 미분에 관한 야코비 공식Jacobi's_formula[10]이 있는데 여기서 이를 증명하기는 너무 길고 지루하므로 일단 다음 결과를 받아 들이도록 한다.\n", "\n", "$$\n", "\\text{d} \\lvert \\mathbf{X} \\rvert = \\text{tr} \\left( \\text{adj}(\\mathbf{X}) \\text{d}\\mathbf{X} \\right) \\tag{3}\n", "$$\n", "\n", "증명은 위키에 아주 자세히 나와 있다.\n", "\n", "이상의 내용을 이용하면 보이고자 하는 미분은 비교적 간단하게 보일 수 있다. 식(1)로 부터\n", "\n", "$$\n", "\\begin{align}\n", "\\mathbf{X}^{-1} &= \\frac{1}{\\lvert \\mathbf{X} \\rvert} \\text{adj}(\\mathbf{X}) \\\\[5pt]\n", "\\lvert \\mathbf{X} \\rvert \\mathbf{X}^{-1} &= \\text{adj}(\\mathbf{X}) \\\\[5pt]\n", "\\lvert \\mathbf{X} \\rvert \\mathbf{X}^{-1} \\partial \\mathbf{X} &= \\text{adj}(\\mathbf{X})\\partial \\mathbf{X} \\\\[5pt]\n", "\\text{tr}\\left(\\lvert \\mathbf{X} \\rvert \\mathbf{X}^{-1} \\partial \\mathbf{X}\\right) &= \\text{tr}\\left(\\text{adj}(\\mathbf{X})\\partial \\mathbf{X}\\right)\n", "\\end{align}\n", "$$\n", "\n", "이며 식(3)에 의해\n", "\n", "$$\n", "\\partial \\lvert \\mathbf{X} \\rvert = \\text{tr}\\left(\\lvert \\mathbf{X} \\rvert \\mathbf{X}^{-1} \\partial \\mathbf{X}\\right)\n", "$$\n", "\n", "가 되고 식(2) 3번째 식에 의해\n", "\n", "$$\n", "\\partial \\lvert \\mathbf{X} \\rvert = \\text{tr}\\left(\\lvert \\mathbf{X} \\rvert \\mathbf{X}^{-1} \\partial \\mathbf{X}\\right) \\implies \\frac{ \\partial \\, \\lvert \\mathbf{X} \\rvert}{ \\partial\\mathbf{X}} = \\lvert \\mathbf{X} \\rvert \\mathbf{X}^{-1}\n", "$$\n", "\n", "가 됨을 알 수 있다.\n", "\n", "\n", "또는 좀 더 풀어 써보면 $\\frac{ \\text{tr}(\\mathbf{A} \\text{d}\\mathbf{X}) }{\\text{d}\\mathbf{X}} = \\mathbf{A}$에 의해\n", "\n", "$$\n", "\\frac{ \\partial \\, \\lvert \\mathbf{X} \\rvert}{ \\partial \\, \\mathbf{X}} = \\frac{ \\text{tr}\\left(\\lvert \\mathbf{X} \\rvert \\mathbf{X}^{-1} \\color{RoyalBlue}{ \\partial \\mathbf{X}}\\right) }{\\color{RoyalBlue}{ \\partial\\mathbf{X}}} = \\lvert \\mathbf{X} \\rvert \\mathbf{X}^{-1}\n", "$$\n", "\n", "한편 $\\text{tr}(\\mathbf{AB}) = \\text{tr}(\\mathbf{BA})$ 이므로\n", "\n", "$$\n", "\\frac{ \\partial \\, \\lvert \\mathbf{X} \\rvert}{ \\partial\\mathbf{X}} = \\frac{ \\text{tr}\\left(\\lvert \\mathbf{X} \\rvert \\mathbf{X}^{-1} \\color{RoyalBlue}{ \\partial \\mathbf{X}}\\right) }{\\color{RoyalBlue}{ \\partial \\mathbf{X}}} = \\frac{ \\text{tr}\\left( \\color{RoyalBlue}{ \\partial \\mathbf{X}} \\lvert \\mathbf{X} \\rvert \\mathbf{X}^{-1} \\right) }{\\color{RoyalBlue}{ \\partial \\mathbf{X}}} = \\left( \\lvert \\mathbf{X} \\rvert \\mathbf{X}^{-1} \\right)^{\\text{T}} = \\lvert \\mathbf{X} \\rvert \\left( \\mathbf{X}^{-1} \\right)^{\\text{T}}\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### [4] $\\dfrac{\\partial \\, }{\\partial \\, \\mathbf{X}} \\ln \\lvert \\mathbf{X} \\rvert = \\left( \\mathbf{X}^{-1} \\right)^{\\text{T}}$ (matrix cookbook eq.57)\n", "\n", "위 결과를 이용하면\n", "\n", "$$\n", "\\frac{\\partial \\, }{\\partial \\, \\mathbf{X}} \\ln \\lvert \\mathbf{X} \\rvert = \\frac{1}{ \\lvert \\mathbf{X} \\rvert } \\frac{\\partial \\, \\lvert \\mathbf{X} \\rvert}{\\partial \\, \\mathbf{X}} = \\frac{1}{ \\lvert \\mathbf{X} \\rvert } \\lvert \\mathbf{X} \\rvert \\mathbf{X}^{-1}=\\mathbf{X}^{-1}\n", "$$\n", "\n", "또는\n", "\n", "$$\n", "\\frac{\\partial \\, }{\\partial \\, \\mathbf{X}} \\ln \\lvert \\mathbf{X} \\rvert = \\frac{1}{ \\lvert \\mathbf{X} \\rvert } \\frac{\\partial \\, \\lvert \\mathbf{X} \\rvert}{\\partial \\, \\mathbf{X}} = \\frac{1}{ \\lvert \\mathbf{X} \\rvert } \\lvert \\mathbf{X} \\rvert \\left(\\mathbf{X}^{-1}\\right)^{\\text{T}}= \\left(\\mathbf{X}^{-1}\\right)^{\\text{T}}\n", "$$\n", "\n", "#### [5] $\\dfrac{\\partial \\, \\mathbf{a}^{\\text{T}} \\mathbf{X} \\mathbf{b}}{\\partial \\, \\mathbf{X}}=\\mathbf{ab}^{\\text{T}}$ (matrix cookbook eq.70)\n", "\n", "$\\mathbf{a}^{\\text{T}} : 1 \\times m$, $\\mathbf{X} : m \\times n$, $\\mathbf{b} : n \\times 1$ 인 임의의 벡터와 행렬이라고 가정한다. \n", "\n", "$\\mathbf{a}^{\\text{T}} \\mathbf{X} \\mathbf{b}$는 결과가 숫자 이므로 $\\text{tr}(\\mathbf{a}^{\\text{T}} \\mathbf{X} \\mathbf{b})$로 트레이스를 씌워도 결과가 변하지 않는다. 따라서 $\\frac{\\partial \\, }{\\partial \\, \\mathbf{A}} \\text{tr}(\\mathbf{AB}) = \\mathbf{B}^{\\text{T}}$을 사용하면 다음처럼 간단히 보일 수 있다.\n", "\n", "$$\n", "\\frac{\\partial \\, \\mathbf{a}^{\\text{T}} \\mathbf{X} \\mathbf{b}}{\\partial \\, \\mathbf{X}} = \\frac{\\partial \\, \\text{tr}\\left(\\mathbf{a}^{\\text{T}} \\mathbf{X} \\mathbf{b}\\right)}{\\partial \\, \\mathbf{X}} = \\frac{\\partial \\, \\text{tr}\\left(\\mathbf{X} \\mathbf{b} \\mathbf{a}^{\\text{T}} \\right)}{\\partial \\, \\mathbf{X}} = \\left( \\mathbf{b} \\mathbf{a}^{\\text{T}} \\right)^{\\text{T}} = \\mathbf{a}\\mathbf{b}^{\\text{T}}\n", "$$\n", "\n", "또는 약간 번거롭지만 크로네커 곱과 곱의 미분법을 그대로 적용해서도 보일 수 있다.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "\\begin{align}\n", "\\frac{\\partial \\, \\mathbf{a}^{\\text{T}} \\mathbf{X} \\mathbf{b}}{\\partial \\, \\mathbf{X}} \n", "&= \\frac{\\partial \\, \\left(\\mathbf{a}^{\\text{T}} \\mathbf{X} \\right) \\mathbf{b}}{\\partial \\, \\mathbf{X}} \\\\[5pt]\n", "&= \\underbrace{\\frac{\\partial \\, \\mathbf{a}^{\\text{T}} \\mathbf{X}}{\\partial \\, \\mathbf{X}}}_{m \\times n^2} \\underbrace{\\left( \\mathbf{I}_n \\otimes \\mathbf{b} \\right)}_{n^2 \\times n} + \\underbrace{\\left(\\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}} \\mathbf{X}\\right)}_{m \\times mn} \\underbrace{\\frac{\\partial \\, \\mathbf{b}}{\\partial \\, \\mathbf{X}}}_{mn \\times n} \\\\[5pt]\n", "&= \\left\\{ \\underbrace{\\frac{\\partial \\, \\mathbf{a}^{\\text{T}}}{\\partial \\, \\mathbf{X}}}_{m \\times mn} \\underbrace{ \\left(\\mathbf{I}_n \\otimes \\mathbf{X} \\right)}_{mn \\times n^2} + \\underbrace{\\left( \\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}}\\right)}_{m \\times m^2} \\underbrace{ \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}}}_{m^2 \\times n^2} \\right\\} \\left( \\mathbf{I}_n \\otimes \\mathbf{b} \\right) \\\\[5pt]\n", "&= \\left( \\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}}\\right) \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}} \\left( \\mathbf{I}_n \\otimes \\mathbf{b} \\right)\n", "\\end{align}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\left( \\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}}\\right)$은 $m \\times m$인 부분행렬이 행방향으로 $m$개 늘어선 형태로 $m \\times m^2$인 행렬이 된다. 부분행렬은 그 행렬이 전체 행렬에서 위치하는 곳의 행을 $\\mathbf{a}^{\\text{T}}$로 가지는 행렬이다.\n", "\n", "$$\n", "\\left( \\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}}\\right) =\n", "\\begin{bmatrix}\n", "\\color{RoyalBlue}{\\begin{matrix}a_1 & a_2 & \\cdots & a_m\\end{matrix}} & | & \\mathbf{0}^{\\text{T}} & | & \\cdots & | & \\mathbf{0}^{\\text{T}} \\\\\n", "\\mathbf{0}^{\\text{T}} & | & \\color{RoyalBlue}{\\begin{matrix}a_1 & a_2 & \\cdots & a_m\\end{matrix}} & | & \\cdots & | & \\mathbf{0}^{\\text{T}} \\\\\n", "\\vdots & | & \\vdots & | & \\ddots & | & \\vdots \\\\\n", "\\mathbf{0}^{\\text{T}} & | & \\mathbf{0}^{\\text{T}} & | & \\cdots & | & \\color{RoyalBlue}{\\begin{matrix}a_1 & a_2 & \\cdots & a_m\\end{matrix}} \n", "\\end{bmatrix}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}}$은 $m \\times n$ 부분 행렬이 $m \\times n$으로 바둑판 형식으로 늘어선 행렬로 $m^2 \\times n^2$행렬이 되며 여기서 각 부분 행렬은 전체 행렬에서 그 부분행렬이 위치하는 자리만 1이고 나머지는 모두 0인 행렬이 된다.\n", "\n", "즉, 아래 식처럼 첫번째 부분행렬은 (1,1)만 1이고 나머지는 모두 0인 부분행렬이고, 그 오른쪽 옆 행렬은 (1,2)만 1이고 나머지는 모두 0인 부분행렬이 되는 식이다.\n", "\n", "$$\n", "\\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}} =\n", "\\left[\n", "\\begin{array}{c|c|c|c}\n", "\\begin{matrix}\n", "1 & 0 & \\cdots & 0 \\\\ 0 & 0 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & 0 & \\cdots & 0\n", "\\end{matrix} & \n", "\\begin{matrix}\n", "0 & 1 & \\cdots & 0 \\\\ 0 & 0 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & 0 & \\cdots & 0\n", "\\end{matrix} &\n", "\\begin{matrix}\\cdots \\\\ \\cdots \\\\ \\cdots \\\\ \\cdots \\end{matrix} &\n", "\\begin{matrix}\n", "0 & 0 & \\cdots & 1 \\\\ 0 & 0 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & 0 & \\cdots & 0\n", "\\end{matrix} \\\\\n", "\\begin{matrix}-&-&-&-\\end{matrix} & \\begin{matrix}-&-&-&-\\end{matrix} & \\begin{matrix}-&-&-&-\\end{matrix} & \\begin{matrix}-&-&-&-\\end{matrix} \\\\\n", "\\begin{matrix}\n", "0 & 0 & \\cdots & 0 \\\\ 1 & 0 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & 0 & \\cdots & 0\n", "\\end{matrix} & \n", "\\begin{matrix}\n", "0 & 0 & \\cdots & 0 \\\\ 0 & 1 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & 0 & \\cdots & 0\n", "\\end{matrix} &\n", "\\begin{matrix}\\cdots \\\\ \\cdots \\\\ \\cdots \\\\ \\cdots \\end{matrix} &\n", "\\begin{matrix}\n", "0 & 0 & \\cdots & 0 \\\\ 0 & 0 & \\cdots & 1 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & 0 & \\cdots & 0\n", "\\end{matrix} \\\\\n", "\\begin{matrix}-&-&-&-\\end{matrix} & \\begin{matrix}-&-&-&-\\end{matrix} & \\begin{matrix}-&-&-&-\\end{matrix} & \\begin{matrix}-&-&-&-\\end{matrix} \\\\\n", "\\begin{matrix}\\vdots&\\vdots&\\vdots&\\vdots\\end{matrix} & \\begin{matrix}\\vdots&\\vdots&\\vdots&\\vdots\\end{matrix} & \\begin{matrix}\\vdots&\\vdots&\\vdots&\\vdots\\end{matrix} & \\begin{matrix}\\vdots&\\vdots&\\vdots&\\vdots\\end{matrix} \\\\\n", "\\begin{matrix}-&-&-&-\\end{matrix} & \\begin{matrix}-&-&-&-\\end{matrix} & \\begin{matrix}-&-&-&-\\end{matrix} & \\begin{matrix}-&-&-&-\\end{matrix} \\\\\n", "\\begin{matrix}\n", "0 & 0 & \\cdots & 0 \\\\ 0 & 0 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 1 & 0 & \\cdots & 0\n", "\\end{matrix} & \n", "\\begin{matrix}\n", "0 & 0 & \\cdots & 0 \\\\ 0 & 0 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & 1 & \\cdots & 0\n", "\\end{matrix} &\n", "\\begin{matrix}\\cdots \\\\ \\cdots \\\\ \\cdots \\\\ \\cdots \\end{matrix} &\n", "\\begin{matrix}\n", "0 & 0 & \\cdots & 0 \\\\ 0 & 0 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & 0 & \\cdots & 1\n", "\\end{matrix}\n", "\\end{array}\n", "\\right]\n", "$$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "위 두 행렬을 먼저 곱하면 $m \\times n$ 부분행렬이 $n$개 만큼 행방향으로 늘어선 $m \\times n^2$인 행렬이 되는데 전체 행렬에서 부분행렬이 있는 위치의 열이 $\\mathbf{a}$가 되는 행렬이다.\n", "\n", "$$\n", "\\left( \\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}}\\right) \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}} = \n", "\\left[\n", "\\begin{array}{c|c|c|c}\n", "\\begin{matrix}\n", "a_1 & 0 & \\cdots & 0 \\\\ a_2 & 0 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ a_m & 0 & \\cdots & 0\n", "\\end{matrix} &\n", "\\begin{matrix}\n", "0 & a_1 & \\cdots & 0 \\\\ 0 & a_2 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & a_m & \\cdots & 0\n", "\\end{matrix} & \n", "\\begin{matrix}\\cdots \\\\ \\cdots \\\\ \\cdots \\\\ \\cdots \\end{matrix} &\n", "\\begin{matrix}\n", "0 & 0 & \\cdots & a_1 \\\\ 0 & 0 & \\cdots & a_2 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & 0 & \\cdots & a_m\n", "\\end{matrix}\n", "\\end{array}\n", "\\right]\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "한편 $\\left( \\mathbf{I}_n \\otimes \\mathbf{b} \\right)$은 $n \\times n$ 부분행렬이 열방향으로 늘어선 $n^2 \\times n$인 행렬로 전체 행렬에서 부분행렬이 있는 위치의 열이 $\\mathbf{b}$가 되는 행렬이다.\n", "\n", "$$\n", "\\left( \\mathbf{I}_n \\otimes \\mathbf{b} \\right) = \n", "\\begin{bmatrix}\n", "\\begin{matrix}\n", "b_1 & 0 & \\cdots & 0 \\\\ b_2 & 0 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ b_n & 0 & \\cdots & 0\n", "\\end{matrix} \\\\\n", "\\begin{matrix} - & - & - & - \\end{matrix} \\\\\n", "\\begin{matrix}\n", "0 & b_1 & \\cdots & 0 \\\\ 0 & b_2 & \\cdots & 0 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & b_n & \\cdots & 0\n", "\\end{matrix} \\\\\n", "\\begin{matrix} - & - & - & - \\end{matrix} \\\\\n", "\\begin{matrix} \\vdots & \\vdots & \\vdots & \\vdots \\end{matrix}\\\\\n", "\\begin{matrix} - & - & - & - \\end{matrix} \\\\\n", "\\begin{matrix}\n", "0 & 0 & \\cdots & b_1 \\\\ 0 & 0 & \\cdots & b_2 \\\\ \\vdots & \\vdots & \\ddots & \\vdots \\\\ 0 & 0 & \\cdots & b_n\n", "\\end{matrix} \n", "\\end{bmatrix}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "마지막으로 두 행렬을 곱하면 원하는 결과를 얻을 수 있다.\n", "\n", "$$\n", "\\left( \\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}}\\right) \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}} \\left( \\mathbf{I}_n \\otimes \\mathbf{b} \\right) = \n", "\\begin{bmatrix}\n", "a_1 b_1 & a_1 b_2 & \\cdots & a_1 b_n \\\\\n", "a_2 b_1 & a_2 b_2 & \\cdots & a_2 b_n \\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "a_m b_1 & a_m b_2 & \\cdots & a_m b_n\n", "\\end{bmatrix} = \\mathbf{a} \\mathbf{b}^{\\text{T}}\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### [6] $\\dfrac{\\partial \\, \\mathbf{a}^{\\text{T}} \\mathbf{X}^{-1} \\mathbf{b}}{\\partial \\, \\mathbf{X}}= -\\mathbf{X}^{\\text{T}} \\mathbf{ab}^{\\text{T}} \\mathbf{X}^{-\\text{T}}$ (matrix cookbook eq.61)\n", "\n", "위 미분은 앞선 matrix cookbook eq.70 미분 공식과 크로네커 곱의 두 성질[11]\n", "\n", "$$\n", "\\left( \\mathbf{A} \\otimes \\mathbf{B} \\right)^{-1} = \\mathbf{A}^{-1} \\otimes \\mathbf{B}^{-1}\n", "$$\n", "\n", "$$\n", "\\left( \\mathbf{A} \\otimes \\mathbf{B} \\right)\\left( \\mathbf{C} \\otimes \\mathbf{D} \\right) = \\mathbf{A}\\mathbf{C} \\otimes \\mathbf{B}\\mathbf{D}\n", "$$\n", "\n", "을 이용하여 보일 수 있다.\n", "\n", "역행렬을 가지는 $\\mathbf{X} : m \\times m$와 $\\mathbf{a}^{\\text{T}} : 1 \\times m$, $\\mathbf{b} : m \\times 1$ 임의의 벡터를 가정한다. \n", "\n", "$$\n", "\\begin{align}\n", "\\frac{\\partial \\, \\mathbf{a}^{\\text{T}} \\mathbf{X}^{-1} \\mathbf{b}}{\\partial \\, \\mathbf{X}} \n", "&= \\frac{\\partial \\, \\mathbf{a}^{\\text{T}} \\mathbf{X}^{-1}}{\\partial \\, \\mathbf{X}} \\left( \\mathbf{I}_m \\otimes \\mathbf{b} \\right) + \\left( \\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}} \\mathbf{X}^{-1} \\right) \\frac{\\partial \\, \\mathbf{b}}{\\partial \\, \\mathbf{X}} \\\\[5pt]\n", "&= \\left( \\frac{\\partial \\, \\mathbf{a}^{\\text{T}}}{\\partial \\, \\mathbf{X}} \\left( \\mathbf{I}_m \\otimes \\mathbf{X}^{-1} \\right) + \\left( \\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}} \\right) \\frac{\\partial \\, \\mathbf{X}^{-1}}{\\partial \\, \\mathbf{X}} \\right) \\left( \\mathbf{I}_m \\otimes \\mathbf{b} \\right) \\\\[5pt]\n", "&= \\left( \\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}} \\right) \\frac{\\partial \\, \\mathbf{X}^{-1}}{\\partial \\, \\mathbf{X}} \\left( \\mathbf{I}_m \\otimes \\mathbf{b} \\right)\n", "\\end{align} \\tag{1}\n", "$$\n", "\n", "한편 $\\mathbf{X}^{-1} \\mathbf{X} = \\mathbf{I}$에서\n", "\n", "$$\n", "\\frac{\\partial \\,\\mathbf{X}^{-1} \\mathbf{X}}{\\partial \\, \\mathbf{X}} = \\frac{\\partial \\, \\mathbf{I}}{\\partial \\, \\mathbf{X}} \\\\[5pt]\n", "\\frac{\\partial \\,\\mathbf{X}^{-1} }{\\partial \\, \\mathbf{X}}\\left( \\mathbf{I}_m \\otimes \\mathbf{X} \\right) + \\left( \\mathbf{I}_m \\otimes \\mathbf{X}^{-1} \\right) \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}} = \\mathbf{0} \\\\[5pt]\n", "\\frac{\\partial \\,\\mathbf{X}^{-1} }{\\partial \\, \\mathbf{X}} \\left( \\mathbf{I}_m \\otimes \\mathbf{X} \\right) \\left( \\mathbf{I}_m \\otimes \\mathbf{X} \\right)^{-1} + \\left( \\mathbf{I}_m \\otimes \\mathbf{X}^{-1} \\right) \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}} \\left( \\mathbf{I}_m \\otimes \\mathbf{X} \\right)^{-1}= \\mathbf{0} \\\\[5pt]\n", "\\frac{\\partial \\,\\mathbf{X}^{-1} }{\\partial \\, \\mathbf{X}} = - \\left( \\mathbf{I}_m \\otimes \\mathbf{X}^{-1} \\right) \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}} \\left( \\mathbf{I}_m \\otimes \\mathbf{X} \\right)^{-1}\n", "$$\n", "\n", "이제 $ \\left( \\mathbf{A} \\otimes \\mathbf{B} \\right)^{-1} = \\mathbf{A}^{-1} \\otimes \\mathbf{B}^{-1}$를 이용하면 다음처럼 된다.\n", "\n", "$$\n", "\\frac{\\partial \\,\\mathbf{X}^{-1} }{\\partial \\, \\mathbf{X}} = - \\left( \\mathbf{I}_m \\otimes \\mathbf{X}^{-1} \\right) \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}} \\left( \\mathbf{I}_m \\otimes \\mathbf{X}^{-1} \\right) \\tag{2}\n", "$$\n", "\n", "(2)를 (1)에 대입하면\n", "\n", "$$\n", "\\begin{align}\n", "\\frac{\\partial \\, \\mathbf{a}^{\\text{T}} \\mathbf{X}^{-1} \\mathbf{b}}{\\partial \\, \\mathbf{X}} &= - \\left( \\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}} \\right) \\left( \\mathbf{I}_m \\otimes \\mathbf{X}^{-1} \\right) \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}} \\left( \\mathbf{I}_m \\otimes \\mathbf{X}^{-1} \\right) \\left( \\mathbf{I}_m \\otimes \\mathbf{b} \\right) \\\\[5pt]\n", "&= - \\left( \\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}} \\mathbf{X}^{-1} \\right) \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}} \\left( \\mathbf{I}_m \\otimes \\mathbf{X}^{-1} \\mathbf{b} \\right) \\quad \\because \\left( \\mathbf{A} \\otimes \\mathbf{B} \\right)\\left( \\mathbf{C} \\otimes \\mathbf{D} \\right) = \\mathbf{A}\\mathbf{C} \\otimes \\mathbf{B}\\mathbf{D}\n", "\\end{align}\n", "$$\n", "\n", "위 식과 앞선 미분공식 \n", "\n", "$$\n", "\\begin{align}\n", "\\frac{\\partial \\, \\mathbf{a}^{\\text{T}} \\mathbf{X} \\mathbf{b}}{\\partial \\, \\mathbf{X}} \n", "= \\left( \\mathbf{I}_m \\otimes \\mathbf{a}^{\\text{T}}\\right) \\frac{\\partial \\, \\mathbf{X}}{\\partial \\, \\mathbf{X}} \\left( \\mathbf{I}_n \\otimes \\mathbf{b} \\right) = \\mathbf{a} \\mathbf{b}^{\\text{T}}\n", "\\end{align}\n", "$$\n", "\n", "을 보일 때의 과정을 비교하면 최종적으로 다음을 보일 수 있다.\n", "\n", "\n", "$$\n", "\\frac{\\partial \\, \\mathbf{a}^{\\text{T}} \\mathbf{X}^{-1} \\mathbf{b}}{\\partial \\, \\mathbf{X}} = - \\mathbf{X}^{-\\text{T}} \\mathbf{a} \\mathbf{b}^{\\text{T}} \\mathbf{X}^{-\\text{T}} \n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 참고문헌\n", "\n", "1. COURSE NOTES: STATISTICS 550 ADVANCED MATHEMATICAL STATISTICS SPRING 2008, Robert J. Boik, Department of Mathematical Sciences\n", "Montana State University, 2012\n", "\n", "2. Old and New Matrix Algebra Useful for Statistics, Thomas P., Minka (December 28, 2000), MIT Media Lab note (1997; revised 12/00). Retrieved 5 February 2016.\n", "\n", "3. Linear Algebra & Matrix Calculus:https://www.slideshare.net/ssuser7e10e4/matrix-calculus, 임성빈\n", "\n", "3. The Matrix Cookbook, Kaare Brandt Petersen & Michael Syskind Pedersen, 2012\n", "\n", "5. Advanced Engineering Mathematics 7.7 & 7.8, Erwin Kreyszig, Wiley\n", "\n", "6. Adjugate_matrix:https://en.wikipedia.org/wiki/Adjugate_matrix\n", "\n", "7. Matrix_calculus:https://en.wikipedia.org/wiki/Matrix_calculus\n", "\n", "8. Differential_(infinitesimal):https://en.wikipedia.org/wiki/Differential_(infinitesimal) 주의:(infinitesimal)까지 모두 주소 '(' 앞에 _ 있음\n", "\n", "9. Derivative:https://en.wikipedia.org/wiki/Derivative\n", "\n", "10. Jacobi's_formula:https://en.wikipedia.org/wiki/Jacobi%27s_formula\n", "\n", "11. Kronecker product:https://en.wikipedia.org/wiki/Kronecker_product" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "\n", "\n", "\n", "\n", "\n", "" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }