{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Perspectives on matrix multiplication\n", "\n", "\n", "Lots of you seem to have learned how to multiply matrices ([matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication)) in high school. \n", "We compute the product $C=AB$ of an $m \\times n$ matrix $A$ with an $n \\times p$ matrix $B$ to produce an $m \\times p$ matrix $C$.\n", "\n", "Did you ever wonder why \"matmul\" has such a fancy definition?\n", "\n", "When we add matrices we add elements. Why coudn't matmul be just as easy?\n", "\n", "## Compare Elementwise Multiply\n", "\n", "Of course the elementwise multiply is doable but never seems to be quite as important:\n", "\n", "(I'll bet your high school teacher never mentioned elementwise multiply!)\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "A .* B = [7 4; 9 8]\n", "A * B = [13 6; 33 14]\n" ] } ], "source": [ "A=[1 2\n", " 3 4]\n", "B=[7 2\n", " 3 2]\n", "@show(A .* B) # Elementwise times is the \"dot star\"\n", "@show(A * B); # Matmul is just the \"star\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For square n x n matrices, elementwise multiply requires $n^2$ operations, while matmul requires about $2n^3$. (Think $n^2$ dot products, each requiring $n$ mults and almost $n$ adds.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Raising the Abstraction\n", "\n", "Why is matmul defined this way? We will find out later in the course when we begin to understand that a matrix represents a linear transformation, and matmul is the natural representation of the composition of transformations. It is only then you can understand the true nature of matrix multiplication. (Bet your high school teacher never told you that!)\n", "\n", "One of our goals in 18.06 is to sometimes stop thinking of matrices as arrays of numbers, and more as holistic objects.\n", "\n", "Abstractly, the rules for matrix multiplication are determined once you define how to multiply matrices by vectors $Ax$, the central [linear operation](https://en.wikipedia.org/wiki/Linear_map) of 18.06, by requiring that multiplication be [associative](https://en.wikipedia.org/wiki/Associative_property). That is, we require:\n", "$$\n", "A(Bx)=(AB)x\n", "$$\n", "for all matrices $A$ and $B$ and all vectors $x$. The expression $A(Bx)$ involves only matrix × vector (computing $y=Bx$ then $Ay$), and requiring this to equal $(AB)x$ actually uniquely defines the matrix–matrix product $AB$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Perspective 1 (high school!): rows × columns\n", "\n", "The most familar definition is that you take **dot products of rows of A with columns of B** to get the product $C$. For example:\n", "$$\n", "\\begin{pmatrix}\n", " -14 & 5 & 10 \\\\\n", " \\color{red}{-5} & -20 & 10 \\\\\n", " -6 & 10 & 6\n", "\\end{pmatrix} =\n", "\\begin{pmatrix}\n", " 2 & -1 & 5 \\\\\n", " \\color{red}{3} & \\color{red}{4} & \\color{red}{4} \\\\\n", " -4 & -2 & 0\n", "\\end{pmatrix}\n", "\\begin{pmatrix}\n", "\\color{red}{1} & 0 & -2 \\\\\n", " \\color{red}{1} & -5 & 1 \\\\\n", " \\color{red}{-3} & 0 & 3\n", "\\end{pmatrix}\n", "$$\n", "where we have highlighted the entry $\\color{red}{-5 = 3 \\times 1 + 4 \\times 1 + 4 \\times -3}$ (second row of $A$ ⋅ first column of $B$).\n", "\n", "This can be written out as the formula\n", "$$\n", "c_{ij} = \\sum_{k=1}^n a_{ik} b_{kj}\n", "$$\n", "in terms of the entries of the matrices, e.g. $c_{ij}$ is the entry in row $i$, column $j$ of $C$, assuming $A$ has $n$ columns and $B$ has $n$ rows.\n", "\n", "Essentially all matrix multiplications in practice are done with a version of this formula — at least, with the same operations, but often the *order* in which you multiply/add individual numbers is re-arranged.\n", "\n", "**In this notebook, we will explore several ways to *think* about these operations by re-arranging their order.**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Int64,2}:\n", " -14 5 10\n", " -5 -20 10\n", " -6 10 6" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A = [ 2 -1 5\n", " 3 4 4\n", " -4 -2 0]\n", "B = [ 1 0 -2\n", " 1 -5 1\n", " -3 0 3]\n", "C = A * B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because matrix multiplication is generally [not commutative](https://en.wikipedia.org/wiki/Commutative_property), $AB$ and $BA$ give *different* matrices:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Int64,2}:\n", " 10 3 5\n", " -17 -23 -15\n", " -18 -3 -15" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B*A" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Int64,2}:\n", " -24 2 5\n", " 12 3 25\n", " 12 13 21" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A*B - B*A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Though sometimes it can happen to be commutative." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Float64,2}:\n", " -138.0 -89.0 -147.0\n", " -101.0 -24.0 92.0\n", " 44.0 46.0 -132.0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A*(A^2 + 2*A + inv(A)*10) " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Float64,2}:\n", " -138.0 -89.0 -147.0\n", " -101.0 -24.0 92.0\n", " 44.0 46.0 -132.0" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(A^2 + 2*A + inv(A)*10) * A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want, we can compute the individual dot products in Julia too. For example, let's compute $c_{2,1} = -5$ (the 2nd row and first column of $C$, or `C[2,1]` in Julia) by taking the dot product of the second row of $A$ with the first column of $B$.\n", "\n", "To extract rows and columns of a matrix, Julia supports a syntax for \"array slicing\" pioneered by APL. The second row of $A$ is `A[2,:]`, and the first column of `B` is `B[:,1]`:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Int64,2}:\n", " 2 -1 5\n", " 3 4 4\n", " -4 -2 0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[2,3] # 2nd row, 3rd col" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Int64,1}:\n", " 3\n", " 4\n", " 4" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[2,:] # 2nd row of A" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Int64,2}:\n", " 1 0 -2\n", " 1 -5 1\n", " -3 0 3" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Int64,1}:\n", " 1\n", " 1\n", " -3" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B[:,1] # 1st column of B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can compute $c_{2,1}$ by their dot product via the `dot` function:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-5" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dot(A[2,:], B[:,1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also write the dot product with the $x \\dot y$ notation:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-5" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[2,:] ⋅ B[:,1] # type \\cdot + tab" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "α = A[2,3] # \\alpha + tab" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This matches $c_{2,1}$ from above, or `C[2,1]` in Julia:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-5" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "C[2,1]" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "-5" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A[2,:]' * B[:,1] # yet another way to get a dot product" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The summation $$c_{ij} = \\sum_{k=1}^n a_{ik} b_{kj}$$ directly in a triple loop code" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "matmul_ijk (generic function with 1 method)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function matmul_ijk(A,B)\n", " m,n = size(A)\n", " n2,p = size(B)\n", " if n≠n2 error(\"No good, n=$n ≠ n2=$(n2)\") end\n", " \n", " C = fill(0,m,p) # m x p \"zeros\" matrix\n", " \n", " for i=1:m, j=1:p, k=1:n\n", " C[i,j] += A[i,k]*B[k,j] # shorthand for C[i,j] = C[i,j] + A[i,k]*B[k,j] \n", " end\n", " \n", " return C \n", "end" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Int64,2}:\n", " -14 5 10\n", " -5 -20 10\n", " -6 10 6" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "matmul_ijk(A,B)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Perspective 2: matrix × columns\n", "\n", "$AB$ can be viewed as multiplying $A$ on the *left* by each *column* of $B$.\n", "\n", "For example, let's multiply $A$ by the first column of $B$:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3×3 Array{Int64,2}:\n", " -14 5 10\n", " -5 -20 10\n", " -6 10 6" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A*B" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Int64,1}:\n", " 1\n", " 1\n", " -3" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B[:,1]" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Int64,1}:\n", " -14\n", " -5\n", " -6" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "A * B[:,1]" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ " \n" ], "text/plain": [ "HTML{String}(\" \\n\")" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "HTML{String}(\"\")" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "HTML{String}(\"\")" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " \n" ], "text/plain": [ "HTML{String}(\" \\n\")" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "using Interact" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "