{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# The Meataxe64 Package, Core Functionality" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "LoadPackage(\"meataxe64\");; Read(\"../gap/bench.g\"); \n", "Read(\"../gap/mtx64utils.g\"); LoadPackage(\"jupyterviz\");;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Monster Matrix Multiplication\n", "\n", "The Classification of Finite Simple Groups was one of the highlights of 20th Century mathematics. \n", "\n", "It included the discovery of the sporadic simple Fischer-Griess Monster $\\mathbb{M}$ of order \n", "
$808017424794512875886459904961710757005754368000000000$
.\n", "\n", "$\\mathbb{M}$ has many intriguing links to other areas of mathematics and even to theoretical physics, so we'd like to compute in it. \n", "\n", "**But**, the *most* tractable representation of this group is as $196882\\times 196882$ matrices of integers mod 2 -- Each matrix is 5GB.\n", "\n", "For many years this was clearly beyond any practical computation. The first actual multiplication of two general elements was done in 1998, using most of the computing resources of a large maths department for **45 hours**.\n", "\n", "Before OpenDreamKit this computation would still have taken days. We have contributed to the development of a massively improved C and assembler library and a GAP interface for it.\n", "\n", "A Monster matrix multiplication now takes about **8 minutes** on a laptop, scaling to 2-3 minutes on a multicore server. \n", "\n", "We start with matrices half that size for speed." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "< matrix 98441x98441 : >" ] }, "execution_count": 9, "metadata": { "text/plain": "" }, "output_type": "execute_result" } ], "source": [ "size := 196882/2;; f := MTX64_FiniteField(2);;\n", "m := MTX64_RandomMat(f,size,size);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can multiply this by itself in about 1 minute:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "wall time: 53.7s cpu time: 505s memory allocated: 1.12GB result returned\n" ] } ], "source": [ "ShowBench(ParMult,m,m);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll start the full sized computation and come back to it after the rest of Clement's talk" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "wall time: 469s cpu time: 4800s memory allocated: 4.51GB result returned\n" ] } ], "source": [ "m := MTX64_RandomMat(f,2*size,2*size);; ShowBench(ParMult, m, m);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is not just for the Monster. High performance linear algebra over these small fields are an essential component of a VRE for users in many areas. As well as multiplication we need\n", "\n", "## Gaussian Elimination\n", "\n", "Our other key primitive operation. Much more difficult to parallelize in general.\n", "\n", "Our basic Gaussian elimnination operation applied to a matrix $A$, computes $M$, $K$, $R$, $\\gamma$ and $\\rho$ satisfying: \n", "\n", "$$\\pmatrix{M&0\\cr K & 1} \\rho A \\gamma = \\pmatrix{-1&R\\cr0&0}$$ \n", "\n", "where $\\gamma$ and $\\rho$ are permutations that effectively select the pivot columns and pivot rows of $A$. This is effectively full reduced row echelon form, with transforming matrices.\n", "\n", "Using this, we can compute inverses, solve systems of equations, determine nullspaces, etc. efficiently.\n", "\n", "To see what it does properly we need a singular matrix. We take the Kronecker (tensor) product of two rectangular matrices. (If $A$ is $n\\times m$ and $B$ is $m\\times n$ with $m < n$ then $A\\otimes B$ will be $nm\\times nm$ of rank at most $m^2$.)\n", "\n", "We'll try a different small finite field." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "< matrix 19800x19800 : >" ] }, "execution_count": 42, "metadata": { "text/plain": "" }, "output_type": "execute_result" } ], "source": [ "f := MTX64_FiniteField(9);;\n", "m1 := MTX64_RandomMat(f, 200,99);;\n", "m2 := MTX64_RandomMat(f, 99,200);;\n", "m := MTX64_KroneckerProduct(m1,m2);" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "wall time: 11.8s cpu time: 11.0s memory allocated: 747.81MB no result returned\n" ] }, { "data": { "text/plain": [ "< matrix 9801x9801 : >" ] }, "execution_count": 45, "metadata": { "text/plain": "" }, "output_type": "execute_result" }, { "data": { "text/plain": [ "< matrix 9999x9801 : >" ] }, "execution_count": 46, "metadata": { "text/plain": "" }, "output_type": "execute_result" }, { "data": { "text/plain": [ "< matrix 9801x9999 : >" ] }, "execution_count": 47, "metadata": { "text/plain": "" }, "output_type": "execute_result" } ], "source": [ "ech := fail;; # suppress a warning.\n", "ShowBench(function() ech := MTX64_Echelize(m);end); \n", "ech.multiplier; ech.cleaner; ech.remnant; # M, K and R in the above formula" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also use the multi-threaded version of the Gaussian elimination (although this problenm is a little small)." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "wall time: 4.37s cpu time: 44.1s memory allocated: 144B result returned\n" ] }, { "data": { "text/plain": [ "true" ] }, "execution_count": 34, "metadata": { "text/plain": "" }, "output_type": "execute_result" } ], "source": [ "MTX64_WriteMatrix(m, \"a\"); \n", "ShowBench(MTX64_fEchelize, \".\", \"a\", \"gamma\", \"rho\", \"m\", \"k\", \"r\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run-time of Multiplication versus matrix size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We set the field and maximum dimension and make a set of random matrices of different sizes" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[ < matrix 1200x1200 : >, < matrix 2400x2400 : >, < matrix 3600x3600 : >, < matrix 4800x4800 : >, < matrix 6000x6000 : >, < matrix 7200x7200 : >, < matrix 8400x8400 : >, < matrix 9600x9600 : >, < matrix 10800x10800 : >, < matrix 12000x12000 : > ]" ] }, "execution_count": 18, "metadata": { "text/plain": "" }, "output_type": "execute_result" } ], "source": [ "q := 5;; maxdim := 12000;; steps := 10;;\n", "sizes := List([1..steps], i-> i*QuoInt(maxdim, steps));;\n", "mats := List(sizes, i-> MTX64_RandomMat(MTX64_FiniteField(q), i, i));" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And look at the timing for squaring them:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "( function ( element ) {\n", "if(!window.hasOwnProperty(\"runGAP\")){window.runGAP=function(code,callback){if(!Jupyter||!Jupyter.notebook||!Jupyter.notebook.kernel)return callback(null,\"JavaScript-based output must be re-evaluated when the notebook\"+\" is reloaded. (Re-run the cell to clear this problem.)\");var errorMessages=[];var timeout=null;Jupyter.notebook.kernel.execute(code,{iopub:{output:function(message){if(\"data\"in message.content)return callback(message.content.data,null);if(!(\"text\"in message.content))return callback(null,message.content);errorMessages.push(message.content.text);if(timeout!==null)clearTimeout(timeout);timeout=setTimeout(function(){callback(null,errorMessages.join(\"\\n\"))},200)},clear_output:function(message){}},shell:{reply:function(message){}},input:function(message){}},{})}}if(!window.hasOwnProperty(\"librariesLoadedFromGAP\")){window.librariesLoadedFromGAP={}}function myCallback(){if(!window.hasOwnProperty(\"librariesLoadedFromGAP\")){window.librariesLoadedFromGAP={}}function myCallback(){window.createVisualization(element,{\"data\" : {\"data\" : [{\"x\" : [1200,2400,3600,4800,6000,7200,8400,9600,10800,12000],\"y\" : [19,106,293,677,1211,2157,3434,4766,6676,8715],\"name\" : \"single-threaded\",\"type\" : \"line\"},{\"x\" : [1200,2400,3600,4800,6000,7200,8400,9600,10800,12000],\"y\" : [16,34,89,297,356,609,974,1527,1896,2789],\"name\" : \"multi-threaded wall time\",\"type\" : \"line\"}],\"layout\" : {\"height\" : 400,\"title\" : \"Meataxe64 runtimes for matrix multiply\",\"xaxis\" : {\"title\" : \"Dimension\"},\"yaxis\" : {\"title\" : \"ms\"}}},\"tool\" : \"plotly\"}\n", ",function(element,visualization){\n", "});\n", "}if(window.librariesLoadedFromGAP.hasOwnProperty(\"viz-tool-plotly\"\n", ")){myCallback()}else{var filenameString=JSON.stringify(\"viz-tool-plotly\"\n", ");var GAPcode=\"JUPVIZFileContents( LoadJavaScriptFile( \"+filenameString+\" ) );\";window.runGAP(GAPcode,function(result,error){if(error)throw Error(\"When loading library \"+filenameString+\": \"+error);result=result[\"text/plain\"];window.librariesLoadedFromGAP[\"viz-tool-plotly\"\n", "]=result;try{var whatItEvaluatesTo=eval(result)}catch(e){throw Error(\"Error evaluating code for library \"+filenameString+\": \"+e)}return myCallback()})}\n", "}if(window.librariesLoadedFromGAP.hasOwnProperty(\"main\"\n", ")){myCallback()}else{var filenameString=JSON.stringify(\"main\"\n", ");var GAPcode=\"JUPVIZFileContents( LoadJavaScriptFile( \"+filenameString+\" ) );\";window.runGAP(GAPcode,function(result,error){if(error)throw Error(\"When loading library \"+filenameString+\": \"+error);result=result[\"text/plain\"];window.librariesLoadedFromGAP[\"main\"\n", "]=result;try{var whatItEvaluatesTo=eval(result)}catch(e){throw Error(\"Error evaluating code for library \"+filenameString+\": \"+e)}return myCallback()})}\n", ";\n", "} )( element.get( 0 ) )" ] }, "execution_count": 21, "metadata": { "application/javascript": "" }, "output_type": "execute_result" } ], "source": [ "marks1 := List(mats, x-> BenchMark(\\*,x,x));;\n", "marksm := List(mats, x-> BenchMark(ParMult,x,x));;\n", "Plot(\n", "[sizes,List(marks1, x-> x.cpu), rec(name := \"single-threaded\",\n", "title := \"Meataxe64 runtimes for matrix multiply\", xaxis := \"Dimension\", yaxis := \"ms\")],\n", "[sizes,List(marksm, x-> QuoInt(x.wall,10^6)), rec(name := \"multi-threaded wall time\")]\n", ");\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "GAP 4 Native", "language": "gap", "name": "gap-4" }, "language_info": { "codemirror_mode": "gap", "file_extension": ".g", "mimetype": "text/x-gap", "name": "GAP 4", "nbconvert_exporter": "", "pygments_lexer": "gap", "version": "4.dev" } }, "nbformat": 4, "nbformat_minor": 2 }