{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "# NRPy+ SymPy LaTeX Interface (NRPyLaTeX)\n", "\n", "## Author: Ken Sible\n", "\n", "### Formatting Updates by Gabriel M Steward\n", "\n", "## The following notebook demonstrates the conversion of LaTeX to SymPy, including support for tensor operations and [Einstein notation](https://en.wikipedia.org/wiki/Einstein_notation)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Table of Contents\n", "$$\\label{toc}$$\n", "\n", "- [Step 1](#step_1): Lexical Analysis and Syntax Analysis\n", "- [Step 2](#step_2): Grammar Demonstration and Sandbox\n", "- [Step 3](#step_3): Tensor Support with Einstein Notation\n", " - [Example 1](#example_1): Tensor Contraction\n", " - [Example 2](#example_2): Index Raising\n", " - [Example 3](#example_3): Cross Product\n", " - [Example 4](#example_4): Covariant Derivative\n", " - [Example 5 (1)](#example_5_1): Schwarzschild Metric\n", " - [Example 5 (2)](#example_5_2): Kretschmann Scalar\n", " - [Example 6 (1)](#example_6_1): Extrinsic Curvature (ADM Formalism)\n", " - [Example 6 (2)](#example_6_2): Hamiltonian/Momentum Constraint\n", "- [Step 4](#step_4): Exception Handling and Index Checking\n", "- [Step 5](#step_5): Output Notebook to PDF\n", "\n", "Further Reading: [Parsing BSSN (Cartesian) Notebook](Tutorial-LaTeX_Interface_Example-BSSN_Cartesian.ipynb)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Step 1: Lexical Analysis and Syntax Analysis [ [^](#top) ]\n", "\n", "In the following section, we discuss [lexical analysis](https://en.wikipedia.org/wiki/Lexical_analysis) (lexing) and [syntax analysis](https://en.wikipedia.org/wiki/Parsing) (parsing). In lexical analysis, a lexical analyzer (or scanner) can tokenize a character string, called a sentence, using substring pattern matching. In syntax analysis, a syntax analyzer (or parser) can construct a parse tree, containing all syntactic information of the language (specified by a [formal grammar](https://en.wikipedia.org/wiki/Formal_grammar)), after receiving a token iterator from the lexical analyzer.\n", "\n", "For LaTeX to SymPy conversion, we implemented a [recursive descent parser](https://en.wikipedia.org/wiki/Recursive_descent_parser) that can construct a parse tree in [preorder traversal](https://en.wikipedia.org/wiki/Tree_traversal#Pre-order_(NLR)), starting from the root [nonterminal](https://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols), using a [right recursive](https://en.wikipedia.org/wiki/Left_recursion) grammar (partially shown below in the canonical (extended) [BNF](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form) notation).\n", "\n", "```\n", " -> { ( '+' | '-' ) }*\n", " -> { [ '/' ] }*\n", " -> { '^' }*\n", " -> [ '-' ] ( | )\n", " -> | '{' '}' | '{' '{' '}' '}'\n", " -> | | | \n", " -> '(' ')' | '[' ']' | '\\' '{' '\\' '}'\n", " -> | | | | \n", " ⋮ ⋮\n", "```\n", "\n", "**Source**: Robert W. Sebesta. Concepts of Programming Languages. Pearson Education Limited, 2016." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nrpylatex==1.2.3\r\n" ] } ], "source": [ "import sympy as sp\n", "!pip install nrpylatex~=1.2 > /dev/null\n", "!pip freeze | grep nrpylatex\n", "from nrpylatex import *" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:14:13.875016Z", "iopub.status.busy": "2021-03-07T17:14:13.873993Z", "iopub.status.idle": "2021-03-07T17:14:13.878036Z", "shell.execute_reply": "2021-03-07T17:14:13.878534Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LPAREN, INTEGER, PLUS, LETTER, DIVIDE, LETTER, RPAREN, CARET, LETTER\n" ] } ], "source": [ "scanner = Scanner(); scanner.initialize(r'(1 + x/n)^n')\n", "print(', '.join(token for token in scanner.tokenize()))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:14:13.885902Z", "iopub.status.busy": "2021-03-07T17:14:13.885200Z", "iopub.status.idle": "2021-03-07T17:14:13.954505Z", "shell.execute_reply": "2021-03-07T17:14:13.953822Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1 + x/n)**n \n", " >> Pow(Add(Integer(1), Mul(Pow(Symbol('n', real=True), Integer(-1)), Symbol('x', real=True))), Symbol('n', real=True))\n" ] } ], "source": [ "expr = parse_latex(r'(1 + x/n)^n')\n", "print(expr, '\\n >>', sp.srepr(expr))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`Grammar Derivation: (1 + x/n)^n`\n", "```\n", " -> \n", " -> \n", " -> ^\n", " -> ^\n", " -> ()^\n", " -> ( + )^\n", " -> ( + )^\n", " -> ( + )^\n", " -> ( + )^\n", " -> ( + )^\n", " -> ( + )^\n", " -> (1 + )^\n", " -> (1 + / )^\n", " -> ...\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Step 2: Grammar Demonstration and Sandbox [ [^](#top) ]\n", "\n", "In the following section, we demonstrate the process for extending the parsing module to include a (previously) unsupported LaTeX command.\n", "\n", "1. Update the `grammar` dictionary in the `Scanner` class with the mapping `regex` $\\mapsto$ `token`.\n", "1. Write a grammar abstraction in BNF notation (similar to a regular expression) for the command.\n", "1. Implement a private method for the nonterminal (command name) to parse the grammar abstraction.\n", "\n", "``` -> [ '[' ']' ] '{' '}'```\n", "```\n", "def _sqrt(self):\n", " self.expect('SQRT_CMD')\n", " if self.accept('LBRACK'):\n", " integer = self.scanner.lexeme\n", " self.expect('INTEGER')\n", " root = Rational(1, integer)\n", " self.expect('RBRACK')\n", " else: root = Rational(1, 2)\n", " self.expect('LBRACE')\n", " expr = self._expression()\n", " self.expect('RBRACE')\n", " if root == Rational(1, 2):\n", " return sqrt(expr)\n", " return Pow(expr, root)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to expression parsing, we included support for equation parsing, which can produce a dictionary mapping `LHS` $\\mapsto$ `RHS`, where `LHS` must be a symbol, and insert that mapping into the global namespace of the previous stack frame, as demonstrated below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ \\mathit{s_n} = \\left(1 + \\frac{1}{n}\\right)^n $$" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:14:13.963874Z", "iopub.status.busy": "2021-03-07T17:14:13.963086Z", "iopub.status.idle": "2021-03-07T17:14:13.966039Z", "shell.execute_reply": "2021-03-07T17:14:13.966529Z" } }, "outputs": [ { "data": { "text/plain": [ "('s_n',)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parse_latex(r'\\text{s_n} = \\left(1 + \\frac{1}{n}\\right)^n')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "s_n = (1 + 1/n)**n\n" ] } ], "source": [ "print('s_n =', s_n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Furthermore, we implemented robust error messaging using the custom `ParseError` exception, which should handle every conceivable case to identify, as detailed as possible, invalid syntax inside of a LaTeX sentence. The following are some runnable examples of possible error messages." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:14:13.978893Z", "iopub.status.busy": "2021-03-07T17:14:13.977817Z", "iopub.status.idle": "2021-03-07T17:14:13.981334Z", "shell.execute_reply": "2021-03-07T17:14:13.981835Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ScanError: 5x^{{4$}}\n", " ^\n", "unexpected '$' at position 6\n" ] } ], "source": [ "try: parse_latex(r'5x^{{4$}}')\n", "except ScanError as e:\n", " print(type(e).__name__ + ': ' + str(e))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:14:13.986633Z", "iopub.status.busy": "2021-03-07T17:14:13.985867Z", "iopub.status.idle": "2021-03-07T17:14:13.988752Z", "shell.execute_reply": "2021-03-07T17:14:13.989250Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ParseError: \\sqrt[0.1]{5x^{{4}}}\n", " ^\n", "expected token INTEGER at position 6\n" ] } ], "source": [ "try: parse_latex(r'\\sqrt[0.1]{5x^{{4}}}')\n", "except ParseError as e:\n", " print(type(e).__name__ + ': ' + str(e))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2021-03-07T17:14:13.993768Z", "iopub.status.busy": "2021-03-07T17:14:13.993143Z", "iopub.status.idle": "2021-03-07T17:14:13.995947Z", "shell.execute_reply": "2021-03-07T17:14:13.996531Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "ParseError: \\int_0^5 5x^{{4}}dx\n", " ^\n", "unsupported command '\\int' at position 0\n" ] } ], "source": [ "try: parse_latex(r'\\int_0^5 5x^{{4}}dx')\n", "except ParseError as e:\n", " print(type(e).__name__ + ': ' + str(e))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the sandbox code cell below, you can experiment with converting LaTeX to SymPy using the wrapper function `parse(sentence)`, where `sentence` must be a Python [raw string](https://docs.python.org/3/reference/lexical_analysis.html) to interpret a backslash as a literal character rather than an [escape sequence](https://en.wikipedia.org/wiki/Escape_sequence). You could, alternatively, use the supported cell magic `%%parse_latex` to automatically escape every backslash and parse the cell (more convenient than `parse(sentence)` in a notebook format)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Write Sandbox Code Here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Step 3: Tensor Support with Einstein Notation [ [^](#top) ]\n", "\n", "In the following section, we demonstrate parsing tensor notation using the Einstein summation convention. In each example, every tensor should appear either on the LHS of an equation or on the RHS of a `vardef` macro before appearing on the RHS of an equation. Furthermore, an exception will be raised upon violation of the Einstein summation convention, i.e. the occurrence of an invalid free or bound index.\n", "\n", "**Configuration Grammar**\n", "\n", "```\n", " -> | | | | | \n", " -> { ',' }* '\\\\'\n", " -> [ '-' ] { ',' }*\n", " -> { '-' (