{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Writing An Parser In Nim\n", "\n", "- 2019/2/6" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Table of Contents\n", "\n", "1. Overview\n", "2. What is Parser?\n", "4. Demo\n", "3. What is Pratt Parser?\n", "5. Where it was difficult to make this Nim's parser ?\n", "6. Introduction to Generator\n", "7. Impressions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Overview" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- 「Goで作るインタプリタ」 \n", " - Chapter 2\n", "- Continuation of Lexer\n", "![](https://www.oreilly.co.jp/books/images/picture_large978-4-87311-822-2.jpeg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What is Parser?\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- Make AST from Token\n", " - Abstract Syntax Tree of program\n", " - Leave out whitespace and parentheses and end of line semicolon" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Example\n", "\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/inputLetStatementCode.png?raw=true)\n", "\n", "- `let = ;`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## \"expression\" and \"statement\" and ..\n", "\n", "### Expression\n", "\n", "- generate a value\n", "- ex.\n", " - `42`\n", " - `hoge`\n", " \n", " \n", "### Statement\n", "\n", "- do not generate a value\n", "- ex.\n", " - `let hoge = 42`\n", " - `return 42`\n", " - `if..else`\n", " - `for..`\n", " \n", "### Expression Statement\n", "\n", "- statement consisting of only one expression\n", "- ex.\n", " - `x + 42`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Token\n", "\n", "- output of lexer\n", "\n", "```\n", "[Type = \"LET\", Literal = \"let\"]\n", "[Type = \"IDENT\", Literal = \"hoge\"]\n", "[Type = \"=\", Literal = \"=\"]\n", "[Type = \"INT\", Literal = \"1\"]\n", "[Type = \"+\", Literal = \"+\"]\n", "[Type = \"INT\", Literal = \"2\"]\n", "[Type = \"*\", Literal = \"*\"]\n", "[Type = \"INT\", Literal = \"3\"]\n", "[Type = \"/\", Literal = \"/\"]\n", "[Type = \"INT\", Literal = \"4\"]\n", "[Type = \"+\", Literal = \"+\"]\n", "[Type = \"INT\", Literal = \"5\"]\n", "[Type = \"-\", Literal = \"-\"]\n", "[Type = \"INT\", Literal = \"6\"]\n", "[Type = \";\", Literal = \";\"]\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## AST Object\n", "\n", "- output of Parser\n", "\n", "\n", "```\n", "Token: {\n", " Type: 'LET',\n", " Literal: 'let'\n", "},\n", "Let: {\n", " Token: {\n", " Type: 'IDENT',\n", " Literal: 'hoge'\n", " },\n", " IdentValue: 'hoge'\n", "},\n", "LetValue: {\n", " Left: {\n", " Left: {\n", " Token: {\n", " Type: 'INT',\n", " Literal: '1'\n", " },\n", " IntValue: 1\n", " },\n", " Operator: '+',\n", " Right: {\n", " Left: {\n", " Left: {\n", " Token: {\n", " Type: 'INT',\n", " Literal: '2'\n", " },\n", " IntValue: 2\n", " },\n", " Operator: '*',\n", " Right: {\n", " Token: {\n", " Type: 'INT',\n", " Literal: '3'\n", " },\n", " IntValue: 3\n", " }\n", " },\n", " Operator: '/',\n", " Right: {\n", " Token: {\n", " Type: 'INT',\n", " Literal: '4'\n", " },\n", " IntValue: 4\n", " }\n", " }\n", " },\n", " Operator: '+',\n", " InRight: {\n", " Left: {\n", " Token: {\n", " Type: 'INT',\n", " Literal: '5'\n", " },\n", " IntValue: 5\n", " },\n", " Operator: '*',\n", " Right: {\n", " Token: {\n", " Type: 'INT',\n", " Literal: '6'\n", " },\n", " IntValue: 6\n", " }\n", " }\n", "}\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## AST\n", "\n", "- AST image\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/ast.png?raw=true)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Demo\n", "\n", "`$ nim c -r src/repl/repl.nim`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# What is Pratt Parser ?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Pratt Parser\n", "\n", "- Vaughan Pratt, 1973\n", "- he said in this thesis about this parser\n", " - very simple to understand\n", " - trivial to implement\n", " - easy to use\n", " - extremely efficient in practice if not in theory\n", "- use when make parser without parser generator\n", "\n", "\n", "paper: [link](https://tdop.github.io/) \n", "explanation with js: [link](http://crockford.com/javascript/tdop/tdop.html)←he made JSLint" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Types of parsers\n", "\n", "- Top-down parsing\n", " - recurcive-descent parser ← this!\n", " - LL parser\n", " - packrat parser\n", "- Bottom-up parsing\n", " - Earley parser\n", " - CYK parser\n", " - LR parser\n", " - Simple LR parser\n", " - LALR parser\n", " - CLR parser\n", " - GLR parser\n", "\n", "\n", "\n", "ref: [Parsing - Wikipedia](https://en.wikipedia.org/wiki/Parsing)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## How It Works " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example\n", "\n", "- input: `2 * 3 / 4`\n", "- output: AST\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### AST\n", "\n", "- a part of the AST just before\n", "- the red part of the figure below\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/ast2.png?raw=true)\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### output object\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/struct.png?raw=true)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### make \"2 * 3\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 1. Whether \"Statement\" or \"Expression Statement\"\n", "\n", "- determine the type of expression\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/1.png?raw=true)\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseStatementCode-2.png?raw=true)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 2. save \"2\" in the object\n", "\n", "\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/2.png?raw=true)\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseExpressionCode-INT.png?raw=true)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 3. Enter the while loop\n", "\n", "- Enter \"while\"\n", " - the conditional formulas are described later\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/3.png?raw=true)\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseExpressionCode-ASTERISC.png?raw=true)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 4. save \"*\" in the object\n", "\n", "- Call parseExpression recursively!\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/4.png?raw=true)\n", "\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseInfixExpression-2.png?raw=true)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 5. parseExpression again\n", "\n", "- not enter \"while\"\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/5.png?raw=true)\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseExpressionCode-INT-2.png?raw=true)\n", "|" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 6. return to the first parseExpression\n", "\n", "- Since the value of parseInfixExpression is returned, it returns to the top of while\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/6.png?raw=true)\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseExpressionCode-while.png?raw=true)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Determine precedence of Tokens\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- priority gets larger as going down\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/PrecedenceCode-2.png?raw=true)\n", "\n", "\n", "```\n", "Sum < Product # true\n", "```\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### while\n", "\n", "- conditions\n", " - `precedence < self.peekPrecedence()`\n", " - `not self.peekTokenIs(SEMICOLON)`\n", "\n", "- ex.\n", " - \"1 + 2 * 3\"\n", " - Sum < Product\n", " - → enter while loop..\n", "\n", "\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/tree.png?raw=true)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### make \"/ 4\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 7. in while\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseExpressionCode-SLASH.png?raw=true)\n", "\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseInfixExpression-Operator.png?raw=true)\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/7.png?raw=true)\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 8. parseInfixExpression again again\n", "\n", "- save \"4\" to object\n", "- don't enter the while loop\n", "\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseInfixExpression-right.png?raw=true)\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/8.png?raw=true)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### 9. completion\n", "\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/9.png?raw=true)\n", "\n", "![](https://github.com/mrsekut/slides/blob/master/img/WritingParserInNim/parseInfixExpression-last.png?raw=true)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Where it was difficult to make this Nim's parser ?\n", "- don't know about Nim\n", " - posted a question on the forum\n", "- comparison with Golang\n", " - don't have 'Interface'\n", " - reffered to the code of Rust\n", "- read the Nim's parser code\n", " - use enum well" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Impression\n", "\n", "- complexity\n", " - easy to write using a functional languages\n", " - too hard to make this slide\n", "- next, make a evaluator \n", " - the book is still harf\n", " " ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" }, "toc": { "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "toc_cell": false, "toc_position": {}, "toc_section_display": "block", "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }