{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Week 5: Context-free languages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from tock import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Monday reading\n", "\n", "Read Section 2.1, but skip \"Chomsky Normal Form.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Tuesday class\n", "\n", "We're beginning the second unit of the course, in which we look at _context-free grammars_ and _pushdown automata_. Recall that in the first class we briefly introduced Turing machines, and then introduced finite automata as being like Turing machines but restricted to have only a one-way, read-only tape. Pushdown automata are a little bit less restricted: they have a one-way, read-only tape, but they also have a stack. But the book starts with context-free grammars, probably because they're a little more familiar and/or easier to understand.\n", "\n", "## Context-free grammars\n", "\n", "Context-free grammars were invented in the late 1950s by Noam Chomsky as a way of describing the syntax of human languages (he called them *phrase-structure grammars*). Later, they were appropriate by the inventors of the programming language ALGOL-60 as a way to describe that language's syntax (which came to be called *Backus-Naur form*).\n", "\n", "There is some variation in terminology that the book doesn't mention. Most importantly, variables are also called *nonterminal symbols* or *nonterminals*. You'll hear me call them nonterminals almost exclusively -- I can't get myself to call them variables.\n", "\n", "We can start with a simpler example than the book's.\n", "\\begin{align*}\n", "S &\\rightarrow \\mathtt{a}~S~\\mathtt{b} \\\\\n", "S &\\rightarrow \\varepsilon\n", "\\end{align*}\n", "**Question.** What language does this CFG generate?\n", "\n", "And here's a version of Example 2.3 that is rewritten using parentheses:\n", "\\begin{align*}\n", "S &\\rightarrow \\mathtt{(}~S~\\mathtt{)} \\\\\n", "S &\\rightarrow S~S \\\\\n", "S &\\rightarrow \\varepsilon\n", "\\end{align*}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Real-world CFGs\n", "\n", "You can see a CFG for C in the [draft standard](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf), starting on page 458, or as a [YACC grammar](http://www.quut.com/c/ANSI-C-grammar-y.html).\n", "\n", "[Generalized Phrase Structure Grammar](https://en.wikipedia.org/wiki/Generalized_phrase_structure_grammar) was an attempt to write a CFG for English. The [Berkeley Parser](https://github.com/slavpetrov/berkeleyparser) builds a CFG for English partly using human input and partly automatically.\n", "\n", "[L-systems](https://en.wikipedia.org/wiki/L-system) are a kind of visual grammar; context-free L-systems can be used to draw surprisingly natural-looking images, especially of trees:\n", "\n", "![Dragon trees](https://upload.wikimedia.org/wikipedia/commons/7/74/Dragon_trees.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Wednesday reading\n", "\n", "Read Section 2.2, up to but not including \"Equivalence with Context-Free Grammars\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Thursday class\n", "\n", "## Pushdown automata\n", "\n", "Pushdown automata are equipped with, in addition to the input tape, a _pushdown store_, said to have been inspired by the tray dispenser in a university dining hall:\n", "\n", "![Tray dispenser](https://www.foodservicedirect.com/media/catalog/product/cache/f4ff5e8081d8e52f84a2ea6986aee80a/h/t/httpss3.amazonaws.comfoodservicedirect.comproductimageslargedndxidt2e1622l.jpg)\n", "\n", "Nowadays, a pushdown store is more commonly known as a stack, which you should be quite familiar with. Here's an example (2.14):" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "%3\n", "\n", "\n", "\n", "_START\n", "\n", "\n", "\n", "0\n", "\n", "q1\n", "\n", "\n", "\n", "_START->0\n", "\n", "\n", "\n", "\n", "\n", "1\n", "\n", "q2\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "ε,ε → $\n", "\n", "\n", "\n", "1->1\n", "\n", "\n", "0,ε → 0\n", "\n", "\n", "\n", "2\n", "\n", "q3\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "1,0 → ε\n", "\n", "\n", "\n", "2->2\n", "\n", "\n", "1,0 → ε\n", "\n", "\n", "\n", "3\n", "\n", "\n", "q4\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "ε,$ → ε\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "m1 = read_csv(\"pda-m1.csv\")\n", "m1" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "%3\n", "\n", "\n", "\n", "_START\n", "\n", "\n", "\n", "13\n", "\n", "q1,ε\n", "\n", "\n", "\n", "_START->13\n", "\n", "\n", "\n", "\n", "\n", "0\n", "\n", "q3,[0] 0 $\n", "\n", "\n", "\n", "1\n", "\n", "q3,[0] $\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "\n", "\n", "\n", "5\n", "\n", "q3,$\n", "\n", "\n", "\n", "1->5\n", "\n", "\n", "\n", "\n", "\n", "2\n", "\n", "q2,[0] 0 $\n", "\n", "\n", "\n", "3\n", "\n", "q2,[0] 0 0 …\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "\n", "\n", "\n", "3->0\n", "\n", "\n", "\n", "\n", "\n", "4\n", "\n", "\n", "q4,ε\n", "\n", "\n", "\n", "5->4\n", "\n", "\n", "\n", "\n", "\n", "6\n", "\n", "\n", "\n", "7\n", "\n", "\n", "\n", "6->7\n", "\n", "\n", "0\n", "\n", "\n", "\n", "8\n", "\n", "\n", "\n", "7->8\n", "\n", "\n", "0\n", "\n", "\n", "\n", "9\n", "\n", "\n", "\n", "8->9\n", "\n", "\n", "0\n", "\n", "\n", "\n", "10\n", "\n", "\n", "\n", "9->10\n", "\n", "\n", "1\n", "\n", "\n", "\n", "11\n", "\n", "\n", "\n", "10->11\n", "\n", "\n", "1\n", "\n", "\n", "\n", "12\n", "\n", "\n", "\n", "11->12\n", "\n", "\n", "1\n", "\n", "\n", "\n", "14\n", "\n", "q2,$\n", "\n", "\n", "\n", "13->14\n", "\n", "\n", "\n", "\n", "\n", "15\n", "\n", "q2,[0] $\n", "\n", "\n", "\n", "14->15\n", "\n", "\n", "\n", "\n", "\n", "15->2\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "run(m1, \"0 0 0 1 1 1\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The run graph shows both the state and the stack at each time step. The square brackets just indicate the top of the stack. Note that when the stack gets deeper, ellipses (…) are used to keep the node labels compact.\n", "\n", "This happens to be a deterministic PDA: at every time step, there's only one possibility for the next step." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More examples from the book:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "%3\n", "\n", "\n", "\n", "_START\n", "\n", "\n", "\n", "0\n", "\n", "q1\n", "\n", "\n", "\n", "_START->0\n", "\n", "\n", "\n", "\n", "\n", "1\n", "\n", "q2\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "ε,ε → $\n", "\n", "\n", "\n", "1->1\n", "\n", "\n", "a,ε → a\n", "\n", "\n", "\n", "5\n", "\n", "q3\n", "\n", "\n", "\n", "1->5\n", "\n", "\n", "ε,ε → ε\n", "\n", "\n", "\n", "6\n", "\n", "q5\n", "\n", "\n", "\n", "1->6\n", "\n", "\n", "ε,ε → ε\n", "\n", "\n", "\n", "2\n", "\n", "q6\n", "\n", "\n", "\n", "2->2\n", "\n", "\n", "c,a → ε\n", "\n", "\n", "\n", "4\n", "\n", "\n", "q7\n", "\n", "\n", "\n", "2->4\n", "\n", "\n", "ε,$ → ε\n", "\n", "\n", "\n", "3\n", "\n", "\n", "q4\n", "\n", "\n", "\n", "3->3\n", "\n", "\n", "c,ε → ε\n", "\n", "\n", "\n", "5->3\n", "\n", "\n", "ε,$ → ε\n", "\n", "\n", "\n", "5->5\n", "\n", "\n", "b,a → ε\n", "\n", "\n", "\n", "6->2\n", "\n", "\n", "ε,ε → ε\n", "\n", "\n", "\n", "6->6\n", "\n", "\n", "b,ε → ε\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "m2 = read_csv(\"pda-m2.csv\")\n", "m2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question.** Does the above PDA accept the strings $\\mathtt{aabb}$? $\\mathtt{aabc}$? $\\mathtt{aacc}$?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "%3\n", "\n", "\n", "\n", "_START\n", "\n", "\n", "\n", "2\n", "\n", "\n", "q1\n", "\n", "\n", "\n", "_START->2\n", "\n", "\n", "\n", "\n", "\n", "0\n", "\n", "q3\n", "\n", "\n", "\n", "0->0\n", "\n", "\n", "0,0 → ε\n", "1,1 → ε\n", "\n", "\n", "\n", "1\n", "\n", "\n", "q4\n", "\n", "\n", "\n", "0->1\n", "\n", "\n", "ε,$ → ε\n", "\n", "\n", "\n", "3\n", "\n", "q2\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "ε,ε → $\n", "\n", "\n", "\n", "3->0\n", "\n", "\n", "ε,ε → ε\n", "\n", "\n", "\n", "3->3\n", "\n", "\n", "0,ε → 0\n", "1,ε → 1\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "m3 = read_csv(\"pda-m3.csv\")\n", "m3" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "%3\n", "\n", "\n", "\n", "_START\n", "\n", "\n", "\n", "8\n", "\n", "q1,ε\n", "\n", "\n", "\n", "_START->8\n", "\n", "\n", "\n", "\n", "\n", "0\n", "\n", "q3,$\n", "\n", "\n", "\n", "13\n", "\n", "q4,ε\n", "\n", "\n", "\n", "0->13\n", "\n", "\n", "\n", "\n", "\n", "1\n", "\n", "\n", "\n", "2\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "0\n", "\n", "\n", "\n", "3\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "1\n", "\n", "\n", "\n", "4\n", "\n", "\n", "\n", "3->4\n", "\n", "\n", "0\n", "\n", "\n", "\n", "5\n", "\n", "\n", "\n", "4->5\n", "\n", "\n", "0\n", "\n", "\n", "\n", "6\n", "\n", "\n", "\n", "5->6\n", "\n", "\n", "1\n", "\n", "\n", "\n", "7\n", "\n", "\n", "\n", "6->7\n", "\n", "\n", "0\n", "\n", "\n", "\n", "9\n", "\n", "q2,$\n", "\n", "\n", "\n", "8->9\n", "\n", "\n", "\n", "\n", "\n", "9->0\n", "\n", "\n", "\n", "\n", "\n", "10\n", "\n", "q2,[0] $\n", "\n", "\n", "\n", "9->10\n", "\n", "\n", "\n", "\n", "\n", "11\n", "\n", "q2,[1] 0 $\n", "\n", "\n", "\n", "10->11\n", "\n", "\n", "\n", "\n", "\n", "12\n", "\n", "q3,[0] $\n", "\n", "\n", "\n", "10->12\n", "\n", "\n", "\n", "\n", "\n", "14\n", "\n", "q3,[1] 0 $\n", "\n", "\n", "\n", "11->14\n", "\n", "\n", "\n", "\n", "\n", "15\n", "\n", "q2,[0] 1 0 …\n", "\n", "\n", "\n", "11->15\n", "\n", "\n", "\n", "\n", "\n", "16\n", "\n", "q3,[0] 1 0 …\n", "\n", "\n", "\n", "15->16\n", "\n", "\n", "\n", "\n", "\n", "17\n", "\n", "q2,[0] 0 1 …\n", "\n", "\n", "\n", "15->17\n", "\n", "\n", "\n", "\n", "\n", "19\n", "\n", "q3,[1] 0 $\n", "\n", "\n", "\n", "16->19\n", "\n", "\n", "\n", "\n", "\n", "18\n", "\n", "q3,[0] 0 1 …\n", "\n", "\n", "\n", "17->18\n", "\n", "\n", "\n", "\n", "\n", "20\n", "\n", "q2,[1] 0 0 …\n", "\n", "\n", "\n", "17->20\n", "\n", "\n", "\n", "\n", "\n", "21\n", "\n", "q3,[0] $\n", "\n", "\n", "\n", "19->21\n", "\n", "\n", "\n", "\n", "\n", "22\n", "\n", "q3,[1] 0 0 …\n", "\n", "\n", "\n", "20->22\n", "\n", "\n", "\n", "\n", "\n", "24\n", "\n", "q2,[0] 1 0 …\n", "\n", "\n", "\n", "20->24\n", "\n", "\n", "\n", "\n", "\n", "23\n", "\n", "q3,$\n", "\n", "\n", "\n", "21->23\n", "\n", "\n", "\n", "\n", "\n", "25\n", "\n", "\n", "q4,ε\n", "\n", "\n", "\n", "23->25\n", "\n", "\n", "\n", "\n", "\n", "26\n", "\n", "q3,[0] 1 0 …\n", "\n", "\n", "\n", "24->26\n", "\n", "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "run(m3, \"0 1 0 0 1 0\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question.** Design a PDA that recognizes the language of matching left and right parentheses (like Example 2.3)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question.** Design a PDA that recognizes the language over $\\Sigma = \\{\\mathtt{a}, \\mathtt{u}, \\mathtt{c}, \\mathtt{g}\\}$ such that every symbol is paired with exactly one other symbol -- $\\mathtt{a}$ with $\\mathtt{u}$ and $\\mathtt{c}$ with $\\mathtt{g}$, and the pairings are nested like parentheses in the previous question." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question.** Do you think a queue automaton would be more or less powerful than a pushdown (stack) automaton?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 1 }