{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Fuzzing with Input Fragments\n",
    "\n",
    "In this chapter, we introduce how to combine [parsing](Parser.ipynb) and [fuzzing](GrammarFuzzer.ipynb).  This allows to _mutate_ existing inputs while preserving syntactical correctness, and to _reuse_ fragments from existing inputs while generating new ones.  The combination of parsing and fuzzing, as demonstrated in this chapter, has been highly successful in practice: The _LangFuzz_ fuzzer for JavaScript has found more than 2,600 bugs in JavaScript interpreters this way."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "**Prerequisites**\n",
    "\n",
    "* You should know [how to use a parser](Parser.ipynb).\n",
    "* You should have read the [chapter on efficient grammar fuzzing](GrammarFuzzer.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "import fuzzingbook_utils"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "from Parser import PEGParser\n",
    "from GrammarFuzzer import GrammarFuzzer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "toc-hr-collapsed": true
   },
   "source": [
    "## Recombining Parsed Inputs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Recombining parsed inputs was pioneered by _Langfuzz_ \\cite{Holler2012}. The main challenge is that program inputs often carry additional constraints beyond what is described by the syntax. For example, in Java, one needs to declare a variable (using a specific format for declaration) before it can be used in an expression. This restriction is not captured in the _Java CFG_. Checking for type correctness is another example for additional restrictions carried by program definitions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "When fuzzing compilers and interpreters, naive generation of programs using the language *CFG* often fails to achieve significant deeper coverage due to these kinds of checks external to the grammar. Holler et al. suggests using pre-existing valid code fragments to get around these restrictions. The idea is that the pre-existing valid code fragments already conform to the restrictions external to the grammar, and can often provide a means to evade validity checks."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "### A Grammar-based Mutational Fuzzer\n",
    "\n",
    "The idea is that one can treat the derivation tree of a preexisting program as the scaffolding, poke holes in it, and patch it with generated inputs from our grammar. Given below is a grammar for a language that allows assignment of variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "import string"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "from Grammars import crange, syntax_diagram"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "VAR_GRAMMAR = {\n",
    "    '<start>': ['<statements>'],\n",
    "    '<statements>': ['<statement>;<statements>', '<statement>'],\n",
    "    '<statement>': ['<assignment>'],\n",
    "    '<assignment>': ['<identifier>=<expr>'],\n",
    "    '<identifier>': ['<word>'],\n",
    "    '<word>': ['<alpha><word>', '<alpha>'],\n",
    "    '<alpha>': list(string.ascii_letters),\n",
    "    '<expr>': ['<term>+<expr>', '<term>-<expr>', '<term>'],\n",
    "    '<term>': ['<factor>*<term>', '<factor>/<term>', '<factor>'],\n",
    "    '<factor>':\n",
    "    ['+<factor>', '-<factor>', '(<expr>)', '<identifier>', '<number>'],\n",
    "    '<number>': ['<integer>.<integer>', '<integer>'],\n",
    "    '<integer>': ['<digit><integer>', '<digit>'],\n",
    "    '<digit>': crange('0', '9')\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "syntax_diagram(VAR_GRAMMAR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Let us use our new grammar to parse a program."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "mystring = 'va=10;vb=20'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "def hl_predicate(_d, _n, symbol, _a): return symbol in {\n",
    "    '<number>', '<identifier>'}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "from Parser import PEGParser, highlight_node\n",
    "from GrammarFuzzer import display_tree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "parser = PEGParser(VAR_GRAMMAR)\n",
    "for tree in parser.parse(mystring):\n",
    "    display_tree(tree, node_attr=highlight_node(hl_predicate))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "As can be seen from the above example, our grammar is rather detailed. So we need to define our token nodes, which correspond to the *red* nodes above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "VAR_TOKENS = {'<number>', '<identifier>'}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "These tokens are pruned using the `prune_tree` method that we mentioned previously.\n",
    "Here is a slightly more complex program to parse, but with the tree pruned using tokens:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "mystring = 'avar=1.3;bvar=avar-3*(4+300)'\n",
    "parser = PEGParser(VAR_GRAMMAR, tokens=VAR_TOKENS)\n",
    "for tree in parser.parse(mystring):\n",
    "    display_tree(tree, node_attr=highlight_node(hl_predicate))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "We develop a `LangFuzzer` class that generates recombined inputs. To apply the _Langfuzz_ approach, we need a few parsed strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "mystrings = [\n",
    "    'abc=12+(3+3.3)',\n",
    "    'a=1;b=2;c=a+b',\n",
    "    'avar=1.3;bvar=avar-3*(4+300)',\n",
    "    'a=1.3;b=a-1*(4+3+(2/a))',\n",
    "    'a=10;b=20;c=34;d=-b+(b*b-4*a*c)/(2*a)',\n",
    "    'x=10;y=20;z=(x+y)*(x-y)',\n",
    "    'x=23;y=51;z=x*x-y*y',\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "We recurse through any given tree, collecting parsed fragments corresponding to each nonterminal. Further, we also name each node so that we can address each node separately."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "from Fuzzer import Fuzzer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "class LangFuzzer(Fuzzer):\n",
    "    def __init__(self, parser):\n",
    "        self.parser = parser\n",
    "        self.fragments = {k: [] for k in self.parser.cgrammar}\n",
    "\n",
    "    def traverse_tree(self, node):\n",
    "        counter = 1\n",
    "        nodes = {}\n",
    "\n",
    "        def helper(node, id):\n",
    "            nonlocal counter\n",
    "            name, children = node\n",
    "            new_children = []\n",
    "            nodes[id] = node\n",
    "            for child in children:\n",
    "                counter += 1\n",
    "                new_children.append(helper(child, counter))\n",
    "            return name, new_children, id\n",
    "\n",
    "        return helper(node, counter), nodes\n",
    "\n",
    "    def fragment(self, strings):\n",
    "        self.trees = []\n",
    "        for string in strings:\n",
    "            for tree in self.parser.parse(string):\n",
    "                tree, nodes = self.traverse_tree(tree)\n",
    "                self.trees.append((tree, nodes))\n",
    "                for node in nodes:\n",
    "                    symbol = nodes[node][0]\n",
    "                    if symbol in self.fragments:\n",
    "                        self.fragments[symbol].append(nodes[node])\n",
    "        return self.fragments"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "We thus obtain all valid fragments from our parsed strings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "lf = LangFuzzer(PEGParser(VAR_GRAMMAR, tokens=VAR_TOKENS))\n",
    "fragments = lf.fragment(mystrings)\n",
    "for key in fragments:\n",
    "    print(\"%s: %d\" % (key, len(fragments[key])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "All that remains is to actually find a place to poke a hole using `candidate()`, and patch that hole using `generate_new_tree()`. We will explain how to do this next.\n",
    "\n",
    "But before that, we update our `initialization` method with a call to `fragment()."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "import random"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "class LangFuzzer(LangFuzzer):\n",
    "    def __init__(self, parser, strings):\n",
    "        self.parser = parser\n",
    "        self.fragments = {k: [] for k in self.parser.cgrammar}\n",
    "        self.fragment(strings)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Candidate\n",
    "`LangFuzzer` accepts a list of strings, which are stored as derivation trees in the object.\n",
    "\n",
    "The method `candidate()` chooses one of the derivation trees randomly as the template, and identifies a node such that it can be replaced by another node that is different from itself. That is, it chooses a node such that, if the non-terminal name of the node is `node_type`, there is at least one other entry in `fragment[node_type])` "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "class LangFuzzer(LangFuzzer):\n",
    "    def candidate(self):\n",
    "        tree, nodes = random.choice(self.trees)\n",
    "        interesting_nodes = [\n",
    "            n for n in nodes if nodes[n][0] in self.fragments\n",
    "            and len(self.fragments[nodes[n][0]]) > 1\n",
    "        ]\n",
    "        node = random.choice(interesting_nodes)\n",
    "        return tree, node"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Here is how it is used -- the *red* node is the node chosen."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "random.seed(1)\n",
    "lf = LangFuzzer(PEGParser(VAR_GRAMMAR, tokens=VAR_TOKENS), mystrings)\n",
    "tree, node = lf.candidate()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "def hl_predicate(_d, nid, _s, _a): return nid in {node}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "display_tree(tree, node_attr=highlight_node(hl_predicate))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Generate New Tree\n",
    "Once we have identified the node, one can generate a new tree by replacing that node with another node of similar type from our fragment pool."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "class LangFuzzer(LangFuzzer):\n",
    "    def generate_new_tree(self, node, choice):\n",
    "        name, children, id = node\n",
    "        if id == choice:\n",
    "            return random.choice(self.fragments[name])\n",
    "        else:\n",
    "            return (name, [self.generate_new_tree(c, choice)\n",
    "                           for c in children])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Again, the red node indicates where the replacement has occurred."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "random.seed(1)\n",
    "lf = LangFuzzer(PEGParser(VAR_GRAMMAR, tokens=VAR_TOKENS), mystrings)\n",
    "tree, node = lf.candidate()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "def hl_predicate(_d, nid, _s, _a): return nid in {node}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "from GrammarFuzzer import tree_to_string"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "new_tree = lf.generate_new_tree(tree, node)\n",
    "for s in [tree_to_string(i) for i in [tree, new_tree]]:\n",
    "    print(s)\n",
    "display_tree(new_tree, node_attr=highlight_node(hl_predicate))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Fuzz\n",
    "\n",
    "The `fuzz()` method simply calls the procedures defined before in order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "class LangFuzzer(LangFuzzer):\n",
    "    def fuzz(self):\n",
    "        tree, node = self.candidate()\n",
    "        modified = self.generate_new_tree(tree, node)\n",
    "        return tree_to_string(modified)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Here is our fuzzer in action."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "lf = LangFuzzer(PEGParser(VAR_GRAMMAR, tokens=VAR_TOKENS), mystrings)\n",
    "for i in range(10):\n",
    "    print(lf.fuzz())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "How effective was our fuzzing? Let us find out!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "from Timer import Timer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "trials = 100\n",
    "\n",
    "lf = LangFuzzer(PEGParser(VAR_GRAMMAR, tokens=VAR_TOKENS), mystrings)\n",
    "valid = []\n",
    "time = 0\n",
    "for i in range(trials):\n",
    "    with Timer() as t:\n",
    "        s = lf.fuzz()\n",
    "        try:\n",
    "            exec(s, {}, {})\n",
    "            valid.append((s, t.elapsed_time()))\n",
    "        except:\n",
    "            pass\n",
    "        time += t.elapsed_time()\n",
    "print(\"%d valid strings, that is LangFuzzer generated %f%% valid entries\" %\n",
    "      (len(valid), len(valid) * 100.0 / trials))\n",
    "print(\"Total time of %f seconds\" % time)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "gf = GrammarFuzzer(VAR_GRAMMAR)\n",
    "valid = []\n",
    "time = 0\n",
    "for i in range(trials):\n",
    "    with Timer() as t:\n",
    "        s = gf.fuzz()\n",
    "        try:\n",
    "            exec(s, {}, {})\n",
    "            valid.append(s)\n",
    "        except:\n",
    "            pass\n",
    "        time += t.elapsed_time()\n",
    "print(\"%d valid strings, that is GrammarFuzzer generated %f%% valid entries\" %\n",
    "      (len(valid), len(valid) * 100.0 / trials))\n",
    "print(\"Total time of %f seconds\" % time)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "That is, our `LangFuzzer` is rather effective on generating valid entries when compared to the `GrammarFuzzer`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": true,
    "run_control": {
     "read_only": false
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Grammar-Based Mutation\n",
    "\n",
    "General idea: Take a derivation tree and a matching grammar; apply a random mutation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "from Grammars import EXPR_GRAMMAR\n",
    "from GrammarFuzzer import display_tree\n",
    "from Parser import EarleyParser"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "parser = EarleyParser(EXPR_GRAMMAR)\n",
    "tree = parser.parse(\"1 + 2 * 3\")[0]\n",
    "display_tree(tree)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "1. Pick any node in the tree\n",
    "2. Produce a new expansion.\n",
    "\n",
    "We have seen this for `LangFuzzer` already, right?\n",
    "\n",
    "How about we factor this out (from the Parser notebook), and have two notebook on mutational (and genetic fuzzing):\n",
    "\n",
    "1. `LangFuzzer` – a chapter on\n",
    "    * Mutating trees (randomly)\n",
    "    * Mutating trees from a given population (the LangFuzz approach)\n",
    "    * Tree recombination (and crossover)\n",
    "    \n",
    "2. `EvoGrammarFuzzer` – a chapter on\n",
    "    * Genetic improvement (using coverage only)\n",
    "    * Genetic improvement (using a fitness function from search-based fuzzing)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "def mutate_tree(tree, grammar):\n",
    "    pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": true,
    "run_control": {
     "read_only": false
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Lessons Learned\n",
    "\n",
    "* We can generate a pool of fragments using the _LangFuzz_ approach, and use it to generate nearly valid strings."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": false,
    "run_control": {
     "read_only": false
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Next Steps\n",
    "\n",
    "* In the [chapter on evolutionary fuzzing](EvoGrammarFuzzer.ipynb), we discuss how to systematically evolve a population of inputs through mutation and crossover operations (as discussed in this chapter) to achieve a specific goal such as code coverage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Background\n",
    "\n",
    "Recombining parsed inputs was pioneered by _Langfuzz_ \\cite{Holler2012}, which also gave its name to the main class of this chapter."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "button": false,
    "new_sheet": true,
    "run_control": {
     "read_only": false
    },
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Exercises\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "solution2": "hidden",
    "solution2_first": true
   },
   "source": [
    "### Exercise 1: A Different LangFuzzer\n",
    "\n",
    "Sometimes we do not want to use our pool of strings for various reasons – the number of items in the pool may be inadequate, or not varied enough. Extend the `LangFuzzer` to use a separate function to check if the number of items in the pool corresponding to the selected non-terminal is large enough (say greater than 10), and if not, use the tree expansion technique from `GrammarFuzzer` to patch the hole."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "solution2": "hidden"
   },
   "source": [
    "**Solution.** Here is a possible solution.  Before we can make use of `GrammarFuzzer`, we need to change it a little bit. GrammarFuzzer relies on the grammar being in the `fuzzing` format with the expansions represented as strings. Our `LangFuzzer` expects the expansions to be a list of tokens. So we fix the output of `GrammarFuzzer`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "class LangFuzzer2(LangFuzzer):\n",
    "    def __init__(self, parser, strings):\n",
    "        super().__init__(parser, strings)\n",
    "        self.gfuzz = GrammarFuzzer(parser.grammar())\n",
    "\n",
    "    def check_diversity(self, pool):\n",
    "        return len(pool) > 10\n",
    "\n",
    "    def candidate(self):\n",
    "        tree, nodes = random.choice(self.trees)\n",
    "        interesting_nodes = [\n",
    "            n for n in nodes if nodes[n][0] in self.fragments\n",
    "            and nodes[n][0] is not self.parser.start_symbol()\n",
    "            and len(self.fragments[nodes[n][0]]) > 0\n",
    "        ]\n",
    "        node = random.choice(interesting_nodes)\n",
    "        return tree, node\n",
    "\n",
    "    def generate_new_tree(self, node, choice):\n",
    "        name, children, id = node\n",
    "        if id == choice:\n",
    "            pool = self.fragments[name]\n",
    "            if self.check_diversity(pool):\n",
    "                return random.choice(self.fragments[name])\n",
    "            else:\n",
    "                return None\n",
    "        else:\n",
    "            return (name,\n",
    "                    [self.generate_new_tree(c, choice) for c in children])\n",
    "\n",
    "    def fuzz(self):\n",
    "        tree, node = self.candidate()\n",
    "        tree_with_a_hole = self.generate_new_tree(tree, node)\n",
    "        modified = self.gfuzz.expand_tree(tree_with_a_hole)\n",
    "        return tree_to_string(modified)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "solution2": "hidden"
   },
   "outputs": [],
   "source": [
    "parser = EarleyParser(VAR_GRAMMAR, tokens=VAR_TOKENS)\n",
    "lf = LangFuzzer2(parser, mystrings)\n",
    "for i in range(100):\n",
    "    print(lf.fuzz())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "solution2": "hidden"
   },
   "source": [
    "With these changes, our `LangFuzzer2` is able to use the pool of fragments when necessary, but can rely on the grammar when the pool is not sufficient."
   ]
  }
 ],
 "metadata": {
  "ipub": {
   "bibliography": "fuzzingbook.bib",
   "toc": true
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": true
  },
  "toc-autonumbering": false
 },
 "nbformat": 4,
 "nbformat_minor": 2
}