{ "cells": [ { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "slide" } }, "source": [ "# Testing Web Applications\n", "\n", "In this chapter, we explore how to generate tests for Graphical User Interfaces (GUIs), notably on Web interfaces. We set up a (vulnerable) Web server and demonstrate how to systematically explore its behavior – first with hand-written grammars, then with grammars automatically inferred from the user interface. We also show how to conduct systematic attacks on these servers, notably with code and SQL injection." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "source": [ "**Prerequisites**\n", "\n", "* The techniques in this chapter make use of [grammars for fuzzing](Grammars.ipynb).\n", "* Basic knowledge of HTML and HTTP is required.\n", "* Knowledge of SQL databases is helpful." ] }, { "cell_type": "markdown", "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" }, "toc-hr-collapsed": false }, "source": [ "## A Web User Interface\n", "\n", "Let us start with a simple example. We want to set up a _Web server_ that allows readers of this book to buy fuzzingbook-branded fan articles. In reality, we would make use of an existing Web shop (or an appropriate framework) for this purpose. For the purpose of this book, we _write our own Web server_, building on the HTTP server facilities provided by the Python library." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "All of our Web server is defined in a `HTTPRequestHandler`, which, as the name suggests, handles arbitrary Web page requests." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from http.server import HTTPServer, BaseHTTPRequestHandler, HTTPStatus" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):\n", " pass" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Taking Orders\n", "\n", "For our Web server, we need a number of Web pages:\n", "* We want one page where customers can place an order.\n", "* We want one page where they see their order confirmed. \n", "* Additionally, we need pages display error messages such as \"Page Not Found\"." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We start with the order form. The dictionary `FUZZINGBOOK_SWAG` holds the items that customers can order, together with long descriptions:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import fuzzingbook_utils" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "FUZZINGBOOK_SWAG = {\n", " \"tshirt\": \"One FuzzingBook T-Shirt\",\n", " \"drill\": \"One FuzzingBook Rotary Hammer\",\n", " \"lockset\": \"One FuzzingBook Lock Set\"\n", "}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This is the HTML code for the order form. The menu for selecting the swag to be ordered is created dynamically from `FUZZINGBOOK_SWAG`. We omit plenty of details such as precise shipping address, payment, shopping cart, and more." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "HTML_ORDER_FORM = \"\"\"\n", "
\n", "\n", "\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "This is what the order form looks like:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from IPython.display import display" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from fuzzingbook_utils import HTML" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "HTML(HTML_ORDER_FORM)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "This form is not yet functional, as there is no server behind it; pressing \"place order\" will lead you to a nonexistent page." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" }, "toc-hr-collapsed": false }, "source": [ "### Order Confirmation\n", "\n", "Once we have gotten an order, we show a confirmation page, which is instantiated with the customer information submitted before. Here is the HTML and the rendering:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "HTML_ORDER_RECEIVED = \"\"\"\n", "\n", "\n",
" We will send {item_name} to {name} in {city}, {zip}
\n",
" A confirmation mail will be sent to {email}.\n",
"
\n", " Want more swag? Use our order form!\n", "
\n", "\n", " The content of this project is licensed under the\n", " Creative Commons\n", " Attribution-NonCommercial-ShareAlike 4.0 International License.\n", "
\n", "\n", " To place an order, use our order form.\n", "
\n", "\n", " This page does not exist. Try our order form instead.\n", "
\n", "\n", " The server has encountered an internal error. Go to our order form.\n", "
{error_message}\n", " \n", "
' +\n", " message +\n", " \"\"))\n", " else:\n", " print(terminal_escape(message))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "display_httpd_message(\"I am a httpd server message\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The method `print_httpd_messages()` prints all messages accumulated in the queue so far:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def print_httpd_messages():\n", " while not HTTPD_MESSAGE_QUEUE.empty():\n", " message = HTTPD_MESSAGE_QUEUE.get()\n", " display_httpd_message(message)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import time" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "time.sleep(1)\n", "print_httpd_messages()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With `clear_httpd_messages()`, we can silently discard all pending messages:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def clear_httpd_messages():\n", " while not HTTPD_MESSAGE_QUEUE.empty():\n", " HTTPD_MESSAGE_QUEUE.get()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The method `log_message()` in the request handler makes use of the queue to store its messages:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "class SimpleHTTPRequestHandler(SimpleHTTPRequestHandler):\n", " def log_message(self, format, *args):\n", " message = (\"%s - - [%s] %s\\n\" %\n", " (self.address_string(),\n", " self.log_date_time_string(),\n", " format % args))\n", " HTTPD_MESSAGE_QUEUE.put(message)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In [the chapter on carving](Carver.ipynb), we had introduced a `webbrowser()` method which retrieves the contents of the given URL. We now extend it such that it also prints out any log messages produced by the server:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import requests" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def webbrowser(url, mute=False):\n", " \"\"\"Download the http/https resource given by the URL\"\"\"\n", " try:\n", " r = requests.get(url)\n", " contents = r.text\n", " finally:\n", " if not mute:\n", " print_httpd_messages()\n", " else:\n", " clear_httpd_messages()\n", "\n", " return contents" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Running the Server\n", "\n", "After all these definitions, we are now ready to get the Web server up and running. We run the server on the *local host* – that is, the same machine which also runs this notebook. We check for an accessible port and put the resulting URL in the queue created earlier." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "button": false, "new_sheet": false, "run_control": { "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "def run_httpd_forever(handler_class):\n", " host = \"127.0.0.1\" # localhost IP\n", " for port in range(8800, 9000):\n", " httpd_address = (host, port)\n", "\n", " try:\n", " httpd = HTTPServer(httpd_address, handler_class)\n", " break\n", " except OSError:\n", " continue\n", "\n", " httpd_url = \"http://\" + host + \":\" + repr(port)\n", " HTTPD_MESSAGE_QUEUE.put(httpd_url)\n", " httpd.serve_forever()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The function `start_httpd()` starts the server in a separate process, which we start using the `multiprocessing` module. It retrieves its URL from the message queue and returns it, such that we can start talking to the server." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from multiprocessing import Process" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def start_httpd(handler_class=SimpleHTTPRequestHandler):\n", " clear_httpd_messages()\n", "\n", " httpd_process = Process(target=run_httpd_forever, args=(handler_class,))\n", " httpd_process.start()\n", "\n", " httpd_url = HTTPD_MESSAGE_QUEUE.get()\n", " return httpd_process, httpd_url" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us now start the server and save its URL:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "httpd_process, httpd_url = start_httpd()\n", "httpd_url" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Interacting with the Server\n", "\n", "Let us now access the server just created." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Direct Browser Access\n", "\n", "If you are running the Jupyter notebook server on the local host as well, you can now access the server directly at the given URL. Simply open the address in `httpd_url` by clicking on the link below.\n", "\n", "**Note**: This only works if you are running the Jupyter notebook server on the local host." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def print_url(url):\n", " if rich_output():\n", " display(HTML('
%s' % (url, url)))\n", " else:\n", " print(terminal_escape(url))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "print_url(httpd_url)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Even more convenient, you may be able to interact directly with the server using the window below. \n", "\n", "**Note**: This only works if you are running the Jupyter notebook server on the local host." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "HTML('')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "After interaction, you can retrieve the messages produced by the server:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "print_httpd_messages()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "We can also see any orders placed in the database:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "print(db.execute(\"SELECT * FROM orders\").fetchall())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And we can clear the order database:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "db.execute(\"DELETE FROM orders\")\n", "db.commit()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Retrieving the Home Page\n", "\n", "Even if our browser cannot directly interact with the server, the _notebook_ can. We can, for instance, retrieve the contents of the home page and display them:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "contents = webbrowser(httpd_url)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "HTML(contents)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Placing Orders\n", "\n", "To test this form, we can generate URLs with orders and have the server process them." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The method `urljoin()` puts together a base URL (i.e., the URL of our server) and a path – say, the path towards our order." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from urllib.parse import urljoin, urlsplit" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "urljoin(httpd_url, \"/order?foo=bar\")" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "With `urljoin()`, we can create a full URL that is the same as the one generated by the browser as we submit the order form. Sending this URL to the browser effectively places the order, as we can see in the server log produced:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "contents = webbrowser(urljoin(httpd_url,\n", " \"/order?item=tshirt&name=Jane+Doe&email=doe%40example.com&city=Seattle&zip=98104\"))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "The web page returned confirms the order:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "HTML(contents)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "And the order is in the database, too:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "print(db.execute(\"SELECT * FROM orders\").fetchall())" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Error Messages\n", "\n", "We can also test whether the server correctly responds to invalid requests. Nonexistent pages, for instance, are correctly handled:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "HTML(webbrowser(urljoin(httpd_url, \"/some/other/path\")))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "You may remember we also have a page for internal server errors. Can we get the server to produce this page? To find this out, we have to test the server thoroughly – which we do in the remainder of this chapter." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Fuzzing Input Forms\n", "\n", "After setting up and starting the server, let us now go and systematically test it – first with expected, and then with less expected values." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Fuzzing with Expected Values\n", "\n", "Since placing orders is all done by creating appropriate URLs, we define a [grammar](Grammars.ipynb) `ORDER_GRAMMAR` which encodes ordering URLs. It comes with a few sample values for names, email addresses, cities and (random) digits." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "To make it easier to define strings that become part of a URL, we define the function `cgi_encode()`, taking a string and autmatically encoding it into CGI:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "import string" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def cgi_encode(s, do_not_encode=\"\"):\n", " ret = \"\"\n", " for c in s:\n", " if (c in string.ascii_letters or c in string.digits\n", " or c in \"$-_.+!*'(),\" or c in do_not_encode):\n", " ret += c\n", " elif c == ' ':\n", " ret += '+'\n", " else:\n", " ret += \"%%%02x\" % ord(c)\n", " return ret" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "s = cgi_encode('Is \"DOW30\" down .24%?')\n", "s" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The optional parameter `do_not_encode` allows us to skip certain characters from encoding. This is useful when encoding grammar rules:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "cgi_encode(\"