{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Project 3 - Search-Based Web Fuzzer " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import fuzzingbook_utils" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "\n", "Fuzzing web applications can be challenging. A simple web form requires a specific input format for each field, this format needs to be fulfilled for a valid web interaction (e.g. an email field is expected to fulfill the following regex pattern `r'^([a-z0-9_\\.-]+)@([\\da-z\\.-]+)\\.([a-z\\.]{2,6})$')`.\n", "\n", "In this project, we demonstrate how to apply a search-based algorithm to generate specific test inputs for web interfaces. The generated inputs should exercise the web application by fulfilling the input validation requirements of all fields required to reach specific web pages (both normal and error pages). \n", "\n", "The task is to employ a genetic algorithm (GA) for fuzzing, your implementation is expected to generate inputs that match the expected pattern or constraints of the input field (e.g. email fuzzingbook@gmail.com fulfils the expected email regex pattern `r'^([a-z0-9_\\.-]+)@([\\da-z\\.-]+)\\.([a-z\\.]{2,6})$')`. This is to be achieved by searching the input space starting from an initially (random) input. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from WebFuzzer import init_db, ORDERS_DB, SimpleHTTPRequestHandler \n", "from http.server import HTTPServer, HTTPStatus " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Objective\n", "\n", "The goal of this project is to _implement a search-based fuzzing algorithm that generates specifically formatted inputs which fulfil the input validation requirements of a web form_. \n", "\n", "You will apply techniques learned in the lecture ([SBST](SBST.ipynb) and [WebFuzzer](WebFuzzer.ipynb)) to automatically fill web forms by producing inputs that improve the reachability of web pages. \n", "\n", "The goal of your Fuzzer is to cleverly search the input space until the input validation format of the input field is achieved. Consequently, fulfilling (or not fulfilling) such input validation schemes achieves the reachability of specific web pages and improves the coverage of the site map. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Web Application\n", "\n", "We create a (HTML only) web application for placing an order of items, similar to the order form in the [WebFuzzer](WebFuzzer.ipynb) lecture.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Input Validation\n", "\n", "We create a `ProjectHTTPRequestHandler` class that provides a web order form with input validation schemes for all input formats. \n", "\n", "First, we create a sample set of regex for input validation of each input field in our order form. \n", "\n", "__Note that your implementation shall be evaluated with a _secret set of regex_ other than the ones provided below.__\n", "\n", "For some input fields, we have provided other sample regexes you may use to test your implementation, other regexes are provided to inform you of the nature of the regexes which may be used to evaluate your final solution. \n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "itemList = r'^(tshirt|drill|lockset)$'\n", "name_regex = [r'^[A-Z][a-z]+ [A-Z][a-z]+ [A-Z][a-z]+$', r'^[A-Z][a-z]+ [A-Z]\\. [A-Z][a-z]+$', \\\n", " r'^[A-Z][a-z]+ [A-Z][a-z]+$', r'^[a-z]+ [a-z]+$']\n", "email_regex = [r'^([a-z0-9_\\.-]+)@([\\da-z\\.-]+)\\.([a-z\\.]{2,6})$', r'^([a-z0-9_\\.-]+)@cispa\\.saarland$', \\\n", " r'^([a-z0-9_\\.-]+)@gmail\\.com$', r'^([a-z0-9_\\.-]+)@hotmail\\.co\\.uk$']\n", "zip_regex = [r'^\\d{5}$', r'^\\d{3}-\\d{4}$', r'^\\d{5}([ \\-]\\d{4})?$', \\\n", " r'^[ABCEGHJKLMNPRSTVXY]\\d[ABCEGHJ-NPRSTV-Z][ ]?\\d[ABCEGHJ-NPRSTV-Z]\\d$']\n", "city_regex = [r'^[A-Z]\\w{2,9}$', r'^[a-zA-Z]+(?:[\\s-][a-zA-Z]+)*$', r'^\\p{Lu}\\p{L}*(?:[\\s-]\\p{Lu}\\p{L}*)*$', r'^\\w{2,12}$']\n", "terms_regex = r'^on$'\n", "\n", "list_regex = [itemList, name_regex[0], email_regex[0], city_regex[0], zip_regex[0], terms_regex]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We define diagnostic error messages for each error page." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "error_msg_list = [\"INCOMPLETE INPUT: \\n All fields have to be filled \",\\\n", " \"INVALID INPUT: \\n Item is not in item list\",\\\n", " \"INVALID INPUT: \\n Name does not match expected pattern :\\n \",\\\n", " \"INVALID INPUT: \\n Email does not match expected pattern :\\n \",\\\n", " \"INVALID INPUT: \\n City does not match expected pattern :\\n \",\\\n", " \"INVALID INPUT: \\n Zip does not match expected pattern :\\n \",\\\n", " \"INVALID INPUT: \\n Terms and Conditions have to be checked \"\n", " ]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We override the `handle_order()` method, to validate the values of each input field using the pre-defined regex patterns above. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use the [regex parser](https://pypi.org/project/regex/), specifically version 2.5.23, instead of Python's native `re`, because `re` does not support the `\\p{}` syntax. Confirm regex version using: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install regex \n", "import regex\n", "print(regex.__version__)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "#import re\n", "\n", "class ProjectHTTPRequestHandler(SimpleHTTPRequestHandler):\n", " def handle_order(self): \n", " values = self.get_field_values()\n", "\n", " if len(values) < 6:\n", " error_type = error_msg_list[0]\n", " self.internal_server_error(error_type)\n", " \n", " elif regex.match(list_regex[0], values[\"item\"]) and regex.match(list_regex[1], values[\"name\"]) and \\\n", " regex.match(list_regex[2], values[\"email\"]) and regex.match(list_regex[4], values[\"zip\"]) and \\\n", " regex.match(list_regex[3], values[\"city\"]) and regex.match(list_regex[5], values[\"terms\"]):\n", " self.store_order(values)\n", " self.send_order_received(values)\n", "\n", " elif not regex.match(list_regex[0], values[\"item\"]):\n", " error_type = error_msg_list[1]\n", " self.internal_server_error(error_type) \n", " \n", " elif not regex.match(list_regex[1], values[\"name\"]):\n", " error_type = error_msg_list[2] + name_regex[0] \n", " self.internal_server_error(error_type)\n", " \n", " elif not regex.match(list_regex[2], values[\"email\"]):\n", " error_type = error_msg_list[3] + email_regex[0]\n", " self.internal_server_error(error_type)\n", " \n", " elif not regex.match(list_regex[3], values[\"city\"]):\n", " error_type = error_msg_list[4] + city_regex[0]\n", " self.internal_server_error(error_type)\n", " \n", " elif not regex.match(list_regex[4], values[\"zip\"]):\n", " error_type = error_msg_list[5] + zip_regex[0]\n", " self.internal_server_error(error_type)\n", " \n", " elif not regex.match(list_regex[5], values[\"terms\"]):\n", " error_type = error_msg_list[6]\n", " self.internal_server_error(error_type)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Internal Errors\n", "For diagnostic (and identification) purposes, the internal server error pages include the error message indicating the input validation scheme that failed." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "HTML_INTERNAL_SERVER_ERROR = \"\"\"\n", "
\n", "\n", " The server has encountered an internal error. Go to our order form.\n", "
{error_message}\n", " \n", "
\n", " The server has encountered an internal error. Go to our order form.\n", "
{error_message}\n", " \n", "
\n",
" We will send {item_name} to {name} in {city}, {zip}
\n",
" A confirmation mail will be sent to {email}.\n",
"
\n", " Want more swag? Use our order form!\n", "
\n", "' +\n", " message +\n", " \"\"))\n", " else:\n", " print(terminal_escape(message))\n", "\n", "def print_httpd_messages():\n", " while not HTTPD_MESSAGE_QUEUE.empty():\n", " message = HTTPD_MESSAGE_QUEUE.get()\n", " display_httpd_message(message)\n", "\n", "def clear_httpd_messages():\n", " while not HTTPD_MESSAGE_QUEUE.empty():\n", " HTTPD_MESSAGE_QUEUE.get()\n", "\n", "class ProjectHTTPRequestHandler(ProjectHTTPRequestHandler):\n", " def log_message(self, format, *args):\n", " message = (\"%s - - [%s] %s\\n\" %\n", " (self.address_string(),\n", " self.log_date_time_string(),\n", " format % args))\n", " HTTPD_MESSAGE_QUEUE.put(message)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extend `webbrowser()` method to prints log messages produced by the server:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "from Carver import webbrowser as simple_webbrowser\n", "\n", "def webbrowser(url, mute=False):\n", " try:\n", " contents = simple_webbrowser(url)\n", " finally:\n", " if not mute:\n", " print_httpd_messages()\n", " else:\n", " clear_httpd_messages()\n", "\n", " return contents" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Running the Web Application\n", "Similar to the [WebFuzzer](WebFuzzer.ipynb) lecture, to run the web application we implement the following:\n", "\n", "* `run_httpd_forever()`\n", "* `start_httpd()`\n", "* `print_url()`\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
http://127.0.0.1:8800" ], "text/plain": [ "
%s' % (url, url)))\n", " else:\n", " print(terminal_escape(url))\n", "\n", "print_url(httpd_url)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: The above URL only works if you are running the Jupyter notebook server on the local host." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Testing the Web Application\n", "### Input Format\n", "\n", "When the user clicks `Submit` on the order form, the Web browser creates and retrieves a URL of the form:\n", "\n", "```\n", "
127.0.0.1 - - [24/Feb/2019 22:01:14] \"GET /order?item=tshirt&name=Jane+Lee+Doe&email=doe%40example.com&city=Seattle&zip=98104&terms=on HTTP/1.1\" 200 -\n", "" ], "text/plain": [ "
\n",
" We will send One FuzzingBook T-Shirt to Jane Lee Doe in Seattle, 98104
\n",
" A confirmation mail will be sent to doe@example.com.\n",
"
\n", " Want more swag? Use our order form!\n", "
\n", "127.0.0.1 - - [24/Feb/2019 22:01:14] \"GET /order?item=tshirt&name=Jane+Lee+Doe&email=doe%40example.com&city=Seattle&zip=98104&terms=off HTTP/1.1\" 200 -\n", "" ], "text/plain": [ "
127.0.0.1 - - [24/Feb/2019 22:01:14] NoneType: None\n", "" ], "text/plain": [ "
\n", " The server has encountered an internal error. Go to our order form.\n", "
INVALID INPUT: \n", " Terms and Conditions have to be checked\n", " \n", "
127.0.0.1 - - [24/Feb/2019 22:01:14] \"GET /order?item=tshirt&name=Jane+Lee+Doe&email=doe%40example.com&city=Seattle&zip=98104 HTTP/1.1\" 200 -\n", "" ], "text/plain": [ "
127.0.0.1 - - [24/Feb/2019 22:01:14] NoneType: None\n", "" ], "text/plain": [ "
\n", " The server has encountered an internal error. Go to our order form.\n", "
INCOMPLETE INPUT: \n", " All fields have to be filled\n", " \n", "
http://127.0.0.1:8801" ], "text/plain": [ "
127.0.0.1 - - [24/Feb/2019 22:01:14] \"GET /order?item=tshirt&name=Jane+L.+Doe&email=doe%40cispa.saarland&city=Seattle&zip=984-1234&terms=on HTTP/1.1\" 200 -\n", "" ], "text/plain": [ "
http://127.0.0.1:8802" ], "text/plain": [ "
127.0.0.1 - - [24/Feb/2019 22:01:15] \"GET /order?item=tshirt&name=Jane+Lee+Doe&email=doe%40example.com&city=Seattle&zip=98104&terms=on HTTP/1.1\" 200 -\n", "" ], "text/plain": [ "
127.0.0.1 - - [24/Feb/2019 22:01:15] \"GET /order?item=tshirt&name=Jane+Lee+Doe&email=doe%40example.com&city=1&zip=98104&terms=on HTTP/1.1\" 200 -\n", "" ], "text/plain": [ "
127.0.0.1 - - [24/Feb/2019 22:01:15] NoneType: None\n", "" ], "text/plain": [ "
127.0.0.1 - - [24/Feb/2019 22:01:15] \"GET /order?item=tshirt&name=Jane+Lee+Doe&email=doe%40example.com&city=Seattle&zip=104&terms=on HTTP/1.1\" 200 -\n", "" ], "text/plain": [ "
127.0.0.1 - - [24/Feb/2019 22:01:15] NoneType: None\n", "" ], "text/plain": [ "
127.0.0.1 - - [24/Feb/2019 22:01:15] \"GET /order?item=tshirt&name=Jane+Lee+Doe1&email=doe%40example.com&city=Seattle&zip=98104&terms=on HTTP/1.1\" 200 -\n", "" ], "text/plain": [ "
127.0.0.1 - - [24/Feb/2019 22:01:15] NoneType: None\n", "" ], "text/plain": [ "
127.0.0.1 - - [24/Feb/2019 22:01:15] \"GET /order?item=tshirt&name=Rhvf+Ctz+Yvun&email=doe%40example.com&city=Seattle&zip=98104&terms=on HTTP/1.1\" 200 -\n", "" ], "text/plain": [ "