{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Benchmark of the SHA256 hash function, with Python, Cython and Numba\n", "\n", "This small [Jupyter notebook](https://www.Jupyter.org/) is a short experiment, to compare the time complexity of three different implementations of the [SHA-256 hash function](https://en.wikipedia.org/wiki/SHA-2), in pure [Python](https://www.Python.org/), with [Cython](http://Cython.org/), and with [Numba](http://Numba.PyData.org/).\n", "\n", "- *Reference*: Wikipedia pages on [Hash functions](https://en.wikipedia.org/wiki/Hash_function) and [SHA-2](https://en.wikipedia.org/wiki/SHA-2).\n", "- *Date*: 21 June 2017.\n", "- *Author*: [Lilian Besson](https://GitHub.com/Naereen/notebooks).\n", "- *License*: [MIT Licensed](https://LBesson.MIT-License.org/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## What is a hash function?\n", "> TL;DR : [Hash functions](https://en.wikipedia.org/wiki/Hash_function) and [cryptographic hashing functions](https://en.wikipedia.org/wiki/Cryptographic_hash_function) on Wikipedia." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Common API for the different classes\n", "\n", "I will copy the API proposed by [the `hashlib` module in Python standard library](https://docs.python.org/3/library/hashlib.html), so it will be very easy to compare my implementations with the one provided with your default [Python](https://www.Python.org/) installation." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "code_folding": [ 0 ], "collapsed": true }, "outputs": [], "source": [ "class Hash(object):\n", " \"\"\" Common class for all hash methods.\n", " \n", " It copies the one of the hashlib module (https://docs.python.org/3.5/library/hashlib.html).\n", " \"\"\"\n", " \n", " def __init__(self, *args, **kwargs):\n", " \"\"\" Create the Hash object.\"\"\"\n", " self.name = self.__class__.__name__ # https://docs.python.org/3.5/library/hashlib.html#hashlib.hash.name\n", " self.byteorder = 'little'\n", " self.digest_size = 0 # https://docs.python.org/3.5/library/hashlib.html#hashlib.hash.digest_size\n", " self.block_size = 0 # https://docs.python.org/3.5/library/hashlib.html#hashlib.hash.block_size\n", "\n", " def __str__(self):\n", " return self.name\n", " \n", " def update(self, arg):\n", " \"\"\" Update the hash object with the object arg, which must be interpretable as a buffer of bytes.\"\"\"\n", " pass\n", "\n", " def digest(self):\n", " \"\"\" Return the digest of the data passed to the update() method so far. This is a bytes object of size digest_size which may contain bytes in the whole range from 0 to 255.\"\"\"\n", " return b\"\"\n", "\n", " def hexdigest(self):\n", " \"\"\" Like digest() except the digest is returned as a string object of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments.\"\"\"\n", " digest = self.digest()\n", " raw = digest.to_bytes(self.digest_size, byteorder=self.byteorder)\n", " format_str = '{:0' + str(2 * self.digest_size) + 'x}'\n", " return format_str.format(int.from_bytes(raw, byteorder='big'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Checking the [the `hashlib` module in Python standard library](https://docs.python.org/3/library/hashlib.html)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import hashlib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check [the available algorithms](https://docs.python.org/3.5/library/hashlib.html#hashlib.algorithms_available), some of them being [guaranteed to be on any platform](https://docs.python.org/3.5/library/hashlib.html#hashlib.algorithms_guaranteed), some are not." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "['SHA',\n", " 'dsaWithSHA',\n", " 'dsaEncryption',\n", " 'md4',\n", " 'md5',\n", " 'sha512',\n", " 'sha256',\n", " 'sha1',\n", " 'RIPEMD160',\n", " 'ecdsa-with-SHA1',\n", " 'MD4',\n", " 'SHA256',\n", " 'MD5',\n", " 'SHA224',\n", " 'SHA1',\n", " 'sha224',\n", " 'whirlpool',\n", " 'DSA',\n", " 'ripemd160',\n", " 'sha',\n", " 'SHA512',\n", " 'SHA384',\n", " 'DSA-SHA',\n", " 'sha384']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(hashlib.algorithms_available)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will need at least this one:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "assert 'SHA256' in hashlib.algorithms_available" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets check that they have the block size and digest size announced:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For SHA256 : the block size is 64 and the digest size is 32.\n" ] } ], "source": [ "name = 'SHA256'\n", "s = hashlib.sha256()\n", "print(\"For {:<8} : the block size is {:<3} and the digest size is {:<2}.\".format(name, s.block_size, s.digest_size))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Pure Python code for the SHA-2 hashing function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let now study and implement a last hashing function, again slightly harder to write but more secure: SHA-2, \"Secure Hash Algorithm, version 2\".\n", "See [the SHA-2 hashing function](https://en.wikipedia.org/wiki/SHA-2) on Wikipedia, if needed.\n", "\n", "