{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "

1  Benchmark of the SHA256 hash function, with Python, Cython and Numba
1.1  What is a hash function?
1.2  Common API for the different classes
1.3  Checking the the hashlib module in Python standard library
1.4  Pure Python code for the SHA-2 hashing function
1.4.1  Useful functions the SHA-2 algorithm
1.4.2  The SHA2 class
1.4.3  Check on SHA-2
1.4.4  Trying 1000 random examples
1.5  Numba-powered code for the SHA-2 hashing function
1.5.1  Requirements
1.5.2  Useful functions the SHA-2 algorithm
1.5.3  The SHA2_Numba class
1.5.4  Check on SHA-2
1.6  Cython-power code for the SHA-2 hashing function
1.6.1  Requirements
1.6.2  Useful functions the SHA-2 algorithm
1.6.3  The SHA2_Cython class
1.6.4  Check on SHA-2
1.7  Conclusion
1.7.1  Bonus
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Benchmark of the SHA256 hash function, with Python, Cython and Numba\n", "\n", "This small [Jupyter notebook](https://www.Jupyter.org/) is a short experiment, to compare the time complexity of three different implementations of the [SHA-256 hash function](https://en.wikipedia.org/wiki/SHA-2), in pure [Python](https://www.Python.org/), with [Cython](http://Cython.org/), and with [Numba](http://Numba.PyData.org/).\n", "\n", "- *Reference*: Wikipedia pages on [Hash functions](https://en.wikipedia.org/wiki/Hash_function) and [SHA-2](https://en.wikipedia.org/wiki/SHA-2).\n", "- *Date*: 21 June 2017.\n", "- *Author*: [Lilian Besson](https://GitHub.com/Naereen/notebooks).\n", "- *License*: [MIT Licensed](https://LBesson.MIT-License.org/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## What is a hash function?\n", "> TL;DR : [Hash functions](https://en.wikipedia.org/wiki/Hash_function) and [cryptographic hashing functions](https://en.wikipedia.org/wiki/Cryptographic_hash_function) on Wikipedia." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Common API for the different classes\n", "\n", "I will copy the API proposed by [the `hashlib` module in Python standard library](https://docs.python.org/3/library/hashlib.html), so it will be very easy to compare my implementations with the one provided with your default [Python](https://www.Python.org/) installation." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "code_folding": [ 0 ], "collapsed": true }, "outputs": [], "source": [ "class Hash(object):\n", " \"\"\" Common class for all hash methods.\n", " \n", " It copies the one of the hashlib module (https://docs.python.org/3.5/library/hashlib.html).\n", " \"\"\"\n", " \n", " def __init__(self, *args, **kwargs):\n", " \"\"\" Create the Hash object.\"\"\"\n", " self.name = self.__class__.__name__ # https://docs.python.org/3.5/library/hashlib.html#hashlib.hash.name\n", " self.byteorder = 'little'\n", " self.digest_size = 0 # https://docs.python.org/3.5/library/hashlib.html#hashlib.hash.digest_size\n", " self.block_size = 0 # https://docs.python.org/3.5/library/hashlib.html#hashlib.hash.block_size\n", "\n", " def __str__(self):\n", " return self.name\n", " \n", " def update(self, arg):\n", " \"\"\" Update the hash object with the object arg, which must be interpretable as a buffer of bytes.\"\"\"\n", " pass\n", "\n", " def digest(self):\n", " \"\"\" Return the digest of the data passed to the update() method so far. This is a bytes object of size digest_size which may contain bytes in the whole range from 0 to 255.\"\"\"\n", " return b\"\"\n", "\n", " def hexdigest(self):\n", " \"\"\" Like digest() except the digest is returned as a string object of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments.\"\"\"\n", " digest = self.digest()\n", " raw = digest.to_bytes(self.digest_size, byteorder=self.byteorder)\n", " format_str = '{:0' + str(2 * self.digest_size) + 'x}'\n", " return format_str.format(int.from_bytes(raw, byteorder='big'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Checking the [the `hashlib` module in Python standard library](https://docs.python.org/3/library/hashlib.html)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import hashlib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check [the available algorithms](https://docs.python.org/3.5/library/hashlib.html#hashlib.algorithms_available), some of them being [guaranteed to be on any platform](https://docs.python.org/3.5/library/hashlib.html#hashlib.algorithms_guaranteed), some are not." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [ { "data": { "text/plain": [ "['SHA',\n", " 'dsaWithSHA',\n", " 'dsaEncryption',\n", " 'md4',\n", " 'md5',\n", " 'sha512',\n", " 'sha256',\n", " 'sha1',\n", " 'RIPEMD160',\n", " 'ecdsa-with-SHA1',\n", " 'MD4',\n", " 'SHA256',\n", " 'MD5',\n", " 'SHA224',\n", " 'SHA1',\n", " 'sha224',\n", " 'whirlpool',\n", " 'DSA',\n", " 'ripemd160',\n", " 'sha',\n", " 'SHA512',\n", " 'SHA384',\n", " 'DSA-SHA',\n", " 'sha384']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(hashlib.algorithms_available)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will need at least this one:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "assert 'SHA256' in hashlib.algorithms_available" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets check that they have the block size and digest size announced:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For SHA256 : the block size is 64 and the digest size is 32.\n" ] } ], "source": [ "name = 'SHA256'\n", "s = hashlib.sha256()\n", "print(\"For {:<8} : the block size is {:<3} and the digest size is {:<2}.\".format(name, s.block_size, s.digest_size))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Pure Python code for the SHA-2 hashing function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let now study and implement a last hashing function, again slightly harder to write but more secure: SHA-2, \"Secure Hash Algorithm, version 2\".\n", "See [the SHA-2 hashing function](https://en.wikipedia.org/wiki/SHA-2) on Wikipedia, if needed.\n", "\n", "
Remark: it is not (yet) considered broken, and it is the military standard for security and cryptographic hashing. SHA-3 is preferred for security purposes.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Useful functions the SHA-2 algorithm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is exactly like for MD5. But SHA-2 requires right-rotate as well." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def leftrotate(x, c):\n", " \"\"\" Left rotate the number x by c bytes.\"\"\"\n", " x &= 0xFFFFFFFF\n", " return ((x << c) | (x >> (32 - c))) & 0xFFFFFFFF\n", "\n", "def rightrotate(x, c):\n", " \"\"\" Right rotate the number x by c bytes.\"\"\"\n", " x &= 0xFFFFFFFF\n", " return ((x >> c) | (x << (32 - c))) & 0xFFFFFFFF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As SHA-2 plays with big-endian and little-endian integers, and at the end it requires a leftshift to combine the 5 hash pieces into one." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def leftshift(x, c):\n", " \"\"\" Left shift the number x by c bytes.\"\"\"\n", " return x << c\n", "\n", "def rightshift(x, c):\n", " \"\"\" Right shift the number x by c bytes.\"\"\"\n", " return x >> c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `SHA2` class\n", "\n", "I will use a simple class, very similar to the class used for the SHA-1 algorithm (see above).\n", "It is a direct implementation of the pseudo-code, as given for instance on the Wikipedia page.\n", "\n", "I will only implement the simpler one, SHA-256, of digest size of 256 bits. Other variants are SHA-224, SHA-384, SHA-512 (and others include SHA-512/224, SHA-512/256)." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "class SHA2(Hash):\n", " \"\"\"SHA256 hashing, see https://en.wikipedia.org/wiki/SHA-2#Pseudocode.\"\"\"\n", " \n", " def __init__(self):\n", " self.name = \"SHA256\"\n", " self.byteorder = 'big'\n", " self.block_size = 64\n", " self.digest_size = 32\n", " # Note 2: For each round, there is one round constant k[i] and one entry in the message schedule array w[i], 0 ≤ i ≤ 63\n", " # Note 3: The compression function uses 8 working variables, a through h\n", " # Note 4: Big-endian convention is used when expressing the constants in this pseudocode,\n", " # and when parsing message block data from bytes to words, for example,\n", " # the first word of the input message \"abc\" after padding is 0x61626380\n", "\n", " # Initialize hash values:\n", " # (first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19):\n", " h0 = 0x6a09e667\n", " h1 = 0xbb67ae85\n", " h2 = 0x3c6ef372\n", " h3 = 0xa54ff53a\n", " h4 = 0x510e527f\n", " h5 = 0x9b05688c\n", " h6 = 0x1f83d9ab\n", " h7 = 0x5be0cd19\n", " \n", " # Initialize array of round constants:\n", " # (first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311):\n", " self.k = [\n", " 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,\n", " 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,\n", " 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,\n", " 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,\n", " 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,\n", " 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,\n", " 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,\n", " 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2\n", " ]\n", "\n", " # Store them\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", " \n", " def update(self, arg):\n", " h0, h1, h2, h3, h4, h5, h6, h7 = self.hash_pieces\n", " # 1. Pre-processing, exactly like MD5\n", " data = bytearray(arg)\n", " orig_len_in_bits = (8 * len(data)) & 0xFFFFFFFFFFFFFFFF\n", " # 1.a. Add a single '1' bit at the end of the input bits\n", " data.append(0x80)\n", " # 1.b. Padding with zeros as long as the input bits length ≡ 448 (mod 512)\n", " while len(data) % 64 != 56:\n", " data.append(0)\n", " # 1.c. append original length in bits mod (2 pow 64) to message\n", " data += orig_len_in_bits.to_bytes(8, byteorder='big')\n", " assert len(data) % 64 == 0, \"Error in padding\"\n", " # 2. Computations\n", " # Process the message in successive 512-bit = 64-bytes chunks:\n", " for offset in range(0, len(data), 64):\n", " # 2.a. 512-bits = 64-bytes chunks\n", " chunks = data[offset : offset + 64]\n", " w = [0 for i in range(64)]\n", " # 2.b. Break chunk into sixteen 32-bit = 4-bytes words w[i], 0 ≤ i ≤ 15\n", " for i in range(16):\n", " w[i] = int.from_bytes(chunks[4*i : 4*i + 4], byteorder='big')\n", " # 2.c. Extend the first 16 words into the remaining 48\n", " # words w[16..63] of the message schedule array:\n", " for i in range(16, 64):\n", " s0 = (rightrotate(w[i-15], 7) ^ rightrotate(w[i-15], 18) ^ rightshift(w[i-15], 3)) & 0xFFFFFFFF\n", " s1 = (rightrotate(w[i-2], 17) ^ rightrotate(w[i-2], 19) ^ rightshift(w[i-2], 10)) & 0xFFFFFFFF\n", " w[i] = (w[i-16] + s0 + w[i-7] + s1) & 0xFFFFFFFF\n", " # 2.d. Initialize hash value for this chunk\n", " a, b, c, d, e, f, g, h = h0, h1, h2, h3, h4, h5, h6, h7\n", " # 2.e. Main loop, cf. https://tools.ietf.org/html/rfc6234\n", " for i in range(64):\n", " S1 = (rightrotate(e, 6) ^ rightrotate(e, 11) ^ rightrotate(e, 25)) & 0xFFFFFFFF\n", " ch = ((e & f) ^ ((~e) & g)) & 0xFFFFFFFF\n", " temp1 = (h + S1 + ch + self.k[i] + w[i]) & 0xFFFFFFFF\n", " S0 = (rightrotate(a, 2) ^ rightrotate(a, 13) ^ rightrotate(a, 22)) & 0xFFFFFFFF\n", " maj = ((a & b) ^ (a & c) ^ (b & c)) & 0xFFFFFFFF\n", " temp2 = (S0 + maj) & 0xFFFFFFFF\n", "\n", " new_a = (temp1 + temp2) & 0xFFFFFFFF\n", " new_e = (d + temp1) & 0xFFFFFFFF\n", " # Rotate the 8 variables\n", " a, b, c, d, e, f, g, h = new_a, a, b, c, new_e, e, f, g\n", "\n", " # Add this chunk's hash to result so far:\n", " h0 = (h0 + a) & 0xFFFFFFFF\n", " h1 = (h1 + b) & 0xFFFFFFFF\n", " h2 = (h2 + c) & 0xFFFFFFFF\n", " h3 = (h3 + d) & 0xFFFFFFFF\n", " h4 = (h4 + e) & 0xFFFFFFFF\n", " h5 = (h5 + f) & 0xFFFFFFFF\n", " h6 = (h6 + g) & 0xFFFFFFFF\n", " h7 = (h7 + h) & 0xFFFFFFFF\n", " # 3. Conclusion\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", "\n", " def digest(self):\n", " # h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7\n", " return sum(leftshift(x, 32 * i) for i, x in enumerate(self.hash_pieces[::-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also write a function to directly compute the hex digest from some bytes data." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def hash_SHA2(data):\n", " \"\"\" Shortcut function to directly receive the hex digest from SHA2(data).\"\"\"\n", " h = SHA2()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check on SHA-2\n", "\n", "Let try the example from [SHA-2 Wikipedia page](https://en.wikipedia.org/wiki/SHA-2#Test_vectors) :" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592'" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA2(\"The quick brown fox jumps over the lazy dog\")\n", "assert hash_SHA2(\"The quick brown fox jumps over the lazy dog\") == 'd7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even a small change in the message will (with overwhelming probability) result in a mostly different hash, due to the [**avalanche effect**](https://en.wikipedia.org/wiki/Avalanche_effect). For example, adding a period at the end of the sentence:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ef537f25c895bfa782526529a9b63d97aa631564d5d789c2b765448c8635fb6c'" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA2(\"The quick brown fox jumps over the lazy dog.\")\n", "assert hash_SHA2(\"The quick brown fox jumps over the lazy dog.\") == 'ef537f25c895bfa782526529a9b63d97aa631564d5d789c2b765448c8635fb6c'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The hash of the zero-length string is:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA2(\"\")\n", "assert hash_SHA2(\"\") == 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\implies$ We obtained the same result, OK our function works!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Trying 1000 random examples\n", "On a small sentence:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'10a6dec377c28d2d34001c103760339f8d6ab02660d200d6014329b86253552c'" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA2(\"My name is Zorro !\")" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'10a6dec377c28d2d34001c103760339f8d6ab02660d200d6014329b86253552c'" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h = hashlib.sha256()\n", "h.update(b\"My name is Zorro !\")\n", "h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It starts to look good." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def true_hash_SHA2(data):\n", " h = hashlib.sha256()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On some random data:" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy.random as nr\n", "alphabets = \"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\"\n", "\n", "def random_string(size=10000):\n", " return ''.join(alphabets[nr.randint(len(alphabets))] for _ in range(size))" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'bXOwDYtUOz'" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "random_string(10)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from tqdm import tqdm_notebook as tqdm" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b17aa534a4544b7ba4a0c3058c84e880" } }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "CPU times: user 1min 2s, sys: 88 ms, total: 1min 2s\n", "Wall time: 1min 2s\n" ] } ], "source": [ "%%time\n", "for _ in tqdm(range(1000)):\n", " x = random_string()\n", " assert hash_SHA2(x) == true_hash_SHA2(x), \"Error: x = {} gave two different SHA2 hashes: my implementation = {} != hashlib implementation = {}...\".format(x, hash_SHA2(x), true_hash_SHA2(x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Numba-powered code for the SHA-2 hashing function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Requirements\n", "You need [numba](http://numba.pydata.org/) to be installed." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from numba import jit, jitclass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Useful functions the SHA-2 algorithm\n", "\n", "Let just add the [`numba.jit`](http://numba.pydata.org/numba-doc/latest/user/jit.html) decorator to every function we defined before:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "@jit\n", "def leftrotate_numba(x, c):\n", " \"\"\" Left rotate the number x by c bytes.\"\"\"\n", " x &= 0xFFFFFFFF\n", " return ((x << c) | (x >> (32 - c))) & 0xFFFFFFFF\n", "\n", "@jit\n", "def rightrotate_numba(x, c):\n", " \"\"\" Right rotate the number x by c bytes.\"\"\"\n", " x &= 0xFFFFFFFF\n", " return ((x >> c) | (x << (32 - c))) & 0xFFFFFFFF" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "@jit\n", "def leftshift_numba(x, c):\n", " \"\"\" Left shift the number x by c bytes.\"\"\"\n", " return x << c\n", "\n", "@jit\n", "def rightshift_numba(x, c):\n", " \"\"\" Right shift the number x by c bytes.\"\"\"\n", " return x >> c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `SHA2_Numba` class\n", "\n", "And similarly for the `SHA2` class, with the [`numba.jit`](http://numba.pydata.org/numba-doc/latest/user/jit.html) decorator to the `update` function." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": true }, "outputs": [], "source": [ "class SHA2_Numba(Hash):\n", " \"\"\"SHA256 hashing, speed-up with Numba.jit, see https://en.wikipedia.org/wiki/SHA-2#Pseudocode.\"\"\"\n", " \n", " def __init__(self):\n", " self.name = \"SHA256\"\n", " self.byteorder = 'big'\n", " self.block_size = 64\n", " self.digest_size = 32\n", " # Note 2: For each round, there is one round constant k[i] and one entry in the message schedule array w[i], 0 ≤ i ≤ 63\n", " # Note 3: The compression function uses 8 working variables, a through h\n", " # Note 4: Big-endian convention is used when expressing the constants in this pseudocode,\n", " # and when parsing message block data from bytes to words, for example,\n", " # the first word of the input message \"abc\" after padding is 0x61626380\n", "\n", " # Initialize hash values:\n", " # (first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19):\n", " h0 = 0x6a09e667\n", " h1 = 0xbb67ae85\n", " h2 = 0x3c6ef372\n", " h3 = 0xa54ff53a\n", " h4 = 0x510e527f\n", " h5 = 0x9b05688c\n", " h6 = 0x1f83d9ab\n", " h7 = 0x5be0cd19\n", " \n", " # Initialize array of round constants:\n", " # (first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311):\n", " self.k = [\n", " 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,\n", " 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,\n", " 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,\n", " 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,\n", " 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,\n", " 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,\n", " 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,\n", " 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2\n", " ]\n", "\n", " # Store them\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", " \n", " @jit\n", " def update(self, arg):\n", " h0, h1, h2, h3, h4, h5, h6, h7 = self.hash_pieces\n", " # 1. Pre-processing, exactly like MD5\n", " data = bytearray(arg)\n", " orig_len_in_bits = (8 * len(data)) & 0xFFFFFFFFFFFFFFFF\n", " # 1.a. Add a single '1' bit at the end of the input bits\n", " data.append(0x80)\n", " # 1.b. Padding with zeros as long as the input bits length ≡ 448 (mod 512)\n", " while len(data) % 64 != 56:\n", " data.append(0)\n", " # 1.c. append original length in bits mod (2 pow 64) to message\n", " data += orig_len_in_bits.to_bytes(8, byteorder='big')\n", " assert len(data) % 64 == 0, \"Error in padding\"\n", " # 2. Computations\n", " # Process the message in successive 512-bit = 64-bytes chunks:\n", " for offset in range(0, len(data), 64):\n", " # 2.a. 512-bits = 64-bytes chunks\n", " chunks = data[offset : offset + 64]\n", " w = [0 for i in range(64)]\n", " # 2.b. Break chunk into sixteen 32-bit = 4-bytes words w[i], 0 ≤ i ≤ 15\n", " for i in range(16):\n", " w[i] = int.from_bytes(chunks[4*i : 4*i + 4], byteorder='big')\n", " # 2.c. Extend the first 16 words into the remaining 48\n", " # words w[16..63] of the message schedule array:\n", " for i in range(16, 64):\n", " s0 = (rightrotate(w[i-15], 7) ^ rightrotate(w[i-15], 18) ^ rightshift(w[i-15], 3)) & 0xFFFFFFFF\n", " s1 = (rightrotate(w[i-2], 17) ^ rightrotate(w[i-2], 19) ^ rightshift(w[i-2], 10)) & 0xFFFFFFFF\n", " w[i] = (w[i-16] + s0 + w[i-7] + s1) & 0xFFFFFFFF\n", " # 2.d. Initialize hash value for this chunk\n", " a, b, c, d, e, f, g, h = h0, h1, h2, h3, h4, h5, h6, h7\n", " # 2.e. Main loop, cf. https://tools.ietf.org/html/rfc6234\n", " for i in range(64):\n", " S1 = (rightrotate(e, 6) ^ rightrotate(e, 11) ^ rightrotate(e, 25)) & 0xFFFFFFFF\n", " ch = ((e & f) ^ ((~e) & g)) & 0xFFFFFFFF\n", " temp1 = (h + S1 + ch + self.k[i] + w[i]) & 0xFFFFFFFF\n", " S0 = (rightrotate(a, 2) ^ rightrotate(a, 13) ^ rightrotate(a, 22)) & 0xFFFFFFFF\n", " maj = ((a & b) ^ (a & c) ^ (b & c)) & 0xFFFFFFFF\n", " temp2 = (S0 + maj) & 0xFFFFFFFF\n", "\n", " new_a = (temp1 + temp2) & 0xFFFFFFFF\n", " new_e = (d + temp1) & 0xFFFFFFFF\n", " # Rotate the 8 variables\n", " a, b, c, d, e, f, g, h = new_a, a, b, c, new_e, e, f, g\n", "\n", " # Add this chunk's hash to result so far:\n", " h0 = (h0 + a) & 0xFFFFFFFF\n", " h1 = (h1 + b) & 0xFFFFFFFF\n", " h2 = (h2 + c) & 0xFFFFFFFF\n", " h3 = (h3 + d) & 0xFFFFFFFF\n", " h4 = (h4 + e) & 0xFFFFFFFF\n", " h5 = (h5 + f) & 0xFFFFFFFF\n", " h6 = (h6 + g) & 0xFFFFFFFF\n", " h7 = (h7 + h) & 0xFFFFFFFF\n", " # 3. Conclusion\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", "\n", " def digest(self):\n", " # h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7\n", " return sum(leftshift(x, 32 * i) for i, x in enumerate(self.hash_pieces[::-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also write a function to directly compute the hex digest from some bytes data." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def hash_SHA2_Numba(data):\n", " \"\"\" Shortcut function to directly receive the hex digest from SHA2_Numba(data).\"\"\"\n", " h = SHA2_Numba()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check on SHA-2\n", "\n", "Let try the example from [SHA-2 Wikipedia page](https://en.wikipedia.org/wiki/SHA-2#Test_vectors) :" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "Failed at object (analyzing bytecode)\n'DataFlowAnalysis' object has no attribute 'op_MAKE_FUNCTION'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mhash_SHA2_Numba\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"The quick brown fox jumps over the lazy dog\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mhash_SHA2_Numba\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"The quick brown fox jumps over the lazy dog\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'd7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m\u001b[0m in \u001b[0;36mhash_SHA2_Numba\u001b[0;34m(data)\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mbytes\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mencoding\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'utf8'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mh\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mh\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhexdigest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/dispatcher.py\u001b[0m in \u001b[0;36m_compile_for_args\u001b[0;34m(self, *args, **kws)\u001b[0m\n\u001b[1;32m 284\u001b[0m \u001b[0margtypes\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtypeof_pyval\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 285\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 286\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcompile\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtuple\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0margtypes\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 287\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0merrors\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mTypingError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 288\u001b[0m \u001b[0;31m# Intercept typing error that may be due to an argument\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/dispatcher.py\u001b[0m in \u001b[0;36mcompile\u001b[0;34m(self, sig)\u001b[0m\n\u001b[1;32m 530\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 531\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_cache_misses\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0msig\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m+=\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 532\u001b[0;31m \u001b[0mcres\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_compiler\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcompile\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mreturn_type\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 533\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0madd_overload\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcres\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 534\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_cache\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msave_overload\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msig\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcres\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/dispatcher.py\u001b[0m in \u001b[0;36mcompile\u001b[0;34m(self, args, return_type)\u001b[0m\n\u001b[1;32m 79\u001b[0m \u001b[0mimpl\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 80\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mreturn_type\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mreturn_type\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 81\u001b[0;31m flags=flags, locals=self.locals)\n\u001b[0m\u001b[1;32m 82\u001b[0m \u001b[0;31m# Check typing error if object mode is used\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 83\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mcres\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtyping_error\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mflags\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0menable_pyobject\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/compiler.py\u001b[0m in \u001b[0;36mcompile_extra\u001b[0;34m(typingctx, targetctx, func, args, return_type, flags, locals, library)\u001b[0m\n\u001b[1;32m 691\u001b[0m pipeline = Pipeline(typingctx, targetctx, library,\n\u001b[1;32m 692\u001b[0m args, return_type, flags, locals)\n\u001b[0;32m--> 693\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mpipeline\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcompile_extra\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfunc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 694\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 695\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/compiler.py\u001b[0m in \u001b[0;36mcompile_extra\u001b[0;34m(self, func)\u001b[0m\n\u001b[1;32m 348\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlifted\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 349\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlifted_from\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 350\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_compile_bytecode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 351\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 352\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mcompile_ir\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfunc_ir\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlifted\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlifted_from\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/compiler.py\u001b[0m in \u001b[0;36m_compile_bytecode\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 656\u001b[0m \"\"\"\n\u001b[1;32m 657\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfunc_ir\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 658\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_compile_core\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 659\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 660\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_compile_ir\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/compiler.py\u001b[0m in \u001b[0;36m_compile_core\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 643\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 644\u001b[0m \u001b[0mpm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfinalize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 645\u001b[0;31m \u001b[0mres\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstatus\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 646\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mres\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 647\u001b[0m \u001b[0;31m# Early pipeline completion\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/compiler.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self, status)\u001b[0m\n\u001b[1;32m 234\u001b[0m \u001b[0;31m# No more fallback pipelines?\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 235\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mis_final_pipeline\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 236\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mpatched_exception\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 237\u001b[0m \u001b[0;31m# Go to next fallback pipeline\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 238\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/compiler.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self, status)\u001b[0m\n\u001b[1;32m 226\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 227\u001b[0m \u001b[0mevent\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstage_name\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 228\u001b[0;31m \u001b[0mstage\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 229\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0m_EarlyPipelineCompletion\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 230\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/compiler.py\u001b[0m in \u001b[0;36mstage_analyze_bytecode\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 362\u001b[0m \u001b[0mAnalyze\u001b[0m \u001b[0mbytecode\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mtranslating\u001b[0m \u001b[0mto\u001b[0m \u001b[0mNumba\u001b[0m \u001b[0mIR\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 363\u001b[0m \"\"\"\n\u001b[0;32m--> 364\u001b[0;31m \u001b[0mfunc_ir\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtranslate_stage\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfunc_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 365\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_set_and_check_ir\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfunc_ir\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 366\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/compiler.py\u001b[0m in \u001b[0;36mtranslate_stage\u001b[0;34m(func_id, bytecode)\u001b[0m\n\u001b[1;32m 754\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mtranslate_stage\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfunc_id\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbytecode\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 755\u001b[0m \u001b[0minterp\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0minterpreter\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mInterpreter\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfunc_id\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 756\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0minterp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minterpret\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mbytecode\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 757\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 758\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/interpreter.py\u001b[0m in \u001b[0;36minterpret\u001b[0;34m(self, bytecode)\u001b[0m\n\u001b[1;32m 95\u001b[0m \u001b[0;31m# Data flow analysis\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 96\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdfa\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdataflow\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mDataFlowAnalysis\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcfa\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 97\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdfa\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 98\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 99\u001b[0m \u001b[0;31m# Temp states during interpretation\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/dataflow.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 25\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mblk\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcfa\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0miterliveblocks\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 27\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minfos\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mblk\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0moffset\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_on_block\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mblk\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 28\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 29\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrun_on_block\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mblk\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/dataflow.py\u001b[0m in \u001b[0;36mrun_on_block\u001b[0;34m(self, blk)\u001b[0m\n\u001b[1;32m 69\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0moffset\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mblk\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 70\u001b[0m \u001b[0minst\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbytecode\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0moffset\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 71\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdispatch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minfo\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minst\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 72\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0minfo\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 73\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/usr/local/lib/python3.5/dist-packages/numba/dataflow.py\u001b[0m in \u001b[0;36mdispatch\u001b[0;34m(self, info, inst)\u001b[0m\n\u001b[1;32m 78\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mdispatch\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minfo\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minst\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 79\u001b[0m \u001b[0mfname\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"op_%s\"\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0minst\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mopname\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreplace\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'+'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'_'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 80\u001b[0;31m \u001b[0mfn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgetattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 81\u001b[0m \u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minfo\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minst\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 82\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAttributeError\u001b[0m: Failed at object (analyzing bytecode)\n'DataFlowAnalysis' object has no attribute 'op_MAKE_FUNCTION'" ] } ], "source": [ "hash_SHA2_Numba(\"The quick brown fox jumps over the lazy dog\")\n", "assert hash_SHA2_Numba(\"The quick brown fox jumps over the lazy dog\") == 'd7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I failed to make `numba.jit` work on that function :-(" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Cython-power code for the `SHA-2` hashing function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Requirements\n", "You need [cython](http://cython.org/) and the cython Jupyter extension to be installed." ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%load_ext cython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Useful functions the SHA-2 algorithm\n", "\n", "For the functions defined before, we rewrite them with type annotations in `%%cython` cells.\n", "All variables are `int`, i.e., 32-bits integer (64-bits are `long`)." ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "\n", "cpdef int leftrotate_cython(int x, int c):\n", " \"\"\" Left rotate the number x by c bytes.\"\"\"\n", " return (x << c) | (x >> (32 - c))\n", "\n", "cpdef int rightrotate_cython(int x, int c):\n", " \"\"\" Right rotate the number x by c bytes.\"\"\"\n", " return (x >> c) | (x << (32 - c))" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0;31mDocstring:\u001b[0m Left rotate the number x by c bytes.\n", "\u001b[0;31mType:\u001b[0m builtin_function_or_method\n", "\n", "\u001b[0;31mDocstring:\u001b[0m Right rotate the number x by c bytes.\n", "\u001b[0;31mType:\u001b[0m builtin_function_or_method\n", "\n" ] } ], "source": [ "leftrotate_cython?\n", "rightrotate_cython?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On basic functions like this, of course we don't get any speedup with Cython:" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.57 µs ± 42.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n", "1.4 µs ± 60.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n", "1.63 µs ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n", "1.44 µs ± 88.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n" ] } ], "source": [ "from numpy.random import randint\n", "\n", "%timeit leftrotate(randint(0, 100000), 5)\n", "%timeit leftrotate_cython(randint(0, 100000), 5)\n", "\n", "%timeit rightrotate(randint(0, 100000), 5)\n", "%timeit rightrotate_cython(randint(0, 100000), 5)" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%%cython\n", "\n", "cpdef int leftshift_cython(int x, int c):\n", " \"\"\" Left shift the number x by c bytes.\"\"\"\n", " return x << c\n", "\n", "cpdef int rightshift_cython(int x, int c):\n", " \"\"\" Right shift the number x by c bytes.\"\"\"\n", " return x >> c" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0;31mDocstring:\u001b[0m Left shift the number x by c bytes.\n", "\u001b[0;31mType:\u001b[0m builtin_function_or_method\n", "\n", "\u001b[0;31mDocstring:\u001b[0m Right shift the number x by c bytes.\n", "\u001b[0;31mType:\u001b[0m builtin_function_or_method\n", "\n" ] } ], "source": [ "leftshift_cython?\n", "rightshift_cython?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On basic functions like this, of course we don't get any speedup with Cython:" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.4 µs ± 33.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n", "1.31 µs ± 24.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n", "1.36 µs ± 35 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n", "1.33 µs ± 51.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n" ] } ], "source": [ "%timeit leftshift(randint(0, 100000), 5)\n", "%timeit leftshift_cython(randint(0, 100000), 5)\n", "\n", "%timeit rightshift(randint(0, 100000), 5)\n", "%timeit rightshift_cython(randint(0, 100000), 5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `SHA2_Cython` class\n", "\n", "And similarly for the `SHA2` class, we write it in a `%%cython` cell, and we type everything." ] }, { "cell_type": "code", "execution_count": 182, "metadata": { "code_folding": [] }, "outputs": [], "source": [ "%%cython\n", "# cython: c_string_type=unicode, c_string_encoding=utf8\n", "\n", "cdef int rightrotate_cython(int x, int c):\n", " \"\"\" Right rotate the number x by c bytes.\"\"\"\n", " return (x >> c) | (x << (32 - c))\n", "\n", "cdef int rightshift_cython(int x, int c):\n", " \"\"\" Right shift the number x by c bytes.\"\"\"\n", " return x >> c\n", "\n", "# See http://cython.readthedocs.io/en/latest/src/tutorial/array.html\n", "from cpython cimport array\n", "import array\n", "\n", "cdef array.array empty_64 = array.array('i', [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])\n", "cdef int[:] view_empty_64 = empty_64\n", "\n", "\n", "cpdef void update_cython(int[:] hash_pieces, int[:] k, bytearray arg):\n", " \"\"\" One pass of the SHA-256 algorithm, update hash_pieces on place. \"\"\"\n", " # Extract the 8 variables\n", " cdef int h0 = hash_pieces[0], h1 = hash_pieces[1], h2 = hash_pieces[2], h3 = hash_pieces[3], h4 = hash_pieces[4], h5 = hash_pieces[5], h6 = hash_pieces[6], h7 = hash_pieces[7]\n", " # 1. Pre-processing, exactly like MD5\n", " cdef bytearray data = arg\n", " cdef long orig_len_in_bits = 8 * len(data)\n", " # 1.a. Add a single '1' bit at the end of the input bits\n", " data.append(0x80)\n", " # 1.b. Padding with zeros as long as the input bits length ≡ 448 (mod 512)\n", " while len(data) % 64 != 56:\n", " data.append(0x0)\n", " # 1.c. append original length in bits mod (2 pow 64) to message\n", " data += orig_len_in_bits.to_bytes(8, byteorder='big')\n", " assert len(data) % 64 == 0, \"Error in padding\"\n", "\n", " # Declare loop indexes and variables\n", " cdef int offset, i\n", " cdef int a, b, c, d, e, f, g, h\n", " cdef int temp1, temp2\n", "\n", " # 2. Computations\n", " # Process the message in successive 512-bit = 64-bytes chunks:\n", " cdef int[:] w = view_empty_64\n", "\n", " for offset in range(0, len(data), 64):\n", " # 2.a. 512-bits = 64-bytes chunks\n", " # 2.b. Break chunk into sixteen 32-bit = 4-bytes words w[i], 0 ≤ i ≤ 15\n", " for i in range(16):\n", " w[i] = int.from_bytes(data[offset : offset + 64][4*i : 4*i + 4], byteorder='big')\n", " # 2.c. Extend the first 16 words into the remaining 48\n", " # words w[16..63] of the message schedule array:\n", " for i in range(16, 64):\n", " w[i] = w[i-16] + (rightrotate_cython(w[i-15], 7) ^ rightrotate_cython(w[i-15], 18) ^ rightshift_cython(w[i-15], 3)) + w[i-7] + (rightrotate_cython(w[i-2], 17) ^ rightrotate_cython(w[i-2], 19) ^ rightshift_cython(w[i-2], 10))\n", " # 2.d. Initialize hash value for this chunk\n", " a = h0\n", " b = h1\n", " c = h2\n", " d = h3\n", " e = h4\n", " f = h5\n", " g = h6\n", " h = h7\n", " # 2.e. Main loop, cf. https://tools.ietf.org/html/rfc6234\n", " for i in range(64):\n", " temp1 = h + (rightrotate_cython(e, 6) ^ rightrotate_cython(e, 11) ^ rightrotate_cython(e, 25)) + ((e & f) ^ ((~e) & g)) + k[i] + w[i]\n", " temp2 = (rightrotate_cython(a, 2) ^ rightrotate_cython(a, 13) ^ rightrotate_cython(a, 22)) + ((a & b) ^ (a & c) ^ (b & c))\n", "\n", " # Rotate the 8 variables\n", " a, b, c, d, e, f, g, h = temp1 + temp2, a, b, c, d + temp1, e, f, g\n", "\n", " # Add this chunk's hash to result so far:\n", " h0 += a\n", " h1 += b\n", " h2 += c\n", " h3 += d\n", " h4 += e\n", " h5 += f\n", " h6 += g\n", " h7 += h\n", " # 3. Conclusion\n", " hash_pieces[0] = h0\n", " hash_pieces[1] = h1\n", " hash_pieces[2] = h2\n", " hash_pieces[3] = h3\n", " hash_pieces[4] = h4\n", " hash_pieces[5] = h5\n", " hash_pieces[6] = h6\n", " hash_pieces[7] = h7\n" ] }, { "cell_type": "code", "execution_count": 183, "metadata": { "code_folding": [ 3 ], "collapsed": true }, "outputs": [], "source": [ "class SHA2_Cython(Hash):\n", " \"\"\"SHA256 hashing, speed-up with Numba.jit, see https://en.wikipedia.org/wiki/SHA-2#Pseudocode.\"\"\"\n", " \n", " def __init__(self):\n", " self.name = \"SHA256\"\n", " self.byteorder = 'big'\n", " self.block_size = 64\n", " self.digest_size = 32\n", " # Note 2: For each round, there is one round constant k[i] and one entry in the message schedule array w[i], 0 ≤ i ≤ 63\n", " # Note 3: The compression function uses 8 working variables, a through h\n", " # Note 4: Big-endian convention is used when expressing the constants in this pseudocode,\n", " # and when parsing message block data from bytes to words, for example,\n", " # the first word of the input message \"abc\" after padding is 0x61626380\n", "\n", " # Initialize hash values:\n", " # (first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19):\n", " h0 = 0x6a09e667\n", " h1 = 0xbb67ae85\n", " h2 = 0x3c6ef372\n", " h3 = 0xa54ff53a\n", " h4 = 0x510e527f\n", " h5 = 0x9b05688c\n", " h6 = 0x1f83d9ab\n", " h7 = 0x5be0cd19\n", " \n", " # Initialize array of round constants:\n", " # (first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311):\n", " self.k = [\n", " 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,\n", " 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,\n", " 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,\n", " 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,\n", " 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,\n", " 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,\n", " 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,\n", " 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2\n", " ]\n", "\n", " # Store them\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", " \n", " def update(self, data):\n", " update_cython(self.hash_pieces, self.k, data)\n", "\n", " def digest(self):\n", " # h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7\n", " return sum(leftshift(x, 32 * i) for i, x in enumerate(self.hash_pieces[::-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also write a function to directly compute the hex digest from some bytes data." ] }, { "cell_type": "code", "execution_count": 184, "metadata": { "code_folding": [ 0 ], "collapsed": true }, "outputs": [], "source": [ "def hash_SHA2_Cython(data):\n", " \"\"\" Shortcut function to directly receive the hex digest from SHA2_Cython(data).\"\"\"\n", " h = SHA2_Cython()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " print(\"type(data) =\", type(data))\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "code", "execution_count": 185, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1779033703]" ] }, "execution_count": 185, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "list" ] }, "execution_count": 185, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "[1116352408]" ] }, "execution_count": 185, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "list" ] }, "execution_count": 185, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "b''" ] }, "execution_count": 185, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "bytes" ] }, "execution_count": 185, "metadata": {}, "output_type": "execute_result" }, { "ename": "TypeError", "evalue": "a bytes-like object is required, not 'list'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 11\u001b[0;31m \u001b[0mupdate_cython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mh\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhash_pieces\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mh\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mbytearray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m_cython_magic_892b302b9e5d7477154227e427367045.pyx\u001b[0m in \u001b[0;36m_cython_magic_892b302b9e5d7477154227e427367045.update_cython (/home/lilian/.cache/ipython/cython/_cython_magic_892b302b9e5d7477154227e427367045.c:2840)\u001b[0;34m()\u001b[0m\n", "\u001b[0;32m~/.cache/ipython/cython/_cython_magic_892b302b9e5d7477154227e427367045.cpython-35m-x86_64-linux-gnu.so\u001b[0m in \u001b[0;36mView.MemoryView.memoryview_cwrapper (/home/lilian/.cache/ipython/cython/_cython_magic_892b302b9e5d7477154227e427367045.c:9311)\u001b[0;34m()\u001b[0m\n", "\u001b[0;32m~/.cache/ipython/cython/_cython_magic_892b302b9e5d7477154227e427367045.cpython-35m-x86_64-linux-gnu.so\u001b[0m in \u001b[0;36mView.MemoryView.memoryview.__cinit__ (/home/lilian/.cache/ipython/cython/_cython_magic_892b302b9e5d7477154227e427367045.c:5546)\u001b[0;34m()\u001b[0m\n", "\u001b[0;31mTypeError\u001b[0m: a bytes-like object is required, not 'list'" ] } ], "source": [ "data = bytes(\"\", encoding='utf8')\n", "h = SHA2_Cython()\n", "\n", "h.hash_pieces[:1]\n", "type(h.hash_pieces)\n", "h.k[:1]\n", "type(h.k)\n", "data\n", "type(data)\n", "\n", "update_cython(h.hash_pieces, h.k, bytearray(data))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check on SHA-2\n", "\n", "Let try the example from [SHA-2 Wikipedia page](https://en.wikipedia.org/wiki/SHA-2#Test_vectors) :" ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "a bytes-like object is required, not 'list'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mhash_SHA2_Cython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"The quick brown fox jumps over the lazy dog\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mhash_SHA2_Cython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"The quick brown fox jumps over the lazy dog\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;34m'd7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m\u001b[0m in \u001b[0;36mhash_SHA2_Cython\u001b[0;34m(data)\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mbytes\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mencoding\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'utf8'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mh\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mh\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhexdigest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m\u001b[0m in \u001b[0;36mupdate\u001b[0;34m(self, data)\u001b[0m\n\u001b[1;32m 41\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 42\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mupdate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 43\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mupdate_cython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhash_pieces\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mk\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 44\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 45\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mdigest\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m_cython_magic_3a51a049b19292c89caa81e80cc99ee0.pyx\u001b[0m in \u001b[0;36m_cython_magic_3a51a049b19292c89caa81e80cc99ee0.update_cython (/home/lilian/.cache/ipython/cython/_cython_magic_3a51a049b19292c89caa81e80cc99ee0.c:3101)\u001b[0;34m()\u001b[0m\n", "\u001b[0;32m~/.cache/ipython/cython/_cython_magic_3a51a049b19292c89caa81e80cc99ee0.cpython-35m-x86_64-linux-gnu.so\u001b[0m in \u001b[0;36mView.MemoryView.memoryview_cwrapper (/home/lilian/.cache/ipython/cython/_cython_magic_3a51a049b19292c89caa81e80cc99ee0.c:9569)\u001b[0;34m()\u001b[0m\n", "\u001b[0;32m~/.cache/ipython/cython/_cython_magic_3a51a049b19292c89caa81e80cc99ee0.cpython-35m-x86_64-linux-gnu.so\u001b[0m in \u001b[0;36mView.MemoryView.memoryview.__cinit__ (/home/lilian/.cache/ipython/cython/_cython_magic_3a51a049b19292c89caa81e80cc99ee0.c:5804)\u001b[0;34m()\u001b[0m\n", "\u001b[0;31mTypeError\u001b[0m: a bytes-like object is required, not 'list'" ] } ], "source": [ "hash_SHA2_Cython(\"The quick brown fox jumps over the lazy dog\")\n", "assert hash_SHA2_Cython(\"The quick brown fox jumps over the lazy dog\") == 'd7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I still have to work on that.\n", "\n", "[![made-with-jupyter](https://img.shields.io/badge/Made%20for-Jupyter%20notebook-1f425f.svg)](https://www.jupyter.org/)\n", "[![GitHub license](https://img.shields.io/github/license/Naereen/notebooks.svg)](https://github.com/Naereen/notebooks/blob/master/LICENSE.txt)\n", "[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/) \n", "[![ForTheBadge built-with-science](http://ForTheBadge.com/images/badges/built-with-science.svg)](https://GitHub.com/Naereen/)\n", "[![ForTheBadge powered-by-electricity](http://ForTheBadge.com/images/badges/powered-by-electricity.svg)](http://ForTheBadge.com)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bonus\n", "\"SHA\" is pronouced like the French word \"chat\", which means *cat*.\n", "\n", "![a cat playing on a computer](https://media.giphy.com/media/JIX9t2j0ZTN9S/giphy.gif)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> See [my GitHub `notebooks` project](https://GitHub.com/Naereen/notebooks/) for others notebooks." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" }, "notify_time": "5", "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "303px", "width": "251px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 4, "toc_cell": true, "toc_position": { "height": "552px", "left": "0px", "right": "1164px", "top": "117px", "width": "212px" }, "toc_section_display": "block", "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 2 }