{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "

1  Manual implementation of some hash functions
1.1  What is a hash function?
1.2  Common API for the different classes
1.3  Checking the the hashlib module in Python standard library
1.4  First stupid example: a stupid hashing function
1.5  First real example: the MD5 hashing function
1.5.1  Useful functions for the MD5 algorithm
1.5.2  The MD5 class
1.5.3  First check on MD5
1.5.4  A less stupid check on MD5
1.5.5  Trying 1000 random examples
1.6  Second real example: the SHA-1 hashing function
1.6.1  Useful functions the SHA-1 algorithm
1.6.2  The SHA1 class
1.6.3  First check on SHA-1
1.6.4  A less stupid check on SHA-1
1.6.5  Trying 1000 random examples
1.7  Third real example: the SHA-2 hashing function
1.7.1  Useful functions the SHA-2 algorithm
1.7.2  The SHA2 class
1.7.3  First check on SHA-2
1.7.4  Check on SHA-2
1.7.5  Trying 1000 random examples
1.8  Comparison : MD5 vs SHA-1 vs SHA-2
1.9  Bonus : SHA-2 variants
1.9.1  SHA-224
1.9.1.1  The SHA224 class
1.9.1.2  Checks on SHA-224
1.9.2  SHA-512
1.9.2.1  Useful functions the SHA-512 algorithm
1.9.2.2  The SHA512 class
1.9.2.3  Checks on SHA-512
1.9.3  SHA-384
1.9.3.1  The SHA384 class
1.9.3.2  Checks on SHA-384
1.9.4  More comparison
1.10  Conclusion
1.10.1  Bonus
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Manual implementation of some hash functions\n", "\n", "This small [Jupyter notebook](https://www.Jupyter.org/) is a short experiment, to see if I can implement the some basic [Hashing functions](https://en.wikipedia.org/wiki/Hash_function), more specifically [cryptographic hashing functions](https://en.wikipedia.org/wiki/Cryptographic_hash_function), like `MD5`, `SHA1`, `SHA256`, `SHA512` etc\n", "\n", "And then I want compare my manual implementations with the functions implemented in [the `hashlib` module in Python standard library](https://docs.python.org/3/library/hashlib.html).\n", "Ideally, my implementation should work exactly like the reference one, only slower!\n", "\n", "\n", "- *Reference*: Wikipedia pages on [Hash functions](https://en.wikipedia.org/wiki/Hash_function), [MD5](https://en.wikipedia.org/wiki/MD5), [SHA-1](https://en.wikipedia.org/wiki/SHA1) and [SHA-2](https://en.wikipedia.org/wiki/SHA-2), as well as [the `hashlib` module in Python standard library](https://docs.python.org/3/library/hashlib.html).\n", "- *Date*: 13 May 2017 (first part about MD5), 19 June 2017 (second part about SHA-1), 19 June 2017 (last part about SHA-2).\n", "- *Author*: [Lilian Besson](https://GitHub.com/Naereen/notebooks).\n", "- *License*: [MIT Licensed](https://LBesson.MIT-License.org/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## What is a hash function?\n", "> TL;DR : [Hash functions](https://en.wikipedia.org/wiki/Hash_function) and [cryptographic hashing functions](https://en.wikipedia.org/wiki/Cryptographic_hash_function) on Wikipedia." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Common API for the different classes\n", "\n", "I will copy the API proposed by [the `hashlib` module in Python standard library](https://docs.python.org/3/library/hashlib.html), so it will be very easy to compare my implementations with the one provided with your default [Python](https://www.Python.org/) installation." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "class Hash(object):\n", " \"\"\" Common class for all hash methods.\n", " \n", " It copies the one of the hashlib module (https://docs.python.org/3.5/library/hashlib.html).\n", " \"\"\"\n", " \n", " def __init__(self, *args, **kwargs):\n", " \"\"\" Create the Hash object.\"\"\"\n", " self.name = self.__class__.__name__ # https://docs.python.org/3.5/library/hashlib.html#hashlib.hash.name\n", " self.byteorder = 'little'\n", " self.digest_size = 0 # https://docs.python.org/3.5/library/hashlib.html#hashlib.hash.digest_size\n", " self.block_size = 0 # https://docs.python.org/3.5/library/hashlib.html#hashlib.hash.block_size\n", "\n", " def __str__(self):\n", " return self.name\n", " \n", " def update(self, arg):\n", " \"\"\" Update the hash object with the object arg, which must be interpretable as a buffer of bytes.\"\"\"\n", " pass\n", "\n", " def digest(self):\n", " \"\"\" Return the digest of the data passed to the update() method so far. This is a bytes object of size digest_size which may contain bytes in the whole range from 0 to 255.\"\"\"\n", " return b\"\"\n", "\n", " def hexdigest(self):\n", " \"\"\" Like digest() except the digest is returned as a string object of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments.\"\"\"\n", " digest = self.digest()\n", " raw = digest.to_bytes(self.digest_size, byteorder=self.byteorder)\n", " format_str = '{:0' + str(2 * self.digest_size) + 'x}'\n", " return format_str.format(int.from_bytes(raw, byteorder='big'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Checking the [the `hashlib` module in Python standard library](https://docs.python.org/3/library/hashlib.html)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import hashlib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check [the available algorithms](https://docs.python.org/3.5/library/hashlib.html#hashlib.algorithms_available), some of them being [guaranteed to be on any platform](https://docs.python.org/3.5/library/hashlib.html#hashlib.algorithms_guaranteed), some are not." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['ripemd160',\n", " 'MD4',\n", " 'SHA224',\n", " 'sha384',\n", " 'md4',\n", " 'DSA-SHA',\n", " 'SHA256',\n", " 'SHA384',\n", " 'MD5',\n", " 'whirlpool',\n", " 'sha',\n", " 'sha1',\n", " 'RIPEMD160',\n", " 'DSA',\n", " 'dsaWithSHA',\n", " 'dsaEncryption',\n", " 'sha224',\n", " 'SHA',\n", " 'sha512',\n", " 'md5',\n", " 'SHA512',\n", " 'ecdsa-with-SHA1',\n", " 'SHA1',\n", " 'sha256']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(hashlib.algorithms_available)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will need at least these ones:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "assert 'MD5' in hashlib.algorithms_available\n", "assert 'SHA1' in hashlib.algorithms_available\n", "assert 'SHA256' in hashlib.algorithms_available\n", "assert 'SHA224' in hashlib.algorithms_available\n", "assert 'SHA512' in hashlib.algorithms_available\n", "assert 'SHA384' in hashlib.algorithms_available" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets check that they have the block size and digest size announced:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For MD5 : the block size is 64 and the digest size is 16.\n", "For SHA1 : the block size is 64 and the digest size is 20.\n", "For SHA256 : the block size is 64 and the digest size is 32.\n", "For SHA224 : the block size is 64 and the digest size is 28.\n", "For SHA512 : the block size is 128 and the digest size is 64.\n", "For SHA384 : the block size is 128 and the digest size is 48.\n" ] } ], "source": [ "for name, s in zip(\n", " ('MD5', 'SHA1', 'SHA256', 'SHA224', 'SHA512', 'SHA384'),\n", " (hashlib.md5(), hashlib.sha1(), hashlib.sha256(), hashlib.sha224(), hashlib.sha512(), hashlib.sha384())\n", " ):\n", " print(\"For {:<8} : the block size is {:<3} and the digest size is {:<2}.\".format(name, s.block_size, s.digest_size))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## First stupid example: a stupid hashing function\n", "\n", "This \"stupid\" hashing function will use `digest_size` of 128 bytes (= 16 bits), and compute it by ... just looking at the first 128 bytes of the input data.\n", "\n", "This is just to check the API and how to read from a bytes buffer." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "class HeaderHash(Hash):\n", " \"\"\" This \"stupid\" hashing function will use `digest_size` of 128 bytes, and compute it by ... just looking at the first 128 bytes of the input data.\n", " \"\"\"\n", " \n", " def __init__(self):\n", " # Common part\n", " self.digest_size = 16\n", " self.block_size = 16\n", " self.name = \"Header\"\n", " # Specific part\n", " self._data = b\"\"\n", "\n", " def update(self, arg):\n", " \"\"\" Update the hash object with the object arg, which must be interpretable as a buffer of bytes.\"\"\"\n", " if len(self._data) == 0:\n", " self._data = arg[:self.block_size]\n", "\n", " def digest(self):\n", " \"\"\" Return the digest of the data passed to the update() method so far. This is a bytes object of size digest_size which may contain bytes in the whole range from 0 to 255.\"\"\"\n", " return self._data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us try it:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "h1 = HeaderHash()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<__main__.HeaderHash at 0x7fc84c200630>" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "Header\n" ] } ], "source": [ "h1\n", "print(h1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us use some toy data, to test here and after." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "data = b\"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ\" * 100" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "b'0123456789ABCDEF'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h1.update(data)\n", "h1.digest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Well... It seems to work, even if this first example is stupid." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## First real example: the MD5 hashing function\n", "Let start with a simple one: [the MD5 hashing function](https://en.wikipedia.org/wiki/MD5), from Rivest in 1992.\n", "\n", "
Warning: it is considered broken since at least 2012, never use it for security purposes.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Useful functions for the MD5 algorithm\n", "Instead of writing the complete MD5 algorithm in the class below, I preferred to define here some useful functions, using [Bitwise operators](https://wiki.python.org/moin/BitwiseOperators)." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def MD5_f1(b, c, d):\n", " \"\"\" First ternary bitwise operation.\"\"\"\n", " return ((b & c) | ((~b) & d)) & 0xFFFFFFFF\n", "\n", "def MD5_f2(b, c, d):\n", " \"\"\" Second ternary bitwise operation.\"\"\"\n", " return ((b & d) | (c & (~d))) & 0xFFFFFFFF\n", "\n", "def MD5_f3(b, c, d):\n", " \"\"\" Third ternary bitwise operation.\"\"\"\n", " return (b ^ c ^ d) & 0xFFFFFFFF\n", "\n", "def MD5_f4(b, c, d):\n", " \"\"\" Forth ternary bitwise operation.\"\"\"\n", " return (c ^ (b | (~d))) & 0xFFFFFFFF" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def leftrotate(x, c):\n", " \"\"\" Left rotate the number x by c bytes.\"\"\"\n", " x &= 0xFFFFFFFF\n", " return ((x << c) | (x >> (32 - c))) & 0xFFFFFFFF" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def leftshift(x, c):\n", " \"\"\" Left shift the number x by c bytes.\"\"\"\n", " return x << c" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from math import floor, sin" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `MD5` class" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is a direct implementation of the pseudo-code, as given for instance on the Wikipedia page, or the original research article by Rivest." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": true }, "outputs": [], "source": [ "class MD5(Hash):\n", " \"\"\"MD5 hashing, see https://en.wikipedia.org/wiki/MD5#Algorithm.\"\"\"\n", " \n", " def __init__(self):\n", " self.name = \"MD5\"\n", " self.byteorder = 'little'\n", " self.block_size = 64\n", " self.digest_size = 16\n", " # Internal data\n", " s = [0] * 64\n", " K = [0] * 64\n", " # Initialize s, s specifies the per-round shift amounts\n", " s[ 0:16] = [7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22]\n", " s[16:32] = [5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20]\n", " s[32:48] = [4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23]\n", " s[48:64] = [6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21]\n", " # Store it\n", " self._s = s\n", " # Use binary integer part of the sines of integers (Radians) as constants:\n", " for i in range(64):\n", " K[i] = floor(2**32 * abs(sin(i + 1))) & 0xFFFFFFFF\n", " # Store it\n", " self._K = K\n", " # Initialize variables:\n", " a0 = 0x67452301 # A\n", " b0 = 0xefcdab89 # B\n", " c0 = 0x98badcfe # C\n", " d0 = 0x10325476 # D\n", " self.hash_pieces = [a0, b0, c0, d0]\n", " \n", " def update(self, arg):\n", " s, K = self._s, self._K\n", " a0, b0, c0, d0 = self.hash_pieces\n", " # 1. Pre-processing\n", " data = bytearray(arg)\n", " orig_len_in_bits = (8 * len(data)) & 0xFFFFFFFFFFFFFFFF\n", " # 1.a. Add a single '1' bit at the end of the input bits\n", " data.append(0x80)\n", " # 1.b. Padding with zeros as long as the input bits length ≡ 448 (mod 512)\n", " while len(data) % 64 != 56:\n", " data.append(0)\n", " # 1.c. append original length in bits mod (2 pow 64) to message\n", " data += orig_len_in_bits.to_bytes(8, byteorder='little')\n", " assert len(data) % 64 == 0, \"Error in padding\"\n", " # 2. Computations\n", " # Process the message in successive 512-bit = 64-bytes chunks:\n", " for offset in range(0, len(data), 64):\n", " # 2.a. 512-bits = 64-bytes chunks\n", " chunks = data[offset : offset + 64]\n", " # 2.b. Break chunk into sixteen 32-bit = 4-bytes words M[j], 0 ≤ j ≤ 15\n", " A, B, C, D = a0, b0, c0, d0\n", " # 2.c. Main loop\n", " for i in range(64):\n", " if 0 <= i <= 15:\n", " F = MD5_f1(B, C, D)\n", " g = i\n", " elif 16 <= i <= 31:\n", " F = MD5_f2(B, C, D)\n", " g = (5 * i + 1) % 16\n", " elif 32 <= i <= 47:\n", " F = MD5_f3(B, C, D)\n", " g = (3 * i + 5) % 16\n", " elif 48 <= i <= 63:\n", " F = MD5_f4(B, C, D)\n", " g = (7 * i) % 16\n", " # Be wary of the below definitions of A, B, C, D\n", " to_rotate = (A + F + K[i] + int.from_bytes(chunks[4*g : 4*g+4], byteorder='little')) & 0xFFFFFFFF\n", " new_B = (B + leftrotate(to_rotate, s[i])) & 0xFFFFFFFF\n", " A, B, C, D = D, new_B, B, C\n", " # Add this chunk's hash to result so far:\n", " a0 = (a0 + A) & 0xFFFFFFFF\n", " b0 = (b0 + B) & 0xFFFFFFFF\n", " c0 = (c0 + C) & 0xFFFFFFFF\n", " d0 = (d0 + D) & 0xFFFFFFFF\n", " # 3. Conclusion\n", " self.hash_pieces = [a0, b0, c0, d0]\n", "\n", " def digest(self):\n", " return sum(leftshift(x, (32 * i)) for i, x in enumerate(self.hash_pieces))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also write a function to directly compute the hex digest from some bytes data." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def hash_MD5(data):\n", " \"\"\" Shortcut function to directly receive the hex digest from MD5(data).\"\"\"\n", " h = MD5()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
*Note:* [This page helped for debugging](https://rosettacode.org/wiki/MD5/Implementation#Python).
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### First check on MD5\n", "\n", "Let us try it:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<__main__.MD5 at 0x7fc84c46b668>" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "MD5\n" ] } ], "source": [ "h2 = MD5()\n", "h2\n", "print(h2)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "52666558089014014065978771967570616878" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h2.update(data)\n", "h2.digest()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2e224cd661b6b83e0f3a0a06cb359f27'" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h2.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A less stupid check on MD5\n", "\n", "Let try the example from [MD5 Wikipedia page](https://en.wikipedia.org/wiki/MD5#MD5_hashes) :" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'9e107d9d372bb6826bd81d3542a419d6'" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_MD5(\"The quick brown fox jumps over the lazy dog\")\n", "assert hash_MD5(\"The quick brown fox jumps over the lazy dog\") == '9e107d9d372bb6826bd81d3542a419d6'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even a small change in the message will (with overwhelming probability) result in a mostly different hash, due to the [**avalanche effect**](https://en.wikipedia.org/wiki/Avalanche_effect). For example, adding a period to the end of the sentence:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'e4d909c290d0fb1ca068ffaddf22cbd0'" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_MD5(\"The quick brown fox jumps over the lazy dog.\")\n", "assert hash_MD5(\"The quick brown fox jumps over the lazy dog.\") == 'e4d909c290d0fb1ca068ffaddf22cbd0'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The hash of the zero-length string is:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'d41d8cd98f00b204e9800998ecf8427e'" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_MD5(\"\")\n", "assert hash_MD5(\"\") == 'd41d8cd98f00b204e9800998ecf8427e'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\implies$ We obtained the same result, OK our function works!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Trying 1000 random examples\n", "On a small sentence:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0ad8cb82874690906cf732223adeebbe'" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_MD5(\"My name is Zorro !\")" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0ad8cb82874690906cf732223adeebbe'" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h = hashlib.md5()\n", "h.update(b\"My name is Zorro !\")\n", "h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It starts to look good." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def true_hash_MD5(data):\n", " h = hashlib.md5()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On some random data:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import numpy.random as nr\n", "alphabets = \"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\"\n", "\n", "def random_string(size=10000):\n", " return ''.join(alphabets[nr.randint(len(alphabets))] for _ in range(size))" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'3rsYqmZDxE'" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "random_string(10)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from tqdm import tqdm_notebook as tqdm" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "543521e9ef89409c94620250e345c1d7" } }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "CPU times: user 33.1 s, sys: 12 ms, total: 33.1 s\n", "Wall time: 33.1 s\n" ] } ], "source": [ "%%time\n", "for _ in tqdm(range(1000)):\n", " x = random_string()\n", " assert hash_MD5(x) == true_hash_MD5(x), \"Error: x = {} gave two different MD5 hashes: my implementation = {} != hashlib implementation = {}...\".format(x, hash_MD5(x), true_hash_MD5(x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Second real example: the SHA-1 hashing function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let now study and implement another hashing function, slightly harder to write but more secure: SHA1, \"Secure Hash Algorithm, version 1\".\n", "See [the SHA1 hashing function](https://en.wikipedia.org/wiki/SHA-1) on Wikipedia, if needed.\n", "\n", "
Warning: it is considered broken since at least 2011, it is not advised to use it for real security purposes. SHA-2 or SHA-3 is better advised.
\n", "\n", "For instance, see [this nice blog post](https://konklone.com/post/why-google-is-hurrying-the-web-to-kill-sha-1)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Useful functions the SHA-1 algorithm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Pretty similar to the ones used for the MD5 algorithm." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def SHA1_f1(b, c, d):\n", " \"\"\" First ternary bitwise operation.\"\"\"\n", " return ((b & c) | ((~b) & d)) & 0xFFFFFFFF\n", "\n", "def SHA1_f2(b, c, d):\n", " \"\"\" Second ternary bitwise operation.\"\"\"\n", " return (b ^ c ^ d) & 0xFFFFFFFF\n", "\n", "def SHA1_f3(b, c, d):\n", " \"\"\" Third ternary bitwise operation.\"\"\"\n", " return ((b & c) | (b & d) | (c & d) ) & 0xFFFFFFFF\n", "\n", "def SHA1_f4(b, c, d):\n", " \"\"\" Forth ternary bitwise operation, = SHA1_f1.\"\"\"\n", " return (b ^ c ^ d) & 0xFFFFFFFF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is exactly like for MD5." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def leftrotate(x, c):\n", " \"\"\" Left rotate the number x by c bytes.\"\"\"\n", " x &= 0xFFFFFFFF\n", " return ((x << c) | (x >> (32 - c))) & 0xFFFFFFFF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As SHA-1 plays with big-endian and little-endian integers, and at the end it requires a leftshift to combine the 5 hash pieces into one." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def leftshift(x, c):\n", " \"\"\" Left shift the number x by c bytes.\"\"\"\n", " return x << c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `SHA1` class\n", "\n", "I will use a simple class, very similar to the class used for the MD5 algorithm (see above).\n", "It is a direct implementation of the pseudo-code, as given for instance on the Wikipedia page." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": true }, "outputs": [], "source": [ "class SHA1(Hash):\n", " \"\"\"SHA1 hashing, see https://en.wikipedia.org/wiki/SHA-1#Algorithm.\"\"\"\n", " \n", " def __init__(self):\n", " self.name = \"SHA1\"\n", " self.byteorder = 'big'\n", " self.block_size = 64\n", " self.digest_size = 20\n", " # Initialize variables\n", " h0 = 0x67452301\n", " h1 = 0xEFCDAB89\n", " h2 = 0x98BADCFE\n", " h3 = 0x10325476\n", " h4 = 0xC3D2E1F0\n", " # Store them\n", " self.hash_pieces = [h0, h1, h2, h3, h4]\n", " \n", " def update(self, arg):\n", " h0, h1, h2, h3, h4 = self.hash_pieces\n", " # 1. Pre-processing, exactly like MD5\n", " data = bytearray(arg)\n", " orig_len_in_bits = (8 * len(data)) & 0xFFFFFFFFFFFFFFFF\n", " # 1.a. Add a single '1' bit at the end of the input bits\n", " data.append(0x80)\n", " # 1.b. Padding with zeros as long as the input bits length ≡ 448 (mod 512)\n", " while len(data) % 64 != 56:\n", " data.append(0)\n", " # 1.c. append original length in bits mod (2 pow 64) to message\n", " data += orig_len_in_bits.to_bytes(8, byteorder='big')\n", " assert len(data) % 64 == 0, \"Error in padding\"\n", " # 2. Computations\n", " # Process the message in successive 512-bit = 64-bytes chunks:\n", " for offset in range(0, len(data), 64):\n", " # 2.a. 512-bits = 64-bytes chunks\n", " chunks = data[offset : offset + 64]\n", " w = [0 for i in range(80)]\n", " # 2.b. Break chunk into sixteen 32-bit = 4-bytes words w[i], 0 ≤ i ≤ 15\n", " for i in range(16):\n", " w[i] = int.from_bytes(chunks[4*i : 4*i + 4], byteorder='big')\n", " # 2.c. Extend the sixteen 32-bit words into eighty 32-bit words\n", " for i in range(16, 80):\n", " w[i] = leftrotate(w[i-3] ^ w[i-8] ^ w[i-14] ^ w[i-16], 1)\n", " # 2.d. Initialize hash value for this chunk\n", " a, b, c, d, e = h0, h1, h2, h3, h4\n", " # 2.e. Main loop, cf. http://www.faqs.org/rfcs/rfc3174.html\n", " for i in range(80):\n", " if 0 <= i <= 19 :\n", " f = SHA1_f1(b, c, d)\n", " k = 0x5A827999\n", " elif 20 <= i <= 39 :\n", " f = SHA1_f2(b, c, d)\n", " k = 0x6ED9EBA1\n", " elif 40 <= i <= 59 :\n", " f = SHA1_f3(b, c, d)\n", " k = 0x8F1BBCDC\n", " elif 60 <= i <= 79 :\n", " f = SHA1_f4(b, c, d)\n", " k = 0xCA62C1D6\n", "\n", " new_a = leftrotate(a, 5) + f + e + k + w[i] & 0xFFFFFFFF\n", " new_c = leftrotate(b, 30)\n", " # Rotate the 5 variables\n", " a, b, c, d, e = new_a, a, new_c, c, d\n", "\n", " # Add this chunk's hash to result so far:\n", " h0 = (h0 + a) & 0xFFFFFFFF\n", " h1 = (h1 + b) & 0xFFFFFFFF\n", " h2 = (h2 + c) & 0xFFFFFFFF\n", " h3 = (h3 + d) & 0xFFFFFFFF\n", " h4 = (h4 + e) & 0xFFFFFFFF\n", " # 3. Conclusion\n", " self.hash_pieces = [h0, h1, h2, h3, h4]\n", "\n", " def digest(self):\n", " return sum(leftshift(x, 32 * i) for i, x in enumerate(self.hash_pieces[::-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also write a function to directly compute the hex digest from some bytes data." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def hash_SHA1(data):\n", " \"\"\" Shortcut function to directly receive the hex digest from SHA1(data).\"\"\"\n", " h = SHA1()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### First check on SHA-1\n", "\n", "Let us try it:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<__main__.SHA1 at 0x7fc82832f6a0>" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "SHA1\n" ] } ], "source": [ "h3 = SHA1()\n", "h3\n", "print(h3)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "144539618284518333681008855956641116845695054279" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h3.update(data)\n", "h3.digest()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0b1100101010001011000010111000111101000100111010111100000100010111111110001010100111110000001000110111111011010111001110100001011101000111111000011000111000111'" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "159" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "digest = h3.digest()\n", "bin(digest)\n", "len(bin(digest))" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'19516171e89d7822ff153e046fdae742e8fc31c7'" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h3.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A less stupid check on SHA-1\n", "\n", "Let try the example from [SHA-1 Wikipedia page](https://en.wikipedia.org/wiki/SHA-1#SHA-1_hashes) :" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2fd4e1c67a2d28fced849ee1bb76e7391b93eb12'" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA1(\"The quick brown fox jumps over the lazy dog\")\n", "assert hash_SHA1(\"The quick brown fox jumps over the lazy dog\") == '2fd4e1c67a2d28fced849ee1bb76e7391b93eb12'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even a small change in the message will (with overwhelming probability) result in a mostly different hash, due to the [**avalanche effect**](https://en.wikipedia.org/wiki/Avalanche_effect). For example, changing one letter in the sentence:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'de9f2c7fd25e1b3afad3e85a0bd17d9b100db4b3'" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA1(\"The quick brown fox jumps over the lazy cog\")\n", "assert hash_SHA1(\"The quick brown fox jumps over the lazy cog\") == 'de9f2c7fd25e1b3afad3e85a0bd17d9b100db4b3'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The hash of the zero-length string is:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'da39a3ee5e6b4b0d3255bfef95601890afd80709'" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA1(\"\")\n", "assert hash_SHA1(\"\") == 'da39a3ee5e6b4b0d3255bfef95601890afd80709'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\implies$ We obtained the same result, OK our function works!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Trying 1000 random examples\n", "On a small sentence:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'1f475bf19d9e7dd6b3714164116392ee1e477ec5'" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA1(\"My name is Zorro !\")" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'1f475bf19d9e7dd6b3714164116392ee1e477ec5'" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h = hashlib.sha1()\n", "h.update(b\"My name is Zorro !\")\n", "h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It starts to look good." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def true_hash_SHA1(data):\n", " h = hashlib.sha1()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On some random data:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'lzqCyNkHUQ'" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "random_string(10)" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from tqdm import tqdm_notebook as tqdm" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "15b0ae19b01744b591ef9a86f4ceafa4" } }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "CPU times: user 38.4 s, sys: 72 ms, total: 38.4 s\n", "Wall time: 38.5 s\n" ] } ], "source": [ "%%time\n", "for _ in tqdm(range(1000)):\n", " x = random_string()\n", " assert hash_SHA1(x) == true_hash_SHA1(x), \"Error: x = {} gave two different SHA1 hashes: my implementation = {} != hashlib implementation = {}...\".format(x, hash_SHA1(x), true_hash_SHA1(x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Third real example: the SHA-2 hashing function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let now study and implement a last hashing function, again slightly harder to write but more secure: SHA-2, \"Secure Hash Algorithm, version 2\".\n", "See [the SHA-2 hashing function](https://en.wikipedia.org/wiki/SHA-2) on Wikipedia, if needed.\n", "\n", "
Remark: it is not (yet) considered broken, and it is the military standard for security and cryptographic hashing. SHA-3 is preferred for security purposes.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Useful functions the SHA-2 algorithm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is exactly like for MD5. But SHA-2 requires right-rotate as well." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def leftrotate(x, c):\n", " \"\"\" Left rotate the number x by c bytes.\"\"\"\n", " x &= 0xFFFFFFFF\n", " return ((x << c) | (x >> (32 - c))) & 0xFFFFFFFF\n", "\n", "def rightrotate(x, c):\n", " \"\"\" Right rotate the number x by c bytes.\"\"\"\n", " x &= 0xFFFFFFFF\n", " return ((x >> c) | (x << (32 - c))) & 0xFFFFFFFF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As SHA-2 plays with big-endian and little-endian integers, and at the end it requires a leftshift to combine the 5 hash pieces into one." ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def leftshift(x, c):\n", " \"\"\" Left shift the number x by c bytes.\"\"\"\n", " return x << c\n", "\n", "def rightshift(x, c):\n", " \"\"\" Right shift the number x by c bytes.\"\"\"\n", " return x >> c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The `SHA2` class\n", "\n", "I will use a simple class, very similar to the class used for the SHA-1 algorithm (see above).\n", "It is a direct implementation of the pseudo-code, as given for instance on the Wikipedia page.\n", "\n", "I will only implement the simpler one, SHA-256, of digest size of 256 bits. Other variants are SHA-224, SHA-384, SHA-512 (and others include SHA-512/224, SHA-512/256)." ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": true }, "outputs": [], "source": [ "class SHA2(Hash):\n", " \"\"\"SHA256 hashing, see https://en.wikipedia.org/wiki/SHA-2#Pseudocode.\"\"\"\n", " \n", " def __init__(self):\n", " self.name = \"SHA256\"\n", " self.byteorder = 'big'\n", " self.block_size = 64\n", " self.digest_size = 32\n", " # Note 2: For each round, there is one round constant k[i] and one entry in the message schedule array w[i], 0 ≤ i ≤ 63\n", " # Note 3: The compression function uses 8 working variables, a through h\n", " # Note 4: Big-endian convention is used when expressing the constants in this pseudocode,\n", " # and when parsing message block data from bytes to words, for example,\n", " # the first word of the input message \"abc\" after padding is 0x61626380\n", "\n", " # Initialize hash values:\n", " # (first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19):\n", " h0 = 0x6a09e667\n", " h1 = 0xbb67ae85\n", " h2 = 0x3c6ef372\n", " h3 = 0xa54ff53a\n", " h4 = 0x510e527f\n", " h5 = 0x9b05688c\n", " h6 = 0x1f83d9ab\n", " h7 = 0x5be0cd19\n", " \n", " # Initialize array of round constants:\n", " # (first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311):\n", " self.k = [\n", " 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,\n", " 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,\n", " 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,\n", " 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,\n", " 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,\n", " 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,\n", " 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,\n", " 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2\n", " ]\n", "\n", " # Store them\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", " \n", " def update(self, arg):\n", " h0, h1, h2, h3, h4, h5, h6, h7 = self.hash_pieces\n", " # 1. Pre-processing, exactly like MD5\n", " data = bytearray(arg)\n", " orig_len_in_bits = (8 * len(data)) & 0xFFFFFFFFFFFFFFFF\n", " # 1.a. Add a single '1' bit at the end of the input bits\n", " data.append(0x80)\n", " # 1.b. Padding with zeros as long as the input bits length ≡ 448 (mod 512)\n", " while len(data) % 64 != 56:\n", " data.append(0)\n", " # 1.c. append original length in bits mod (2 pow 64) to message\n", " data += orig_len_in_bits.to_bytes(8, byteorder='big')\n", " assert len(data) % 64 == 0, \"Error in padding\"\n", " # 2. Computations\n", " # Process the message in successive 512-bit = 64-bytes chunks:\n", " for offset in range(0, len(data), 64):\n", " # 2.a. 512-bits = 64-bytes chunks\n", " chunks = data[offset : offset + 64]\n", " w = [0 for i in range(64)]\n", " # 2.b. Break chunk into sixteen 32-bit = 4-bytes words w[i], 0 ≤ i ≤ 15\n", " for i in range(16):\n", " w[i] = int.from_bytes(chunks[4*i : 4*i + 4], byteorder='big')\n", " # 2.c. Extend the first 16 words into the remaining 48\n", " # words w[16..63] of the message schedule array:\n", " for i in range(16, 64):\n", " s0 = (rightrotate(w[i-15], 7) ^ rightrotate(w[i-15], 18) ^ rightshift(w[i-15], 3)) & 0xFFFFFFFF\n", " s1 = (rightrotate(w[i-2], 17) ^ rightrotate(w[i-2], 19) ^ rightshift(w[i-2], 10)) & 0xFFFFFFFF\n", " w[i] = (w[i-16] + s0 + w[i-7] + s1) & 0xFFFFFFFF\n", " # 2.d. Initialize hash value for this chunk\n", " a, b, c, d, e, f, g, h = h0, h1, h2, h3, h4, h5, h6, h7\n", " # 2.e. Main loop, cf. https://tools.ietf.org/html/rfc6234\n", " for i in range(64):\n", " S1 = (rightrotate(e, 6) ^ rightrotate(e, 11) ^ rightrotate(e, 25)) & 0xFFFFFFFF\n", " ch = ((e & f) ^ ((~e) & g)) & 0xFFFFFFFF\n", " temp1 = (h + S1 + ch + self.k[i] + w[i]) & 0xFFFFFFFF\n", " S0 = (rightrotate(a, 2) ^ rightrotate(a, 13) ^ rightrotate(a, 22)) & 0xFFFFFFFF\n", " maj = ((a & b) ^ (a & c) ^ (b & c)) & 0xFFFFFFFF\n", " temp2 = (S0 + maj) & 0xFFFFFFFF\n", "\n", " new_a = (temp1 + temp2) & 0xFFFFFFFF\n", " new_e = (d + temp1) & 0xFFFFFFFF\n", " # Rotate the 8 variables\n", " a, b, c, d, e, f, g, h = new_a, a, b, c, new_e, e, f, g\n", "\n", " # Add this chunk's hash to result so far:\n", " h0 = (h0 + a) & 0xFFFFFFFF\n", " h1 = (h1 + b) & 0xFFFFFFFF\n", " h2 = (h2 + c) & 0xFFFFFFFF\n", " h3 = (h3 + d) & 0xFFFFFFFF\n", " h4 = (h4 + e) & 0xFFFFFFFF\n", " h5 = (h5 + f) & 0xFFFFFFFF\n", " h6 = (h6 + g) & 0xFFFFFFFF\n", " h7 = (h7 + h) & 0xFFFFFFFF\n", " # 3. Conclusion\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", "\n", " def digest(self):\n", " # h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7\n", " return sum(leftshift(x, 32 * i) for i, x in enumerate(self.hash_pieces[::-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also write a function to directly compute the hex digest from some bytes data." ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def hash_SHA2(data):\n", " \"\"\" Shortcut function to directly receive the hex digest from SHA2(data).\"\"\"\n", " h = SHA2()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### First check on SHA-2\n", "\n", "Let us try it:" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "<__main__.SHA2 at 0x7fc82727f470>" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" }, { "name": "stdout", "output_type": "stream", "text": [ "SHA256\n" ] } ], "source": [ "h4 = SHA2()\n", "h4\n", "print(h4)" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "72202089257263067108821672097406107534247682087137282803352466770222342186230" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h4.update(data)\n", "h4.digest()" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0b1001111110100000111011110010111110100111101111001011110000000101110110010001110010001110000100110000100010010011011010011010001000100000100010011100101100000010000011011001000001010000011011101110100010000011010011001100011000001001110011111010010011110110'" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" }, { "data": { "text/plain": [ "258" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "digest = h4.digest()\n", "bin(digest)\n", "len(bin(digest))" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'9fa0ef2fa7bcbc05d91c8e13089369a22089cb020d90506ee8834cc609cfa4f6'" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h4.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check on SHA-2\n", "\n", "Let try the example from [SHA-2 Wikipedia page](https://en.wikipedia.org/wiki/SHA-2#Test_vectors) :" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592'" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA2(\"The quick brown fox jumps over the lazy dog\")\n", "assert hash_SHA2(\"The quick brown fox jumps over the lazy dog\") == 'd7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even a small change in the message will (with overwhelming probability) result in a mostly different hash, due to the [**avalanche effect**](https://en.wikipedia.org/wiki/Avalanche_effect). For example, adding a period at the end of the sentence:" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ef537f25c895bfa782526529a9b63d97aa631564d5d789c2b765448c8635fb6c'" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA2(\"The quick brown fox jumps over the lazy dog.\")\n", "assert hash_SHA2(\"The quick brown fox jumps over the lazy dog.\") == 'ef537f25c895bfa782526529a9b63d97aa631564d5d789c2b765448c8635fb6c'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The hash of the zero-length string is:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA2(\"\")\n", "assert hash_SHA2(\"\") == 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\implies$ We obtained the same result, OK our function works!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Trying 1000 random examples\n", "On a small sentence:" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'10a6dec377c28d2d34001c103760339f8d6ab02660d200d6014329b86253552c'" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA2(\"My name is Zorro !\")" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'10a6dec377c28d2d34001c103760339f8d6ab02660d200d6014329b86253552c'" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "h = hashlib.sha256()\n", "h.update(b\"My name is Zorro !\")\n", "h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It starts to look good." ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def true_hash_SHA2(data):\n", " h = hashlib.sha256()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On some random data:" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'CpGQBPn3dP'" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "random_string(10)" ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from tqdm import tqdm_notebook as tqdm" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "dd08b930c4f04376a0827792342e12df" } }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "CPU times: user 1min 1s, sys: 52 ms, total: 1min 1s\n", "Wall time: 1min 1s\n" ] } ], "source": [ "%%time\n", "for _ in tqdm(range(1000)):\n", " x = random_string()\n", " assert hash_SHA2(x) == true_hash_SHA2(x), \"Error: x = {} gave two different SHA2 hashes: my implementation = {} != hashlib implementation = {}...\".format(x, hash_SHA2(x), true_hash_SHA2(x))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Comparison : MD5 vs SHA-1 vs SHA-2\n", "\n", "It can be interesting to compare each hash functions, with respect to its time complexity." ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "36 ms ± 4.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "def test_MD5():\n", " x = random_string()\n", " return hash_MD5(x) == true_hash_MD5(x)\n", "\n", "%timeit test_MD5()" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "40.2 ms ± 2.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "def test_SHA1():\n", " x = random_string()\n", " return hash_SHA1(x) == true_hash_SHA1(x)\n", "\n", "%timeit test_SHA1()" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "61.4 ms ± 2.65 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "def test_SHA2():\n", " x = random_string()\n", " return hash_SHA2(x) == true_hash_SHA2(x)\n", "\n", "%timeit test_SHA2()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As expected, the MD5 hash is the fastest, and SHA-2 is slower than SHA-1.\n", "\n", "$\\implies$ The more secure, the slowest. Of course." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Bonus : SHA-2 variants\n", "\n", "Now that we have implemented SHA-256, it should be quick to add the SHA-224 variant, which is simply the SHA-256 with different initial values and a shorter digest.\n", "\n", "The SHA-512 variant is working with 64-bits words and 1024-bits chunks, instead of 32-bits words and 512-bits chunks, and the SHA-384 is very similar to SHA-512 with different initial values and a shorter digest.\n", "\n", "> Sorry about the length of this part, I know all these variants are very similar, but I wanted to write them all." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SHA-224\n", "\n", "As said on the Wikipedia page:\n", "> SHA-224 is identical to SHA-256, except that:\n", "> - the initial hash values `h0` through `h7` are different, and\n", "> - the output is constructed by omitting `h7`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The `SHA224` class" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": true }, "outputs": [], "source": [ "class SHA224(Hash):\n", " \"\"\"SHA224 hashing, see https://en.wikipedia.org/wiki/SHA-2#Pseudocode.\"\"\"\n", " \n", " def __init__(self):\n", " self.name = \"SHA224\"\n", " self.byteorder = 'big'\n", " self.block_size = 64\n", " self.digest_size = 28\n", " # Note 2: For each round, there is one round constant k[i] and one entry in the message schedule array w[i], 0 ≤ i ≤ 63\n", " # Note 3: The compression function uses 8 working variables, a through h\n", " # Note 4: Big-endian convention is used when expressing the constants in this pseudocode,\n", " # and when parsing message block data from bytes to words, for example,\n", " # the first word of the input message \"abc\" after padding is 0x61626380\n", "\n", " # Initialize hash values:\n", " # (The second 32 bits of the fractional parts of the square roots of the 9th through 16th primes 23..53)\n", " h0 = 0xc1059ed8\n", " h1 = 0x367cd507\n", " h2 = 0x3070dd17\n", " h3 = 0xf70e5939\n", " h4 = 0xffc00b31\n", " h5 = 0x68581511\n", " h6 = 0x64f98fa7\n", " h7 = 0xbefa4fa4\n", " \n", " # Initialize array of round constants:\n", " # (first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311):\n", " self.k = [\n", " 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,\n", " 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,\n", " 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,\n", " 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,\n", " 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,\n", " 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,\n", " 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,\n", " 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2\n", " ]\n", "\n", " # Store them\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", " \n", " def update(self, arg):\n", " h0, h1, h2, h3, h4, h5, h6, h7 = self.hash_pieces\n", " # 1. Pre-processing, exactly like MD5\n", " data = bytearray(arg)\n", " orig_len_in_bits = (8 * len(data)) & 0xFFFFFFFFFFFFFFFF\n", " # 1.a. Add a single '1' bit at the end of the input bits\n", " data.append(0x80)\n", " # 1.b. Padding with zeros as long as the input bits length ≡ 448 (mod 512)\n", " while len(data) % 64 != 56:\n", " data.append(0)\n", " # 1.c. append original length in bits mod (2 pow 64) to message\n", " data += orig_len_in_bits.to_bytes(8, byteorder='big')\n", " assert len(data) % 64 == 0, \"Error in padding\"\n", " # 2. Computations\n", " # Process the message in successive 512-bit = 64-bytes chunks:\n", " for offset in range(0, len(data), 64):\n", " # 2.a. 512-bits = 64-bytes chunks\n", " chunks = data[offset : offset + 64]\n", " w = [0 for i in range(64)]\n", " # 2.b. Break chunk into sixteen 32-bit = 4-bytes words w[i], 0 ≤ i ≤ 15\n", " for i in range(16):\n", " w[i] = int.from_bytes(chunks[4*i : 4*i + 4], byteorder='big')\n", " # 2.c. Extend the first 16 words into the remaining 48\n", " # words w[16..63] of the message schedule array:\n", " for i in range(16, 64):\n", " s0 = (rightrotate(w[i-15], 7) ^ rightrotate(w[i-15], 18) ^ rightshift(w[i-15], 3)) & 0xFFFFFFFF\n", " s1 = (rightrotate(w[i-2], 17) ^ rightrotate(w[i-2], 19) ^ rightshift(w[i-2], 10)) & 0xFFFFFFFF\n", " w[i] = (w[i-16] + s0 + w[i-7] + s1) & 0xFFFFFFFF\n", " # 2.d. Initialize hash value for this chunk\n", " a, b, c, d, e, f, g, h = h0, h1, h2, h3, h4, h5, h6, h7\n", " # 2.e. Main loop, cf. https://tools.ietf.org/html/rfc6234\n", " for i in range(64):\n", " S1 = (rightrotate(e, 6) ^ rightrotate(e, 11) ^ rightrotate(e, 25)) & 0xFFFFFFFF\n", " ch = ((e & f) ^ ((~e) & g)) & 0xFFFFFFFF\n", " temp1 = (h + S1 + ch + self.k[i] + w[i]) & 0xFFFFFFFF\n", " S0 = (rightrotate(a, 2) ^ rightrotate(a, 13) ^ rightrotate(a, 22)) & 0xFFFFFFFF\n", " maj = ((a & b) ^ (a & c) ^ (b & c)) & 0xFFFFFFFF\n", " temp2 = (S0 + maj) & 0xFFFFFFFF\n", "\n", " new_a = (temp1 + temp2) & 0xFFFFFFFF\n", " new_e = (d + temp1) & 0xFFFFFFFF\n", " # Rotate the 8 variables\n", " a, b, c, d, e, f, g, h = new_a, a, b, c, new_e, e, f, g\n", "\n", " # Add this chunk's hash to result so far:\n", " h0 = (h0 + a) & 0xFFFFFFFF\n", " h1 = (h1 + b) & 0xFFFFFFFF\n", " h2 = (h2 + c) & 0xFFFFFFFF\n", " h3 = (h3 + d) & 0xFFFFFFFF\n", " h4 = (h4 + e) & 0xFFFFFFFF\n", " h5 = (h5 + f) & 0xFFFFFFFF\n", " h6 = (h6 + g) & 0xFFFFFFFF\n", " h7 = (h7 + h) & 0xFFFFFFFF\n", " # 3. Conclusion\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", "\n", " def digest(self):\n", " # h0 append h1 append h2 append h3 append h4 append h5 append h6\n", " pieces_without_h7 = self.hash_pieces[:-1]\n", " return sum(leftshift(x, 32 * i) for i, x in enumerate(pieces_without_h7[::-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also write a function to directly compute the hex digest from some bytes data." ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def hash_SHA224(data):\n", " \"\"\" Shortcut function to directly receive the hex digest from SHA224(data).\"\"\"\n", " h = SHA224()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Checks on SHA-224\n", "\n", "Let try the example from [SHA-2 Wikipedia page](https://en.wikipedia.org/wiki/SHA-2#Test_vectors) :" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def true_hash_SHA224(data):\n", " h = hashlib.sha224()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'730e109bd7a8a32b1cb9d9a09aa2325d2430587ddbc0c38bad911525'" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA224(\"The quick brown fox jumps over the lazy dog\")\n", "assert hash_SHA224(\"The quick brown fox jumps over the lazy dog\") == '730e109bd7a8a32b1cb9d9a09aa2325d2430587ddbc0c38bad911525'\n", "assert hash_SHA224(\"The quick brown fox jumps over the lazy dog\") == true_hash_SHA224(\"The quick brown fox jumps over the lazy dog\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even a small change in the message will (with overwhelming probability) result in a mostly different hash, due to the [**avalanche effect**](https://en.wikipedia.org/wiki/Avalanche_effect). For example, adding a period at the end of the sentence:" ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'619cba8e8e05826e9b8c519c0a5c68f4fb653e8a3d8aa04bb2c8cd4c'" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA224(\"The quick brown fox jumps over the lazy dog.\")\n", "assert hash_SHA224(\"The quick brown fox jumps over the lazy dog.\") == '619cba8e8e05826e9b8c519c0a5c68f4fb653e8a3d8aa04bb2c8cd4c'\n", "assert hash_SHA224(\"The quick brown fox jumps over the lazy dog.\") == true_hash_SHA224(\"The quick brown fox jumps over the lazy dog.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The hash of the zero-length string is:" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'d14a028c2a3a2bc9476102bb288234c415a2b01f828ea62ac5b3e42f'" ] }, "execution_count": 73, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA224(\"\")\n", "assert hash_SHA224(\"\") == 'd14a028c2a3a2bc9476102bb288234c415a2b01f828ea62ac5b3e42f'\n", "assert hash_SHA224(\"\") == true_hash_SHA224(\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\implies$ We obtained the same result, OK our function works!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SHA-512\n", "\n", "As said on the Wikipedia page:\n", "\n", "> SHA-512 is identical in structure to SHA-256, but:\n", "> \n", "> - the message is broken into 1024-bit chunks,\n", "> - the initial hash values and round constants are extended to 64 bits,\n", "> - there are 80 rounds instead of 64,\n", "> - the message schedule array w has 80 64-bit words instead of 64 32-bit words,\n", "> - to extend the message schedule array w, the loop is from 16 to 79 instead of from 16 to 63,\n", "> - the round constants are based on the first 80 primes 2..409,\n", "> - the word size used for calculations is 64 bits long,\n", "> - the appended length of the message (before pre-processing), in bits, is a 128-bit big-endian integer, and\n", "> - the shift and rotate amounts used are different.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Useful functions the SHA-512 algorithm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is exactly like for SHA-256, except we work with 64-bits words instead of 32-bits." ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def leftrotate_64(x, c):\n", " \"\"\" Left rotate the number x by c bytes, for 64-bits numbers.\"\"\"\n", " x &= 0xFFFFFFFFFFFFFFFF\n", " return ((x << c) | (x >> (64 - c))) & 0xFFFFFFFFFFFFFFFF\n", "\n", "def rightrotate_64(x, c):\n", " \"\"\" Right rotate the number x by c bytes, for 64-bits numbers.\"\"\"\n", " x &= 0xFFFFFFFFFFFFFFFF\n", " return ((x >> c) | (x << (64 - c))) & 0xFFFFFFFFFFFFFFFF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The `SHA512` class" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "collapsed": true }, "outputs": [], "source": [ "class SHA512(Hash):\n", " \"\"\"SHA384 hashing, see https://en.wikipedia.org/wiki/SHA-2#Pseudocode.\"\"\"\n", " \n", " def __init__(self):\n", " self.name = \"SHA512\"\n", " self.byteorder = 'big'\n", " self.block_size = 128\n", " self.digest_size = 64\n", " # Note 2: For each round, there is one round constant k[i] and one entry in the message schedule array w[i], 0 ≤ i ≤ 79\n", " # Note 3: The compression function uses 8 working variables, a through h\n", " # Note 4: Big-endian convention is used when expressing the constants in this pseudocode\n", "\n", " # Initialize hash values:\n", " # (The second 64 bits of the fractional parts of the square roots of the first 8 primes 2..19)\n", " h0 = 0x6a09e667f3bcc908\n", " h1 = 0xbb67ae8584caa73b\n", " h2 = 0x3c6ef372fe94f82b\n", " h3 = 0xa54ff53a5f1d36f1\n", " h4 = 0x510e527fade682d1\n", " h5 = 0x9b05688c2b3e6c1f\n", " h6 = 0x1f83d9abfb41bd6b\n", " h7 = 0x5be0cd19137e2179\n", "\n", " # Initialize array of round constants:\n", " # (first 64 bits of the fractional parts of the cube roots of the first 80 primes 2..409):\n", " self.k = [\n", " 0x428a2f98d728ae22, 0x7137449123ef65cd, 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc, 0x3956c25bf348b538, \n", " 0x59f111f1b605d019, 0x923f82a4af194f9b, 0xab1c5ed5da6d8118, 0xd807aa98a3030242, 0x12835b0145706fbe, \n", " 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2, 0x72be5d74f27b896f, 0x80deb1fe3b1696b1, 0x9bdc06a725c71235, \n", " 0xc19bf174cf692694, 0xe49b69c19ef14ad2, 0xefbe4786384f25e3, 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65, \n", " 0x2de92c6f592b0275, 0x4a7484aa6ea6e483, 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5, 0x983e5152ee66dfab, \n", " 0xa831c66d2db43210, 0xb00327c898fb213f, 0xbf597fc7beef0ee4, 0xc6e00bf33da88fc2, 0xd5a79147930aa725, \n", " 0x06ca6351e003826f, 0x142929670a0e6e70, 0x27b70a8546d22ffc, 0x2e1b21385c26c926, 0x4d2c6dfc5ac42aed, \n", " 0x53380d139d95b3df, 0x650a73548baf63de, 0x766a0abb3c77b2a8, 0x81c2c92e47edaee6, 0x92722c851482353b, \n", " 0xa2bfe8a14cf10364, 0xa81a664bbc423001, 0xc24b8b70d0f89791, 0xc76c51a30654be30, 0xd192e819d6ef5218, \n", " 0xd69906245565a910, 0xf40e35855771202a, 0x106aa07032bbd1b8, 0x19a4c116b8d2d0c8, 0x1e376c085141ab53, \n", " 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8, 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb, 0x5b9cca4f7763e373, \n", " 0x682e6ff3d6b2b8a3, 0x748f82ee5defb2fc, 0x78a5636f43172f60, 0x84c87814a1f0ab72, 0x8cc702081a6439ec, \n", " 0x90befffa23631e28, 0xa4506cebde82bde9, 0xbef9a3f7b2c67915, 0xc67178f2e372532b, 0xca273eceea26619c, \n", " 0xd186b8c721c0c207, 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178, 0x06f067aa72176fba, 0x0a637dc5a2c898a6, \n", " 0x113f9804bef90dae, 0x1b710b35131c471b, 0x28db77f523047d84, 0x32caab7b40c72493, 0x3c9ebe0a15c9bebc, \n", " 0x431d67c49c100d4c, 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a, 0x5fcb6fab3ad6faec, 0x6c44198c4a475817\n", " ]\n", "\n", " # Store them\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", " \n", " def update(self, arg):\n", " h0, h1, h2, h3, h4, h5, h6, h7 = self.hash_pieces\n", " # 1. Pre-processing, exactly like MD5\n", " data = bytearray(arg)\n", " orig_len_in_bits = (8 * len(data)) & 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF\n", " # 1.a. Add a single '1' bit at the end of the input bits\n", " data.append(0x80)\n", " # 1.b. Padding with zeros as long as the input bits length ≡ 896 (mod 1024)\n", " while len(data) % 128 != 112:\n", " data.append(0)\n", " # 1.c. append original length in bits mod (2 pow 128) to message\n", " data += orig_len_in_bits.to_bytes(16, byteorder='big')\n", " assert len(data) % 128 == 0, \"Error in padding\"\n", " # 2. Computations\n", " # Process the message in successive 1024-bit = 128-bytes chunks:\n", " for offset in range(0, len(data), 128):\n", " # 2.a. 1024-bits = 128-bytes chunks\n", " chunks = data[offset : offset + 128]\n", " w = [0 for i in range(80)]\n", " # 2.b. Break chunk into sixteen 128-bit = 8-bytes words w[i], 0 ≤ i ≤ 15\n", " for i in range(16):\n", " w[i] = int.from_bytes(chunks[8*i : 8*i + 8], byteorder='big')\n", " # 2.c. Extend the first 16 words into the remaining 64\n", " # words w[16..79] of the message schedule array:\n", " for i in range(16, 80):\n", " s0 = (rightrotate_64(w[i-15], 1) ^ rightrotate_64(w[i-15], 8) ^ rightshift(w[i-15], 7)) & 0xFFFFFFFFFFFFFFFF\n", " s1 = (rightrotate_64(w[i-2], 19) ^ rightrotate_64(w[i-2], 61) ^ rightshift(w[i-2], 6)) & 0xFFFFFFFFFFFFFFFF\n", " w[i] = (w[i-16] + s0 + w[i-7] + s1) & 0xFFFFFFFFFFFFFFFF\n", " # 2.d. Initialize hash value for this chunk\n", " a, b, c, d, e, f, g, h = h0, h1, h2, h3, h4, h5, h6, h7\n", " # 2.e. Main loop, cf. https://tools.ietf.org/html/rfc6234\n", " for i in range(80):\n", " S1 = (rightrotate_64(e, 14) ^ rightrotate_64(e, 18) ^ rightrotate_64(e, 41)) & 0xFFFFFFFFFFFFFFFF\n", " ch = ((e & f) ^ ((~e) & g)) & 0xFFFFFFFFFFFFFFFF\n", " temp1 = (h + S1 + ch + self.k[i] + w[i]) & 0xFFFFFFFFFFFFFFFF\n", " S0 = (rightrotate_64(a, 28) ^ rightrotate_64(a, 34) ^ rightrotate_64(a, 39)) & 0xFFFFFFFFFFFFFFFF\n", " maj = ((a & b) ^ (a & c) ^ (b & c)) & 0xFFFFFFFFFFFFFFFF\n", " temp2 = (S0 + maj) & 0xFFFFFFFFFFFFFFFF\n", "\n", " new_a = (temp1 + temp2) & 0xFFFFFFFFFFFFFFFF\n", " new_e = (d + temp1) & 0xFFFFFFFFFFFFFFFF\n", " # Rotate the 8 variables\n", " a, b, c, d, e, f, g, h = new_a, a, b, c, new_e, e, f, g\n", "\n", " # Add this chunk's hash to result so far:\n", " h0 = (h0 + a) & 0xFFFFFFFFFFFFFFFF\n", " h1 = (h1 + b) & 0xFFFFFFFFFFFFFFFF\n", " h2 = (h2 + c) & 0xFFFFFFFFFFFFFFFF\n", " h3 = (h3 + d) & 0xFFFFFFFFFFFFFFFF\n", " h4 = (h4 + e) & 0xFFFFFFFFFFFFFFFF\n", " h5 = (h5 + f) & 0xFFFFFFFFFFFFFFFF\n", " h6 = (h6 + g) & 0xFFFFFFFFFFFFFFFF\n", " h7 = (h7 + h) & 0xFFFFFFFFFFFFFFFF\n", " # 3. Conclusion\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", "\n", " def digest(self):\n", " # h0 append h1 append h2 append h3 append h4 append h5 append h6 append h7\n", " return sum(leftshift(x, 64 * i) for i, x in enumerate(self.hash_pieces[::-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also write a function to directly compute the hex digest from some bytes data." ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def hash_SHA512(data):\n", " \"\"\" Shortcut function to directly receive the hex digest from SHA512(data).\"\"\"\n", " h = SHA512()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Checks on SHA-512\n", "\n", "Let try the example from [SHA-2 Wikipedia page](https://en.wikipedia.org/wiki/SHA-2#Test_vectors) :" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def true_hash_SHA512(data):\n", " h = hashlib.sha512()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'07e547d9586f6a73f73fbac0435ed76951218fb7d0c8d788a309d785436bbb642e93a252a954f23912547d1e8a3b5ed6e1bfd7097821233fa0538f3db854fee6'" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA512(\"The quick brown fox jumps over the lazy dog\")\n", "assert hash_SHA512(\"The quick brown fox jumps over the lazy dog\") == true_hash_SHA512(\"The quick brown fox jumps over the lazy dog\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even a small change in the message will (with overwhelming probability) result in a mostly different hash, due to the [**avalanche effect**](https://en.wikipedia.org/wiki/Avalanche_effect). For example, adding a period at the end of the sentence:" ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'91ea1245f20d46ae9a037a989f54f1f790f0a47607eeb8a14d12890cea77a1bbc6c7ed9cf205e67b7f2b8fd4c7dfd3a7a8617e45f3c463d481c7e586c39ac1ed'" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA512(\"The quick brown fox jumps over the lazy dog.\")\n", "assert hash_SHA512(\"The quick brown fox jumps over the lazy dog.\") == true_hash_SHA512(\"The quick brown fox jumps over the lazy dog.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The hash of the zero-length string is:" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e'" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA512(\"\")\n", "assert hash_SHA512(\"\") == 'cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e'\n", "assert hash_SHA512(\"\") == true_hash_SHA512(\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\implies$ We obtained the same result, OK our function works!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### SHA-384\n", "\n", "As said on the Wikipedia page:\n", "\n", "> SHA-384 is identical to SHA-512, except that:\n", "> - the initial hash values h0 through h7 are different (taken from the 9th through 16th primes), and\n", "> - the output is constructed by omitting h6 and h7." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### The `SHA384` class" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": true }, "outputs": [], "source": [ "class SHA384(Hash):\n", " \"\"\"SHA384 hashing, see https://en.wikipedia.org/wiki/SHA-2#Pseudocode.\"\"\"\n", " \n", " def __init__(self):\n", " self.name = \"SHA384\"\n", " self.byteorder = 'big'\n", " self.block_size = 96\n", " self.digest_size = 48\n", " # Note 2: For each round, there is one round constant k[i] and one entry in the message schedule array w[i], 0 ≤ i ≤ 79\n", " # Note 3: The compression function uses 8 working variables, a through h\n", " # Note 4: Big-endian convention is used when expressing the constants in this pseudocode\n", "\n", " # Initialize hash values:\n", " # (The second 64 bits of the fractional parts of the square roots of the first 9th through 16th primes 23..53)\n", " h0 = 0xcbbb9d5dc1059ed8\n", " h1 = 0x629a292a367cd507\n", " h2 = 0x9159015a3070dd17\n", " h3 = 0x152fecd8f70e5939\n", " h4 = 0x67332667ffc00b31\n", " h5 = 0x8eb44a8768581511\n", " h6 = 0xdb0c2e0d64f98fa7\n", " h7 = 0x47b5481dbefa4fa4\n", " \n", "\n", " # Initialize array of round constants:\n", " # (first 64 bits of the fractional parts of the cube roots of the first 80 primes 2..409):\n", " self.k = [\n", " 0x428a2f98d728ae22, 0x7137449123ef65cd, 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc, 0x3956c25bf348b538, \n", " 0x59f111f1b605d019, 0x923f82a4af194f9b, 0xab1c5ed5da6d8118, 0xd807aa98a3030242, 0x12835b0145706fbe, \n", " 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2, 0x72be5d74f27b896f, 0x80deb1fe3b1696b1, 0x9bdc06a725c71235, \n", " 0xc19bf174cf692694, 0xe49b69c19ef14ad2, 0xefbe4786384f25e3, 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65, \n", " 0x2de92c6f592b0275, 0x4a7484aa6ea6e483, 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5, 0x983e5152ee66dfab, \n", " 0xa831c66d2db43210, 0xb00327c898fb213f, 0xbf597fc7beef0ee4, 0xc6e00bf33da88fc2, 0xd5a79147930aa725, \n", " 0x06ca6351e003826f, 0x142929670a0e6e70, 0x27b70a8546d22ffc, 0x2e1b21385c26c926, 0x4d2c6dfc5ac42aed, \n", " 0x53380d139d95b3df, 0x650a73548baf63de, 0x766a0abb3c77b2a8, 0x81c2c92e47edaee6, 0x92722c851482353b, \n", " 0xa2bfe8a14cf10364, 0xa81a664bbc423001, 0xc24b8b70d0f89791, 0xc76c51a30654be30, 0xd192e819d6ef5218, \n", " 0xd69906245565a910, 0xf40e35855771202a, 0x106aa07032bbd1b8, 0x19a4c116b8d2d0c8, 0x1e376c085141ab53, \n", " 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8, 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb, 0x5b9cca4f7763e373, \n", " 0x682e6ff3d6b2b8a3, 0x748f82ee5defb2fc, 0x78a5636f43172f60, 0x84c87814a1f0ab72, 0x8cc702081a6439ec, \n", " 0x90befffa23631e28, 0xa4506cebde82bde9, 0xbef9a3f7b2c67915, 0xc67178f2e372532b, 0xca273eceea26619c, \n", " 0xd186b8c721c0c207, 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178, 0x06f067aa72176fba, 0x0a637dc5a2c898a6, \n", " 0x113f9804bef90dae, 0x1b710b35131c471b, 0x28db77f523047d84, 0x32caab7b40c72493, 0x3c9ebe0a15c9bebc, \n", " 0x431d67c49c100d4c, 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a, 0x5fcb6fab3ad6faec, 0x6c44198c4a475817\n", " ]\n", "\n", " # Store them\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", " \n", " def update(self, arg):\n", " h0, h1, h2, h3, h4, h5, h6, h7 = self.hash_pieces\n", " # 1. Pre-processing, exactly like MD5\n", " data = bytearray(arg)\n", " orig_len_in_bits = (8 * len(data)) & 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF\n", " # 1.a. Add a single '1' bit at the end of the input bits\n", " data.append(0x80)\n", " # 1.b. Padding with zeros as long as the input bits length ≡ 896 (mod 1024)\n", " while len(data) % 128 != 112:\n", " data.append(0)\n", " # 1.c. append original length in bits mod (2 pow 128) to message\n", " data += orig_len_in_bits.to_bytes(16, byteorder='big')\n", " assert len(data) % 128 == 0, \"Error in padding\"\n", " # 2. Computations\n", " # Process the message in successive 1024-bit = 128-bytes chunks:\n", " for offset in range(0, len(data), 128):\n", " # 2.a. 1024-bits = 128-bytes chunks\n", " chunks = data[offset : offset + 128]\n", " w = [0 for i in range(80)]\n", " # 2.b. Break chunk into sixteen 128-bit = 8-bytes words w[i], 0 ≤ i ≤ 15\n", " for i in range(16):\n", " w[i] = int.from_bytes(chunks[8*i : 8*i + 8], byteorder='big')\n", " # 2.c. Extend the first 16 words into the remaining 64\n", " # words w[16..79] of the message schedule array:\n", " for i in range(16, 80):\n", " s0 = (rightrotate_64(w[i-15], 1) ^ rightrotate_64(w[i-15], 8) ^ rightshift(w[i-15], 7)) & 0xFFFFFFFFFFFFFFFF\n", " s1 = (rightrotate_64(w[i-2], 19) ^ rightrotate_64(w[i-2], 61) ^ rightshift(w[i-2], 6)) & 0xFFFFFFFFFFFFFFFF\n", " w[i] = (w[i-16] + s0 + w[i-7] + s1) & 0xFFFFFFFFFFFFFFFF\n", " # 2.d. Initialize hash value for this chunk\n", " a, b, c, d, e, f, g, h = h0, h1, h2, h3, h4, h5, h6, h7\n", " # 2.e. Main loop, cf. https://tools.ietf.org/html/rfc6234\n", " for i in range(80):\n", " S1 = (rightrotate_64(e, 14) ^ rightrotate_64(e, 18) ^ rightrotate_64(e, 41)) & 0xFFFFFFFFFFFFFFFF\n", " ch = ((e & f) ^ ((~e) & g)) & 0xFFFFFFFFFFFFFFFF\n", " temp1 = (h + S1 + ch + self.k[i] + w[i]) & 0xFFFFFFFFFFFFFFFF\n", " S0 = (rightrotate_64(a, 28) ^ rightrotate_64(a, 34) ^ rightrotate_64(a, 39)) & 0xFFFFFFFFFFFFFFFF\n", " maj = ((a & b) ^ (a & c) ^ (b & c)) & 0xFFFFFFFFFFFFFFFF\n", " temp2 = (S0 + maj) & 0xFFFFFFFFFFFFFFFF\n", "\n", " new_a = (temp1 + temp2) & 0xFFFFFFFFFFFFFFFF\n", " new_e = (d + temp1) & 0xFFFFFFFFFFFFFFFF\n", " # Rotate the 8 variables\n", " a, b, c, d, e, f, g, h = new_a, a, b, c, new_e, e, f, g\n", "\n", " # Add this chunk's hash to result so far:\n", " h0 = (h0 + a) & 0xFFFFFFFFFFFFFFFF\n", " h1 = (h1 + b) & 0xFFFFFFFFFFFFFFFF\n", " h2 = (h2 + c) & 0xFFFFFFFFFFFFFFFF\n", " h3 = (h3 + d) & 0xFFFFFFFFFFFFFFFF\n", " h4 = (h4 + e) & 0xFFFFFFFFFFFFFFFF\n", " h5 = (h5 + f) & 0xFFFFFFFFFFFFFFFF\n", " h6 = (h6 + g) & 0xFFFFFFFFFFFFFFFF\n", " h7 = (h7 + h) & 0xFFFFFFFFFFFFFFFF\n", " # 3. Conclusion\n", " self.hash_pieces = [h0, h1, h2, h3, h4, h5, h6, h7]\n", "\n", " def digest(self):\n", " # h0 append h1 append h2 append h3 append h4 append h5\n", " hash_pieces_without_67 = self.hash_pieces[:-2]\n", " return sum(leftshift(x, 64 * i) for i, x in enumerate(hash_pieces_without_67[::-1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also write a function to directly compute the hex digest from some bytes data." ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def hash_SHA384(data):\n", " \"\"\" Shortcut function to directly receive the hex digest from SHA384(data).\"\"\"\n", " h = SHA384()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Checks on SHA-384\n", "\n", "Let try the example from [SHA-2 Wikipedia page](https://en.wikipedia.org/wiki/SHA-2#Test_vectors) :" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def true_hash_SHA384(data):\n", " h = hashlib.sha384()\n", " if isinstance(data, str):\n", " data = bytes(data, encoding='utf8')\n", " h.update(data)\n", " return h.hexdigest()" ] }, { "cell_type": "code", "execution_count": 84, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ca737f1014a48f4c0b6dd43cb177b0afd9e5169367544c494011e3317dbf9a509cb1e5dc1e85a941bbee3d7f2afbc9b1'" ] }, "execution_count": 84, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA384(\"The quick brown fox jumps over the lazy dog\")\n", "assert hash_SHA384(\"The quick brown fox jumps over the lazy dog\") == true_hash_SHA384(\"The quick brown fox jumps over the lazy dog\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even a small change in the message will (with overwhelming probability) result in a mostly different hash, due to the [**avalanche effect**](https://en.wikipedia.org/wiki/Avalanche_effect). For example, adding a period at the end of the sentence:" ] }, { "cell_type": "code", "execution_count": 85, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ed892481d8272ca6df370bf706e4d7bc1b5739fa2177aae6c50e946678718fc67a7af2819a021c2fc34e91bdb63409d7'" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA384(\"The quick brown fox jumps over the lazy dog.\")\n", "assert hash_SHA384(\"The quick brown fox jumps over the lazy dog.\") == true_hash_SHA384(\"The quick brown fox jumps over the lazy dog.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The hash of the zero-length string is:" ] }, { "cell_type": "code", "execution_count": 86, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'38b060a751ac96384cd9327eb1b1e36a21fdb71114be07434c0cc7bf63f6e1da274edebfe76f65fbd51ad2f14898b95b'" ] }, "execution_count": 86, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hash_SHA384(\"\")\n", "assert hash_SHA384(\"\") == '38b060a751ac96384cd9327eb1b1e36a21fdb71114be07434c0cc7bf63f6e1da274edebfe76f65fbd51ad2f14898b95b'\n", "assert hash_SHA384(\"\") == true_hash_SHA384(\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\implies$ We obtained the same result, OK our function works!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More comparison" ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "63.2 ms ± 3.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "def test_SHA224():\n", " x = random_string()\n", " return hash_SHA224(x) == true_hash_SHA224(x)\n", "\n", "%timeit test_SHA224()" ] }, { "cell_type": "code", "execution_count": 88, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "42.7 ms ± 465 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "def test_SHA512():\n", " x = random_string()\n", " return hash_SHA512(x) == true_hash_SHA512(x)\n", "\n", "%timeit test_SHA512()" ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "42.8 ms ± 474 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" ] } ], "source": [ "def test_SHA384():\n", " x = random_string()\n", " return hash_SHA384(x) == true_hash_SHA384(x)\n", "\n", "%timeit test_SHA384()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`SHA512` and `SHA384` are slower than `SHA256` obviously, but it's weird that `SHA224` is slower than the 64-bits versions..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "----\n", "## Conclusion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Well, it was fun and interesting to implement these hashing functions, manually.\n", "Using [Python](https://www.Python.org) made it easy!\n", "\n", "> Note that a Python 2 library implementing manually all these hashing functions already exist: [`pysha2`](https://github.com/thomdixon/pysha2), by [@thomdixon](https://github.com/thomdixon).\n", "> (I discovered it *after* writing this notebook!)\n", "\n", "[![made-with-jupyter](https://img.shields.io/badge/Made%20for-Jupyter%20notebook-1f425f.svg)](https://www.jupyter.org/)\n", "[![GitHub license](https://img.shields.io/github/license/Naereen/notebooks.svg)](https://github.com/Naereen/notebooks/blob/master/LICENSE.txt)\n", "[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/) \n", "[![ForTheBadge built-with-science](http://ForTheBadge.com/images/badges/built-with-science.svg)](https://GitHub.com/Naereen/)\n", "[![ForTheBadge powered-by-electricity](http://ForTheBadge.com/images/badges/powered-by-electricity.svg)](http://ForTheBadge.com)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bonus\n", "\"SHA\" is pronouced like the French word \"chat\", which means *cat*.\n", "\n", "![a cat playing on a computer](https://media.giphy.com/media/JIX9t2j0ZTN9S/giphy.gif)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> See [my GitHub `notebooks` project](https://GitHub.com/Naereen/notebooks/) for others notebooks." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.3" }, "notify_time": "5", "toc": { "colors": { "hover_highlight": "#DAA520", "running_highlight": "#FF0000", "selected_highlight": "#FFD700" }, "moveMenuLeft": true, "nav_menu": { "height": "303px", "width": "251px" }, "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 4, "toc_cell": true, "toc_position": { "height": "552px", "left": "0px", "right": "1164px", "top": "117px", "width": "212px" }, "toc_section_display": "block", "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 2 }