{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Part 7:  The RSA Cryptosystem in Python 3.x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the previous notebook, we studied a few basic ciphers together with Diffie-Hellman key exchange.  The Vigenère cipher we studied uses a **secret key** for encrypting and decrypting messages.  The same key is used for both encryption and decryption, so we say it is a **symmetric key** cipher.  In order for two parties to share the same secret key, we studied the Diffie-Hellman protocol, whose security rests on the difficulty of the discrete logarithm problem.\n",
    "\n",
    "Although this represents progress towards secure communication, it is particularly vulnerable to problems of authentication.  For example, imagine a \"man-in-the-middle attack\":  Alice and Bob wish to communicate securely, and begin the Diffie-Hellman protocol over an insecure line.  But **Eve** has intercepted the line.  To Alice, she pretends to be Bob, and to Bob, she pretends to be Alice.  She goes through the Diffie-Hellman protocol with each, obtaining two secret keys, and decrypting/encrypting messages as they pass through her computer.  In this way, Alice and Bob think they are talking to each other, but Eve is just passing (and understanding) their messages the whole time!\n",
    "\n",
    "To thwart such an attack, we need some type of authentication.  We need something **asymmetric** -- something one person can do that no other person can do, like a verifiable signature, so that we can be sure we're communicating with the intended person.  For such a purpose, we introduce the RSA cryptosystem.  Computationally based on modular exponentiation, its security rests on the difficulty of factoring large numbers.\n",
    "\n",
    "The material in this notebook complements Chapter 7 of [An Illustrated Theory of Numbers](http://illustratedtheoryofnumbers.com/index.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table of Contents\n",
    "\n",
    "- [Euler's theorem and modular roots](#euler)\n",
    "- [The RSA protocol](#RSA)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='euler'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Euler's Theorem and Modular Roots\n",
    "\n",
    "Recall Fermat's Little Theorem:  if $p$ is prime and $GCD(a,p) = 1$, then $a^{p-1} \\equiv 1$ mod $p$.  This is a special case of Euler's theorem, which holds for any modulus $m$.\n",
    "\n",
    "### Euler's theorem and the totient\n",
    "\n",
    "Euler's theorem states:  if $m$ is a positive integer and $GCD(a,m) = 1$, then $a^{\\phi(m)} \\equiv 1$ mod $m$.  Here $\\phi(m)$ denotes the **totient** of $m$, which is the number of elements of $\\{ 1,...,m \\}$ which are coprime to $m$.  We give a brute force implementation of the totient first, using our old Euclidean algorithm code for the GCD."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def GCD(a,b):\n",
    "    while b:   # Recall that != means \"not equal to\".\n",
    "        a, b = b, a % b\n",
    "    return abs(a)\n",
    "\n",
    "def totient(m):\n",
    "    tot = 0 # The running total.\n",
    "    j = 0\n",
    "    while j < m:  # We go up to m, because the totient of 1 is 1 by convention.\n",
    "        j = j + 1  # Last step of while loop:  j = m-1, and then j = j+1, so j = m.\n",
    "        if GCD(j,m) == 1:\n",
    "            tot = tot + 1\n",
    "    return tot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "totient(17)  # The totient of a prime p should be p-1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "totient(1000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "totient(1)  # Check a borderline case, to make sure we didn't make an off-by-one error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "17**totient(1000) % 1000 # Let's demonstrate Euler's theorem.  Note GCD(17,1000) = 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pow(17,totient(1000),1000) # A more efficient version, using the pow command."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%timeit totient(123456)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The totient is a **multiplicative function**, meaning that if $GCD(a,b) = 1$, then $\\phi(ab) = \\phi(a) \\phi(b)$.  Therefore, the totient of number can be found quickly from the totient of the prime powers within its decomposition.  We can re-implement the totient using our functions for prime decomposition and multiplicative functions from [Notebook 4](http://illustratedtheoryofnumbers.com/prog.html#notebooks)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from math import sqrt  # We'll want to use the square root.\n",
    "\n",
    "def smallest_factor(n):\n",
    "    '''\n",
    "    Gives the smallest prime factor of n.\n",
    "    '''\n",
    "    if n < 2:\n",
    "        return None # No prime factors!\n",
    "    \n",
    "    test_factor = 2 # The smallest possible prime factor.\n",
    "    max_factor = sqrt(n) # we don't have to search past sqrt(n).\n",
    "    \n",
    "    while test_factor <= max_factor:\n",
    "        if n%test_factor == 0:\n",
    "            return test_factor\n",
    "        test_factor = test_factor + 1  # This could be sped up.\n",
    "    \n",
    "    return n  # If we didn't find a factor up to sqrt(n), n itself is prime!\n",
    "\n",
    "def decompose(N):\n",
    "    '''\n",
    "    Gives the unique prime decomposition of a positive integer N,\n",
    "    as a dictionary with primes as keys and exponents as values.\n",
    "    '''\n",
    "    current_number = N  # We'll divide out factors from current_number until we get 1.\n",
    "    decomp = {} # An empty dictionary to start.\n",
    "    while current_number > 1:\n",
    "        p = smallest_factor(current_number) # The smallest prime factor of the current number.\n",
    "        if p in decomp.keys():  # Is p already in the list of keys?\n",
    "            decomp[p] = decomp[p] + 1 # Increase the exponent (value with key p) by 1.\n",
    "        else:  # \"else\" here means \"if p is not in decomp.keys()\".\n",
    "            decomp[p] = 1  # Creates a new entry in the dictionary, with key p and value 1.\n",
    "        current_number = current_number // p # Factor out p.\n",
    "    return decomp\n",
    "\n",
    "def mult_function(f_pp):\n",
    "    '''\n",
    "    When a function f_pp(p,e) of two arguments is input,\n",
    "    this outputs a multiplicative function obtained from f_pp\n",
    "    via prime decomposition.\n",
    "    '''\n",
    "    def f(n):\n",
    "        D = decompose(n)\n",
    "        result = 1\n",
    "        for p in D:\n",
    "            result = result * f_pp(p, D[p])\n",
    "        return result\n",
    "    \n",
    "    return f"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When $p^e$ is a prime power, the numbers among $1, \\ldots, p^e$ are coprime to $p^e$ precisely when they are **not** multiples of $p$.  Therefore, the totient of a prime power is pretty easy to compute:\n",
    "$$\\phi(p^e) = p^e - p^{e-1} = p^{e-1} (p-1).$$\n",
    "We implement this, and use the multiplicative function code to complete the implementation of the totient."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def totient_pp(p,e):\n",
    "    return (p**(e-1)) * (p-1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that for efficiency, the computation of $p^{e-1}(p-1)$ is probably faster than the computation of $p^e - p^{e-1}$ (which relies on two exponents).  But Python is very smart about these sorts of things, and might do some optimization which makes the computation faster.  It should be very fast anyways, in the 10-20 nanosecond range!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "totient = mult_function(totient_pp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "totient(1000)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%timeit totient(123456)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This should be **much** faster than the previous brute-force computation of the totient."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Modular roots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A consequence of Euler's theorem is that -- depending on the modulus -- some exponentiation can be \"reversed\" by another exponentiation, a form of taking a \"root\" in modular arithmetic.  For example, if we work modulo $100$, and $GCD(a,100) = 1$, then Euler's theorem states that\n",
    "$$a^{40} \\equiv 1 \\text{ mod } 100.$$\n",
    "It follows that $a^{80} \\equiv 1$ and $a^{81} \\equiv a$, modulo $100$.  Expanding this, we find\n",
    "$$a \\equiv a^{81} = a^{3 \\cdot 27} = (a^3)^{27} \\text{ mod } 100.$$\n",
    "\n",
    "What all this computation shows is that \"raising to the 27th power\" is like \"taking the cube root\", modulo 100 (and with appropriate bases)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for b in range(20):\n",
    "    b_cubed = pow(b,3,100)\n",
    "    bb = pow(b_cubed,27,100)\n",
    "    print(b, b_cubed, bb, GCD(b,100) == 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In every line ending with `True`, the first and third numbers should match.  This will happen in some `False` lines too, but not reliably since Euler's theorem does not apply there.\n",
    "\n",
    "We found the exponent 27 -- reversing the cubing operation -- by an ad hoc sort of procedure.  The relationship between 27 and 3, which made things work, is that\n",
    "$$3 \\cdot 27 \\equiv 1 \\text{ mod } 40.$$\n",
    "In other words, $3 \\cdot 27 = 1 + 40k$ for some (positive, in fact) integer $k$.\n",
    "\n",
    "Recalling that $40 = \\phi(100)$, this relationship and Euler's theorem imply that\n",
    "$$a^{3 \\cdot 27} = a^{1 + 40k} \\equiv a^1 = a \\text{ mod } 100.$$\n",
    "By this argument, we have the following consequence of Euler's theorem.  \n",
    "\n",
    "**Theorem: ** If $GCD(a,m) = 1$, and $ef \\equiv 1$ mod $\\phi(m)$, then\n",
    "$$a^{ef} \\equiv a \\text{ mod } \\phi(m).$$\n",
    "In this way, \"raising to the $f$ power\" is like \"taking the $e$-th root\", modulo $m$. \n",
    "\n",
    "If we are given $f$, then $e$ is a **multiplicative inverse** of $f$ modulo $\\phi(m)$.  In particular, such a multiplicative inverse exists if and only if $GCD(e,\\phi(m)) = 1$.  The following function computes a multiplicative inverse, by adapting the `solve_LDE` function from [Notebook 2]().  After all, solving $ex \\equiv 1$ mod $m$ is equivalent to solving the linear Diophantine equation $ex + my = 1$ (and only caring about the $x$-value). \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def mult_inverse(a,m):\n",
    "    '''\n",
    "    Finds the multiplicative inverse of a, mod m.\n",
    "    If GCD(a,m) = 1, this is returned via its natural representative.\n",
    "    Otherwise, None is returned.\n",
    "    '''  \n",
    "    u = a # We use u instead of dividend.\n",
    "    v = m # We use v instead of divisor.\n",
    "    u_hops, u_skips = 1,0 # u is built from one hop (a) and no skips.\n",
    "    v_hops, v_skips = 0,1 # v is built from no hops and one skip (b).\n",
    "    while v != 0:   # We could just write while v:\n",
    "        q = u // v  # q stands for quotient.\n",
    "        r = u % v  # r stands for remainder.  So u = q(v) + r.\n",
    "        \n",
    "        r_hops = u_hops - q * v_hops  # Tally hops\n",
    "        r_skips = u_skips - q * v_skips  # Tally skips\n",
    "        \n",
    "        u,v = v,r  # The new dividend,divisor is the old divisor,remainder.\n",
    "        u_hops, v_hops = v_hops, r_hops # The new u_hops, v_hops is the old v_hops, r_hops\n",
    "        u_skips, v_skips = v_skips, r_skips # The new u_skips, v_skips is the old v_skips, r_skips\n",
    "    \n",
    "    g = u # The variable g now describes the GCD of a and b.\n",
    "    if g == 1:\n",
    "        return u_hops % m\n",
    "    else:  # When GCD(a,m) is not 1...\n",
    "        return None"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mult_inverse(3,40)  # 3 times what is congruent to 1, mod 40?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "mult_inverse(5,40)  # None should be returned."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's test this out on some bigger numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from random import randint\n",
    "while True:\n",
    "    m = randint(1000000, 9999999) # a random 7-digit number\n",
    "    e = randint(100,999) # a random 3-digit number\n",
    "    a = randint(10,99) # a random 2-digit number\n",
    "    if GCD(a,m) == 1:\n",
    "        tot = totient(m)\n",
    "        if GCD(e,tot) == 1:\n",
    "            f = mult_inverse(e,tot)\n",
    "            test_number = pow(a, e*f, m)\n",
    "            print(\"Success!\")\n",
    "            print(\"{} ^ ({} * {}) = {}, mod {}\".format(a,e,f,test_number,m))\n",
    "            break # Escapes the loop once an example is found!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercises\n",
    "\n",
    "1.  What is the largest integer whose totient is 100?  Study this by a brute force search, i.e., looping through the integers up to 10000\n",
    "\n",
    "2.  For which integers $n$ is it true that $\\phi(n) = n/2$?  Study this by a brute force search in order to make a conjecture.  Then prove it if you can.\n",
    "\n",
    "3.  Compute the totient of the numbers from 1 to 10000, and analyze the results.  The totient $\\phi(n)$ is always less than $n$ when $n > 1$, but how does the ratio $\\phi(n) / n$ behave? Create a graph.  What is the average value of the ratio?  \n",
    "\n",
    "4.  If $a^{323} \\equiv 802931$, mod $5342481$, and $GCD(a, 5342481) = 1$, and $0 < a < 5342481$, then what is $a$? \n",
    "\n",
    "5.  Challenge:  Create a function `superpow(x,y,z,m)` which computes $x^{y^z}$ modulo $m$ efficiently, when $GCD(x,m) = 1$ and $m$ is small enough to factor."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id='RSA'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The RSA protocol"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Like Diffie-Hellman, the RSA protocol involves a series of computations in modular arithmetic, taking care to keep some numbers private while making others public.  RSA was published two years after Diffie-Hellman, in 1978 by Rivest, Shamir, and Adelman (hence its name).  The great advance of the RSA protocol was its **asymmetry**.  While Diffie-Hellman is used for symmetric key cryptography (using the same key to encrypt and decrypt), the RSA protocol has two keys:  a **public key** that can be used by anyone for encryption and a **private key** that can be used by its owner for decryption.  \n",
    "\n",
    "In this way, if Alice publishes her public key online, anyone can send her an encrypted message.  But as long as she keeps her private key private, **only** Alice can decrypt the messages sent to her.  Such an asymmetry allows RSA to be used for authentication -- if the owner of a private key has an ability nobody else has, then this ability can be used to prove the owner's identity.  In practice, this is one of the most common applications of RSA, guaranteeing that we are communicating with the intended person."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the RSA protocol, the **private key** is a pair of large (e.g. 512 bit) prime numbers, called $p$ and $q$.  The **public key** is the pair $(N, e)$, where $N$ is defined to be the product $N = pq$ and $e$ is an auxiliary number called the exponent.  The number $e$ is often (for computational efficiency and other reasons) taken to be 65537 -- the same number $e$ can be used over and over by different people.  But it is absolutely crucial that the same private keys $p$ and $q$ are not used by different individuals.  Individuals must create and safely keep their own private key."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We begin with the creation of a private key $(p,q)$.  We use the `SystemRandom` function (see the previous Python Notebook) to cook up cryptographically secure random numbers, and the Miller-Rabin test to certify primality."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from random import SystemRandom, randint\n",
    "\n",
    "def Miller_Rabin(p, base):\n",
    "    '''\n",
    "    Tests whether p is prime, using the given base.\n",
    "    The result False implies that p is definitely not prime.\n",
    "    The result True implies that p **might** be prime.\n",
    "    It is not a perfect test!\n",
    "    '''\n",
    "    result = 1\n",
    "    exponent = p-1\n",
    "    modulus = p\n",
    "    bitstring = bin(exponent)[2:] # Chop off the '0b' part of the binary expansion of exponent\n",
    "    for bit in bitstring: # Iterates through the \"letters\" of the string.  Here the letters are '0' or '1'.\n",
    "        sq_result = result*result % modulus  # We need to compute this in any case.\n",
    "        if sq_result == 1:\n",
    "            if (result != 1) and (result != exponent):  # Note that exponent is congruent to -1, mod p.\n",
    "                return False  # a ROO violation occurred, so p is not prime\n",
    "        if bit == '0':\n",
    "            result = sq_result \n",
    "        if bit == '1':\n",
    "            result = (sq_result * base) % modulus\n",
    "    if result != 1:\n",
    "        return False  # a FLT violation occurred, so p is not prime.\n",
    "    \n",
    "    return True  # If we made it this far, no violation occurred and p might be prime.\n",
    "\n",
    "def is_prime(p, witnesses=50):  # witnesses is a parameter with a default value.\n",
    "    '''\n",
    "    Tests whether a positive integer p is prime.\n",
    "    For p < 2^64, the test is deterministic, using known good witnesses.\n",
    "    Good witnesses come from a table at Wikipedia's article on the Miller-Rabin test,\n",
    "    based on research by Pomerance, Selfridge and Wagstaff, Jaeschke, Jiang and Deng.\n",
    "    For larger p, a number (by default, 50) of witnesses are chosen at random.\n",
    "    '''\n",
    "    if (p%2 == 0): # Might as well take care of even numbers at the outset!\n",
    "        if p == 2:\n",
    "            return True\n",
    "        else:\n",
    "            return False \n",
    "    \n",
    "    if p > 2**64:  # We use the probabilistic test for large p.\n",
    "        trial = 0\n",
    "        while trial < witnesses:\n",
    "            trial = trial + 1\n",
    "            witness = randint(2,p-2) # A good range for possible witnesses\n",
    "            if Miller_Rabin(p,witness) == False:\n",
    "                return False\n",
    "        return True\n",
    "    \n",
    "    else:  # We use a determinisic test for p <= 2**64.\n",
    "        verdict = Miller_Rabin(p,2)\n",
    "        if p < 2047:\n",
    "            return verdict # The witness 2 suffices.\n",
    "        verdict = verdict and Miller_Rabin(p,3)\n",
    "        if p < 1373653:\n",
    "            return verdict # The witnesses 2 and 3 suffice.\n",
    "        verdict = verdict and Miller_Rabin(p,5)\n",
    "        if p < 25326001:\n",
    "            return verdict # The witnesses 2,3,5 suffice.\n",
    "        verdict = verdict and Miller_Rabin(p,7)\n",
    "        if p < 3215031751:\n",
    "            return verdict # The witnesses 2,3,5,7 suffice.\n",
    "        verdict = verdict and Miller_Rabin(p,11)\n",
    "        if p < 2152302898747:\n",
    "            return verdict # The witnesses 2,3,5,7,11 suffice.\n",
    "        verdict = verdict and Miller_Rabin(p,13)\n",
    "        if p < 3474749660383:\n",
    "            return verdict # The witnesses 2,3,5,7,11,13 suffice.\n",
    "        verdict = verdict and Miller_Rabin(p,17)\n",
    "        if p < 341550071728321:\n",
    "            return verdict # The witnesses 2,3,5,7,11,17 suffice.\n",
    "        verdict = verdict and Miller_Rabin(p,19) and Miller_Rabin(p,23)\n",
    "        if p < 3825123056546413051:\n",
    "            return verdict # The witnesses 2,3,5,7,11,17,19,23 suffice.\n",
    "        verdict = verdict and Miller_Rabin(p,29) and Miller_Rabin(p,31) and Miller_Rabin(p,37)\n",
    "        return verdict # The witnesses 2,3,5,7,11,17,19,23,29,31,37 suffice for testing up to 2^64. \n",
    "    \n",
    "\n",
    "def random_prime(bitlength):\n",
    "    while True:\n",
    "        p = SystemRandom().getrandbits(bitlength)  # A cryptographically secure random number.\n",
    "        if is_prime(p):\n",
    "            return p"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "random_prime(100) # A random 100-bit prime"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "random_prime(512) # A random 512-bit prime.  Should be quick, thanks to Miller-Rabin!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%timeit random_prime(1024)  # Even 1024-bit primes should be quick!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def RSA_privatekey(bitlength):\n",
    "    '''\n",
    "    Create private key for RSA, with given bitlength.\n",
    "    Just a pair of big primes!\n",
    "    '''\n",
    "    p = random_prime(bitlength)\n",
    "    q = random_prime(bitlength)\n",
    "    return p,q # Returns both values, as a \"tuple\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(RSA_privatekey(8))  # When a function returns multiple values, the type is \"tuple\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p,q = RSA_privatekey(512)  # If a function outputs two values, you can assign them to two variables.\n",
    "print(\"Private key p = {}\".format(p))\n",
    "print(\"Private key q = {}\".format(q))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def RSA_publickey(p,q, e = 65537):\n",
    "    '''\n",
    "    Makes the RSA public key out of \n",
    "    two prime numbers p,q (the private key), \n",
    "    and an auxiliary exponent e.  \n",
    "    By default, e = 65537.\n",
    "    '''\n",
    "    N = p*q\n",
    "    return N,e"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N,e = RSA_publickey(p,q) # No value of e is input, so it will default to 65537\n",
    "\n",
    "print(\"Public key N = {}\".format(N)) # A big number!\n",
    "print(\"Public key e = {}\".format(e))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Encryption and decryption\n",
    "\n",
    "Now we explain how the public key $(N,e)$ and the private key $(p,q)$ are used to encrypt and decrypt a (numerical) message $m$.  If you wish to encrypt/decrypt a text message, one case use a numerical scheme like ASCII, of course.  The message should be significantly shorter than the modulus, $m < N$ (ideally, shorter than the private key primes), but big enough so that $m^e$ is much bigger than $N$ (not usually a problem if $e = 65537$).  \n",
    "\n",
    "The encryption procedure is simple -- it requires just one line of Python code, using the message and public key.  The ciphertext $c$ is given by the formula\n",
    "$$c = m^e \\text{ mod } N,$$\n",
    "where here we mean the \"natural representative\" of $m^e$ modulo $N$.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def RSA_encrypt(message, N, e):\n",
    "    '''\n",
    "    Encrypts message, using the public keys N,e.\n",
    "    '''\n",
    "    return pow(message, e, N)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "c = RSA_encrypt(17,N,e) # c is the ciphertext.\n",
    "print(\"The ciphertext is {}\".format(c)) # A very long number!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To decrypt the ciphertext, we need to \"undo\" the operation of raising to the $e$-th power modulo $N$.  We must, effectively, take the $e$-th root of the ciphertext, modulo $N$.  This is what we studied earlier in this notebook.  Namely, if $c \\equiv m^e \\text{ mod } N$ is the ciphertext, and $ef \\equiv 1$ modulo $\\phi(N)$, then\n",
    "$$c^f \\equiv m^{ef} \\equiv m \\text{ mod } N.$$\n",
    "\n",
    "So we must raise the ciphertext to the $f$ power, where $f$ is the multiplicative inverse of $e$ modulo $\\phi(N)$.  Given a giant number $N$, it is difficult to compute the totient $\\phi(N)$.  But, with the **private key** $p$ and $q$ (primes), the fact that $N = pq$ implies\n",
    "$$\\phi(N) = (p-1) \\cdot (q-1) = pq - p - q + 1 = N - p - q + 1.$$\n",
    "\n",
    "Armed with the private key (and the public key, which everyone has), we can decrypt a message in just a few lines of Python code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def RSA_decrypt(ciphertext, p,q,N,e):\n",
    "    '''\n",
    "    Decrypts message, using the private key (p,q) \n",
    "    and the public key (N,e).  We allow the public key N as\n",
    "    an input parameter, to avoid recomputing it.\n",
    "    '''\n",
    "    tot = N - (p+q) + 1\n",
    "    f = mult_inverse(e,tot)  # This uses the Euclidean algorithm... very quick!\n",
    "    return pow(ciphertext,f,N)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "RSA_decrypt(c,p,q,N,e)  # We decrypt the ciphertext... what is the result?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's the entire process of encryption and decryption.  Encryption requires just the public key $(N,e)$ and decryption requires the private key $(p,q)$ too.  Everything else is modular arithmetic, using Euler's theorem and the Euclidean algorithm to find modular multiplicative inverses.\n",
    "\n",
    "From a practical standpoint, there are many challenges, and we just mention a few here.\n",
    "\n",
    "**Key generation:**  The person who constructs the private key $(p,q)$ needs to be careful.  The primes $p$ and $q$ need to be pretty large (512 bits, or 1024 bits perhaps), which is not so difficult.  They also need to be constructed **randomly**.  For imagine that Alice comes up with her private key $(p,q)$ and Anne comes up with her private key $(q,r)$, with the same prime $q$ in common.  Their public keys will include the numbers $N = pq$ and $M = qr$.  If someone like Arjen comes along and starts taking GCDs of all the public keys in a database, that person will stumble upon the fact that $GCD(N,M) = q$, from which the private keys $(p,q)$ and $(q,r)$ can be derived.  And this sort of disaster has happened!  Poorly generated keys were stored in a database, and [discovered by Arjen Lenstra et al.](https://eprint.iacr.org/2012/064.pdf).\n",
    "\n",
    "**Security by difficulty of factoring:**  The security of RSA is based on the difficulty of obtaining the private key $(p,q)$ from the public key $(N,e)$.  Since $N = pq$, this is precisely the difficulty of factoring a large number $N$ into two primes (given the knowledge that it is the product of two primes).  Currently it seems very difficult to factor large numbers.  The RSA factoring challenges give monetary rewards for factoring such large $N$.  The record (2017) is factoring a [768-bit (232 digit) number, RSA-768](https://en.wikipedia.org/wiki/RSA_numbers#RSA-768).  For this reason, we may consider a 1024-bit number secure for now (i.e. $p$ and $q$ are 512-bit primes), or use a 2048-bit number if we are paranoid.  If quantum computers develop sufficiently, they could make factoring large numbers easy, and RSA will have to be replaced by a quantum-secure protocol. \n",
    "\n",
    "**Web of Trust:**  Trust needs to begin somewhere.  Every time Alice and Bob communicate, Alice should not come up with a new private key, and give Bob the new public key.  For if they are far apart, how does Bob know he's receiving Alice's public key and not talking to an eavesdropper Eve?  Instead, it is better for Alice to register (in person, perhaps) her public key at some time.  She can create her private key $(p,q)$ and register the resulting public key $(N,e)$ with some \"key authority\" who checks her identity at the time.  The key authority then stores everyone's public keys -- effectively they say \"if you want to send a secure message to Alice, use the following public key:  (..., ...)\"  Then Bob can consult the key authority when he wishes to communicate securely to Alice, and this private/public key combination can be used for years.\n",
    "\n",
    "But, as one might guess, this kicks the trust question to another layer.  How does Bob know he's communicating with the key authority?  The key authorities need to have their own authentication mechanism, etc..  One way to avoid going down a rabbithole of mistrust is to *distribute* trust across a network of persons.  Instead of a centralized \"key authority\", one can distribute one's public keys across an entire network of communicators (read about [openPGP](https://en.wikipedia.org/wiki/Pretty_Good_Privacy#OpenPGP)).  Then Bob, if he wishes, can double-check Alice's public keys against the records of numerous members of the network -- assuming that hackers haven't gotten to all of them!  \n",
    "\n",
    "In practice, some implementations of RSA use a more centralized authority and others rely on a web of trust.  Cryptography requires a clever application of modular arithmetic (in Diffie-Hellman, RSA, and many other systems), but also a meticulous approach to implementation.  Often the challenges of implementation introduce new problems in number theory."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Digital signatures\n",
    "\n",
    "A variant of RSA is used to \"digitally sign\" a document.  Not worrying about keeping a message private for now, suppose that Alice wants to send Bob a message and **sign** the message in such a way that Bob can be confident it was sent by Alice.  \n",
    "\n",
    "Alice can digitally sign her message by first **hashing** the message and then encrypting the hash using her **private key** (previously the *public key* was used for encryption -- this is different!).  Hashing a message is an irreversible process that turns a message of possibly long length into a nonsense-message of typically fixed length, in such a way that the original message cannot be recovered from the nonsense-message.  \n",
    "\n",
    "There is a science to secure hashing, which we don't touch on here.  Instead, we import the `sha512` hashing function from the Python standard package `hashlib`.  The input to `sha512` is a string of arbitrary length, and the output is a sequence of 512 bits.  Here in Python 3, we also must specify the Unicode encoding (`utf-8`) for the string to be hashed.  One can convert the 512 bits of hash into 64 bytes (512/8 = 64) which can be viewed as a 64-character string via ASCII.  This 64-byte string is called the *digest* of the hash.  It's all very biological.  Note that many of the characters won't display very nicely, since they are out of the code range 32-126!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hashlib import sha512\n",
    "\n",
    "print(sha512(\"I like sweet potato hash.\".encode('utf-8')).digest()) # A 64-character string of hash."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hashes are designed to be irreversible.  Nobody should be able to reverse the hashing process and recover the original string ('I like sweet potato hash') from the output of sha512.  Moreover, hashes should avoid collisions -- although different input strings *can* yield the same hashed result, such \"collisions\" should be exceedingly rare in practice.  You can read more about [SHA-512 and others at Wikipedia](https://en.wikipedia.org/wiki/SHA-2).\n",
    "\n",
    "Now, if Alice wishes to sign a message `m`, and send it to Bob, she goes through the following steps (in the RSA signature protocol).\n",
    "\n",
    "1.  Alice hashes her message using a method such as SHA-512.  Let `h` be the resulting hash.\n",
    "\n",
    "2.  Alice **encrypts** the hash by computing $s = h^f$ modulo $N$.  Here $f$ is the multiplicative inverse of $e$ modulo $\\phi(N)$, just as it was in RSA encryption.  Note that this step requires knowledge of the private key $(p,q)$ to compute $\\phi(N)$ to compute $f$.  Hence only Alice can encrypt the has in this way.\n",
    "\n",
    "3.  Alice sends this encrypted hash along with her message to Bob, along with a note that she signed it using SHA-512 and her RSA key (without revealing her private keys, of course!).\n",
    "\n",
    "When Bob receives the (plaintext) message $m$ and signature $s$, he carries out the following authentication process.\n",
    "\n",
    "1.  Bob computes $s^e$ modulo $N$.  Since $s^e = h^{fe} = h$ modulo $N$, Bob now has the hash of the message $h$.  \n",
    "\n",
    "2.  Bob also computes the SHA-512 hash of the received message $m$, and compares it to the has he computed in step 1.  If they match, then Alice indeed signed the message he received.\n",
    "\n",
    "We leave implementation of this process to the exercises."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercises\n",
    "\n",
    "1.  Why might the exponent $e = 65537$ be a computationally convenient choice?  Consider Pingala's algorithm.\n",
    "\n",
    "2.  What would happen if the message $m$ encrypted by RSA were *equal* to one of these primes of the private key?  How might such an occurrence be avoided, and is this something to be concerned about in practice?\n",
    "\n",
    "3.  Look at the technical document [on OpenPGP](https://tools.ietf.org/html/rfc4880), and try to figure out how RSA and/or other cryptosystems are being used and implemented in practice.  Write a 1-2 paragraph nontechnical summary of your findings.\n",
    "\n",
    "4.  Suppose that you choose $M$ prime numbers at random between $2^{b-1}$ and $2^{b}$.  Assuming that prime numbers near $x$ have a density $1 / \\log(x)$, estimate the probability that there is a \"collision\" -- that at least two of the $M$ prime numbers are the same.  What is this probability when $b = 512$ and $M$ is two billion?  (E.g., if a billion people use a pair of 512-bit private keys).  Look up the \"Birthday Problem\" for some hints if you haven't seen this before.\n",
    "\n",
    "5.  Create a function `sign(message, p,q,N,e)` to carry out the RSA signature protocol described above, based on a private key $(p,q)$ with public key $(N,e)$.  Create a function `verify(message, signature, N, e)` to verify a signed message.  Use the SHA-512 algorithm throughout, and check that your functions work.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}