{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hashlib import sha256\n",
    "import base58"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bitcoin address manipulation  - \n",
    "## Why you should check every character in an address when signing a transaction.\n",
    "\n",
    "Generating a valid bitcoin address that shares most of the characters with another address is not difficult for an attacker if they don't care about having the private key. This allows an attacker to cause you to lose funds with a relatively trivial amount of work. **I can only recommend that bitcoin users verify every character in every address they interact with.**\n",
    "\n",
    "At it's most difficult, you can generate a different valid non-segwit address by changing any 6 non-sequential characters in 10's of hours using this notebook and a standard laptop. If you fix the last 4 characters and up to the first 25 charaters, it it significantly easier (~2-3min to generate a valid malicious address). Fixing the final 6 characters is equivalent to the most difficult case.\n",
    "\n",
    "For segwit addresses, the manipulation is easier. Modifying non-sequential characters only affect the final 6 characters instead of all characters to the right of the modification. Finding a malicious address that matches the last 4 characters and most of the other characters only takes a minute via this notebook.\n",
    "\n",
    "The goal of this post is to demonstrate why it’s important to verify every character in an address, not just the first and last few values, by showing how easy it is for an adversary to generate an address that looks similar to a valid address generated by your wallet. While such an attack couldn’t be used to steal funds, it would still result in lost bitcoin. \n",
    "\n",
    "I’ll also cover some of the most important mechanics behind bitcoin address serialization, including Base58 encoding, Bech32 encoding and checksums.\n",
    "\n",
    "All code for this post is written in python. You can run it yourself from the jupyter notebook, available on GitHub [here](https://github.com/destrys/address_manipulation).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**NOTE:** For this example, I will be using this address: `1HU1LDBXUg73f2ro2e2dB3XY8cFoYLFgZZ`, it was randomly grabbed from a\n",
    "block explorer, apologies if it is yours."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Base58 Basics\n",
    "\n",
    "Before I explain address serialization, I must first explain Base58, and give some examples of converting between bases.\n",
    "\n",
    "When we talk about the “base” of a number, what we’re referring to is the formatting used to represent that data. This is also known as [Positional Notation](https://en.wikipedia.org/wiki/Positional_notation). Most recognize the term “binary”, which is how we can understand computers to interpret data. Binary is a base 2 representation, i.e. 1s and 0s (bi - two numbers). The decimal system is base 10 (deci - ten, i.e. 0-9) and [hexadecimal](https://en.wikipedia.org/wiki/Hexadecimal) is base 16 which uses the numbers 0-9 and letters A-F, generally used to present data in a more human readable format.\n",
    "\n",
    "Base58 uses 58 characters: \n",
    "(`123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz`)\n",
    "\n",
    "> In contrast to Base32 or Base64, the digits of the encoding do not line up well with byte boundaries of the original data. For this reason, the method is well-suited to encode large integers, but not designed to encode longer portions of binary data.\n",
    "\n",
    "Because 58 is not a power of two, the characters in base58 don't correspond to a fixed number of bytes.\n",
    "Decimal has the same problem, one byte has 256 values, but that can require 1, 2, or 3 decimal characters.\n",
    "\n",
    "Because of this, when converting a number to base58 from any other standard base, it may be impossible to know what the least significant digits will be, as they are dependent on all the digits in the input. (Unfortunately I couldn’t find a reference for this, so if any readers can help verify it would be greatly appreciated.)\n",
    "\n",
    "The most significant digits, however, are not dependendent on all the digits, only the most significant ones.\n",
    "If you’re curious to learn more about what we’re talking about when we reference most and least significant digits, the best way is to learn about “Endianness” which you can see [here](https://en.wikipedia.org/wiki/Endianness)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An example might help explain this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "a_big_number = 1234567890"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we’ll convert it into 4 different types: decimal (should be the same), hexadecimal, bytes (as a list of integers), and base58. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Decimal:  1234567890  Length:  10\n",
      "Hexidecimal:  499602d2  Length:  8\n",
      "Bytes: [73, 150, 2, 210]  Length:  4\n",
      "base58:  2t6V2H  Length:  6\n",
      "base58 from int:  2t6V2H  Length:  6\n"
     ]
    }
   ],
   "source": [
    "print(\"Decimal: \", a_big_number, \" Length: \", len(str(a_big_number)))\n",
    "a_big_hex_number = str(hex(a_big_number))[2:]\n",
    "print(\"Hexidecimal: \", a_big_hex_number, \" Length: \", len(a_big_hex_number))\n",
    "a_big_byte_number = a_big_number.to_bytes(4,byteorder='big')\n",
    "print(\"Bytes:\", list(a_big_byte_number), \" Length: \", len(a_big_byte_number))\n",
    "a_big_base58_number = base58.b58encode(a_big_number.to_bytes(4,byteorder='big'))\n",
    "print(\"base58: \", a_big_base58_number.decode('ascii'), \" Length: \", len(a_big_base58_number))\n",
    "a_big_base58_number = base58.b58encode_int(a_big_number)\n",
    "print(\"base58 from int: \", a_big_base58_number.decode('ascii'), \" Length: \", len(a_big_base58_number))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s change the least significant digit now, the one on the right side (the `0`), and see what happens."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Decimal:  1234567891  Length:  10\n",
      "Hexidecimal:  499602d3  Length:  8\n",
      "Bytes: [73, 150, 2, 211]  Length:  4\n",
      "base58:  2t6V2J  Length:  6\n"
     ]
    }
   ],
   "source": [
    "a_big_number = 1234567891\n",
    "print(\"Decimal: \", a_big_number, \" Length: \", len(str(a_big_number)))\n",
    "a_big_hex_number = str(hex(a_big_number))[2:]\n",
    "print(\"Hexidecimal: \", a_big_hex_number, \" Length: \", len(a_big_hex_number))\n",
    "a_big_byte_number = a_big_number.to_bytes(4,byteorder='big')\n",
    "print(\"Bytes:\", list(a_big_byte_number), \" Length: \", len(a_big_byte_number))\n",
    "a_big_base58_number = base58.b58encode(a_big_number.to_bytes(4,byteorder='big'))\n",
    "print(\"base58: \", a_big_base58_number.decode('ascii'), \" Length: \", len(a_big_base58_number))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let’s next change one of the most significant digits (on the left side) and see what the result is."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Decimal:  1334567890  Length:  10\n",
      "Hexidecimal:  4f8be3d2  Length:  8\n",
      "Bytes: [79, 139, 227, 210]  Length:  4\n",
      "base58:  32w1YD  Length:  6\n"
     ]
    }
   ],
   "source": [
    "a_big_number = 1334567890\n",
    "print(\"Decimal: \", a_big_number, \" Length: \", len(str(a_big_number)))\n",
    "a_big_hex_number = str(hex(a_big_number))[2:]\n",
    "print(\"Hexidecimal: \", a_big_hex_number, \" Length: \", len(a_big_hex_number))\n",
    "a_big_byte_number = a_big_number.to_bytes(4,byteorder='big')\n",
    "print(\"Bytes:\", list(a_big_byte_number), \" Length: \", len(a_big_byte_number))\n",
    "a_big_base58_number = base58.b58encode(a_big_number.to_bytes(4,byteorder='big'))\n",
    "print(\"base58: \", a_big_base58_number.decode('ascii'), \" Length: \", len(a_big_base58_number))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that when we change the least significant digit, only the least significant digit changes for the other bases. But if we change a highly significant digit, all the other digits change in the other bases. This is because one decimal digit doesn't map cleanly to a fixed number of digits in any of the other bases. Since bytes and hex do map cleanly, you can change a highly significant digit and not affect the less significant digits in the other base, as you can see next:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "hex      bytes               binary                          decimal\n",
      "4f8be3d2 [79, 139, 227, 210] 1001111100010111110001111010010 1334567890\n",
      "5f8be3d2 [95, 139, 227, 210] 1011111100010111110001111010010 1603003346\n"
     ]
    }
   ],
   "source": [
    "print('hex', '     bytes', '              binary', '                         decimal')\n",
    "hex_number = \"4f8be3d2\"\n",
    "print(hex_number, list(bytes.fromhex(hex_number)), \"{0:b}\".format(int(hex_number,16)), int(hex_number,16))\n",
    "hex_number_2 = \"5f8be3d2\"\n",
    "print(hex_number_2, list(bytes.fromhex(hex_number_2)), \"{0:b}\".format(int(hex_number_2,16)), int(hex_number_2,16))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Bitcoin Addresses\n",
    "Now let's talk about bitcoin addresses. A bitcoin address has three components: \n",
    "- The version byte\n",
    "- The data\n",
    "- The Checksum"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The Version Byte\n",
    "The version byte gives us information about the context of the address.\n",
    "\n",
    "> 0x00 for P2PKH addresses on the main Bitcoin network (mainnet)  \n",
    "> 0x6f for P2PKH addresses on the Bitcoin testing network (testnet)  \n",
    "> 0x05 for P2SH addresses on mainnet  \n",
    "> 0xc4 for P2SH addresses on testnet  \n",
    "\n",
    "This gives wallets and other related software the ability to check that the address is being used where it’s expected. For example, if you paste in an address with the version byte `0x6f` into a mainnet wallet, the wallet will be able to show an error, letting you know that you probably don’t want to send funds to an address meant for testnet. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The Data\n",
    "This will depend on the type of address you’re generating, but for a P2PKH address, the data portion of an address is the hash (actually hashed twice: RIPEMD(SHA256(pubkey))) of a public key."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The Checksum\n",
    "Bitcoin Addresses are actually Base58Check encoded, **not** simply Base58. \n",
    "\n",
    "A checksum is used in computing for data validation. For an address, this can detect typos by comparing the data to the checksum and making sure we get what we expected.\n",
    "\n",
    "In Base58Check, you first double-hash the version byte and data with sha256. You take the first 4 bytes of the resulting hash and append it to the data, and then you convert the full (version byte + data + checksum) byte array into Base58."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A Note on Checksum collisions\n",
    "\n",
    "By only taking the first 4 bytes of the hash, the checksum field is far from immune to collisions. SHA256 properly generates any even distribution of outputs. 4 bytes can only encode $256^4$ values (4,294,967,296), so if you want to find a hash collision, you only need to try an expected 4.3 Billion values per collision. This is important for this discussion, but it isn't the end of the story.\n",
    "\n",
    "You might think at this point, \"Cool, so if I want to create a malicious bitcoin address, I'll just decode it to bytes, and start changing bytes until I find a checksum collision, and I'm set!\" But you'd be wrong. If you skimmed the Base58 discussion above, maybe take a closer read. **If you change some bytes in the middle of an address, when you go to encode it in base58, all of the lower significant digits are modified in base58.**\n",
    "\n",
    "So now you might think: \"Shit, so I have to modify the data, find a checksum collision, and then hope it encodes to the same last 4 digits? I'm screwed, bitcoin addresses are hard.\"\n",
    "\n",
    "It's true, bitcoin addresses are confusing, but no, it's not that difficult to create a malicious address, and that's because while the checksum has to be *valid*, it doesn't have to be the *same* checksum of the original address."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generating a Malicious address.\n",
    "\n",
    "Let's start simple. Let's say you want a valid bitcoin address with the same $A$ number of characters at the beginning, and the last digit to be the same. Let's assume that the resulting addresses will have an even distibution of final characters, since we're going to be changing a bunch of digits (some in the data, and all of the checksum).\n",
    "\n",
    "If it's an even distribution, for each value we try, we have a 1-in-58 probability that it will be the correct value.\n",
    "Let's give it a try!\n",
    "\n",
    "We'll try every value in one byte, that's 256 values, but the last character can be only one of the 58 characters in Bse58, so we should expect to see 256/58 = 4.4 matches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Note for the nerds: I'm not optimizing the code, instead I'm favoring readability.\n",
    "def cycle_through_one_byte(ver, data, digit, match):\n",
    "    dataarray = bytearray(data)\n",
    "    for i in range(256):\n",
    "        # Change a byte\n",
    "        dataarray[digit] = i\n",
    "        # Compute the double sha256 hash and take the first 4 bytes\n",
    "        check1 = sha256(sha256(ver+dataarray).digest()).digest()[:4]\n",
    "        # convert to base58\n",
    "        addr = base58.b58encode(ver+dataarray+check1)\n",
    "        # Check the last digit\n",
    "        if (addr[-1:] == match):\n",
    "            print('found: ', addr.decode('ascii'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "original_address ='1HU1LDBXUg73f2ro2e2dB3XY8cFoYLFgZZ'\n",
    "k = base58.b58decode(original_address)\n",
    "verbyte, data, check0 = k[0:1], k[1:-4], k[-4:]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cFoYLFgZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cFxYGxT7Z\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cFyeGfzQZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cG438ERVZ\n"
     ]
    }
   ],
   "source": [
    "cycle_through_one_byte(verbyte, data, 19, b'Z')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cFkFSmCQ1\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cFwPyAZ91\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cFysN4xF1\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cG4pxmyY1\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cGB5Rhk51\n"
     ]
    }
   ],
   "source": [
    "cycle_through_one_byte(verbyte, data, 19,  b'1')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cFnB2tYx2\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cFqTayv52\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cFzcYYU22\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cG2q3M6u2\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cG4RpLuB2\n",
      "CPU times: user 4.57 ms, sys: 680 µs, total: 5.25 ms\n",
      "Wall time: 4.68 ms\n"
     ]
    }
   ],
   "source": [
    "%time cycle_through_one_byte(verbyte, data, 19, b'2')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Looks pretty solid. I think this is dependent on which digit you're changing, though I don't know how to generalize at this time. For example, if you change the 10th byte instead of the 19th:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "found:  1HU1LDBXUg73f2rS5EwnM2GQLJizdsKtzZ\n",
      "found:  1HU1LDBXUg73f2rZo25fuygJ3nYycfXiyZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cFoYLFgZZ\n",
      "found:  1HU1LDBXUg73f2rqEaMT3tW5UkCwaKM2oZ\n",
      "found:  1HU1LDBXUg73f2rw7jtKPUSBjn5Jr7ykzZ\n",
      "found:  1HU1LDBXUg73f2rzReNZCjtzkz11r3GvJZ\n",
      "found:  1HU1LDBXUg73f2rzny6MrDPkefzYDh1k1Z\n",
      "found:  1HU1LDBXUg73f2s73TM2qGpcoPrRqdNtSZ\n",
      "found:  1HU1LDBXUg73f2sKYRrNoPgM6qaDAqb3MZ\n",
      "found:  1HU1LDBXUg73f2sf2Y413WxTLP8vqTGk6Z\n"
     ]
    }
   ],
   "source": [
    "cycle_through_one_byte(verbyte, data, 10, b'Z')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice how the addresses look significantly different from each other when not changing the least significant byte.\n",
    "\n",
    "But if we want the last digit to be a `1` there’s only one match."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "found:  1HU1LDBXUg73f2sgs9f2GtSEnq6YQ4sgH1\n"
     ]
    }
   ],
   "source": [
    "cycle_through_one_byte(verbyte, data, 10, b'1')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Though maybe that's just statistics being fun.\n",
    "\n",
    "Let's move on to 2 digit matches. With 2 digits we have $58^2$ so 3,364 options, But if we cycle through all values in 2 bytes, that's $256^2$ or 65,536. So we should expect to see, 19.5 matches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cycle_through_two_bytes(ver, data, match):\n",
    "    total_matches = 0\n",
    "    dataarray = bytearray(data)\n",
    "    for i in range(256):\n",
    "        # Change a byte\n",
    "        dataarray[19] = i\n",
    "        for j in range(256):\n",
    "            dataarray[18] = j\n",
    "            # Compute the double sha256 hash and take the first 4 bytes\n",
    "            check1 = sha256(sha256(ver+dataarray).digest()).digest()[:4]\n",
    "            # convert to base58\n",
    "            addr = base58.b58encode(ver+dataarray+check1)\n",
    "            # CHeck the last two digits\n",
    "            if (addr[-2:] == match):\n",
    "                total_matches+=1\n",
    "                print('found: ', addr.decode('ascii'))\n",
    "    print(\"Total Matches:\", total_matches)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bhL6FNdZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8b5VnZNvZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cY9otddZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cQhSD81ZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8aq5LtioZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8ay3kBjzZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8an97xSxZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cFoYLFgZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cXKEfctZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bd2iMNCZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8b99mNuQZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bt1pxGTZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8c7zDqXXZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cBVPbUQZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8b1nWCCgZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cSTME4jZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bp7Nc5dZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8c549A7UZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bxasQpoZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bRE69KdZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bVDxYCzZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8agwJoHTZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8aZWPiFsZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bdmu7CMZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bTr4n3aZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bsknhJ6ZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8b5xBt2fZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8agZgnthZZ\n",
      "Total Matches: 28\n"
     ]
    }
   ],
   "source": [
    "cycle_through_two_bytes(verbyte, data, b'ZZ')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8ae7QdPKaZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bdPxVDTaZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bJzuAiZaZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8b99L4dDaZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bb3swQDaZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bm3A3i8aZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bSekXz2aZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cAVywr3aZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bLif2joaZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bAmk3KXaZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8beCoj1JaZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8aZy1QL2aZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bwAkqDYaZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bmiDSWGaZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8c1CFZ9waZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bhH2ykPaZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8ayy1yF9aZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bPtonGxaZ\n",
      "Total Matches: 18\n"
     ]
    }
   ],
   "source": [
    "cycle_through_two_bytes(verbyte, data, b'aZ')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cQfuTxL11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bvp6pgm11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bkN6vHT11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8az3nv1o11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bMVM9AF11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8c1rC2wg11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8btWHoTC11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cWrnG9d11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bR8RfEb11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bEgqkAu11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cht15Xm11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cGVVvF811\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8agtFyKQ11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bjfNS2s11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bPkcyca11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8bGniJ6M11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cM4y1rG11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8biGTxzX11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8akWcvRn11\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XY8cXd3wQH11\n",
      "Total Matches: 20\n",
      "CPU times: user 968 ms, sys: 12.6 ms, total: 981 ms\n",
      "Wall time: 977 ms\n"
     ]
    }
   ],
   "source": [
    "%time cycle_through_two_bytes(verbyte, data, b'11')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pretty solid so far. Notice that I'm changing only the least significant byte in the data section. This means the whole first section of address is unchanged.\n",
    "\n",
    "Let's move on to 3 digits.\n",
    "\n",
    "With 3 digits we have $58^3$ so 195,112 options, But if we cycle through all values in 3 bytes, that's $256^3$ or 16,777,216. So we should expect to see, 86 matches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cycle_through_three_bytes(ver, data, match):\n",
    "    total_matches = 0\n",
    "    dataarray = bytearray(data)\n",
    "    for i in range(256):\n",
    "        # Change a byte\n",
    "        dataarray[19] = i\n",
    "        for j in range(256):\n",
    "            dataarray[18] = j\n",
    "            for k in range(256):\n",
    "                dataarray[17] = k\n",
    "                # Compute the double sha256 hash and take the first 4 bytes\n",
    "                check1 = sha256(sha256(ver+dataarray).digest()).digest()[:4]\n",
    "                # convert to base58\n",
    "                addr = base58.b58encode(ver+dataarray+check1)\n",
    "                # CHeck the last digit\n",
    "                if (addr[-3:] == match):\n",
    "                    total_matches+=1\n",
    "                    #print('found: ', addr)\n",
    "    print(\"Total Matches:\", total_matches)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1HU1LDBXUg73f2ro2e2dB3XY8cFoYLFgZZ\n"
     ]
    }
   ],
   "source": [
    "print(original_address)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total Matches: 79\n",
      "CPU times: user 4min 45s, sys: 422 ms, total: 4min 46s\n",
      "Wall time: 4min 46s\n"
     ]
    }
   ],
   "source": [
    "%time cycle_through_three_bytes(verbyte, data, b'gZZ')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Along with agreeing with our statistics so far, notice that the execution time has grown linearly with number of operations. 8ms for one byte, 1s for two bytes (256x the ops of one byte), 305s for three bytes (256x the ops of two bytes)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's move on to 4 digits. Which for some reason is a standard, \"check the first and last 4\". This is probably due to misunderstanding the checksum field. 4 base58 digits can only encode 3 bytes worth of data $(58^4 < 256^3)$, and as mentioned earlier, you can't decode the least significant digits only, you need the full value to decode.\n",
    "\n",
    "With 4 digits we have $58^4$ so 11,316,496 options, But we could cycle through all values in only 3 bytes because that's $256^3$ or 16,777,216. So we should expect to see, 1.5 matches.\n",
    "\n",
    "We can modifed a fourth byte, but instead of using all 256 available values, just use 4 values. This will boost the expect matches to 6 (scanning over $256^3 * 4$ or 67,108,864 values) while expecting ~20min execution time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cycle_through_four_bytes(ver, data, match):\n",
    "    total_matches = 0\n",
    "    dataarray = bytearray(data)\n",
    "    for i in range(4):\n",
    "        # Change a byte\n",
    "        dataarray[19] = i\n",
    "        for j in range(256):\n",
    "            dataarray[18] = j\n",
    "            for k in range(256):\n",
    "                dataarray[17] = k\n",
    "                for l in range(256):\n",
    "                    dataarray[16] = l\n",
    "                    # Compute the double sha256 hash and take the first 4 bytes\n",
    "                    check1 = sha256(sha256(ver+dataarray).digest()).digest()[:4]\n",
    "                    # convert to base58\n",
    "                    addr = base58.b58encode(ver+dataarray+check1)\n",
    "                    # Check the last four digits\n",
    "                    if (addr[-4:] == match):\n",
    "                        total_matches+=1\n",
    "                        print('found: ', addr.decode('ascii'))\n",
    "    print(\"Total Matches:\", total_matches)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1HU1LDBXUg73f2ro2e2dB3XY8cFoYLFgZZ\n"
     ]
    }
   ],
   "source": [
    "print(original_address)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "found:  1HU1LDBXUg73f2ro2e2dB3YAZkLzhmFgZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3YAsH6C6bFgZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XeYUj3vkFgZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3Xx2JDVAbFgZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3Y6ZfhPG1FgZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3Y5QbkmDLFgZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3YEXcH33JFgZZ\n",
      "found:  1HU1LDBXUg73f2ro2e2dB3XhYFrPERFgZZ\n",
      "Total Matches: 8\n",
      "CPU times: user 18min 12s, sys: 699 ms, total: 18min 12s\n",
      "Wall time: 23min 49s\n"
     ]
    }
   ],
   "source": [
    "%time cycle_through_four_bytes(verbyte, data, b'FgZZ')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Great. So we can create duplicate valid first4...last4 bitcoin addresses at a rate of once every 3 minutes or so."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now what about fixing the last 5 digits? \n",
    "\n",
    "Now we have $58^5$ = 656,356,768, which is 40x more than 3 bytes, but still well less than the full $256^4$ (4,294,967,296). That'll be roughly 3 hours of execution to generate one, which, while being more than feasible using off-the-shelf hardware, is more than I’m willing to put my computer through at the moment, so we'll leave that as an exercise for you to try at home if you’re interested."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And what about 6 or 7 digits?\n",
    "\n",
    "6 digits gives us a 1 in 38,068,692,544 chance to find a valid address. That's interesting because now we're above $256^4$ which is the hash-collision scale. At this point, I think it will be faster to invert this process.\n",
    "\n",
    "Instead of searching for matches to our base58 pattern, let's search through base58 address values to find valid addresses.\n",
    "A given base58 string should have a 1-in-4,294,967,296 chance of being valid $(256^4)$.\n",
    "\n",
    "Based on my earlier benchmarks, that should take roughly 20 hours of execution (4200M tries per collision / 3M tries per minute $\\approx$ 20hrs) . To get to that number of options, you would need to be able to change 6 digits of the address, and I don't think they need to be contiguous.\n",
    "\n",
    "If any readers want to try this, I'd love to see the results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bech32"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[BIP173](https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki) introduced a new addresses encoding for bitcoin addresses that use native segwit: Bech32. If you guessed that Bech32 is a base 32 encoding, you're right (almost). If you noticed that 32 is a power of 2 ($2^5$) you may also guess that this would make modifying Bech32 addresses easier, and you're right again! Don't feel bad if you thought segwit address would be more resilient to this kind of modification. Several smart folks I talked to about this post also thought that at first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "import bech32\n",
    "import itertools"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is the list of characters used in Bech32 encoding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "023456789acdefghjklmnpqrstuvwxyz\n"
     ]
    }
   ],
   "source": [
    "print(''.join(sorted(list(bech32.CHARSET))))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "an_address = 'bc1pw508d6qejxtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k7grplx'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Like non-segwit addresses, segwit addresses contain several pieces of information:\n",
    "\n",
    "### Human Readable Part\n",
    "This is analogous to the version bytes in a non-segwit address. `bc` for mainnet addresses, `tb` for testnet addresses. The Bech32 spec allows for many different values here, but only these two options are currently used for bitcoin addresses.\n",
    "\n",
    "### Separator\n",
    "This is the number `1`. Note that `1` isn't part of the Bech32 encoding character set. This means that the data part will not contain the number `1`.\n",
    "\n",
    "### Data\n",
    "The data part of the address starts with the 'witness version' character, followed by the characters that actually determine the address. Currently, that is 20 bytes of data for single signature addresses or 32 bytes for script adddresses (which includes multi-sig addresses). These bytes are first converted to 5-bit chunks so that they can be encoded with the base32 character set.\n",
    "\n",
    "### Checksum\n",
    "Like non-segwit addresses, a checksum is appended to the end of the address. Bech32 addresses use a 6-character checksum that is defined in the BIP. So the last 6 characters in a segwit address are the checksum."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Segwit Address Manipulation\n",
    "\n",
    "I'll follow the same approach that I used above for non-segwit addresses, but this time it's easier because modifying one character only changes that character and the last 6 characters that are the checksum. So I'll cycle through the 32 values for a particular character and see if the last character or two in the checksum matches the original address"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cycle_through_one_char_and_match_one(address, digit):\n",
    "    hrp, data = bech32.bech32_decode(an_address)\n",
    "    checksum = bech32.bech32_create_checksum(hrp, data)\n",
    "    for i in range(32):\n",
    "        # Change a byte\n",
    "        data[digit] = i\n",
    "        # Compute the checksum\n",
    "        check1 = bech32.bech32_create_checksum(hrp, data)\n",
    "        # Check the last digit\n",
    "        if (check1[-1:] == checksum[-1:]):\n",
    "            print('found: ', bech32.bech32_encode(hrp, data))\n",
    "                \n",
    "def cycle_through_two_char_and_match_one(address, digits):\n",
    "    hrp, data = bech32.bech32_decode(an_address)\n",
    "    checksum = bech32.bech32_create_checksum(hrp, data)\n",
    "    for values in itertools.product(range(32), repeat=2):\n",
    "        # Change the bytes\n",
    "        for indx, digit in enumerate(digits):\n",
    "            data[digit] = values[indx]\n",
    "        # Compute the checksum\n",
    "        check1 = bech32.bech32_create_checksum(hrp, data)\n",
    "        # Check the last digit\n",
    "        if (check1[-1:] == checksum[-1:]):\n",
    "            print('found: ', bech32.bech32_encode(hrp, data))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bc1pw508d6qejxtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k7grplx\n",
      "found:  bc1pw508d6qejxtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k7grplx\n",
      "CPU times: user 6.76 ms, sys: 1.02 ms, total: 7.78 ms\n",
      "Wall time: 7.05 ms\n"
     ]
    }
   ],
   "source": [
    "print(an_address)\n",
    "%time cycle_through_one_char_and_match_one(an_address, 8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It shouldn't be a surprise that we don't get any modified matches when cycling through one character. There is a 1/32 chance of getting checksum with the same last digit, and we only tried 32 values (and found the original address).\n",
    "\n",
    "We should expect if we cycle through a second character and only match on the last character that we will find 32 matches:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "bc1pw508d6qejxtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k7grplx\n",
      "found:  bc1pw508d6qqj8tdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kshclgx\n",
      "found:  bc1pw508d6qpjjtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7ken6wqx\n",
      "found:  bc1pw508d6qzjytdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kzlu5cx\n",
      "found:  bc1pw508d6qrj3tdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7ktm79sx\n",
      "found:  bc1pw508d6qyjptdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7ka8sfpx\n",
      "found:  bc1pw508d6q9j5tdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k5rjcfx\n",
      "found:  bc1pw508d6qxjztdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k005z3x\n",
      "found:  bc1pw508d6q8jhtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kxtknex\n",
      "found:  bc1pw508d6qgjttdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k27g66x\n",
      "found:  bc1pw508d6qfj7tdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kr62tjx\n",
      "found:  bc1pw508d6q2jgtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kckv32x\n",
      "found:  bc1pw508d6qtjatdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k3jwqzx\n",
      "found:  bc1pw508d6qvjdtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k8wqvnx\n",
      "found:  bc1pw508d6qdjctdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kw2zamx\n",
      "found:  bc1pw508d6qwjwtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k4xy8rx\n",
      "found:  bc1pw508d6q0jmtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kuzxktx\n",
      "found:  bc1pw508d6qsjltdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kd9349x\n",
      "found:  bc1pw508d6q3j2tdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kypnydx\n",
      "found:  bc1pw508d6qjjutdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kld474x\n",
      "found:  bc1pw508d6qnjftdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kkfh0ax\n",
      "found:  bc1pw508d6q5jetdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kq4ervx\n",
      "found:  bc1pw508d6q4jvtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kf3mjyx\n",
      "found:  bc1pw508d6qkj6tdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kjaagux\n",
      "found:  bc1pw508d6qhj0tdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kmele5x\n",
      "found:  bc1pw508d6qcjntdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7khvpshx\n",
      "found:  bc1pw508d6qejxtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k7grplx\n",
      "found:  bc1pw508d6q6jstdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k9y9m8x\n",
      "found:  bc1pw508d6qmj9tdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kvq820x\n",
      "found:  bc1pw508d6quj4tdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k6ufx7x\n",
      "found:  bc1pw508d6qajqtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kncthkx\n",
      "found:  bc1pw508d6q7jktdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kg5ddwx\n",
      "found:  bc1pw508d6qljrtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kps0uxx\n",
      "CPU times: user 119 ms, sys: 3.95 ms, total: 123 ms\n",
      "Wall time: 121 ms\n"
     ]
    }
   ],
   "source": [
    "print(an_address)\n",
    "%time cycle_through_two_char_and_match_one(an_address, [8, 10])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And 32 matches is what we find! And notice that only the two characters I chose to modify have changed, along with the \n",
    "five character before the last character (after the 'k').\n",
    "\n",
    "Now for cycling through 3 characters looking for matches of the last two characters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cycle_through_three_char_and_match_two(address, digits):\n",
    "    hrp, data = bech32.bech32_decode(an_address)\n",
    "    checksum = bech32.bech32_create_checksum(hrp, data)\n",
    "    for values in itertools.product(range(32), repeat=3):\n",
    "        # Change the bytes\n",
    "        for indx, digit in enumerate(digits):\n",
    "            data[digit] = values[indx]\n",
    "        # Compute the checksum\n",
    "        check1 = bech32.bech32_create_checksum(hrp, data)\n",
    "        # Check the last two digits\n",
    "        if (check1[-2:] == checksum[-2:]):\n",
    "            print('found: ', bech32.bech32_encode(hrp, data))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "found:  bc1pw508d6qqj6t9g4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7krrerlx\n",
      "found:  bc1pw508d6qpjgtfg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kyewqlx\n",
      "found:  bc1pw508d6qzjhtag4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kd779lx\n",
      "found:  bc1pw508d6qrj9t3g4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k2yfxlx\n",
      "found:  bc1pw508d6qyjqtug4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7klsh0lx\n",
      "found:  bc1pw508d6q9jjtsg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kc2qvlx\n",
      "found:  bc1pw508d6qxjdtyg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k3dsflx\n",
      "found:  bc1pw508d6q8jltgg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kkh82lx\n",
      "found:  bc1pw508d6qgj8t7g4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kjv9mlx\n",
      "found:  bc1pw508d6qfj4tjg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k4kjclx\n",
      "found:  bc1pw508d6q2j2txg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7ku3zalx\n",
      "found:  bc1pw508d6qtjct2g4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kmt47lx\n",
      "found:  bc1pw508d6qvjat8g4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kwlthlx\n",
      "found:  bc1pw508d6qdj0ttg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kf9u5lx\n",
      "found:  bc1pw508d6qwjstlg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kqzv3lx\n",
      "found:  bc1pw508d6q0jztng4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k8cmjlx\n",
      "found:  bc1pw508d6qsjft6g4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kgag6lx\n",
      "found:  bc1pw508d6q3jmtkg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k08lelx\n",
      "found:  bc1pw508d6qjjytzg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kxq0ulx\n",
      "found:  bc1pw508d6qnjktwg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kp6cllx\n",
      "found:  bc1pw508d6q5jntrg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k5wxklx\n",
      "found:  bc1pw508d6q4jpt0g4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kn534lx\n",
      "found:  bc1pw508d6qkj7tmg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k6npslx\n",
      "found:  bc1pw508d6qhjvthg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kafknlx\n",
      "found:  bc1pw508d6qcj5tpg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kej5zlx\n",
      "found:  bc1pw508d6qejxtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k7grplx\n",
      "found:  bc1pw508d6q6jeteg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kh0nylx\n",
      "found:  bc1pw508d6qmjtt4g4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7ks4y8lx\n",
      "found:  bc1pw508d6qujwtcg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k9p6wlx\n",
      "found:  bc1pw508d6qajut5g4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kzmddlx\n",
      "found:  bc1pw508d6q7jrtqg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7ktuaglx\n",
      "found:  bc1pw508d6qlj3tvg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kvx2tlx\n",
      "CPU times: user 2.82 s, sys: 25.1 ms, total: 2.84 s\n",
      "Wall time: 2.85 s\n"
     ]
    }
   ],
   "source": [
    "%time cycle_through_three_char_and_match_two(an_address, [8, 10, 12])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now we cycle through 4 characters matching the last 3 characters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cycle_through_four_char_and_match_three(address, digits):\n",
    "    hrp, data = bech32.bech32_decode(an_address)\n",
    "    checksum = bech32.bech32_create_checksum(hrp, data)\n",
    "    for values in itertools.product(range(32), repeat=4):\n",
    "        # Change the bytes\n",
    "        for indx, digit in enumerate(digits):\n",
    "            data[digit] = values[indx]    \n",
    "        # Compute the checksum\n",
    "        check1 = bech32.bech32_create_checksum(hrp, data)\n",
    "        # Check the last digit\n",
    "        if (check1[-3:] == checksum[-3:]):\n",
    "            print('found: ', bech32.bech32_encode(hrp, data))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "found:  bc1pw508d6qqjftlgmy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kuhwplx\n",
      "found:  bc1pw508d6qpj4tygjy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kln3plx\n",
      "found:  bc1pw508d6qzjctqgfy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k6leplx\n",
      "found:  bc1pw508d6qrjytmgqy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kemxplx\n",
      "found:  bc1pw508d6qyjztggky5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7ks8fplx\n",
      "found:  bc1pw508d6q9j7tngly5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7knrkplx\n",
      "found:  bc1pw508d6qxjnthgyy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kk07plx\n",
      "found:  bc1pw508d6q8j0tvgdy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k4tpplx\n",
      "found:  bc1pw508d6qgjltcgpy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7ky7qplx\n",
      "found:  bc1pw508d6qfjrtrggy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k86lplx\n",
      "found:  bc1pw508d6q2jwt8gny5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kzkhplx\n",
      "found:  bc1pw508d6qtjjtug6y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kpjgplx\n",
      "found:  bc1pw508d6qvj5t0gvy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kgw8plx\n",
      "found:  bc1pw508d6qdjgt5g9y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kt2cplx\n",
      "found:  bc1pw508d6qwj9tsg7y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kwxsplx\n",
      "found:  bc1pw508d6q0jettghy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kdz0plx\n",
      "found:  bc1pw508d6qsjvt3gxy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k99jplx\n",
      "found:  bc1pw508d6q3jst2g0y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kxpdplx\n",
      "found:  bc1pw508d6qjjatwg5y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7krd9plx\n",
      "found:  bc1pw508d6qnjpt4gay5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kqf6plx\n",
      "found:  bc1pw508d6q5j8txgty5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kf44plx\n",
      "found:  bc1pw508d6q4jmtagzy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k232plx\n",
      "found:  bc1pw508d6qkjktegey5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k0azplx\n",
      "found:  bc1pw508d6qhj2tzgsy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kveaplx\n",
      "found:  bc1pw508d6qcj6tkguy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kavuplx\n",
      "found:  bc1pw508d6qejxtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k7grplx\n",
      "found:  bc1pw508d6q6jttfgwy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kmytplx\n",
      "found:  bc1pw508d6qmjhtjg8y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kcq5plx\n",
      "found:  bc1pw508d6quj3tpg3y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k3umplx\n",
      "found:  bc1pw508d6qajdt6gcy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kjcyplx\n",
      "found:  bc1pw508d6q7jqt7gry5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kh5vplx\n",
      "found:  bc1pw508d6qljut9g2y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k5snplx\n",
      "CPU times: user 1min 35s, sys: 124 ms, total: 1min 35s\n",
      "Wall time: 1min 35s\n"
     ]
    }
   ],
   "source": [
    "%time cycle_through_four_char_and_match_three(an_address, [8, 10, 12, 14])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As above, I'll just cycle through 5 characters 4 times to save time, we should expect to see 4 matches."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cycle_through_five_char_and_match_four(address, digits):\n",
    "    hrp, data = bech32.bech32_decode(an_address)\n",
    "    checksum = bech32.bech32_create_checksum(hrp, data)\n",
    "    for values in itertools.product(range(4), range(32), range(32), range(32), range(32)):\n",
    "        # Change the bytes\n",
    "        for indx, digit in enumerate(digits):\n",
    "            data[digit] = values[indx]        \n",
    "        # Compute the checksum\n",
    "        check1 = bech32.bech32_create_checksum(hrp, data)\n",
    "        # Check the last four characters\n",
    "        if (check1[-4:] == checksum[-4:]):\n",
    "            print('found: ', bech32.bech32_encode(hrp, data))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "found:  bc1pw508dqq2j7tvguy5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k3arplx\n",
      "found:  bc1pw508dpqwjrtzgey5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7ke5rplx\n",
      "found:  bc1pw508dzqzjdtsgky5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kp0rplx\n",
      "found:  bc1pw508drqxjst7gny5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kfxrplx\n",
      "CPU times: user 7min 26s, sys: 649 ms, total: 7min 27s\n",
      "Wall time: 7min 27s\n"
     ]
    }
   ],
   "source": [
    "%time cycle_through_five_char_and_match_four(an_address, [6, 8, 10, 12, 14])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Extrapolating out, looks like it would take ~45min to do a full run of 5 alterations and we should expect ~32 addresses with the same last 4 characters.\n",
    "\n",
    "it would take ~24 hours to do a full run of 6 alterations and we would expect to see ~32 addresses with the same last 5 characters, and this would be the same timescale to find collisions of the full last 6 characters. And unlike base58Check, the least significant digits in a bech32 address won't have changed when we do this brute-force algorithm, so once the checksum matches, we can loop over any arbitrary set of characters in the data section of the address."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To illustrate why the exercises above matter, imagine you’re trying to send your bitcoin from an exchange to your hardware wallet. While none of what we produced in the scripts above would allow an attacker to steal your bitcoin, it could let an attacker destroy that bitcoin, if you don’t check the full address. When you get that address from your wallet's interface, you might copy-paste it in to the exchange website. If the attacker has gained access to your system’s clipboard, this is an opportunity to replace that address with one that looks extremely similar to the valid address you pasted in. To a user who only verifies the first and last few characters, these will look identical.\n",
    "\n",
    "The code presented here was run on a standard laptop using unoptimized code, but this type of code is trivial to parallelize, and would be simple to optimize for speed. I do not doubt that the most difficult address manipulation could happen during the time it takes for a page to load. Because of this, **I can only recommend that bitcoin users verify every character in every address they interact with.**"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}