{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
Peter Norvig
June 2021
\n", "\n", "# Split the States\n", "\n", "The [538 Riddler for 11 June 2021](https://fivethirtyeight.com/features/can-you-split-the-states/) poses this problem (paraphrased):\n", "\n", "> Given a map of the lower 48 states of the USA, remove a subset of the states so that the map is cut into two disjoint contiguous regions that are near-halves by area. Call the regions *A* and *B*, where *A* has the larger area. You can treat Michigan’s upper and lower peninsulas as two non-adjacent \"states,\" for a total of 49. \n", ">\n", "> To be precise, 538's question is: \n", ">\n", "> **1.** What states should you remove to maximize the area of *B*? What is *B*'s area and percent of the country's area?\n", ">\n", "> There is some ambiguity in the phrase \"near-halves by area\" and [Philip Bump](https://twitter.com/pbump/status/1400185939629117442) is interested in a second question:\n", ">\n", "> **2.** What states should you remove to minimize the difference of the area of *A* and the area of *B*? \n", ">\n", "> Philip Bump hypothesized that {IL, MO, OK, NM} is the best subset to remove. Is he right?\n", "\n", "# Vocabulary terms \n", "\n", "Let's start by clarifying some concepts:\n", "- **State**: denoted by the standard 2-letter abbreviations like `'CA'` (plus `'UP'` and `'LP'` for the Michigan peninsulas). \n", "- **States**: a set of states; implemented as a (hashable) `frozenset`. I'll use `states('OR CA')` for `frozenset({'OR', 'CA'})`.\n", "- **Region**: a set of states that are **contiguous**—they are all connected by a single tree of **neighbor** relations.\n", "- **Neighbor**: a relation saying two states share a border. Implemented as [adjacency sets](https://en.wikipedia.org/wiki/Adjacency_list) in the dict `neighbors`.\n", "- **Cut**: a set of states that, when removed from the map, cuts the map into disjoint regions. \n", "- **Split**: a tuple `(A, B, C)`, where `A` and `B` are the two regions made by the cut `C`, with `A` larger than `B`.\n", "- **Border**: the states on the edge of the map (neighbors of Canada, Mexico, Atlantic, or Pacific).\n", "- **Area**: Each state has an area in square miles, given by the `areas` dict, and a region has a total area, given by the function `area`.\n", "\n", "\n", "Code to implement the concepts: " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from typing import *\n", "from collections import defaultdict\n", "\n", "State = str # Two-letter abbrerviation\n", "States = frozenset # Any set of states\n", "Region = frozenset # A contiguous set of states\n", "Split = Tuple[Region, Region, Region] # (A, B, C) = (large region, small region, cut)\n", "\n", "def states(string) -> States: \"Set of states\"; return States(string.split())\n", "def statedict(**dic) -> dict: \"{State:States}\"; return {ST: states(dic[ST]) for ST in dic}\n", "\n", "neighbors = statedict( # https://theincidentaleconomist.com/wordpress/list-of-neighboring-states-with-stata-code/\n", " AK='', AL='FL GA MS TN', AR='LA MO MS OK TN TX', AZ='CA CO NM NV UT', CA='AZ NV OR', \n", " CO='AZ KS NE NM OK UT WY', CT='MA NY RI', DC='MD VA', DE='MD NJ PA', FL='AL GA', \n", " GA='AL FL NC SC TN', HI='', IA='IL MN MO NE SD WI', ID='MT NV OR UT WA WY', IL='IA IN KY MO WI', \n", " IN='IL KY LP MI OH', KS='CO MO NE OK', KY='IL IN MO OH TN VA WV', LA='AR MS TX', \n", " MA='CT NH NY RI VT', MD='DC DE PA VA WV', ME='NH', MI='IN OH WI', MN='IA ND SD WI', \n", " MO='AR IA IL KS KY NE OK TN', MS='AL AR LA TN', MT='ID ND SD WY', NC='GA SC TN VA', \n", " ND='MN MT SD', NE='CO IA KS MO SD WY', NH='MA ME VT', NJ='DE NY PA', NM='AZ CO OK TX UT', \n", " NV='AZ CA ID OR UT', NY='CT MA NJ PA VT', OH='IN KY LP MI PA WV', OK='AR CO KS MO NM TX', \n", " OR='CA ID NV WA', PA='DE MD NJ NY OH WV', RI='CT MA', SC='GA NC', SD='IA MN MT ND NE WY', \n", " TN='AL AR GA KY MO MS NC VA', TX='AR LA NM OK', UT='AZ CO ID NM NV WY', VA='DC KY MD NC TN WV', \n", " VT='MA NH NY', WA='ID OR', WI='IA IL MI MN UP', WV='KY MD OH PA VA', WY='CO ID MT NE SD UT', \n", " UP='WI', LP='IN OH')\n", "\n", "def area(states) -> int: \"Total area\"; return sum(areas[s] for s in states)\n", "\n", "areas = dict( # https://www.census.gov/geographies/reference-files/2010/geo/state-area.html\n", " AK=665384, AL=52420, AZ=113990, AR=53179, CA=163695, CO=104094, CT=5543, DE=2489, DC=68, \n", " FL=65758, GA=59425, HI=10932, ID=83569, IL=57914, IN=36420, IA=56273, KS=82278, KY=40408, \n", " LA=52378, ME=35380, MD=12406, MA=10554, MI=96714, MN=86936, MS=48432, MO=69707, MT=147040, \n", " NE=77348, NV=110572, NH=9349, NJ=8723, NM=121590, NY=54555, NC=53819, ND=70698, OH=44826, \n", " OK=69899, OR=98379, PA=46054, RI=1545, SC=32020, SD=77116, TN=42144, TX=268596, UT=84897, \n", " VT=9616, VA=42775, WA=71298, WV=24230, WI=65496, WY=97813, UP=16377, LP=80337) \n", "\n", "# Borders:\n", "north = states('WA ID MT ND MN WI MI UP IL IN LP OH PA NY VT NH ME') \n", "south = states('CA AZ NM TX LA MS AL FL') \n", "west = states('WA OR CA')\n", "east = states('ME NH MA RI CT NY NJ DE MD VA NC SC GA FL')\n", "border = north | south | west | east\n", "\n", "# \"Countries\":\n", "usa50 = States(areas) - states('DC UP LP') # 50 actual US states\n", "usa49 = States(areas) - states('AK HI DC MI') # lower 49 \"states\": MI split into UP, LP\n", "usa48 = States(areas) - states('AK HI DC UP LP') # lower 48 states\n", "four = states('UT CO AZ NM') # The \"four corners\" states\n", "western = states('WA OR CA ID NV UT AZ MT WY CO NM') # The 11 states west of the Rockies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Strategy for answering the two questions\n", "\n", "My overall strategy:\n", "- Generate a large number of **cuts**. \n", "- For each cut *C*, determine the **split** into regions *A* and *B*. Discard cuts that don't produce exactly two regions.\n", "- Find the split that **maximizes** the area of *B*, and the split that **minimizes** the difference in area of *A* and *B*.\n", "\n", "# Making cuts\n", "\n", "Is it feasible to consider all possible cuts? A cut is a subset of the 49 states, so there are 249 or 500 trillion possible cuts, so **no**, we can't look at them all. I have four ideas to reduce the number of cuts considered:\n", "- **Limit the total area in a cut.** A large area in the cut means there won't be much area left to make *B* big. \n", "- **Limit the number of states in a cut.** Similarly, if there are too many states in a cut, there won't be many left for *A* or *B*.\n", "- **Make cuts contiguous.** Noncontiguous cuts can't be optimal for question 1, so I won't consider them for now.\n", "- **Make cuts go border-to-border.** A cut can produce exactly two regions only if (a) the cut runs from one place on the border to another place on the border or (b) the cut forms a \"donut\" that surrounds some interior region. The US map isn't big enough to support a decent-sized donut (there are only 14 non-border states, and only KS and NE are not neighbors of a border state). \n", "\n", "By default, the function `make_cuts` will yield all cuts that are contiguous regions up to twice the area and twice the number of states as the {IL, MO, NM, OK} cut, as long as they go from the north border of the US to the south border.\n", "\n", "It starts by building a set of regions where each region contains a single `start` state. Then in each iteration of the `while` loop, it yields each region from the current set of regions that intersects the `end` states, and creates a new set of regions formed by adding a neighboring state to a current region in all possible ways, as long as the area does not exceed `maxarea` and the size does not exceed `maxsize`. (On each iteration all the regions have the same size.)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "maxarea = 2 * area(states('IL MO NM OK'))\n", "\n", "def make_cuts(country, maxsize=8, maxarea=maxarea, start=north, end=south) -> Iterator[Region]:\n", " \"\"\"All contiguous regions up to `maxsize` and `maxarea` that contain a `start` and `end` state.\"\"\"\n", " regions = {Region({s}) for s in start & country} \n", " while regions:\n", " yield from filter(end.intersection, regions) \n", " regions = {region | {s1}\n", " for region in regions if len(region) + 1 <= maxsize \n", " for s in region for s1 in (neighbors[s] & country) - region\n", " if area(region) + areas[s1] <= maxarea} " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, the north-south cuts of size up to 3 states:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{frozenset({'AZ', 'ID', 'UT'}),\n", " frozenset({'ID', 'NM', 'UT'}),\n", " frozenset({'CA', 'ID', 'OR'}),\n", " frozenset({'CA', 'ID', 'NV'}),\n", " frozenset({'AZ', 'ID', 'NV'}),\n", " frozenset({'CA', 'OR', 'WA'})}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "set(make_cuts(usa49, 3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Making splits\n", "\n", "Now, given some cuts, the function `make_splits` creates **splits**: tuples `(A, B, C)`, where `A` and `B` are the two regions defined by the cut `C`, with `A` larger than `B`. A split requires that the cut divides the country into exactly two regions, but not all cuts do that. For example, the cut {WA, OR, CA} leaves only one region (consisting of all the other states). The cut {ID, OR, NV, AZ} leaves three regions: {WA}, {CA}, and {everything else}. To verify whether the cut makes two regions, `make_splits` first finds a maximal-size contiguous region `A` from the non-cut states, then finds a region `B` from the remaining states, then checks that `A` and `B` are non-empty and make up all the non-cut states.\n", "\n", "We find the extent of a region with the `floodfill` [algorithm](https://en.wikipedia.org/wiki/Flood_fill): it maintains a mutable `region` and a `frontier` of the states that neighbor the region but are not in the region. We iterate adding the frontier into the region and computing a new frontier until there is no new frontier. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def make_splits(country, cuts) -> Iterator[Split]:\n", " \"\"\"For each cut C, find regions A and B and yield the Split (A, B, C) if valid.\"\"\"\n", " for C in cuts:\n", " noncut = country - C\n", " A = floodfill(noncut) \n", " B = floodfill(noncut - A)\n", " if A and B and (A | B | C == country):\n", " if area(B) > area(A): A, B = B, A # Ensure A larger than B\n", " B, A = sorted([A, B], key=area) \n", " yield (A, B, C)\n", " \n", "def floodfill(legal: States) -> Region:\n", " \"\"\"Starting at one state, fill out to all legal contiguous states.\"\"\"\n", " region = set() \n", " frontier = {min(legal)} if legal else None\n", " while frontier:\n", " region |= frontier\n", " frontier = {s1 for s in frontier for s1 in neighbors[s]\n", " if s1 in legal and s1 not in region}\n", " return Region(region)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For example, the splits (*A*, *B*, *C*) of the western states from cuts *C* of size up to 3:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{(frozenset({'CO', 'MT', 'NM', 'WY'}),\n", " frozenset({'CA', 'NV', 'OR', 'WA'}),\n", " frozenset({'AZ', 'ID', 'UT'})),\n", " (frozenset({'CO', 'MT', 'NM', 'UT', 'WY'}),\n", " frozenset({'CA', 'OR', 'WA'}),\n", " frozenset({'AZ', 'ID', 'NV'})),\n", " (frozenset({'AZ', 'CO', 'MT', 'NM', 'UT', 'WY'}),\n", " frozenset({'OR', 'WA'}),\n", " frozenset({'CA', 'ID', 'NV'})),\n", " (frozenset({'AZ', 'CO', 'MT', 'NM', 'NV', 'UT', 'WY'}),\n", " frozenset({'WA'}),\n", " frozenset({'CA', 'ID', 'OR'}))}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "set(make_splits(western, make_cuts(western, 3)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The answers\n", "\n", "The function `answers` puts it all together: makes cuts; makes splits from those cuts; finds the splits that answer the two questions; and depending on the value of the parameter `do`, either prints information in a pretty format, or returns the four values, or both. By default, just print." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def answers(country, maxsize=8, maxarea=maxarea, start=north, end=south, do='print') -> Optional[tuple]:\n", " \"\"\"Find the splits that answer the 2 questions.\n", " Print information in pretty format if 'print' is a substring of `do`.\n", " Return the tuple (cuts, splits, answer1, answer2) if 'return' is a substring of `do`.\"\"\"\n", " cuts = list(make_cuts(country, maxsize, maxarea, start, end))\n", " splits = list(make_splits(country, cuts))\n", " answer1 = max(splits, key=lambda s: area(s[1]))\n", " answer2 = min(splits, key=lambda s: area(s[0]) - area(s[1]))\n", " if 'print' in do:\n", " print(f'{len(country)} states ⇒ {len(cuts):,d} cuts',\n", " f'(maxsize ≤ {maxsize}, area ≤ {maxarea:,d}) ⇒ {len(splits):,d} splits.')\n", " show('1. Split that maximizes area(B)', country, answer1)\n", " show('2. Split that minimizes ∆ = area(A) - area(B)', country, answer2)\n", " if 'return' in do:\n", " return cuts, splits, answer1, answer2\n", " \n", "def show(title, country, split):\n", " \"\"\"Print a title, and a summary of the split in four rows. The columns shown are:\n", " 'region name|area|percent of country area|number of states in region|states in region'.\n", " The ∆ row of the table is not a region; it is the difference in area between A and B.\"\"\"\n", " A, B, C = split\n", " def print_row(name, region, sqmi): \n", " statelist = f'{len(region):2d}|{{{\",\".join(sorted(region))}}}' if region else ''\n", " print(f'{name}|{sqmi:9,d}|{sqmi/area(country):5.1%}|{statelist}')\n", " print(f'\\n{title}:')\n", " print_row('A', A, area(A))\n", " print_row('B', B, area(B))\n", " print_row('C', C, area(C))\n", " print_row('∆', '', area(A) - area(B))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "49 states ⇒ 43,901 cuts (maxsize ≤ 8, area ≤ 638,220) ⇒ 14,149 splits.\n", "\n", "1. Split that maximizes area(B):\n", "A|1,345,558|43.1%|29|{AL,AR,CT,DE,FL,GA,IN,KS,KY,LA,LP,MA,MD,ME,MS,NC,NH,NJ,NY,OH,OK,PA,RI,SC,TN,TX,VA,VT,WV}\n", "B|1,344,149|43.1%|15|{AZ,CA,IA,ID,MN,MT,ND,NV,OR,SD,UP,UT,WA,WI,WY}\n", "C| 430,653|13.8%| 5|{CO,IL,MO,NE,NM}\n", "∆| 1,409| 0.0%|\n", "\n", "2. Split that minimizes ∆ = area(A) - area(B):\n", "A|1,267,033|40.6%|14|{AZ,CA,IA,ID,MN,MT,ND,NV,OR,UP,UT,WA,WI,WY}\n", "B|1,266,994|40.6%|27|{AL,AR,CT,DE,FL,GA,KS,KY,LA,LP,MA,MD,ME,MS,NC,NH,NJ,NY,OH,OK,PA,RI,SC,TX,VA,VT,WV}\n", "C| 586,333|18.8%| 8|{CO,IL,IN,MO,NE,NM,SD,TN}\n", "∆| 39| 0.0%|\n" ] } ], "source": [ "answers(usa49)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Adding DC\n", "\n", "Would anything change if we made DC a state (besides, obviously, the voting rights of the citizens)?" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "50 states ⇒ 45,810 cuts (maxsize ≤ 8, area ≤ 638,220) ⇒ 14,137 splits.\n", "\n", "1. Split that maximizes area(B):\n", "A|1,345,626|43.1%|30|{AL,AR,CT,DC,DE,FL,GA,IN,KS,KY,LA,LP,MA,MD,ME,MS,NC,NH,NJ,NY,OH,OK,PA,RI,SC,TN,TX,VA,VT,WV}\n", "B|1,344,149|43.1%|15|{AZ,CA,IA,ID,MN,MT,ND,NV,OR,SD,UP,UT,WA,WI,WY}\n", "C| 430,653|13.8%| 5|{CO,IL,MO,NE,NM}\n", "∆| 1,477| 0.0%|\n", "\n", "2. Split that minimizes ∆ = area(A) - area(B):\n", "A|1,267,062|40.6%|28|{AL,AR,CT,DC,DE,FL,GA,KS,KY,LA,LP,MA,MD,ME,MS,NC,NH,NJ,NY,OH,OK,PA,RI,SC,TX,VA,VT,WV}\n", "B|1,267,033|40.6%|14|{AZ,CA,IA,ID,MN,MT,ND,NV,OR,UP,UT,WA,WI,WY}\n", "C| 586,333|18.8%| 8|{CO,IL,IN,MO,NE,NM,SD,TN}\n", "∆| 29| 0.0%|\n" ] } ], "source": [ "answers(usa49 | {'DC'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cuts are the same, but for question 2, adding DC's 68 square miles to the eastern region means that it is now only 29 square miles larger than the western region (previously it was 39 square miles smaller)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reuniting Michigan\n", "\n", "What if Michigan counts as one state, rather than two separate penninsulas? What if we then also add in DC?" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "48 states ⇒ 43,941 cuts (maxsize ≤ 8, area ≤ 638,220) ⇒ 19,811 splits.\n", "\n", "1. Split that maximizes area(B):\n", "A|1,348,646|43.2%|30|{AL,CT,DE,FL,GA,IA,IL,IN,KY,MA,MD,ME,MI,MN,MT,NC,ND,NH,NJ,NY,OH,PA,RI,SC,SD,TN,VA,VT,WI,WV}\n", "B|1,341,666|43.0%|12|{AZ,CA,CO,KS,LA,NM,NV,OK,OR,TX,UT,WA}\n", "C| 430,048|13.8%| 6|{AR,ID,MO,MS,NE,WY}\n", "∆| 6,980| 0.2%|\n", "\n", "2. Split that minimizes ∆ = area(A) - area(B):\n", "A|1,267,816|40.6%|13|{AZ,CA,ID,KS,MN,MT,ND,NE,NV,OR,SD,UT,WA}\n", "B|1,267,672|40.6%|28|{AL,AR,CT,DE,FL,GA,IL,IN,KY,LA,MA,MD,ME,MI,MS,NC,NH,NJ,NY,OH,PA,RI,SC,TN,TX,VA,VT,WV}\n", "C| 584,872|18.7%| 7|{CO,IA,MO,NM,OK,WI,WY}\n", "∆| 144| 0.0%|\n" ] } ], "source": [ "answers(usa48)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "49 states ⇒ 45,856 cuts (maxsize ≤ 8, area ≤ 638,220) ⇒ 19,860 splits.\n", "\n", "1. Split that maximizes area(B):\n", "A|1,348,714|43.2%|31|{AL,CT,DC,DE,FL,GA,IA,IL,IN,KY,MA,MD,ME,MI,MN,MT,NC,ND,NH,NJ,NY,OH,PA,RI,SC,SD,TN,VA,VT,WI,WV}\n", "B|1,341,666|43.0%|12|{AZ,CA,CO,KS,LA,NM,NV,OK,OR,TX,UT,WA}\n", "C| 430,048|13.8%| 6|{AR,ID,MO,MS,NE,WY}\n", "∆| 7,048| 0.2%|\n", "\n", "2. Split that minimizes ∆ = area(A) - area(B):\n", "A|1,267,816|40.6%|13|{AZ,CA,ID,KS,MN,MT,ND,NE,NV,OR,SD,UT,WA}\n", "B|1,267,740|40.6%|29|{AL,AR,CT,DC,DE,FL,GA,IL,IN,KY,LA,MA,MD,ME,MI,MS,NC,NH,NJ,NY,OH,PA,RI,SC,TN,TX,VA,VT,WV}\n", "C| 584,872|18.7%| 7|{CO,IA,MO,NM,OK,WI,WY}\n", "∆| 76| 0.0%|\n" ] } ], "source": [ "answers(usa48 | {'DC'})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results are not as good (I think because splitting MI allows IL and IN to be north border states rather than interior states)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Four-state cuts\n", "\n", "If we are restricted to four-state cuts, the proposed {IL, MO, NM, OK} cut is indeed best:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "49 states ⇒ 61 cuts (maxsize ≤ 4, area ≤ 638,220) ⇒ 45 splits.\n", "\n", "1. Split that maximizes area(B):\n", "A|1,607,869|51.5%|18|{AZ,CA,CO,IA,ID,KS,MN,MT,ND,NE,NV,OR,SD,UP,UT,WA,WI,WY}\n", "B|1,193,381|38.2%|27|{AL,AR,CT,DE,FL,GA,IN,KY,LA,LP,MA,MD,ME,MS,NC,NH,NJ,NY,OH,PA,RI,SC,TN,TX,VA,VT,WV}\n", "C| 319,110|10.2%| 4|{IL,MO,NM,OK}\n", "∆| 414,488|13.3%|\n", "\n", "2. Split that minimizes ∆ = area(A) - area(B):\n", "A|1,607,869|51.5%|18|{AZ,CA,CO,IA,ID,KS,MN,MT,ND,NE,NV,OR,SD,UP,UT,WA,WI,WY}\n", "B|1,193,381|38.2%|27|{AL,AR,CT,DE,FL,GA,IN,KY,LA,LP,MA,MD,ME,MS,NC,NH,NJ,NY,OH,PA,RI,SC,TN,TX,VA,VT,WV}\n", "C| 319,110|10.2%| 4|{IL,MO,NM,OK}\n", "∆| 414,488|13.3%|\n" ] } ], "source": [ "answers(usa49, 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Achieving equality on question 2\n", "\n", "Can we find regions with *exactly* equal areas (to the nearest square mile)? The function `make_equals` generates contiguous regions (up to maxsize 10 by default), keeping track of the areas, and when it finds a second disjoint region with the same area, it yields the two regions with their area. \n", "\n", "With `make_cuts` and `make_splits` we generated a contiguous cut first, then checked that the cut formed two valid regions. Now with `make_equals` we generate two equal-area regions first, then check that they are separated by a not-necessarily-contiguous cut." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def make_equals(country, maxsize=10) -> Iterator[Tuple[int, Region, Region]]:\n", " \"\"\"Yield (area, A, B) for disjoint regions A, B up to `maxsize` with exactly equal area.\"\"\"\n", " table = defaultdict(set) # {area: [regions_with_that_area...]}\n", " for A in make_cuts(country, maxsize, area(country) / 2, country, country):\n", " a = area(A)\n", " for B in table[a]:\n", " if separated(A, B):\n", " yield (a, A, B)\n", " table[a].add(A)\n", " \n", "def separated(A, B) -> bool: \n", " \"\"\"Are regions A and B disjoint with no shared border?\"\"\"\n", " return A.isdisjoint(B) and all(neighbors[a].isdisjoint(B) for a in A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the first computation that will take more than a couple of seconds:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 47.6 s, sys: 525 ms, total: 48.2 s\n", "Wall time: 48.2 s\n", "\n", "Split with ∆ = 0:\n", "A| 874,595|28.0%|10|{AL,FL,GA,KS,LA,MS,NC,NM,OK,TX}\n", "B| 874,595|28.0%|10|{CA,IA,ID,IL,IN,MT,ND,NV,SD,WA}\n", "C|1,371,170|43.9%|29|{AR,AZ,CO,CT,DE,KY,LP,MA,MD,ME,MN,MO,NE,NH,NJ,NY,OH,OR,PA,RI,SC,TN,UP,UT,VA,VT,WI,WV,WY}\n", "∆| 0| 0.0%|\n" ] } ], "source": [ "%time equals = list(make_equals(usa49, 10))\n", "(a, A, B) = max(equals)\n", "show('Split with ∆ = 0', usa49, (A, B, usa49 - A - B))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There may be larger regions with equal area. I searched up to `maxsize=12` and didn't find anything.\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A difference in area of ∆ = 0 is obviously an optimal answer to question 2. How about question 1? \n", "\n", "# Proving optimality on question 1\n", "\n", "We arbitrarily limited cuts to 8 states, going from the north to south border. To prove that we have the best cut, we'll have to eliminate those constraints, allowing cuts of any number of states going between any border states. This will increase run time, probably by an order of magnitude. Fortunately, we can tighten the area constraint a bit. We found that the cut {CO, IL, MO, NE, NM} produces a region *B* with area 1,344,149, so that means that any cut that is better for question 1 must create a split where the areas of both *A* and *B* are greater than 1,344,149, Therefore, we can lower the `maxarea` for the cut to:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "432062" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "area(usa49) - 2 * 1344149" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "49 states ⇒ 547,779 cuts (maxsize ≤ 49, area ≤ 432,062) ⇒ 42,685 splits.\n", "\n", "1. Split that maximizes area(B):\n", "A|1,345,558|43.1%|29|{AL,AR,CT,DE,FL,GA,IN,KS,KY,LA,LP,MA,MD,ME,MS,NC,NH,NJ,NY,OH,OK,PA,RI,SC,TN,TX,VA,VT,WV}\n", "B|1,344,149|43.1%|15|{AZ,CA,IA,ID,MN,MT,ND,NV,OR,SD,UP,UT,WA,WI,WY}\n", "C| 430,653|13.8%| 5|{CO,IL,MO,NE,NM}\n", "∆| 1,409| 0.0%|\n", "\n", "2. Split that minimizes ∆ = area(A) - area(B):\n", "A|1,345,558|43.1%|29|{AL,AR,CT,DE,FL,GA,IN,KS,KY,LA,LP,MA,MD,ME,MS,NC,NH,NJ,NY,OH,OK,PA,RI,SC,TN,TX,VA,VT,WV}\n", "B|1,344,149|43.1%|15|{AZ,CA,IA,ID,MN,MT,ND,NV,OR,SD,UP,UT,WA,WI,WY}\n", "C| 430,653|13.8%| 5|{CO,IL,MO,NE,NM}\n", "∆| 1,409| 0.0%|\n" ] } ], "source": [ "answers(usa49, maxsize=49, maxarea=area(usa49) - 2 * 1344149, start=border, end=border)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This confirms that {CO, IL, MO, NE, NM} is indeed the optimal cut for question 1." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Unit tests\n", "\n", "Here are some unit tests; they also serve as examples of input/output of the various functions:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'ok'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def test():\n", " assert states('AZ CA OR') == frozenset({'AZ', 'CA', 'OR'})\n", "\n", " assert len(usa48) == 48 and len(usa49) == 49 and len(usa50) == 50\n", " assert len(western) == 11 and len(four) == 4\n", " \n", " assert set(areas) == set(neighbors)\n", " assert areas['MI'] == areas['UP'] + areas['LP'] \n", " assert area(states('AZ CA OR')) == area(states('IA IL KY NE SD VA WV')) == 376_064\n", "\n", " assert all((x in neighbors[y]) == (y in neighbors[x]) \n", " for x in neighbors for y in neighbors)\n", " \n", " assert set(make_cuts(western, 3)) == {\n", " states('AZ ID NV'),\n", " states('CA ID OR'),\n", " states('CA OR WA'),\n", " states('CA ID NV'),\n", " states('ID NM UT'),\n", " states('AZ ID UT')}\n", "\n", " assert set(make_splits(western, make_cuts(western, 3))) == {\n", " (states('CO MT NM WY'), states('CA NV OR WA'), states('AZ ID UT')),\n", " (states('CO MT NM UT WY'), states('CA OR WA'), states('AZ ID NV')),\n", " (states('AZ CO MT NM UT WY'), states('OR WA'), states('CA ID NV')),\n", " (states('AZ CO MT NM NV UT WY'), states('WA'), states('CA ID OR'))}\n", "\n", " assert set(make_cuts(four, 4, maxarea, four, four)) == {\n", " states('UT'), states('CO'), states('AZ'), states('NM'),\n", " states('AZ CO'), states('AZ NM'), states('CO NM'), states('NM UT'), states('AZ UT'), states('CO UT'),\n", " states('AZ CO UT'), states('AZ CO NM'), states('CO NM UT'), states('AZ NM UT'),\n", " states('AZ CO NM UT')}\n", "\n", " assert floodfill(western - states('AZ ID NV')) == states('CA OR WA')\n", " \n", " for country in (usa48, usa49, border, western, four):\n", " assert floodfill(country) == country\n", " \n", " assert set(make_equals(usa49, 5)) == {\n", " (207853, states('LP MD OH PA WV'), states('IA MO UP WI')),\n", " (258498, states('AL KY MO NC TN'), states('ID SD WY'))}\n", " \n", " assert not separated(south, states('GA SC'))\n", " assert separated(north, south)\n", " \n", " return 'ok'\n", " \n", "test()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }