{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "title: Geographical Clustering With Additional Constraint\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Originally Contributed by**: Matthew Helm ([with help from Mathieu Tanneau on Julia Discourse](https://discourse.julialang.org/t/which-jump-jl-solver-for-this-problem/43350/17?u=mthelm85))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal of this exercise is to cluster $n$ cities into $k$ groups, minimizing the total pairwise distance between cities \n", "*and* ensuring that the variance in the total populations of each group is relatively small." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this example, we'll use the 20 most populous cities in the United States." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

20 rows × 4 columns

citypopulationlatlon
StringInt64Float64Float64
1New York, NY840583740.7127-74.0059
2Los Angeles, CA388430734.0522-118.244
3Chicago, IL271878241.8781-87.6297
4Houston, TX219591429.7604-95.3698
5Philadelphia, PA155316539.9525-75.1652
6Phoenix, AZ151336733.4483-112.074
7San Antonio, TX140901929.4241-98.4936
8San Diego, CA135589632.7157-117.161
9Dallas, TX125767632.7766-96.7969
10San Jose, CA99853737.3382-121.886
11Austin, TX88540030.2671-97.743
12Indianapolis, IN84339339.7684-86.158
13Jacksonville, FL84258330.3321-81.6556
14San Francisco, CA83744237.7749-122.419
15Columbus, OH82255339.9611-82.9987
16Charlotte, NC79286235.227-80.8431
17Fort Worth, TX79272732.7554-97.3307
18Detroit, MI68870142.3314-83.0457
19El Paso, TX67443331.7775-106.442
20Memphis, TN65345035.1495-90.0489
" ], "text/latex": [ "\\begin{tabular}{r|cccc}\n", "\t& city & population & lat & lon\\\\\n", "\t\\hline\n", "\t& String & Int64 & Float64 & Float64\\\\\n", "\t\\hline\n", "\t1 & New York, NY & 8405837 & 40.7127 & -74.0059 \\\\\n", "\t2 & Los Angeles, CA & 3884307 & 34.0522 & -118.244 \\\\\n", "\t3 & Chicago, IL & 2718782 & 41.8781 & -87.6297 \\\\\n", "\t4 & Houston, TX & 2195914 & 29.7604 & -95.3698 \\\\\n", "\t5 & Philadelphia, PA & 1553165 & 39.9525 & -75.1652 \\\\\n", "\t6 & Phoenix, AZ & 1513367 & 33.4483 & -112.074 \\\\\n", "\t7 & San Antonio, TX & 1409019 & 29.4241 & -98.4936 \\\\\n", "\t8 & San Diego, CA & 1355896 & 32.7157 & -117.161 \\\\\n", "\t9 & Dallas, TX & 1257676 & 32.7766 & -96.7969 \\\\\n", "\t10 & San Jose, CA & 998537 & 37.3382 & -121.886 \\\\\n", "\t11 & Austin, TX & 885400 & 30.2671 & -97.743 \\\\\n", "\t12 & Indianapolis, IN & 843393 & 39.7684 & -86.158 \\\\\n", "\t13 & Jacksonville, FL & 842583 & 30.3321 & -81.6556 \\\\\n", "\t14 & San Francisco, CA & 837442 & 37.7749 & -122.419 \\\\\n", "\t15 & Columbus, OH & 822553 & 39.9611 & -82.9987 \\\\\n", "\t16 & Charlotte, NC & 792862 & 35.227 & -80.8431 \\\\\n", "\t17 & Fort Worth, TX & 792727 & 32.7554 & -97.3307 \\\\\n", "\t18 & Detroit, MI & 688701 & 42.3314 & -83.0457 \\\\\n", "\t19 & El Paso, TX & 674433 & 31.7775 & -106.442 \\\\\n", "\t20 & Memphis, TN & 653450 & 35.1495 & -90.0489 \\\\\n", "\\end{tabular}\n" ], "text/plain": [ "20×4 DataFrame\n", "│ Row │ city │ population │ lat │ lon │\n", "│ │ \u001b[90mString\u001b[39m │ \u001b[90mInt64\u001b[39m │ \u001b[90mFloat64\u001b[39m │ \u001b[90mFloat64\u001b[39m │\n", "├─────┼───────────────────┼────────────┼─────────┼──────────┤\n", "│ 1 │ New York, NY │ 8405837 │ 40.7127 │ -74.0059 │\n", "│ 2 │ Los Angeles, CA │ 3884307 │ 34.0522 │ -118.244 │\n", "│ 3 │ Chicago, IL │ 2718782 │ 41.8781 │ -87.6297 │\n", "│ 4 │ Houston, TX │ 2195914 │ 29.7604 │ -95.3698 │\n", "│ 5 │ Philadelphia, PA │ 1553165 │ 39.9525 │ -75.1652 │\n", "│ 6 │ Phoenix, AZ │ 1513367 │ 33.4483 │ -112.074 │\n", "│ 7 │ San Antonio, TX │ 1409019 │ 29.4241 │ -98.4936 │\n", "│ 8 │ San Diego, CA │ 1355896 │ 32.7157 │ -117.161 │\n", "│ 9 │ Dallas, TX │ 1257676 │ 32.7766 │ -96.7969 │\n", "│ 10 │ San Jose, CA │ 998537 │ 37.3382 │ -121.886 │\n", "│ 11 │ Austin, TX │ 885400 │ 30.2671 │ -97.743 │\n", "│ 12 │ Indianapolis, IN │ 843393 │ 39.7684 │ -86.158 │\n", "│ 13 │ Jacksonville, FL │ 842583 │ 30.3321 │ -81.6556 │\n", "│ 14 │ San Francisco, CA │ 837442 │ 37.7749 │ -122.419 │\n", "│ 15 │ Columbus, OH │ 822553 │ 39.9611 │ -82.9987 │\n", "│ 16 │ Charlotte, NC │ 792862 │ 35.227 │ -80.8431 │\n", "│ 17 │ Fort Worth, TX │ 792727 │ 32.7554 │ -97.3307 │\n", "│ 18 │ Detroit, MI │ 688701 │ 42.3314 │ -83.0457 │\n", "│ 19 │ El Paso, TX │ 674433 │ 31.7775 │ -106.442 │\n", "│ 20 │ Memphis, TN │ 653450 │ 35.1495 │ -90.0489 │" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Cbc\n", "using DataFrames\n", "using Distances\n", "using JuMP\n", "using LinearAlgebra\n", "\n", "cities = DataFrame(\n", " city=[ \"New York, NY\", \"Los Angeles, CA\", \"Chicago, IL\", \"Houston, TX\", \"Philadelphia, PA\", \"Phoenix, AZ\", \"San Antonio, TX\", \"San Diego, CA\", \"Dallas, TX\", \"San Jose, CA\", \"Austin, TX\", \"Indianapolis, IN\", \"Jacksonville, FL\", \"San Francisco, CA\", \"Columbus, OH\", \"Charlotte, NC\", \"Fort Worth, TX\", \"Detroit, MI\", \"El Paso, TX\", \"Memphis, TN\"],\n", " population=[8405837,3884307,2718782,2195914,1553165,1513367,1409019,1355896,1257676,998537,885400,843393,842583,837442,822553,792862,792727,688701,674433,653450],\n", " lat=[40.7127,34.0522,41.8781,29.7604,39.9525,33.4483,29.4241,32.7157,32.7766,37.3382,30.2671,39.7684,30.3321,37.7749,39.9611,35.2270,32.7554,42.3314,31.7775,35.1495],\n", " lon=[-74.0059,-118.2436,-87.6297,-95.3698,-75.1652,-112.0740,-98.4936,-117.1610,-96.7969,-121.8863,-97.7430,-86.1580,-81.6556,-122.4194,-82.9987,-80.8431,-97.3307,-83.0457,-106.4424,-90.0489])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Model Specifics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will cluster these 20 cities into 3 different groups and we will assume that the ideal or target population $P$ for a\n", "group is simply the total population of the 20 cities divided by 3:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.1042014666666666e7" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n = size(cities,1)\n", "k = 3\n", "P = sum(cities.population) / k" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Obtaining the distances between each city" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's leverage the *Distances.jl* package to compute the pairwise Haversine distance between each of the cities in our data\n", "set and store the result in a variable we'll call `dm`:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "20×20 Array{Float64,2}:\n", " 0.0 4912.35 1515.39 2368.0 … 1006.0 3596.53 1784.38\n", " 4912.35 0.0 3402.79 2546.2 3908.33 1315.89 3135.99\n", " 1515.39 3402.79 0.0 856.813 509.874 2088.88 269.042\n", " 2368.0 2546.2 856.813 0.0 1362.63 1232.11 591.849\n", " 130.886 4785.46 1386.56 2240.3 877.76 3469.8 1655.44\n", " 4225.58 686.808 2716.26 1859.56 … 3221.54 629.312 2449.77\n", " 2711.55 2201.01 1203.48 347.477 1707.37 885.739 939.3\n", " 4788.58 138.828 3281.54 2424.73 3785.69 1192.78 3015.58\n", " 2529.82 2385.68 1017.17 162.61 1524.16 1073.03 750.56\n", " 5323.39 444.484 3809.46 2955.49 4317.63 1734.53 3541.16\n", " 2630.43 2282.74 1120.72 264.038 … 1625.65 968.161 855.806\n", " 1351.71 3566.87 164.157 1020.79 347.121 2252.77 432.753\n", " 881.628 4068.0 671.873 1525.36 234.725 2756.75 933.537\n", " 5383.2 509.162 3868.9 3015.49 4377.33 1796.19 3600.46\n", " 1000.35 3916.78 515.312 1370.62 32.4496 2601.92 784.149\n", " 771.152 4159.75 757.234 1614.03 … 268.399 2846.11 1023.93\n", " 2588.88 2326.34 1076.48 221.121 1583.3 1013.67 809.933\n", " 1006.0 3908.33 509.874 1362.63 0.0 2593.01 778.898\n", " 3596.53 1315.89 2088.88 1232.11 2593.01 0.0 1823.4\n", " 1784.38 3135.99 269.042 591.849 778.898 1823.4 0.0" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dm = Distances.pairwise(Haversine(6372.8), Matrix(cities[:, [3,4]])', dims=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our distance matrix is symmetric so we'll convert it to a `LowerTriangular` matrix so that we can better interpret the\n", "objective value of our model (if we don't do this the total distance will be doubled):" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "20×20 LowerTriangular{Float64,Array{Float64,2}}:\n", " 0.0 ⋅ ⋅ ⋅ … ⋅ ⋅ ⋅ ⋅ \n", " 4912.35 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ \n", " 1515.39 3402.79 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ \n", " 2368.0 2546.2 856.813 0.0 ⋅ ⋅ ⋅ ⋅ \n", " 130.886 4785.46 1386.56 2240.3 ⋅ ⋅ ⋅ ⋅ \n", " 4225.58 686.808 2716.26 1859.56 … ⋅ ⋅ ⋅ ⋅ \n", " 2711.55 2201.01 1203.48 347.477 ⋅ ⋅ ⋅ ⋅ \n", " 4788.58 138.828 3281.54 2424.73 ⋅ ⋅ ⋅ ⋅ \n", " 2529.82 2385.68 1017.17 162.61 ⋅ ⋅ ⋅ ⋅ \n", " 5323.39 444.484 3809.46 2955.49 ⋅ ⋅ ⋅ ⋅ \n", " 2630.43 2282.74 1120.72 264.038 … ⋅ ⋅ ⋅ ⋅ \n", " 1351.71 3566.87 164.157 1020.79 ⋅ ⋅ ⋅ ⋅ \n", " 881.628 4068.0 671.873 1525.36 ⋅ ⋅ ⋅ ⋅ \n", " 5383.2 509.162 3868.9 3015.49 ⋅ ⋅ ⋅ ⋅ \n", " 1000.35 3916.78 515.312 1370.62 ⋅ ⋅ ⋅ ⋅ \n", " 771.152 4159.75 757.234 1614.03 … ⋅ ⋅ ⋅ ⋅ \n", " 2588.88 2326.34 1076.48 221.121 0.0 ⋅ ⋅ ⋅ \n", " 1006.0 3908.33 509.874 1362.63 1583.3 0.0 ⋅ ⋅ \n", " 3596.53 1315.89 2088.88 1232.11 1013.67 2593.01 0.0 ⋅ \n", " 1784.38 3135.99 269.042 591.849 809.933 778.898 1823.4 0.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dm = LowerTriangular(dm)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Build the model\n", "Now that we have the basics taken care of, we can set up our model, create decision variables, add constraints, and then\n", "solve." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we'll set up a model that leverages the [Cbc](https://github.com/coin-or/Cbc) solver. Next, we'll set up a binary\n", "variable $x_{i,k}$ that takes the value $1$ if city $i$ is in group $k$ and $0$ otherwise. Each city must be in a group, so\n", "we'll add the constraint $\\sum_kx_{i,k} = 1$ for every $i$." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "model = Model(Cbc.Optimizer)\n", "\n", "@variable(model, x[1:n, 1:k], Bin)\n", "\n", "for i in 1:n\n", " @constraint(model, sum(x[i,:]) == 1)\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The total population of a group $k$ is $Q_k = \\sum_ix_{i,k}q_i$ where $q_i$ is simply the $i$th value from the `population`\n", "column in our `cities` DataFrame. Let's add constraints so that $\\alpha \\leq (Q_k - P) \\leq \\beta$. We'll set $\\alpha$\n", "equal to -2,500,000 and $\\beta$ equal to 2,500,000. By adjusting these thresholds you'll find that there is a tradeoff\n", "between having relatively even populations between groups and having geographically close cities within each group. In\n", "other words, the larger the absolute values of $\\alpha$ and $\\beta$, the closer together the cities in a group will be but\n", "the variance between the group populations will be higher." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "α = -2_500_000\n", "β = 2_500_000\n", "\n", "for i in 1:k\n", " @constraint(model, (x' * cities.population)[i] - P <= β)\n", " @constraint(model, (x' * cities.population)[i] - P >= α)\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we need to add one last binary variable $z_{i,j}$ to our model that we'll use to compute the total distance between the \n", "cities in our groups, defined as $\\sum_{i,j}d_{i,j}z_{i,j}$. Variable $z_{i,j}$ will equal $1$ if cities $i$ and $j$ are \n", "in the same group, and $0$ if they are not in the same group." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To ensure that $z_{i,j} = 1$ if and only if cities $i$ and $j$ are in the same group, we add the constraints $z_{i,j} \\geq \n", "x_{i,k} + x_{j,k} - 1$ for every pair $i,j$ and every $k$:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "@variable(model, z[1:n,1:n], Bin)\n", "\n", "for k in 1:k, i in 1:n, j in 1:n\n", " @constraint(model, z[i,j] >= x[i,k] + x[j,k] - 1)\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now add an objective to our model which will simply be to minimize the dot product of $z$ and our distance matrix,\n", "`dm`. We can then call `optimize!` and review the results." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Welcome to the CBC MILP Solver \n", "Version: 2.10.3 \n", "Build Date: Jan 1 1970 \n", "\n", "command line - Cbc_C_Interface -solve -quit (default strategy 1)\n", "Continuous objective value is 0 - 0.00 seconds\n", "Cgl0004I processed model has 593 rows, 250 columns (250 integer (250 of which binary)) and 1830 elements\n", "Cbc0031I 41 added rows had average density of 39.170732\n", "Cbc0013I At root node, 41 cuts changed objective from 0 to 604.80759 in 100 passes\n", "Cbc0014I Cut generator 0 (Probing) - 0 row cuts average 0.0 elements, 0 column cuts (6 active) in 0.030 seconds - new frequency is -100\n", "Cbc0014I Cut generator 1 (Gomory) - 2145 row cuts average 156.6 elements, 0 column cuts (0 active) in 0.082 seconds - new frequency is 1\n", "Cbc0014I Cut generator 2 (Knapsack) - 0 row cuts average 0.0 elements, 0 column cuts (0 active) in 0.010 seconds - new frequency is -100\n", "Cbc0014I Cut generator 3 (Clique) - 0 row cuts average 0.0 elements, 0 column cuts (0 active) in 0.004 seconds - new frequency is -100\n", "Cbc0014I Cut generator 4 (MixedIntegerRounding2) - 0 row cuts average 0.0 elements, 0 column cuts (0 active) in 0.030 seconds - new frequency is -100\n", "Cbc0014I Cut generator 5 (FlowCover) - 0 row cuts average 0.0 elements, 0 column cuts (0 active) in 0.003 seconds - new frequency is -100\n", "Cbc0014I Cut generator 6 (TwoMirCuts) - 206 row cuts average 79.3 elements, 0 column cuts (0 active) in 0.025 seconds - new frequency is -100\n", "Cbc0010I After 0 nodes, 1 on tree, 1e+50 best solution, best possible 604.80759 (0.99 seconds)\n", "Cbc0016I Integer solution of 84337.912 found by strong branching after 24889 iterations and 159 nodes (6.84 seconds)\n", "Cbc0038I Full problem 593 rows 250 columns, reduced to 252 rows 130 columns\n", "Cbc0012I Integer solution of 80741.38 found by RINS after 45753 iterations and 300 nodes (10.49 seconds)\n", "Cbc0038I Full problem 593 rows 250 columns, reduced to 275 rows 123 columns\n", "Cbc0012I Integer solution of 75417.987 found by RINS after 58777 iterations and 400 nodes (12.61 seconds)\n", "Cbc0004I Integer solution of 68371.228 found after 71303 iterations and 496 nodes (14.43 seconds)\n", "Cbc0038I Full problem 593 rows 250 columns, reduced to 274 rows 144 columns\n", "Cbc0038I Full problem 593 rows 250 columns, reduced to 236 rows 108 columns\n", "Cbc0012I Integer solution of 65415.739 found by RINS after 84618 iterations and 600 nodes (16.31 seconds)\n", "Cbc0004I Integer solution of 55478.28 found after 92931 iterations and 680 nodes (17.59 seconds)\n", "Cbc0038I Full problem 593 rows 250 columns, reduced to 212 rows 112 columns\n", "Cbc0012I Integer solution of 46044.441 found by RINS after 95178 iterations and 700 nodes (18.00 seconds)\n", "Cbc0004I Integer solution of 38460.224 found after 98573 iterations and 738 nodes (18.46 seconds)\n", "Cbc0038I Full problem 593 rows 250 columns, reduced to 329 rows 131 columns - 54 fixed gives 0, 0 - ok now\n", "Cbc0038I Full problem 593 rows 250 columns, reduced to 0 rows 0 columns\n", "Cbc0010I After 1000 nodes, 4 on tree, 38460.224 best solution, best possible 604.80759 (24.54 seconds)\n", "Cbc0038I Full problem 593 rows 250 columns, reduced to 427 rows 136 columns - 54 fixed gives 0, 0 - ok now\n", "Cbc0038I Full problem 593 rows 250 columns, reduced to 0 rows 0 columns\n", "Cbc0038I Full problem 593 rows 250 columns, reduced to 409 rows 130 columns - 54 fixed gives 0, 0 - ok now\n", "Cbc0038I Full problem 593 rows 250 columns, reduced to 0 rows 0 columns\n", "Cbc0001I Search completed - best objective 38460.22443254192, took 171264 iterations and 1186 nodes (30.38 seconds)\n", "Cbc0032I Strong branching done 13908 times (402686 iterations), fathomed 67 nodes and fixed 16 variables\n", "Cbc0035I Maximum depth 31, 9226 variables fixed on reduced cost\n", "Cuts at root node changed objective from 0 to 604.808\n", "Probing was tried 100 times and created 0 cuts of which 6 were active after adding rounds of cuts (0.030 seconds)\n", "Gomory was tried 4049 times and created 16026 cuts of which 0 were active after adding rounds of cuts (1.986 seconds)\n", "Knapsack was tried 100 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.010 seconds)\n", "Clique was tried 100 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.004 seconds)\n", "MixedIntegerRounding2 was tried 100 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.030 seconds)\n", "FlowCover was tried 100 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.003 seconds)\n", "TwoMirCuts was tried 100 times and created 206 cuts of which 0 were active after adding rounds of cuts (0.025 seconds)\n", "ZeroHalf was tried 1 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.001 seconds)\n", "\n", "Result - Optimal solution found\n", "\n", "Objective value: 38460.22443254\n", "Enumerated nodes: 1186\n", "Total iterations: 171264\n", "Time (CPU seconds): 30.40\n", "Time (Wallclock seconds): 31.06\n", "\n", "Total time (CPU seconds): 30.40 (Wallclock seconds): 31.06\n", "\n" ] } ], "source": [ "@objective(model, Min, dot(z,dm));\n", "\n", "optimize!(model)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Reviewing the Results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have results, we can add a column to our `cities` DataFrame for the group and then loop through our $x$\n", "variable to assign each city to its group. Once we have that, we can look at the total population for each group and also\n", "plot the cities and their groups to verify visually that they are grouped by geographic proximity." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "group = 6×5 SubDataFrame\n", "│ Row │ city │ population │ lat │ lon │ group │\n", "│ │ String │ Int64 │ Float64 │ Float64 │ Float64 │\n", "├─────┼──────────────────┼────────────┼─────────┼──────────┼─────────┤\n", "│ 1 │ New York, NY │ 8405837 │ 40.7127 │ -74.0059 │ 2.0 │\n", "│ 2 │ Philadelphia, PA │ 1553165 │ 39.9525 │ -75.1652 │ 2.0 │\n", "│ 3 │ Jacksonville, FL │ 842583 │ 30.3321 │ -81.6556 │ 2.0 │\n", "│ 4 │ Columbus, OH │ 822553 │ 39.9611 │ -82.9987 │ 2.0 │\n", "│ 5 │ Charlotte, NC │ 792862 │ 35.227 │ -80.8431 │ 2.0 │\n", "│ 6 │ Detroit, MI │ 688701 │ 42.3314 │ -83.0457 │ 2.0 │\n", "\n", "sum(group.population) = 13105701\n", "\n", "group = 6×5 SubDataFrame\n", "│ Row │ city │ population │ lat │ lon │ group │\n", "│ │ String │ Int64 │ Float64 │ Float64 │ Float64 │\n", "├─────┼───────────────────┼────────────┼─────────┼──────────┼─────────┤\n", "│ 1 │ Los Angeles, CA │ 3884307 │ 34.0522 │ -118.244 │ 3.0 │\n", "│ 2 │ Phoenix, AZ │ 1513367 │ 33.4483 │ -112.074 │ 3.0 │\n", "│ 3 │ San Diego, CA │ 1355896 │ 32.7157 │ -117.161 │ 3.0 │\n", "│ 4 │ San Jose, CA │ 998537 │ 37.3382 │ -121.886 │ 3.0 │\n", "│ 5 │ San Francisco, CA │ 837442 │ 37.7749 │ -122.419 │ 3.0 │\n", "│ 6 │ El Paso, TX │ 674433 │ 31.7775 │ -106.442 │ 3.0 │\n", "\n", "sum(group.population) = 9263982\n", "\n", "group = 8×5 SubDataFrame\n", "│ Row │ city │ population │ lat │ lon │ group │\n", "│ │ String │ Int64 │ Float64 │ Float64 │ Float64 │\n", "├─────┼──────────────────┼────────────┼─────────┼──────────┼─────────┤\n", "│ 1 │ Chicago, IL │ 2718782 │ 41.8781 │ -87.6297 │ 1.0 │\n", "│ 2 │ Houston, TX │ 2195914 │ 29.7604 │ -95.3698 │ 1.0 │\n", "│ 3 │ San Antonio, TX │ 1409019 │ 29.4241 │ -98.4936 │ 1.0 │\n", "│ 4 │ Dallas, TX │ 1257676 │ 32.7766 │ -96.7969 │ 1.0 │\n", "│ 5 │ Austin, TX │ 885400 │ 30.2671 │ -97.743 │ 1.0 │\n", "│ 6 │ Indianapolis, IN │ 843393 │ 39.7684 │ -86.158 │ 1.0 │\n", "│ 7 │ Fort Worth, TX │ 792727 │ 32.7554 │ -97.3307 │ 1.0 │\n", "│ 8 │ Memphis, TN │ 653450 │ 35.1495 │ -90.0489 │ 1.0 │\n", "\n", "sum(group.population) = 10756361\n", "\n" ] } ], "source": [ "cities.group = zeros(n)\n", "\n", "for i in 1:n, j in 1:k\n", " if round(value.(x)[i,j]) == 1.0\n", " cities.group[i] = j\n", " end\n", "end\n", "\n", "for group in groupby(cities, :group)\n", " @show group\n", " println(\"\")\n", " @show sum(group.population)\n", " println(\"\")\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The populations of each group are fairly even and we can see from the plot below that the groupings look good in terms of\n", "geographic proximity:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"Geographic" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Julia 1.5.1", "language": "julia", "name": "julia-1.5" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.5.1" } }, "nbformat": 4, "nbformat_minor": 2 }