# SMILE — Graph User Guide ## Table of Contents 1. [Overview](#overview) 2. [Quick Start](#quick-start) 3. [Graph Representations](#graph-representations) - [AdjacencyList](#adjacencylist) - [AdjacencyMatrix](#adjacencymatrix) - [Choosing a Representation](#choosing-a-representation) 4. [Creating and Editing Graphs](#creating-and-editing-graphs) - [Constructors](#constructors) - [Adding Edges](#adding-edges) - [Removing Edges](#removing-edges) - [Querying Edges and Degrees](#querying-edges-and-degrees) 5. [Traversal](#traversal) - [Depth-First Search (DFS)](#depth-first-search-dfs) - [Breadth-First Search (BFS)](#breadth-first-search-bfs) - [VertexVisitor](#vertexvisitor) 6. [Connected Components](#connected-components) 7. [Topological Sort](#topological-sort) 8. [Shortest Paths — Dijkstra](#shortest-paths--dijkstra) 9. [Minimum Spanning Tree — Prim](#minimum-spanning-tree--prim) 10. [Maximum Flow — Push-Relabel](#maximum-flow--push-relabel) 11. [Travelling Salesman Problem (TSP)](#travelling-salesman-problem-tsp) - [Exact: Held-Karp](#exact-held-karp) - [Exact: Branch and Bound](#exact-branch-and-bound) - [Heuristic: Nearest Insertion](#heuristic-nearest-insertion) - [Heuristic: Farthest Insertion](#heuristic-farthest-insertion) - [Heuristic: Arbitrary Insertion](#heuristic-arbitrary-insertion) - [Heuristic: Christofides](#heuristic-christofides) - [Local Search: 2-opt](#local-search-2-opt) - [TSP Algorithm Comparison](#tsp-algorithm-comparison) 12. [Nearest Neighbor Graph](#nearest-neighbor-graph) - [Exact k-NN Graph](#exact-k-nn-graph) - [Approximate k-NN Graph: Random](#approximate-k-nn-graph-random) - [Approximate k-NN Graph: NN-Descent](#approximate-k-nn-graph-nn-descent) - [Largest Connected Component](#largest-connected-component) 13. [Matrix Conversion](#matrix-conversion) 14. [Graphviz Visualization](#graphviz-visualization) 15. [Mathematical Background](#mathematical-background) 16. [References](#references) --- ## Overview The `smile.graph` package provides weighted directed and undirected graph data structures together with a rich suite of graph algorithms: | Class / Record | Role | |---|---| | `Graph` | Abstract base class; all algorithms live here | | `AdjacencyList` | Sparse graph backed by `SparseArray[]` — preferred for most uses | | `AdjacencyMatrix` | Dense graph backed by `double[][]` — best for small, dense graphs | | `NearestNeighborGraph` | k-NN graph builder with exact and approximate methods | | `VertexVisitor` | Functional interface for DFS / BFS callbacks | **Algorithm summary:** | Algorithm | Method | Complexity | |---|---|---| | DFS traversal | `dfs(VertexVisitor)` | O(V + E) | | BFS traversal | `bfs(VertexVisitor)` | O(V + E) | | DFS connected components | `dfcc()` | O(V + E) | | BFS connected components | `bfcc()` | O(V + E) | | DFS topological sort | `dfsort()` | O(V + E) | | BFS topological sort (Kahn) | `bfsort()` | O(V + E) | | Single-source shortest path | `dijkstra(s)` | O((V + E) log V) | | All-pairs shortest path | `dijkstra()` | O(V(V + E) log V) | | Minimum spanning tree/forest | `prim(mst)` | O(V²) | | Maximum flow | `pushRelabel(flow, s, t)` | O(V² √E) | | Optimal TSP (small) | `heldKarp()` | O(2ⁿ n²) | | Optimal TSP (branch & bound) | `tsp()` | O(n!) worst case | | TSP heuristics | `nearestInsertion()`, `farthestInsertion()`, `arbitraryInsertion()`, `christofides()` | O(n²) | | TSP local improvement | `opt2(tour, maxIter)` | O(n²) per iteration | --- ## Quick Start ```java import smile.graph.*; import java.util.ArrayList; import java.util.List; // --- Undirected weighted graph --- Graph g = new AdjacencyList(6); g.addEdge(0, 1, 0.41); g.addEdge(1, 2, 0.51); g.addEdge(2, 3, 0.50); g.addEdge(0, 5, 0.29); g.addEdge(5, 4, 0.21); g.addEdge(1, 4, 0.32); g.addEdge(4, 2, 0.32); g.addEdge(5, 1, 0.29); g.addEdge(4, 3, 0.36); g.addEdge(3, 5, 0.38); g.addEdge(3, 0, 0.45); // Shortest paths from vertex 0 double[] dist = g.dijkstra(0); System.out.printf("0 → 3 = %.2f%n", dist[3]); // 0.86 // Minimum spanning tree List mst = new ArrayList<>(); double cost = g.prim(mst); System.out.printf("MST cost = %.2f%n", cost); // 1.47 // Optimal TSP tour (Held-Karp, exact) int[] tour = g.heldKarp(); System.out.printf("TSP cost = %.2f%n", g.getPathDistance(tour)); // 2.17 ``` --- ## Graph Representations ### AdjacencyList ```java AdjacencyList g = new AdjacencyList(n); // undirected AdjacencyList g = new AdjacencyList(n, true); // directed (digraph) ``` Backed by a `SparseArray[]` — one sparse array per vertex. Each sparse array stores `(neighbor, weight)` pairs for that vertex's outgoing edges. | Property | Value | |---|---| | Memory | O(V + E) | | `hasEdge(u, v)` | O(degree(u)) | | `getWeight(u, v)` | O(degree(u)) | | `setWeight(u, v, w)` | O(degree(u)) amortized | | `getOutDegree(v)` | O(1) | | `getInDegree(v)` | O(V) — scans all rows | | Serializable | Yes (`serialVersionUID = 3`) | **Construction from a sparse matrix:** ```java SparseMatrix mat = /* ... */; AdjacencyList g = AdjacencyList.of(mat); // Symmetric matrix → undirected graph // Asymmetric or rectangular matrix → bipartite graph (nrow + ncol vertices) ``` ### AdjacencyMatrix ```java AdjacencyMatrix g = new AdjacencyMatrix(n); // undirected AdjacencyMatrix g = new AdjacencyMatrix(n, true); // directed ``` Backed by a `double[n][n]` array. Entry `graph[i][j]` holds the weight of edge `i → j` (0.0 means no edge). | Property | Value | |---|---| | Memory | O(V²) | | `hasEdge(u, v)` | O(1) | | `getWeight(u, v)` | O(1) | | `setWeight(u, v, w)` | O(1) | | `getOutDegree(v)` | O(V) | | `getInDegree(v)` | O(V) | | `pushRelabel()` available | Yes (only on `AdjacencyMatrix`) | | Serializable | Yes (`serialVersionUID = 2`) | **Additional matrix methods:** ```java double[][] arr = ((AdjacencyMatrix) g).toArray(); // defensive copy of the weight matrix ``` ### Choosing a Representation | Criterion | Use `AdjacencyList` | Use `AdjacencyMatrix` | |---|---|---| | Graph density | Sparse (E ≪ V²) | Dense (E ≈ V²) | | Vertex count | Any | Small–medium (≤ ~10 000) | | Need max-flow | No | Yes (`pushRelabel`) | | Frequent `getWeight` | Not critical | Critical (O(1)) | | Memory budget | Constrained | Generous | | Default choice | **Yes** | Only when dense or max-flow needed | --- ## Creating and Editing Graphs ### Constructors ```java // n vertices, undirected Graph g = new AdjacencyList(5); Graph g = new AdjacencyMatrix(5); // n vertices, directed Graph g = new AdjacencyList(5, true); Graph g = new AdjacencyMatrix(5, true); ``` The `isDigraph()` method returns whether the graph is directed: ```java g.isDigraph(); // true / false ``` ### Adding Edges ```java // Unweighted (weight = 1.0) g.addEdge(source, target); // Weighted g.addEdge(source, target, 2.5); // Bulk add from a collection List edges = List.of( new Graph.Edge(0, 1, 1.0), new Graph.Edge(1, 2, 2.0) ); g.addEdges(edges); ``` For **undirected** graphs, `addEdge(u, v, w)` automatically sets both `graph[u][v]` and `graph[v][u]` to `w`. For **directed** graphs, only `graph[u][v]` is set. `setWeight(source, target, weight)` is the primitive underlying `addEdge`. Setting weight to `0.0` removes the edge (equivalent to `removeEdge`). ### Removing Edges ```java // Remove a single edge g.removeEdge(source, target); // Bulk remove g.removeEdges(edgeCollection); ``` Removal sets the edge weight to `0.0`. For undirected graphs, both directions are zeroed. ### Querying Edges and Degrees ```java g.hasEdge(u, v); // true if edge exists g.getWeight(u, v); // weight, or 0.0 if absent g.getDistance(u, v); // weight, or +Infinity if absent — use for TSP/shortest-path g.getDegree(v); // undirected: out-degree; directed: in + out g.getInDegree(v); // number of incoming edges g.getOutDegree(v); // number of outgoing edges List edges = g.getEdges(v); // all edges from v g.getVertexCount(); // number of vertices ``` **Functional edge operations:** ```java // Sum all weights from vertex 0 double total = g.mapEdges(0, (neighbor, w) -> w).sum(); // Scale all weights from vertex 0 by 2 g.updateEdges(0, (neighbor, w) -> w * 2.0); // Accumulate with a side effect double[] sum = {0.0}; g.forEachEdge(0, (neighbor, w) -> sum[0] += w); ``` **The `Graph.Edge` record:** ```java record Edge(int u, int v, double weight) ``` - `u()` — source (tail for directed) - `v()` — target (head for directed) - `weight()` — edge weight (`1.0` for unweighted) - `compareTo(Edge)` — orders by weight (natural order for priority queues) - Unweighted constructor: `new Graph.Edge(u, v)` sets `weight = 1.0` --- ## Traversal ### Depth-First Search (DFS) `dfs(VertexVisitor)` visits all vertices reachable from each unvisited root, performing a recursive post-order style traversal. The entire graph is covered, including disconnected components. ```java List order = new ArrayList<>(); g.dfs(order::add); // order contains all vertices in DFS visit order ``` ### Breadth-First Search (BFS) `bfs(VertexVisitor)` visits vertices in level-order (shortest hop-count from each component root first). ```java List order = new ArrayList<>(); g.bfs(order::add); ``` ### VertexVisitor `VertexVisitor` is a `@FunctionalInterface` with a single method: ```java void accept(int vertex); ``` Any lambda or method reference accepting an `int` can be used: ```java g.dfs(v -> System.out.println("Visiting " + v)); g.bfs(results::add); ``` --- ## Connected Components Two algorithms are provided; both return a `int[][]` where each row is a sorted array of vertex indices belonging to the same component. ```java // DFS-based (recursive — may StackOverflow on very large graphs) int[][] cc = g.dfcc(); // BFS-based (iterative — safe for large graphs) int[][] cc = g.bfcc(); ``` Both methods require an **undirected** graph; calling them on a digraph throws `UnsupportedOperationException`. ```java int[][] cc = g.bfcc(); System.out.printf("%d components%n", cc.length); for (int[] component : cc) { System.out.println(Arrays.toString(component)); } ``` **Single vertex:** A graph with no edges produces one component per vertex: ```java AdjacencyList g = new AdjacencyList(4); // no edges int[][] cc = g.bfcc(); // cc.length == 4, each cc[i] == {i} ``` --- ## Topological Sort Topological sort is only defined for **directed** graphs (digraphs). Calling these on an undirected graph throws `UnsupportedOperationException`. ```java // DFS-based reverse post-order (standard topological sort) int[] order = g.dfsort(); // BFS-based (Kahn's algorithm) int[] order = g.bfsort(); ``` `bfsort()` returns `-1` for vertices that are not reachable in topological order (i.e., those involved in a cycle). `dfsort()` always produces a complete ordering. ```java Graph dag = new AdjacencyList(13, true); dag.addEdge(0, 1); dag.addEdge(0, 5); dag.addEdge(0, 6); dag.addEdge(2, 0); dag.addEdge(2, 3); dag.addEdge(3, 5); // ... int[] topo = dag.bfsort(); // Process vertices in topological order for (int v : topo) { if (v >= 0) process(v); } ``` --- ## Shortest Paths — Dijkstra Dijkstra's algorithm computes shortest paths on non-negative weighted graphs. ```java // Single-source (weighted) double[] dist = g.dijkstra(source); // dist[v] = shortest distance from source to v, or +Infinity if unreachable // Single-source (unweighted = hop count) double[] hops = g.dijkstra(source, false); // All-pairs (calls dijkstra(i) for every vertex i) double[][] allPairs = g.dijkstra(); ``` ```java Graph g = new AdjacencyList(6, true); g.addEdge(0, 1, 0.41); g.addEdge(0, 5, 0.29); g.addEdge(5, 4, 0.21); g.addEdge(5, 1, 0.29); g.addEdge(1, 4, 0.32); g.addEdge(1, 2, 0.51); g.addEdge(4, 2, 0.32); g.addEdge(4, 3, 0.36); g.addEdge(2, 3, 0.50); g.addEdge(3, 5, 0.38); g.addEdge(3, 0, 0.45); double[] d = g.dijkstra(0); // d[0]=0.00, d[1]=0.41, d[2]=0.82, d[3]=0.86, d[4]=0.50, d[5]=0.29 ``` > **Note:** Dijkstra does not handle negative edge weights. Use Bellman-Ford > for graphs with negative edges. --- ## Minimum Spanning Tree — Prim Prim's algorithm finds the **minimum spanning tree** (MST) of a weighted undirected graph. For disconnected graphs it returns the **minimum spanning forest** (one tree per component). Calling on a digraph throws `UnsupportedOperationException`. ```java List mst = new ArrayList<>(); double totalCost = g.prim(mst); // mst contains the n-1 edges of the spanning tree (or forest) // totalCost is the sum of their weights ``` Pass `null` instead of a list if you only need the total cost: ```java double cost = g.prim(null); ``` ```java Graph g = new AdjacencyList(6); // ... add edges ... List mst = new ArrayList<>(); double cost = g.prim(mst); System.out.printf("MST cost = %.2f, edges = %d%n", cost, mst.size()); for (var e : mst) { System.out.printf(" %d -- %d (%.2f)%n", e.u(), e.v(), e.weight()); } ``` > **Complexity:** O(V²) — suitable for dense graphs. For very sparse graphs > a priority-queue version would be O((V + E) log V), but Prim's current > implementation uses a simple scan. --- ## Maximum Flow — Push-Relabel Maximum flow is available **only on `AdjacencyMatrix`** via the push-relabel (highest-label) algorithm. ```java AdjacencyMatrix g = new AdjacencyMatrix(n, true); g.addEdge(source, sink, capacity); // ... add capacity edges ... double[][] flow = new double[n][n]; // output: flow on each edge double maxFlow = g.pushRelabel(flow, source, sink); ``` `flow[u][v]` contains the net flow on arc `u → v` after the call; negative values on the reverse arc represent the return flow. The method returns the total maximum flow value. ```java AdjacencyMatrix g = new AdjacencyMatrix(6, true); g.addEdge(0, 1, 2); g.addEdge(0, 2, 9); g.addEdge(1, 2, 1); g.addEdge(2, 4, 7); g.addEdge(4, 5, 4); g.addEdge(3, 5, 7); double[][] flow = new double[6][6]; double maxFlow = g.pushRelabel(flow, 0, 5); // maxFlow == 4.0 ``` > **Complexity:** O(V² √E) — the highest-label push-relabel variant. --- ## Travelling Salesman Problem (TSP) The graph package provides both **exact** and **heuristic** TSP solvers, plus a 2-opt local improvement pass. All TSP methods return an `int[]` tour of length `n + 1` where `tour[0] == tour[n] == 0` (the route starts and ends at vertex 0). Use `getPathDistance(tour)` to evaluate tour cost: ```java int[] tour = g.heldKarp(); double cost = g.getPathDistance(tour); ``` ### Exact: Held-Karp Dynamic-programming exact TSP solver. Optimal for graphs up to **31 vertices**. ```java int[] tour = g.heldKarp(); // O(2ⁿ n²) — feasible up to n ≈ 20–25 ``` - Memory: O(2ⁿ n) - Guaranteed optimal - Throws `UnsupportedOperationException` if `n > 31` ### Exact: Branch and Bound Branch-and-bound exact solver seeded with the nearest-insertion heuristic. ```java int[] tour = g.tsp(); // best-first search with MST lower bound pruning ``` - Optimal in theory; practical for n up to ~15–20 depending on graph density - Uses MST lower bound when more than 5 vertices remain in the subproblem - Logs the final tour cost at `INFO` level via SLF4J ### Heuristic: Nearest Insertion Builds the tour by repeatedly inserting the **nearest** unvisited city. ```java int[] tour = g.nearestInsertion(); // O(n²) ``` Typically gives tours within 20–25% of optimal on random instances. ### Heuristic: Farthest Insertion Builds the tour by repeatedly inserting the **farthest** unvisited city. ```java int[] tour = g.farthestInsertion(); // O(n²) ``` Often produces better tours than nearest insertion on structured instances. ### Heuristic: Arbitrary Insertion Inserts cities in their natural index order (1, 2, 3, …), choosing the cheapest insertion position each time. ```java int[] tour = g.arbitraryInsertion(); // O(n²) ``` ### Heuristic: Christofides A classical 3/2-approximation algorithm for metric TSP instances: 1. Compute a minimum spanning tree (Prim). 2. Find odd-degree vertices in the MST. 3. Apply greedy minimum-weight perfect matching on odd-degree vertices. 4. Form a multigraph from MST + matching; find an Eulerian circuit. 5. Shortcut the Eulerian circuit to a Hamiltonian path. ```java int[] tour = g.christofides(); // O(n²); guaranteed ≤ 1.5× OPT on metric instances ``` ### Local Search: 2-opt Improves any existing tour by repeatedly swapping pairs of non-adjacent edges whenever the swap reduces total cost. ```java // Start from a heuristic tour int[] tour = g.nearestInsertion(); // Improve with 2-opt (up to 10 outer iterations) double cost = g.opt2(tour, 10); // tour[] is modified in-place; cost is the new tour length ``` `opt2` throws `IllegalArgumentException` if `tour.length != n + 1`. ### TSP Algorithm Comparison | Algorithm | Type | Optimal? | Practical n | Notes | |---|---|---|---|---| | `heldKarp()` | Exact DP | Yes | ≤ 25 | O(2ⁿ n²), memory O(2ⁿ n) | | `tsp()` | Exact B&B | Yes | ≤ 20 | MST pruning; seeded with `nearestInsertion` | | `nearestInsertion()` | Heuristic | No | Any | Fast; ≈ 20% above OPT | | `farthestInsertion()` | Heuristic | No | Any | Often better than nearest | | `arbitraryInsertion()` | Heuristic | No | Any | Simplest; insertion order = 1,2,… | | `christofides()` | Heuristic | No | Any | ≤ 1.5× OPT on metric graphs | | `opt2(tour, k)` | Local search | No | Any | Post-process any tour | **Recommended workflow for large instances:** ```java // 1. Get a good starting tour int[] tour = g.farthestInsertion(); // 2. Improve locally double cost = g.opt2(tour, 100); System.out.printf("Tour cost after 2-opt: %.4f%n", cost); ``` --- ## Nearest Neighbor Graph `NearestNeighborGraph` is a `record` that builds and stores k-nearest neighbor graphs for datasets, returning an `AdjacencyList` for downstream graph algorithms. ```java record NearestNeighborGraph(int k, int[][] neighbors, double[][] distances, int[] index) ``` - `neighbors[i][j]` — index of the j-th nearest neighbor of point i - `distances[i][j]` — distance from point i to its j-th nearest neighbor - `index` — original dataset indices (useful when working with a subset) - `size()` — number of vertices (same as `neighbors.length`) The canonical constructor validates all fields: `k ≥ 1`, non-null arrays, and consistent lengths between `neighbors`, `distances`, and `index`. `k` must be ≥ 2 for all factory methods (`of`, `random`, `descent`). ### Exact k-NN Graph Computes the exact k nearest neighbors using a brute-force pairwise scan. ```java // Euclidean distance (double[][] data) NearestNeighborGraph knn = NearestNeighborGraph.of(data, k); // Custom distance NearestNeighborGraph knn = NearestNeighborGraph.of(data, MyDistance::d, k); ``` `k` must be ≥ 2. Complexity: O(n² · d) for n points in d dimensions. ### Approximate k-NN Graph: Random Builds a k-random-neighbor graph and extends it with reverse neighbors. Faster than exact but less accurate; useful as initialization for NN-Descent. ```java NearestNeighborGraph approx = NearestNeighborGraph.random(data, MathEx::distance, k); ``` ### Approximate k-NN Graph: NN-Descent The **NN-Descent** algorithm iteratively refines an approximate k-NN graph. The intuition: "a neighbor of my neighbor is also likely my neighbor." ```java // With Euclidean distance and random projection forest initialization NearestNeighborGraph knn = NearestNeighborGraph.descent(data, k); // With custom metric (no forest initialization) NearestNeighborGraph knn = NearestNeighborGraph.descent(data, distance, k); // Full control NearestNeighborGraph knn = NearestNeighborGraph.descent( data, k, numTrees, // number of random projection trees for seeding leafSize, // max leaf size in each tree maxCandidates, // max candidates per iteration maxIter, // max NN-Descent iterations delta // convergence threshold (fraction of k*n updates) ); ``` NN-Descent achieves near-exact quality with O(n^1.14) empirical complexity on many real datasets (vs O(n²) for brute force). **Algorithm parameters:** | Parameter | Default | Effect | |---|---|---| | `numTrees` | 5 | More trees → better initialization → faster convergence | | `leafSize` | k | Larger leaves → more initial candidates | | `maxCandidates` | 50 | More candidates per iteration → higher quality, slower | | `maxIter` | 50 | More iterations → higher quality, slower | | `delta` | 0.001 | Smaller → more iterations; larger → early stop | ### Largest Connected Component Returns a new `NearestNeighborGraph` restricted to the vertices of the largest connected component, with neighbor indices remapped into the subgraph's local coordinate space (`[0, largest.size())`). ```java NearestNeighborGraph knn = NearestNeighborGraph.descent(data, k); NearestNeighborGraph largest = knn.largest(false); // false = undirected ``` - If the graph is already fully connected, the **same object** is returned (`assertSame` holds). - The returned graph has `index.length` vertices (not the original `n`), so `largest.size() ≤ knn.size()`. - `largest.index()[i]` gives the original dataset index for vertex `i` in the filtered subgraph. - All neighbor indices in the result are in the range `[0, largest.size())`. - Logs the number of components and the size of the largest at `INFO` level. ### Converting to AdjacencyList ```java NearestNeighborGraph knn = NearestNeighborGraph.of(data, k); AdjacencyList undirected = knn.graph(false); // undirected (mutual edges) AdjacencyList directed = knn.graph(true); // directed (only i → neighbor[i][j]) ``` **Full pipeline example:** ```java import smile.graph.*; import smile.math.MathEx; double[][] data = /* n × d dataset */; // Build approximate 7-NN graph NearestNeighborGraph knn = NearestNeighborGraph.descent(data, 7); // Keep only the largest connected component NearestNeighborGraph largest = knn.largest(false); System.out.printf("Keeping %d / %d points%n", largest.size(), data.length); // Convert to AdjacencyList for further analysis AdjacencyList graph = largest.graph(false); // Compute connected components (should be 1) int[][] cc = graph.bfcc(); System.out.println("Components: " + cc.length); ``` --- ## Matrix Conversion Both graph implementations can be converted to their matrix form: ```java // AdjacencyList → SparseMatrix SparseMatrix m = ((AdjacencyList) g).toMatrix(); // AdjacencyMatrix → DenseMatrix Matrix m = ((AdjacencyMatrix) g).toMatrix(); // AdjacencyMatrix → raw double[][] (defensive copy) double[][] arr = ((AdjacencyMatrix) g).toArray(); ``` Conversion from a sparse matrix back to a graph: ```java // Symmetric → undirected; asymmetric/rectangular → bipartite (nrow+ncol vertices) AdjacencyList g = AdjacencyList.of(sparseMatrix); ``` **Subgraph extraction:** ```java int[] vertices = {1, 3, 7}; Graph sub = g.subgraph(vertices); // sub has vertices.length vertices, with indices 0, 1, 2, … // mapped to vertices[0], vertices[1], vertices[2], … in the original graph ``` --- ## Graphviz Visualization Both graph types support export to [Graphviz](https://graphviz.org/) DOT format. Paste the output at https://viz-js.com/ to render the graph. ```java // Simple DOT output String dot = g.dot(); // With graph name and vertex labels String[] labels = {"Alice", "Bob", "Carol", "Dave"}; String dot = g.dot("Friends", labels); System.out.println(dot); ``` Example output for a directed graph: ```dot digraph { node [shape=box, style="rounded", color="black", fontname=helvetica]; edge [fontname=helvetica]; 1 -> 2 [label="1"]; 2 -> 3 [label="0.5"]; } ``` Edge labels display the weight. For unweighted edges the label is `"1"`. The `Strings.format()` utility is used for compact floating-point formatting. --- ## Mathematical Background ### Graph Definitions A **graph** G = (V, E) consists of a vertex set V and an edge set E ⊆ V × V. - **Directed graph (digraph):** edges are ordered pairs (u, v); only `u → v` exists, not necessarily `v → u`. - **Undirected graph:** edges are unordered pairs {u, v}; both directions are stored symmetrically. - **Weighted graph:** each edge carries a numerical weight `w(u, v) > 0`. - **Simple graph:** no self-loops, no multi-edges. ### Dijkstra's Algorithm For a graph with non-negative weights, Dijkstra maintains a priority queue of tentative distances and relaxes edges greedily: ``` dist[s] = 0; dist[v] = ∞ for all v ≠ s while queue not empty: u = vertex with minimum dist[u] for each neighbor v of u: if dist[u] + w(u,v) < dist[v]: dist[v] = dist[u] + w(u,v) ``` ### Prim's MST Algorithm Prim grows the MST from vertex 0 by always adding the cheapest edge that connects a tree vertex to a non-tree vertex: ``` inMST[0] = true; minEdge[0] = 0; minEdge[v] = ∞ for v ≠ 0 repeat n times: u = non-tree vertex with minimum minEdge[u] add u to MST; totalCost += minEdge[u] for each neighbor v of u: minEdge[v] = min(minEdge[v], w(u,v)) ``` For disconnected graphs, when no reachable non-tree vertex exists the algorithm seeds the next unvisited vertex, producing a **minimum spanning forest**. ### Held-Karp TSP Held-Karp uses bitmask dynamic programming. `dp[S][v]` = minimum cost to visit exactly the vertices in set S, ending at vertex v, starting from 0: ``` dp[{0}][0] = 0 dp[S][v] = min over u in S\{v} of: dp[S\{v}][u] + dist(u, v) OPT = min over v ≠ 0 of: dp[{all}][v] + dist(v, 0) ``` Memory: O(2ⁿ × n); time: O(2ⁿ × n²). Feasible for n ≤ 25. ### NN-Descent NN-Descent exploits the transitivity of nearest-neighbor relations: if j is close to i, and k is close to j, then k may also be close to i. Each iteration evaluates "neighbor-of-neighbor" candidate pairs and updates the k-NN heaps, stopping when the fraction of updates falls below `delta × k × n`. --- ## References 1. Robert Sedgewick and Kevin Wayne. *Algorithms, 4th Edition.* Addison-Wesley, 2011. 2. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. *Introduction to Algorithms, 3rd Edition.* MIT Press, 2009. 3. Held, M. and Karp, R. M. *A dynamic programming approach to sequencing problems.* Journal of the Society for Industrial and Applied Mathematics, 10(1):196–210, 1962. 4. Wei Dong, Charikar Moses, and Kai Li. *Efficient k-nearest neighbor graph construction for generic similarity measures.* Proceedings of the 20th International Conference on World Wide Web (WWW), 2011. 5. Andrew V. Goldberg and Robert E. Tarjan. *A new approach to the maximum-flow problem.* Journal of the ACM, 35(4):921–940, 1988. 6. Nicos Christofides. *Worst-case analysis of a new heuristic for the travelling salesman problem.* Technical Report, Carnegie Mellon University, 1976. --- *SMILE — © 2010-2026 Haifeng Li. GNU GPL licensed.*