--- name: generating-ovn-topology description: Generates and displays OVN-Kubernetes network topology diagrams showing logical switches, routers, ports with IP/MAC addresses in Mermaid format tools: [Bash, Read, Write] --- # Quick Start - OVN Topology Generation **IMMEDIATE ACTIONS** (follow these steps in order): 1. **Detect Cluster**: Find the OVN-Kubernetes cluster kubeconfig Run: `scripts/detect-cluster.sh 2>/dev/null` The script discovers OVN-Kubernetes clusters: - Scans all kubeconfig files: current KUBECONFIG env, ~/.kube/kind-config, ~/ovn.conf, ~/.kube/config - Tests ALL contexts in each kubeconfig (not just current-context) - Returns parseable list to stdout: `index|kubeconfig|cluster_name|node_count|namespace` - Diagnostics go to stderr - Exit code: 0=success, 1=no clusters found **How to handle the output:** The script returns pipe-delimited lines to stdout, one per cluster found, e.g.: ```text 1|/home/user/.kube/kind-config|kind-ovn|3|ovn-kubernetes 2|/home/user/.kube/config|prod-cluster|12|openshift-ovn-kubernetes ``` **Decision logic:** - If **one cluster** found → automatically use it (extract kubeconfig path from column 2) - If **multiple clusters** found → show the list to user and ask them to choose by number - After selection, extract the kubeconfig path from column 2 of the chosen line - Store the selected kubeconfig path in variable `KC` for use in subsequent steps **Example output format parsing:** - Column 1: Index number (for user selection) - Column 2: Kubeconfig file path (this is what you need for `$KC`) - Column 3: Cluster display name - Column 4: Number of nodes - Column 5: OVN namespace name **Important**: Parse the output using standard text processing. The exact implementation is up to you - use whatever approach works best (awk, Python, inline parsing, etc.). 2. **Check Permissions**: Verify user's Kubernetes access level and inform about write permissions Run: `scripts/check_permissions.py "$KC"` The script returns: - **Exit 0**: Read-only access or user confirmed → proceed - **Exit 1**: Error or user cancelled → stop - **Exit 2**: Write permissions detected → AI must ask user for confirmation **When exit code 2 is returned:** 1. Parse the stdout to get the list of write permissions 2. Display the permissions clearly to the user using a formatted message 3. Explain that: - This skill performs ONLY read-only operations - No cluster modifications will be made - The warning is for transparency about their access level - List read-only operations: kubectl get, kubectl exec (ovn-nbctl list), local file writes - List forbidden operations: kubectl create/delete/patch, ovn-nbctl modifications 4. **Ask the user explicitly**: "You have cluster admin permissions. This command will only perform read-only operations. Do you want to proceed?" 5. If user says yes → continue, if no → stop **Example of proper user communication:** ```text ⚠️ WARNING: Write Permissions Detected Your kubeconfig has cluster admin permissions: • Delete pods, deployments, services • Create and modify resources • Full cluster access 📋 IMPORTANT: This command will ONLY perform read-only operations: ✅ kubectl get (pods, nodes) ✅ kubectl exec (to run read-only ovn-nbctl list commands) ✅ Local file writes (topology diagram) Operations that will NEVER be performed: ❌ kubectl create/delete/patch/apply ❌ ovn-nbctl modifications ❌ Any cluster state changes Do you want to proceed with read-only topology generation? ``` **Security Note**: This step ensures informed consent. The user must be explicitly aware that their cluster admin credentials are accessible to the AI agent (acting on their behalf), even though only read-only operations will be performed. This transparency is critical for security and trust. 3. **Check Output File**: Ask user if `ovn-topology-diagram.md` exists: - (1) Overwrite, (2) Custom path, (3) Timestamp, (4) Cancel 4. **Create Private Temp Directory**: Create a private temporary directory using `mkdtemp` and use it for all temporary files. ```bash TMPDIR=$(mktemp -d) ``` 5. **Collect OVN Data**: Get full topology data from the cluster Run: `scripts/collect_ovn_data.py "$KC" "$TMPDIR"` Detail files written to `$TMPDIR`: - `ovn_switches_detail.txt` - node|uuid|name|other_config - `ovn_routers_detail.txt` - node|uuid|name|external_ids|options - `ovn_lsps_detail.txt` - node|name|addresses|type|options - `ovn_lrps_detail.txt` - node|name|mac|networks|options - `ovn_pods_detail.txt` - namespace|name|ip|node 6. **Analyze Placement**: Determine per-node vs cluster-wide components Run: `scripts/analyze_placement.py "$TMPDIR"` Placement results written to `$TMPDIR`: - `ovn_switch_placement.txt` - name|placement (per-node|cluster-wide|cluster-wide-visual) - `ovn_router_placement.txt` - name|placement (per-node|cluster-wide|cluster-wide-visual) 7. **Generate Diagram**: Create Mermaid `graph BT` diagram - Read `$TMPDIR/ovn_switch_placement.txt` to determine where each switch goes - Read `$TMPDIR/ovn_router_placement.txt` to determine where each router goes - Read detail files directly (ovn_switches_detail.txt, ovn_routers_detail.txt, etc.) - Skip UUID column when parsing switches/routers detail files - If placement is `per-node` → put inside node subgraph - If placement is `cluster-wide` or `cluster-wide-visual` → put outside subgraphs 8. **Save & Report**: Write diagram to file, show summary, clean up temporary files **CRITICAL RULES**: - ❌ NO codebase searching for IPs/MACs - ❌ NO synthetic/example data - ❌ NO inline multi-line bash (use helper scripts) - ❌ NO direct kubectl commands (must use helper scripts only) - ✅ Use helper scripts for all kubectl interactions and architecture discovery - ✅ **For helper scripts only**: If kubectl is required, use `KUBECONFIG="$KC" kubectl --kubeconfig="$KC"` - ✅ **SECURITY**: Create private temp directory with `TMPDIR=$(mktemp -d)` - never use `/tmp` directly - ✅ Temporary files use `$TMPDIR` (private directory created with mkdtemp) - ✅ Clean up temporary files when done: `rm -rf "$TMPDIR"` ## Safety & Security Guarantees ### 🔒 Read-Only Operations This skill performs **ONLY read-only operations** against your Kubernetes cluster. No cluster state is modified. **Allowed Operations:** - ✅ `kubectl get` - Query resources - ✅ `kubectl exec ... ovn-nbctl list` - Query OVN database (read-only) - ✅ Local file writes (temporary files in `$TMPDIR`, output diagram) **Forbidden Operations (NEVER used):** - ❌ `kubectl create/apply/delete/patch` - No resource modifications - ❌ `kubectl scale/drain/cordon` - No cluster operations - ❌ `ovn-nbctl create/set/add/remove/destroy` - No OVN modifications - ❌ No pod restarts or service disruptions **Privacy Consideration**: The generated diagram contains network topology information. Control sharing appropriately based on your security policies. --- # Architecture Concepts ## Interconnect Mode (Distributed NBDB) In **interconnect mode**, each node runs its own NBDB with local copies of components: **Per-Node Components** (different UUIDs on each node): - `ovn_cluster_router` - Each node has its own cluster router instance - `join` switch - Each node has its own join switch instance - `transit_switch` - Each node has its own transit switch instance - Node switches (e.g., `ovn-control-plane`, `ovn-worker`) - External switches (e.g., `ext_ovn-control-plane`) - Gateway routers (e.g., `GR_ovn-control-plane`) **Visualization Overrides**: - `transit_switch`: PER-NODE in reality → shown CLUSTER-WIDE for visualization clarity - `join`: PER-NODE → kept PER-NODE (no override) ## Helper Scripts All helper scripts are in the `scripts/` directory. | Script | Purpose | Input | Output | |--------|---------|-------|--------| | [detect-cluster.sh](scripts/detect-cluster.sh) | Find OVN cluster kubeconfig across all contexts. Scans multiple kubeconfig files and all their contexts. Returns parseable list. | None | Parseable list to stdout: `index\|kubeconfig\|cluster\|nodes\|namespace`. Exit: 0=success, 1=none found | | [check_permissions.py](scripts/check_permissions.py) | Check user permissions and warn if write access detected. | KUBECONFIG path | Exit: 0=proceed, 1=cancelled/error, 2=write perms (needs user confirmation) | | [collect_ovn_data.py](scripts/collect_ovn_data.py) | **Data collector**: Queries each node for all data, with **graceful degradation** (continues on node failures). Writes detail files. | KUBECONFIG path, TMPDIR | Detail files: `ovn_switches_detail.txt`, `ovn_routers_detail.txt`, `ovn_lsps_detail.txt`, `ovn_lrps_detail.txt`, `ovn_pods_detail.txt` | | [analyze_placement.py](scripts/analyze_placement.py) | **Placement analyzer**: Analyzes UUID patterns from detail files to determine per-node vs cluster-wide placement. | TMPDIR (reads detail files) | Placement files: `ovn_switch_placement.txt`, `ovn_router_placement.txt` | --- # Diagram Generation Rules ## Structure ```mermaid graph BT subgraph node1["Node: name (node_ip)"] direction BT %% LAYER 1 (Bottom): Pods and Management Ports POD_example["Pod: pod-name
Namespace: ns
IP: x.x.x.x"] MGMT["Management Port: k8s-node
IP: x.x.x.x"] %% LAYER 2: Pod LSPs LSP_pod["LSP: namespace_pod-name
MAC: xx:xx:xx:xx:xx:xx
IP: x.x.x.x"] LSP_mgmt["LSP: k8s-node
MAC: xx:xx:xx:xx:xx:xx
IP: x.x.x.x"] %% LAYER 3: Node Switch LS_node["Logical Switch: node-name
Subnet: x.x.x.x/24"] %% LAYER 4: Node Switch LSPs LSP_stor["LSP: stor-node
Type: router"] %% LAYER 5: Cluster Router LR_cluster["Logical Router: ovn_cluster_router"] LRP_rtos["LRP: rtos-node
MAC: xx:xx
IP: x.x.x.x/24"] LRP_rtoj["LRP: rtoj-ovn_cluster_router
IP: x.x.x.x/16"] LRP_rtots["LRP: rtots-node
IP: x.x.x.x/16"] %% LAYER 6: Join Switch LS_join["Logical Switch: join"] LSP_jtor_cr["LSP: jtor-ovn_cluster_router
Type: router"] LSP_jtor_gr["LSP: jtor-GR_node
Type: router"] %% LAYER 7: Gateway Router LR_gr["Gateway Router: GR_node"] LRP_rtoj_gr["LRP: rtoj-GR_node
IP: x.x.x.x/16"] LRP_rtoe["LRP: rtoe-GR_node
IP: x.x.x.x/24"] %% LAYER 8: External Switch LS_ext["Logical Switch: ext_node"] LSP_etor["LSP: etor-GR_node
Type: router"] LSP_breth0["LSP: breth0_node
Type: localnet"] %% LAYER 9 (Top): External Network EXT_NET["External Network
Physical bridge: breth0
Node IP: x.x.x.x"] %% Connections (bottom-to-top flow) POD_example --> LSP_pod --> LS_node MGMT --> LSP_mgmt --> LS_node LS_node --> LSP_stor LSP_stor -.->|peer| LRP_rtos --> LR_cluster LR_cluster --> LRP_rtoj -.->|peer| LSP_jtor_cr --> LS_join LS_join --> LSP_jtor_gr -.->|peer| LRP_rtoj_gr --> LR_gr LR_gr --> LRP_rtoe -.->|peer| LSP_etor --> LS_ext LS_ext --> LSP_breth0 -.->|physical| EXT_NET LR_cluster --> LRP_rtots end %% Cluster-wide components (AFTER nodes to appear on top) %% Only include components with placement=cluster-wide or cluster-wide-visual %% Example: transit_switch in interconnect mode LS_cluster_component["Component Name
Details"] %% Connections from nodes to cluster-wide components LRP_from_node -.->|connects to| LSP_cluster_port --> LS_cluster_component ``` ## Key Requirements 1. **Graph Direction**: Always `graph BT` (bottom-to-top) 2. **Component Placement**: Determined by `$TMPDIR/ovn_*_placement.txt` - `per-node` → INSIDE node subgraph - `cluster-wide` or `cluster-wide-visual` → OUTSIDE all subgraphs, **defined AFTER all node subgraphs** - **CRITICAL**: Define ALL cluster-wide components AFTER all nodes to position them at the TOP - Prevents connection lines from overlapping with node subgraphs - Applies to ANY component with `cluster-wide` or `cluster-wide-visual` placement 3. **Node Subgraphs**: Each physical node gets a subgraph with `direction BT` - **Node Ordering**: ALWAYS order nodes as: control-plane node first, then worker nodes sorted alphabetically by name - Title format: `"Node: {node_name} ({external_ip})"` - Get external IP from `rtoe-GR_{node}` router port network field - Use different background colors for each node for visual distinction - **CRITICAL: Define components in BOTTOM-TO-TOP order** (matches packet flow): 1. **Bottom Layer**: Pods and Management Ports (traffic originates here) 2. **Layer 2**: Pod LSPs 3. **Layer 3**: Node Switch (where pods connect) 4. **Layer 4**: Node Switch LSPs (stor, breth0) 5. **Layer 5**: Cluster Router + Router Ports (rtos, rtoj, rtots) 6. **Layer 6**: Join Switch + Join LSPs 7. **Layer 7**: Gateway Router + Router Ports (rtoj, rtoe) 8. **Layer 8**: External Switch + External LSPs (etor, breth0) 9. **Top Layer**: External Network (physical bridge) 4. **Pod Representation**: - **CRITICAL**: Show ALL pods from `$TMPDIR/ovn_pods_detail.txt` as SEPARATE entities - **DO NOT** discover pods from LSPs - many pods (host-network) don't have individual LSPs - Pod format: `POD_{id}["Pod: {name}
Namespace: {ns}
IP: {ip}"]` - Connect pods to their respective LSPs: - **Host-network pods** (IP == node IP): `POD_{id} --> MGMT_{node} --> LSP_k8s_{node}` - **Pod-network pods** (IP in 10.244.x.x): `POD_{id} --> LSP_{namespace}_{podname}` 5. **Physical Network Layer**: - Show explicit external network entities per node - Format: `EXT_NET_{node}["External Network
Physical bridge: breth0
Node IP: {external_ip}"]` - Connect localnet LSPs to external network: `LSP_breth0_{node} -.->|physical| EXT_NET_{node}` 6. **Colors** (apply using classDef with **color:#000** for black text): - Pods: `fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#000` - Switches: `fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#000` - Routers: `fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#000` - LSPs (ALL types): `fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#000` - LRPs: `fill:#f3e5f5,stroke:#6a1b9a,stroke-width:2px,color:#000` - External Network: `fill:#e0e0e0,stroke:#424242,stroke-width:2px,color:#000` - Management Ports (MGMT entities only, not LSPs): `fill:#fff8e1,stroke:#f57c00,stroke-width:2px,color:#000` - **Node Subgraph Backgrounds** (use `style` statements, NOT classDef): Apply at END of diagram after all class assignments. Rotate through these 3 colors: - Node 1 (index 0): `style node1 fill:#e8f5e9,stroke:#2e7d32,stroke-width:3px,stroke-dasharray: 5 5,color:#000` (green) - Node 2 (index 1): `style node2 fill:#e1f5fe,stroke:#0277bd,stroke-width:3px,stroke-dasharray: 5 5,color:#000` (blue) - Node 3 (index 2): `style node3 fill:#fff3e0,stroke:#ef6c00,stroke-width:3px,stroke-dasharray: 5 5,color:#000` (orange) - Node 4+ repeats: Use index % 3 to rotate colors (0=green, 1=blue, 2=orange) 7. **Required Info**: - Pods: Name, Namespace, IP - Switches: Name, Subnet (from other_config), Gateway (optional) - LSPs: Name, MAC, IP (or Type + Router Port for router ports) - LRPs: Name, MAC, Network (IP/CIDR) - Routers: Name, Description - External Network: Physical bridge name (breth0), Node IP 8. **Connections**: - Solid arrows: Layer connections - Dashed arrows with `|peer|`: Peer port relationships ## Pod Discovery **CRITICAL: Discover pods from `$TMPDIR/ovn_pods_detail.txt`, NOT from LSPs** The file `$TMPDIR/ovn_pods_detail.txt` contains ALL running pods in the cluster (populated by collect_ovn_data.py). Format: `namespace|pod_name|pod_ip|node_name` **Pod-to-LSP Mapping Logic:** 1. **Read ALL pods** from `$TMPDIR/ovn_pods_detail.txt` 2. **For each pod**, determine which LSP it connects to: a. **Host-Network Pods** (pod IP == node IP, e.g., 10.89.0.10): - **Connect to management port LSP**: `k8s-{node}` - These pods share the management port's IP address - **MUST be included in diagram** despite not having individual LSPs b. **Pod-Network Pods** (pod IP in pod network range, e.g., 10.244.x.x): - **Connect to individual LSP**: `{namespace}_{pod-name-with-hash}` - LSP name format: `kube-system_coredns-674b8bbfcf-qhfrq` - Filter LSPs where `type=""` (empty string) - Extract MAC and IP from LSP addresses field 3. **Diagram representation**: ```mermaid POD_id["Pod: {name}
Namespace: {ns}
IP: {ip}"] POD_id --> LSP_id ``` **Special LSPs (NOT pods, treat as infrastructure):** - `k8s-{node}`: Management port LSP (multiple host-network pods connect to this) - `stor-{node}`: Router port (type="router") - `breth0_{node}`: LocalNet port (type="localnet") - `jtor-*`, `etor-*`, `tstor-*`: Router ports (type="router") ### Example: Control Plane Node with Host-Network Pods ```mermaid %% 8 host-network pods sharing management port POD_etcd["Pod: etcd-ovn-control-plane
Namespace: kube-system
IP: 10.89.0.10"] POD_apiserver["Pod: kube-apiserver-ovn-control-plane
Namespace: kube-system
IP: 10.89.0.10"] POD_controller["Pod: kube-controller-manager-ovn-control-plane
Namespace: kube-system
IP: 10.89.0.10"] POD_scheduler["Pod: kube-scheduler-ovn-control-plane
Namespace: kube-system
IP: 10.89.0.10"] POD_ovnkube_cp["Pod: ovnkube-control-plane-ovn-control-plane
Namespace: ovn-kubernetes
IP: 10.89.0.10"] POD_ovnkube_id["Pod: ovnkube-identity-ovn-control-plane
Namespace: ovn-kubernetes
IP: 10.89.0.10"] POD_ovnkube_node["Pod: ovnkube-node-xyz
Namespace: ovn-kubernetes
IP: 10.89.0.10"] POD_ovs_node["Pod: ovs-node-xyz
Namespace: ovn-kubernetes
IP: 10.89.0.10"] %% Management port LSP (shared by all host-network pods) MGMT_cp["Management Port: k8s-ovn-control-plane
IP: 10.89.0.10"] LSP_mgmt["LSP: k8s-ovn-control-plane
MAC: xx:xx:xx:xx:xx:xx
IP: 10.244.0.2"] %% All host-network pods connect to same management port POD_etcd --> MGMT_cp POD_apiserver --> MGMT_cp POD_controller --> MGMT_cp POD_scheduler --> MGMT_cp POD_ovnkube_cp --> MGMT_cp POD_ovnkube_id --> MGMT_cp POD_ovnkube_node --> MGMT_cp POD_ovs_node --> MGMT_cp MGMT_cp --> LSP_mgmt --> LS_node ``` --- # Final Steps 1. Generate complete Mermaid diagram following structure above 2. Save to file chosen by user 3. Show summary: nodes, switches, routers, ports, mode 4. Clean up temporary directory: ```bash rm -rf "$TMPDIR" ``` 5. Tell user to open file in IDE to view rendered diagram