--- name: infrastructure description: | Manage NixOS infrastructure for this nix flake project. Deploy configurations with Colmena, manage Proxmox LXC containers, troubleshoot services, and maintain servers. Use when: (1) Deploying NixOS configurations with colmena, (2) Managing Proxmox LXC containers (start, stop, reboot, status), (3) Troubleshooting server issues via SSH or pct exec, (4) Checking service status across hosts, (5) Any infrastructure maintenance task. IMPORTANT architecture notes: - All servers are Proxmox LXC containers. --- # Infrastructure Management ## Quick Reference ### Deploy with Colmena ```bash # Single host colmena apply --on --impure # Multiple hosts colmena apply --on host1,host2,host3 --impure # Build only (no deploy) colmena build --on --impure ``` ### Proxmox Container Management SSH to Proxmox host first, then use `pct`: ```bash # List containers on a host ssh "pct list" # Container status ssh "pct status " ssh "pct status --verbose" # Start/stop/reboot ssh "pct start " ssh "pct stop " ssh "pct reboot " # Execute command in container ssh "pct exec -- /run/current-system/sw/bin/" # Common commands via pct exec ssh "pct exec -- /run/current-system/sw/bin/systemctl status " ssh "pct exec -- /run/current-system/sw/bin/journalctl -u -n 50" ``` ## Server Inventory ### Proxmox Hosts | Host | Description | |------|-------------| | thrall | Proxmox cluster node | | sylvanas | Proxmox cluster node | | voljin | Proxmox cluster node | ### Proxmox LXC Containers All other hosts are LXC containers. Use `pct list` on Proxmox hosts to see VMIDs. Common hosts: gitea-runner-1/2/3, prometheus, grafana, uptime-kuma, sonarqube, jellyseerr, prowlarr, n8n, minio, scanner, external-metrics, ironforge (gitea, woodpecker, paperless, calibre, nixarr, resume) ### NixOS Workstation Services - `fredpc`: glance dashboard (native NixOS module, port 8084) ## Troubleshooting Workflows ### Container Won't Respond 1. Check status: `ssh "pct status --verbose"` 2. If running but commands fail: `ssh "pct reboot "` 3. Wait 15-30 seconds, verify: `ssh "pct status "` 4. Re-deploy if needed: `colmena apply --on --impure` ### Service Not Working 1. Check service status: ```bash ssh "pct exec -- /run/current-system/sw/bin/systemctl status " ``` 2. Check logs: ```bash ssh "pct exec -- /run/current-system/sw/bin/journalctl -u -n 100" ``` 3. Restart service: ```bash ssh "pct exec -- /run/current-system/sw/bin/systemctl restart " ``` ### Podman/Container Issues Check socket status: ```bash ssh "pct exec -- /run/current-system/sw/bin/systemctl status podman.socket" ``` List running containers: ```bash ssh "pct exec -- /run/current-system/sw/bin/podman ps -a" ``` ### SSH Connection Issues If colmena fails with SSH errors: 1. Verify container is running on Proxmox 2. Check if SSH is listening: `pct exec -- /run/current-system/sw/bin/ss -tlnp | grep 22` 3. Reboot container if necessary ## Common Colmena Patterns ### Deploy All Gitea Runners ```bash colmena apply --on gitea-runner-1,gitea-runner-2,gitea-runner-3 --impure ``` ### Deploy Monitoring Stack ```bash colmena apply --on prometheus,grafana --impure ``` ### Update Secrets Before Deploy ```bash just update-secrets colmena apply --on --impure ``` ## File Locations | Purpose | Path | |---------|------| | Colmena host configs | `colmena/hosts/.nix` | | NixOS host configs | `modules/nixos/host//configuration.nix` | | Application configs | `apps/.nix` | | Secrets configs | `modules/secrets/.nix` | | Container image SHAs | `apps/fetcher/containers-sha.nix` | | Container definitions | `apps/fetcher/containers.toml` | ## Related Skills - **provision-nixos-server**: Create new servers from scratch - For creating new hosts, use `/provision-nixos-server` skill instead