# 🧠 **Self-Hosted Agents – Deep Dive** ## **Agent Architecture, Security, Scaling & Pitfalls** > A self-hosted agent is **your machine**, running **Microsoft’s agent software**, executing **pipeline jobs under your responsibility**. > Azure DevOps controls **what** runs. > You control **where**, **how**, and **with what privileges** it runs. --- ![Image](https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/media/agent-connections-devops.png?view=azure-devops) ![Image](https://cloud-cod.com/wp-content/uploads/2025/01/diagram_pipeline_agent_private_endpoint_terraform.png) ![Image](https://pmorisseau-preview.azurewebsites.net/img/tips.06.schema-EphemeralAgentsV2.svg) ---
```mermaid graph TD A@{ shape: hex, label: "🧠 Azure Pipelines Service" } B@{ shape: hex, label: "📡 Agent Listener" } C@{ shape: processes, label: "🤖 Self-Hosted Agent" } D@{ shape: processes, label: "🖥️ VM / Server / Container" } E@{ shape: rect, label: "⚙️ Tools, Files, Network" } A --> B B --> C C --> D D --> E classDef animate stroke-dasharray: 9,5,stroke-dashoffset: 700,animation: dash 22s linear infinite; class A,B,C,D,E animate ```
--- ## 🔴 **Problem: “Hosted Agents Are Too Slow / Limited”** Typical reasons teams move to self-hosted agents: - Terraform plans take too long - Docker builds are slow - Builds need private network access - Tools must be preinstalled - Compliance forbids shared infrastructure This is **exactly when self-hosted agents are correct**. --- ## 1️⃣ Self-Hosted Agent Architecture (What Actually Runs) ### 🧠 What Is a Self-Hosted Agent? It consists of: 1. **A machine** (VM, bare metal, container) 2. **Azure Pipelines Agent software** 3. **An agent pool registration** Once registered: - Agent **polls Azure DevOps** - Receives jobs - Executes them locally --- ### 🧪 Installation Reality On the machine: ```bash ./config.sh ./run.sh ``` The agent: - Opens an **outbound HTTPS connection** - No inbound ports required - Long-polling for jobs ✔ Firewall-friendly ✔ Secure by default --- ### 🔑 Key Architectural Rule > **Azure DevOps never connects _into_ your machine.** > The agent connects _out_. This is why self-hosted agents work even behind: - Firewalls - NAT - Corporate networks --- ## 2️⃣ Security Boundaries (EXTREMELY IMPORTANT) ### 🧨 Truth Most Teams Learn the Hard Way > **A self-hosted agent executes pipeline code with the permissions of the machine.** This has massive implications. --- ### ❌ Dangerous Setup (Common Mistake) - Agent runs as **root / Administrator** - Pipeline runs arbitrary scripts - Any repo contributor can: - Delete files - Read secrets - Exfiltrate data This is a **supply-chain vulnerability**. --- ### ✅ Secure Agent Design (Senior Pattern) | Layer | Best Practice | | ----------- | ------------------------------- | | OS user | Dedicated low-privilege user | | Agent pool | Restricted to trusted pipelines | | Repo access | Protected branches | | Secrets | Key Vault, not files | | Network | Scoped access | --- ### 🧪 Example: Least Privilege Agent - Linux user: `azagent` - No sudo - Access only to required folders - Network access limited via NSGs ✔ Safer ✔ Auditable ✔ Compliant --- ## 3️⃣ Agent Pools & Trust Boundaries ### 🧠 Critical Design Rule > **Agent pools are trust boundaries.** Never mix: - Prod deployments - Dev builds - Untrusted repos --- ### ❌ Bad Design ```ini Agent Pool: Default Used by: - All pipelines - All repos - All environments ``` ❌ One compromised pipeline = full access --- ### ✅ Correct Design ```ini Agent Pools: - build-linux - build-windows - deploy-prod - deploy-nonprod ``` Each pool: - Has limited permissions - Serves a specific purpose --- ## 4️⃣ Scaling Strategies (Where Most Fail) Self-hosted agents **do not scale automatically** unless you design them to. --- ### 🧱 Strategy 1: Static Agents (Simplest) - Fixed number of VMs - One agent per VM ✔ Easy ❌ Queues build up ❌ Underutilization Used for: - Low-volume pipelines - Stable environments --- ### 🧠 Strategy 2: VM Scale Set (VMSS) Agents (Enterprise Standard) > Azure DevOps can **automatically scale agents** using VMSS. --- #### How It Works 1. Pipeline queues job 2. Azure DevOps requests VM 3. VM boots from image 4. Agent auto-registers 5. Job runs 6. VM is deleted ✔ Elastic ✔ Cost-efficient ✔ Secure This gives you **hosted-agent behavior** on **your infrastructure**. --- ### 🧪 Real Example Use Case - Terraform pipelines - Docker-heavy builds - Private registry access - On-demand scale --- ### 🔥 Senior Insight > VMSS agents are the **default choice** for serious self-hosted setups. --- ## 5️⃣ Maintenance Pitfalls (Where Teams Suffer) Self-hosted agents **shift responsibility to you**. --- ### ❌ Pitfall #1 – Tool Drift - Node version upgraded - Terraform changed - Docker updated Pipelines start failing **randomly**. --- ### ✅ Fix: Immutable Images - Bake tools into VM image - Version the image - Roll out changes deliberately Same principle as: - Docker images - AMIs --- ### ❌ Pitfall #2 – Dirty Agents - Leftover files - Cached credentials - Broken workspaces Causes: - Flaky builds - Security leaks --- ### ✅ Fix - Clean work directories - Use disposable agents (VMSS) - Periodic rebuilds --- ### ❌ Pitfall #3 – Agent Starvation Symptoms: - Jobs stuck in queue - “Waiting for agent…” Cause: - Too few agents - No autoscaling --- ### ✅ Fix - Monitor queue length - Scale pools - Use VMSS or containers --- ## 6️⃣ Self-Hosted vs Hosted (Reality Comparison) | Aspect | Hosted | Self-Hosted | | ----------- | ----------- | ------------------- | | Setup | None | Required | | Control | Low | High | | Performance | Medium | High | | Security | Managed | Your responsibility | | Network | Public only | Full access | | Scaling | Automatic | Manual / VMSS | --- ## 🧠 Mental Model (Lock This In) ```ini Hosted Agent = Convenience Self-Hosted = Responsibility VMSS Agent = Best of both ``` --- ## 🧠 Memorization Tips ### 🔑 Mnemonic: **"A-S-S-M"** | Letter | Meaning | | ------ | ----------------------------- | | **A** | Agent runs as OS user | | **S** | Security is yours | | **S** | Scaling is manual unless VMSS | | **M** | Maintenance is mandatory | --- ## ❌ Top Self-Hosted Agent Mistakes | Mistake | Consequence | | ------------------- | --------------- | | Running as root | Security breach | | One pool for all | Trust collapse | | No image versioning | Random failures | | No autoscaling | Queue backlog | | Ignoring cleanup | Flaky builds |