# Kandev System Architecture
> **Note:** This document describes both the current implementation and the planned architecture.
> Sections marked with ✅ are implemented, while sections marked with 📋 are planned for future development.
## Current Implementation Status
The current implementation uses a **unified binary architecture** instead of separate microservices:
- ✅ **Single Go binary** (`cmd/kandev/main.go`) running all services
- ✅ **SQLite database** for persistence (instead of PostgreSQL)
- ✅ **In-memory event bus** (instead of NATS)
- ✅ **Docker-based agent execution** with ACP streaming
- ✅ **Native ACP (Agent Communication Protocol)** - JSON-RPC 2.0 over stdin/stdout
- ✅ **Permission request handling** - Auto-approval of workspace indexing and tool permissions
- ✅ **Session resumption** for multi-turn agent conversations
- ✅ WebSocket-first API for all operations (replacing REST)
---
## Overview
Kandev uses an event-driven architecture with **Agent Communication Protocol (ACP)** for real-time streaming between AI agents and the backend. The system is designed for deployment on client machines (local workstations) with future cloud scalability.
## Current Architecture (Unified Binary)
```mermaid
flowchart TB
subgraph Binary["Kandev Binary (Port 38429)"]
Task["Task Service"]
Agent["Agent Manager"]
Orch["Orchestrator Service"]
Task & Agent & Orch --> EventBus["In-Memory Event Bus"]
EventBus --> DB["SQLite Database (kandev.db)"]
end
Binary --> Docker["Docker Engine"]
subgraph Container["Augment Agent Container"]
ACP["ACP via stdout"]
end
Docker --> Container
```
---
### 3. Orchestrator Service
**Port:** 8082
**Purpose:** Automated task orchestration, agent coordination, and real-time ACP streaming
**Responsibilities:**
- Subscribe to task state change events
- Determine when to launch AI agents
- Manage task queue with priority
- Coordinate with Agent Manager for container launches
- **Aggregate ACP messages from NATS event bus**
- **Provide WebSocket endpoints for real-time ACP streaming to frontend**
- **Expose comprehensive REST API for task execution control**
- Implement retry logic for failed tasks
- Monitor agent execution
**Dependencies:**
- NATS (event subscription + ACP message aggregation)
- Agent Manager (gRPC calls)
- PostgreSQL (read task details, store ACP messages)
**Events Subscribed:**
- `task.state_changed`
- `agent.completed`
- `agent.failed`
- **`acp.message.*` - All ACP messages from agents**
**WebSocket Actions:**
- `orchestrator.subscribe` - Subscribe to real-time ACP streaming for a task
- `orchestrator.unsubscribe` - Unsubscribe from ACP streaming
- `orchestrator.start` - Start agent execution for a task
- `orchestrator.stop` - Stop agent execution for a task
- `orchestrator.status` - Get execution status for a task
**Orchestration Logic:**
```
When task.state_changed to "TODO" with agent_type set:
1. Add task to priority queue
2. Check available resources
3. Request Agent Manager to launch container
4. Update task state to "IN_PROGRESS"
5. Subscribe to ACP messages for this task
6. Stream ACP messages to connected WebSocket clients
When acp.message received:
1. Store message in database
2. Broadcast to WebSocket clients subscribed to this task
3. Update task progress/status
When agent.completed:
1. Update task state to "COMPLETED"
2. Record completion time
3. Send final ACP result message
4. Clean up resources
When agent.failed:
1. Check retry count
2. If retries available: re-queue task
3. Else: Update task state to "FAILED"
4. Send ACP error message
```
---
### 4. Agent Manager Service
**Port:** 8083
**Purpose:** Docker container lifecycle management with ACP streaming and credential mounting
**Responsibilities:**
- Launch Docker containers for AI agents
- **Mount host credentials (SSH keys, Git config) into containers**
- **Capture and parse ACP messages from container stdout**
- **Publish ACP messages to NATS event bus**
- Monitor container health and status
- Stream and store container logs
- Manage agent type registry
- Resource allocation and limits
- Container cleanup
**Dependencies:**
- Docker Engine (container operations)
- PostgreSQL (agent_instances, agent_logs, agent_types tables)
- NATS (event publishing + ACP message publishing)
- **Host filesystem (for credential mounting)**
**Events Published:**
- `agent.started`
- `agent.running`
- `agent.completed`
- `agent.failed`
- `agent.stopped`
- **`acp.message.{task_id}` - Real-time ACP messages from agents**
**Container Launch Flow with ACP:**
```
1. Receive launch request (task_id, agent_type)
2. Lookup agent_type configuration
3. Create agent_instance record (status: PENDING)
4. Pull Docker image if needed
5. **Prepare host credential mounts (SSH, Git config)**
6. **Checkout repository on host (if repository_url provided)**
7. Prepare volume mounts (workspace, credentials)
8. Create container with resource limits and mounts
9. Start container
10. Update agent_instance (status: RUNNING, container_id)
11. **Attach to container stdout for ACP message capture**
12. **Parse ACP messages from stdout**
13. **Publish ACP messages to NATS (acp.message.{task_id})**
14. Store ACP messages in database
15. Publish agent.started event
16. Monitor container until completion
17. Collect exit code and final logs
18. Update agent_instance (status: COMPLETED/FAILED)
19. Publish completion event
20. Cleanup container and workspace
```
**Credential Mounting Strategy:**
```
Mounts (Read-Only):
- ~/.ssh → /root/.ssh (SSH keys)
- ~/.gitconfig → /root/.gitconfig (Git configuration)
- ~/.git-credentials → /root/.git-credentials (HTTPS credentials)
Environment Variables:
- GITHUB_TOKEN (from host environment)
- GITLAB_TOKEN (from host environment)
- GEMINI_API_KEY (from host environment)
Workspace Mount (Read-Write):
- /tmp/kandev/workspaces/{task_id} → /workspace
```
---
## Data Flow Examples
### Example 1: User Creates a Task with AI Agent (with Real-time ACP Streaming)
```
1. User → API Gateway: WebSocket message: task.create
{
"action": "task.create",
"payload": {
"boardId": "{id}",
"title": "Fix login bug",
"description": "Users can't login with email",
"agent_type": "auggie-cli",
"repository_url": "https://github.com/user/repo",
"branch": "main"
}
}
2. API Gateway → Task Service: Forward request
3. Task Service:
- Validate request
- Create task in database (state: TODO)
- Publish event to NATS: task.created
4. Orchestrator (subscribed to task.created):
- Receive event
- Check if agent_type is set
- Add task to priority queue
- Process queue
5. Orchestrator → Agent Manager (gRPC):
LaunchAgent(task_id, agent_type="auggie-cli")
6. Agent Manager:
- Create agent_instance record
- Pull docker image "augmentcode/auggie-cli:latest"
- **Mount host credentials (SSH keys, Git config)**
- **Clone repository to /tmp/kandev/workspaces/{task_id}**
- Create container with volume mounts:
* /tmp/kandev/workspaces/{task_id} → /workspace
* ~/.ssh → /root/.ssh (read-only)
* ~/.gitconfig → /root/.gitconfig (read-only)
- Start container
- **Attach to container stdout for ACP capture**
- Publish agent.started event
7. Orchestrator (subscribed to agent.started):
- Update task state to IN_PROGRESS
- **Subscribe to acp.message.{task_id}**
8. Frontend → API Gateway: WebSocket connection
WS /api/v1/orchestrator/tasks/{task_id}/stream
9. API Gateway → Orchestrator: Proxy WebSocket connection
10. Agent Container → stdout:
**Writes ACP messages in JSON format:**
{"type":"progress","timestamp":"...","agent_id":"...","task_id":"...","data":{"progress":10,"message":"Analyzing codebase..."}}
11. Agent Manager:
- **Captures ACP message from container stdout**
- **Parses JSON ACP message**
- **Publishes to NATS: acp.message.{task_id}**
- Stores ACP message in database
12. Orchestrator (subscribed to acp.message.{task_id}):
- **Receives ACP message from NATS**
- **Broadcasts to all WebSocket clients for this task**
13. Frontend (via WebSocket):
- **Receives real-time ACP message**
- **Updates UI with progress: "Analyzing codebase... 10%"**
14. [Steps 10-13 repeat for each ACP message from agent]
15. Agent Container completes:
- Writes final ACP result message
- Exits with code 0
16. Agent Manager:
- Captures final ACP message
- Publishes to NATS
- Collects exit code
- Publishes agent.completed event
17. Orchestrator (subscribed to agent.completed):
- Update task state to COMPLETED
- Record completion time
- **Send final ACP result to WebSocket clients**
- Close ACP subscription
18. Frontend:
- **Receives completion notification**
- **Displays final results**
- Closes WebSocket connection
```
### Example 2: Manual Task State Change
```
1. User → API Gateway: WebSocket message: task.state
{
"action": "task.state",
"payload": {
"taskId": "{id}",
"state": "IN_PROGRESS"
}
}
2. API Gateway → Task Service: Forward request
3. Task Service:
- Validate state transition
- Update task state
- Create task_event record
- Publish task.state_changed event
4. Orchestrator (subscribed to task.state_changed):
- Check if agent should be launched
- If no agent_type set: ignore
- If agent_type set: add to queue
```
---
## Agent Communication Protocol (ACP)
### Protocol Overview
Kandev uses the **Agent Communication Protocol (ACP)** - an open protocol for communication between AI agents and client applications. The implementation follows the official ACP specification using **JSON-RPC 2.0 over stdin/stdout**.
**Reference:** https://agentclientprotocol.com
**Key Characteristics:**
- **JSON-RPC 2.0**: Standard request/response/notification format
- **Bidirectional**: Client sends requests, agent sends responses/notifications/requests
- **Session-based**: All communication happens within a session context
- **Permission model**: Agent requests permissions before tool execution
### Current Implementation ✅
The backend implements full ACP support with:
1. **Session Management** (`apps/backend/internal/agent/acp/session.go`)
- Create sessions with `session/new`
- Send prompts with `session/prompt`
- Resume sessions with stored session IDs
2. **JSON-RPC Client** (`apps/backend/pkg/acp/jsonrpc/client.go`)
- Handles responses to client requests
- Handles notifications from agent (`session/update`)
- Handles requests from agent (`session/request_permission`)
3. **Permission Handling**
- Auto-approves workspace indexing permissions
- Auto-approves tool execution permissions (selects first "allow" option)
### ACP Message Types
**1. Client → Agent Requests**
```json
// session/new - Create a new session
{
"jsonrpc": "2.0",
"id": 1,
"method": "session/new",
"params": {
"cwd": "/workspace",
"mcpServers": []
}
}
// session/prompt - Send a prompt to the agent
{
"jsonrpc": "2.0",
"id": 2,
"method": "session/prompt",
"params": {
"sessionId": "uuid",
"content": [{"type": "text", "text": "Analyze this code"}]
}
}
```
**2. Agent → Client Responses**
```json
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"sessionId": "uuid",
"status": "ok"
}
}
```
**3. Agent → Client Notifications**
```json
// session/update - Progress updates
{
"jsonrpc": "2.0",
"method": "session/update",
"params": {
"sessionId": "uuid",
"content": [{"type": "text", "text": "Analyzing..."}],
"stopReason": null
}
}
// session/update - Completion
{
"jsonrpc": "2.0",
"method": "session/update",
"params": {
"sessionId": "uuid",
"content": [{"type": "text", "text": "Done!"}],
"stopReason": "end_turn"
}
}
```
**4. Agent → Client Requests (Permissions)**
```json
// session/request_permission - Agent requests permission
{
"jsonrpc": "2.0",
"id": 5,
"method": "session/request_permission",
"params": {
"sessionId": "uuid",
"toolCall": {
"toolCallId": "workspace-indexing-permission",
"title": "Workspace Indexing Permission"
},
"options": [
{"optionId": "enable", "name": "Enable indexing", "kind": "allow_always"},
{"optionId": "disable", "name": "Disable indexing", "kind": "reject_always"}
]
}
}
// Client responds with selected option
{
"jsonrpc": "2.0",
"id": 5,
"result": {
"outcome": {
"outcome": "selected",
"optionId": "enable"
}
}
}
```
### ACP Flow Through System (Current Implementation)
```mermaid
flowchart TB
subgraph Backend["Kandev Backend"]
Orch["Orchestrator
Execute(task)"]
ACP["ACP Session Manager
Create / Prompt / Permission"]
RPC["JSON-RPC Client
Send requests / Handle responses"]
Orch --> ACP --> RPC
end
RPC <-->|"Docker attach (stdin/stdout)"| Agent
subgraph Agent["Agent Container"]
CLI["Auggie CLI (--acp)"]
stdin["stdin ← JSON-RPC requests"]
stdout["stdout → responses/notifications"]
CLI --- stdin
CLI --- stdout
end
Workspace["/workspace
Mounted from host"] --> Agent
```
### Initial Agent Types
**1. Auggie CLI Agent**
- **Image**: `augmentcode/auggie-cli:latest`
- **Capabilities**: Code analysis, generation, debugging, refactoring
- **ACP Support**: Native
- **Resources**: 2 CPU, 2GB RAM
**2. Gemini Agent**
- **Image**: `kandev/gemini-agent:latest`
- **Capabilities**: Code review, documentation, testing
- **ACP Support**: Wrapper implementation
- **Resources**: 1.5 CPU, 1.5GB RAM
---
## Database Schema Summary
**Tables:**
- `users` - User accounts
- `boards` - Kanban boards
- `tasks` - Tasks on boards
- `task_events` - Audit log of task changes
- `agent_types` - Registry of available agent types
- `agent_instances` - Running/completed agent containers
- `agent_logs` - Logs from agent execution
**Key Relationships:**
- Board → Tasks (1:N)
- Task → Agent Instances (1:N)
- Agent Instance → Agent Logs (1:N)
- User → Boards (1:N)
- User → Tasks (created_by)
---
## Technology Stack
**Backend Services:**
- Language: Go 1.21+
- Web Framework: Gin
- Database: PostgreSQL 15+
- Event Bus: NATS
- Container Runtime: Docker
**Key Libraries:**
- `github.com/docker/docker` - Docker SDK
- `github.com/gin-gonic/gin` - HTTP framework
- **`github.com/gorilla/websocket` - WebSocket support**
- `github.com/jackc/pgx/v5` - PostgreSQL driver
- `github.com/nats-io/nats.go` - NATS client
- `github.com/golang-jwt/jwt/v5` - JWT auth
- `go.uber.org/zap` - Structured logging
- `google.golang.org/grpc` - gRPC communication
---
## Deployment Architecture
### Primary Target: Client Machines (Local Workstations)
The system is designed for deployment on **client machines** (developer workstations, local servers) with direct access to user credentials.
**Architecture:**
```mermaid
flowchart TB
subgraph Machine["User's Local Machine"]
subgraph Infra["Infrastructure (Docker)"]
PG["PostgreSQL"]
NATS["NATS Server"]
end
subgraph Services["Kandev Backend Services"]
Gateway["API Gateway"]
TaskSvc["Task Service"]
OrchSvc["Orchestrator"]
AgentMgr["Agent Manager"]
end
subgraph Docker["Docker Engine"]
Auggie["Auggie Agent"]
Gemini["Gemini Agent"]
More["..."]
end
Creds["Host Credentials (Read-Only)
~/.ssh/ ~/.gitconfig ~/.git-credentials"]
end
Services --> Docker
Creds -.-> Docker
```
**Deployment Methods:**
1. **Docker Compose** (Development)
```bash
docker compose up -d
```
2. **Native Binaries** (Production)
```bash
systemctl start kandev-gateway
systemctl start kandev-task-service
systemctl start kandev-orchestrator
systemctl start kandev-agent-manager
```
**System Requirements:**
- OS: Linux (Ubuntu 20.04+) or macOS 12+
- CPU: 4+ cores (8+ recommended)
- RAM: 8GB minimum (16GB recommended)
- Disk: 50GB free space
- Docker: 20.10+
### Future Deployment Options (Roadmap)
**Cloud Deployment (Phase 7+):**
```
- Deploy to AWS/GCP/Azure
- Managed PostgreSQL and NATS
- Cloud secret managers for credentials
- Horizontal scaling with load balancers
```
**Kubernetes (Phase 8+):**
```
- Helm charts for deployment
- StatefulSets for databases
- Deployments for services
- Horizontal Pod Autoscaling
- Ingress for external access
```
**Multi-User SaaS (Phase 9+):**
```
- Tenant isolation
- Centralized credential management
- Usage-based billing
```