# Architecture This page describes how the Mobile Tester Agent is wired together end-to-end: the runtime topology, the request lifecycle, and the responsibilities of each module. The agent targets **Android, iOS, and React Native** apps through a unified tool catalog. --- ## 1. Runtime topology Three processes cooperate at runtime: ```mermaid flowchart TB subgraph Browser FE[React/Vite Dashboard
web/] end subgraph JVM[JVM process — Ktor server :8080] ROUTING[server.Routing
POST /run-test, /stop-test, /config] AGENT[agent.MobileTestAgent
Koog AIAgent singleton] STRAT[agent.strategy.TestingStrategy
Koog graph] TOOLS[agent.tool.mobile.test.MobileTestTools
+ ReportingTools
platform-aware] EXEC[agent.executor.*
Opus 4.7 / DeepSeek V4 Flash /
Gemini 3 Pro / GPT-5.2 Pro / Ollama] end subgraph Targets[Device targets] AND[Android
device / emulator] IOS[iOS
device / simulator] RN[React Native
app on Android or iOS] end subgraph Cloud LLM[(LLM provider)] FS[(Firebase Firestore)] end FE -- /api/* via Vite proxy --> ROUTING ROUTING --> AGENT AGENT --> STRAT AGENT --> TOOLS AGENT --> EXEC EXEC <--> LLM TOOLS --> AND TOOLS --> IOS TOOLS --> RN FE <-.-> FS ``` * **Frontend** runs in the browser. In development, Vite proxies `/api/*` to `http://localhost:8080` (see [vite.config.ts](../web/vite.config.ts)). * **Backend** is a single Ktor/Netty process. The `MobileTestAgent` is a Kotlin `object` (singleton) so its config persists across requests. * **Device targets** — physical Android devices, Android emulators, iOS devices, iOS simulators, and React Native apps running on top of either OS. The tool layer selects the right interaction primitives per platform. * **Cloud**: the chosen LLM provider is called by the executor; Firestore stores user-authored scenarios from the dashboard. --- ## 2. Request lifecycle — `POST /run-test` ```mermaid sequenceDiagram autonumber participant FE as Frontend participant Ktor as Ktor Routing participant Agent as MobileTestAgent participant Strat as TestingStrategy participant LLM as LLM (via executor) participant Tools as MobileTestTools participant Target as Device (Android / iOS / RN) FE->>Ktor: POST /run-test {goal, packageName, steps[]} Ktor->>Agent: runAgent(goal, packageName, steps) Agent->>Agent: build AIAgentConfig (prompt + model + maxIterations) Agent->>Strat: start strategy graph loop reasoning loop Strat->>LLM: nodeCallLLM (request next action) alt LLM returns tool calls LLM-->>Strat: tool_calls[] Strat->>Tools: nodeExecuteToolMultiple (parallel) Tools->>Target: platform-specific UI interaction Target-->>Tools: stdout / OK | NOT_FOUND | … Tools-->>Strat: ReceivedToolResult[] opt tokens > 8000 Strat->>Strat: nodeCompressHistory end Strat->>LLM: nodeSendToolResultMultiple else LLM returns assistant message LLM-->>Strat: final summary Strat-->>Agent: nodeFinish end end Agent-->>Ktor: result string (PASS/FAIL per step) Ktor-->>FE: 200 OK + body ``` The two important loops live in the strategy: 1. **LLM → tools → LLM** until the model emits a plain assistant message. 2. **History compression** kicks in if `prompt.latestTokenUsage > MAX_TOKENS_THRESHOLD` (8000) — see [TestingStrategy.kt:11](../src/main/kotlin/agent/strategy/TestingStrategy.kt). Per-step inside the agent's *behavior*, the system prompt mandates an **act → verify → recover** cycle: the LLM is told never to assume an action worked from the tool's return string alone, and to call `verifyElementVisible` (or `verifyElementNotVisible`) before moving on. See [ai-agent.md](ai-agent.md) for the full prompt. --- ## 3. Module map Backend source lives under [src/main/kotlin/](../src/main/kotlin/): ``` src/main/kotlin/ ├── server/ │ ├── Application.kt # Ktor entry point (EngineMain) │ ├── HTTP.kt # ContentNegotiation (kotlinx.serialization JSON) │ ├── Monitoring.kt # CallLogging │ ├── Routing.kt # POST /run-test, POST /stop-test, POST /config │ └── model/ │ ├── AgentRequest.kt # {goal, packageName, steps[]} │ └── MobileTesterConfigAPI.kt # external config DTO + toMobileConfig() mapper └── agent/ ├── MobileTestAgent.kt # singleton: builds & runs the Koog AIAgent ├── strategy/ │ └── TestingStrategy.kt # Koog strategy graph (LLM ↔ tools ↔ compress) ├── executor/ │ ├── ExecutorInfo.kt # interface { executor, llmModel } │ ├── anthropic/Opus47Executor.kt # Claude Opus 4.7 │ ├── deepSeek/DeepSeekV4FlashExecutor.kt # DeepSeek V4 Flash (default) │ ├── google/Gemini3ProExecutor.kt # Google Gemini 3 Pro Preview │ ├── openRouter/GPT52ProExecutor.kt # OpenRouter GPT-5.2 Pro │ ├── ollama/QWEN36BExecutor.kt # Local Ollama, Qwen 3 0.6B │ ├── ollama/Llama4Executor.kt # Local Ollama (Groq Llama 3 tool-use 8B) │ └── ollama/Grok8BExecutor.kt # Local Ollama (Groq Llama 3 tool-use 8B) ├── model/ │ ├── MobileTesterConfig.kt # internal runtime config │ └── TestScenarioReport.kt # report shape (goal, steps, dates) └── tool/ ├── mobile/test/ │ ├── MobileTestTools.kt # @Tool catalog (taps, swipes, verify, …) │ ├── ReportingTools.kt # screenshots / screen recording │ └── utils/ │ ├── AdbUtils.kt # raw adb command runner + device pinning │ ├── UiAutomatorUtils.kt # XML dump parser, taps, swipes │ ├── MediaUtils.kt # screencap / screenrecord │ ├── Formatter.kt # slugify │ └── UiMatchResult.kt # data class {value, cx, cy} └── reporting/ ├── ReportingTools.kt # init/update/generate report └── utils/TestReportUtils.kt # formatter for report text ``` Frontend source lives under [web/src/](../web/src/) — see [frontend.md](frontend.md) for the page-level breakdown. --- ## 4. Component responsibilities | Component | Responsibility | Key file | |---|---|---| | **Ktor server** | HTTP transport, JSON (de)serialization, request logging | [server/Application.kt](../src/main/kotlin/server/Application.kt) | | **Routing** | Validates payload, dispatches to `MobileTestAgent`, maps exceptions to HTTP codes | [server/Routing.kt](../src/main/kotlin/server/Routing.kt) | | **MobileTestAgent** | Builds the `AIAgent`, installs event handlers (tool-call logging, completion), holds the mutable `MobileTesterConfig` | [agent/MobileTestAgent.kt](../src/main/kotlin/agent/MobileTestAgent.kt) | | **TestingStrategy** | Defines the Koog graph: LLM ↔ tool execution ↔ history compression ↔ finish | [agent/strategy/TestingStrategy.kt](../src/main/kotlin/agent/strategy/TestingStrategy.kt) | | **Executors** | Wire a specific LLM provider + model into the agent; read API keys from `.env` | [agent/executor/](../src/main/kotlin/agent/executor/) | | **MobileTestTools** | The `@Tool`-annotated catalog exposed to the LLM; every method returns a status-prefixed string (`OK`, `TAPPED`, `VISIBLE`, …) the model pattern-matches on | [agent/tool/mobile/test/MobileTestTools.kt](../src/main/kotlin/agent/tool/mobile/test/MobileTestTools.kt) | | **AdbUtils** | Builds and runs `adb` subprocesses; pins a target device serial; auto-recovers offline devices by restarting the adb server | [agent/tool/mobile/test/utils/AdbUtils.kt](../src/main/kotlin/agent/tool/mobile/test/utils/AdbUtils.kt) | | **UiAutomatorUtils** | Calls `uiautomator dump`, parses the XML with regex, computes tap centers from bounds, performs swipes/taps | [agent/tool/mobile/test/utils/UiAutomatorUtils.kt](../src/main/kotlin/agent/tool/mobile/test/utils/UiAutomatorUtils.kt) | | **MediaUtils** | `screencap` / `screenrecord` and pull artifacts from the device to `$HOME_PATH` | [agent/tool/mobile/test/utils/MediaUtils.kt](../src/main/kotlin/agent/tool/mobile/test/utils/MediaUtils.kt) | | **Frontend** | Scenario CRUD (Firestore), settings UI, runs tests via the Vite dev proxy | [web/src/](../web/src/) | --- ## 5. Why this shape? A few non-obvious design decisions worth highlighting: * **Status-prefixed tool returns.** Every `@Tool` returns a string starting with `OK | TAPPED | VISIBLE | NOT_VISIBLE | NOT_FOUND | AMBIGUOUS | ERROR | TIMEOUT`. The LLM is told (in the system prompt) to pattern-match the prefix to decide its next move. This is far cheaper and more reliable than asking the model to interpret free-form text from `adb`. See the constant convention in [MobileTestTools.kt](../src/main/kotlin/agent/tool/mobile/test/MobileTestTools.kt). * **Device serial pinning.** With multiple ADB devices attached, every `adb` call routes through `AdbUtils.runAdb()`, which injects `-s ` after `connectDevice()` selects one. Prevents "more than one device" ambiguity errors mid-test. * **History compression threshold = 8000 tokens.** The original 1000-token threshold caused compression on virtually every tool call, destroying step-by-step memory. See the comment in [TestingStrategy.kt:9](../src/main/kotlin/agent/strategy/TestingStrategy.kt). * **`startTestingScenario` launches via `monkey`, then verifies foreground.** Force-stops any stale instance, wakes the screen, dismisses the keyguard, then launches with `monkey -p -c android.intent.category.LAUNCHER 1` and re-reads `dumpsys activity top` to confirm the package is foreground. Retries once. See [AdbUtils.launchAndVerify](../src/main/kotlin/agent/tool/mobile/test/utils/AdbUtils.kt). * **`hideKeyboard` uses `KEYCODE_BACK` (4), not `KEYCODE_ESCAPE` (111).** Empirically, ESC leaves the keyboard up on the target device; BACK dismisses it on first press. Documented inline.