# System Architecture | 系统架构

DocSentinel Architect

**DocSentinel** — System Architecture Document (open-source style) | | | | :--------------- | :------------------------------------------------------------------------ | | **Version** | 5.0 Phase 0 | | **Author** | PAN CHAO | | **Last updated** | 2026-06 | | **Related** | [Product Requirements (PRD)](./SPEC.md) · [Design docs](./docs/README.md) | --- ## Overview | 概述 DocSentinel is an **AI-powered SSDLC (Secure Software Development Lifecycle) platform** that automates security activities across all six phases of the software development lifecycle. Built on **LangChain** and **LangGraph**, the system orchestrates phase-specific AI agents to perform security assessments of documents, questionnaires, and reports — from requirements analysis and threat modeling to vulnerability monitoring and incident response — providing stateful, graph-based agent workflows with stage-aware routing. This document describes the **system architecture**: high-level design, components, data flow, integrations, and deployment. For product goals and requirements, see [SPEC.md](./SPEC.md). --- ## Goals & Context | 目标与背景 - **Goal**: Provide AI-assisted security coverage across the entire SSDLC — Requirements, Design, Development, Testing, Deployment, and Operations — reducing manual effort for security teams while improving coverage and consistency. Automate first-pass assessment of security-related documents (questionnaires, design docs, compliance evidence) and produce structured reports (risks, compliance gaps, remediations). - **Context**: Enterprise security teams must embed security into every phase of delivery (Shift-Left), aligned with frameworks like NIST SSDF, OWASP SAMM, Microsoft SDL, and SOC2. The system provides phase-specific agents orchestrated by LangGraph, a unified knowledge base (RAG) with phase-specific collections, multi-format parsing, pluggable LLMs (cloud or local) via LangChain, and **SSDLC-aware assessment pipelines**. --- ## High-Level Architecture | 高层架构 The system is organized in layers: **React Console / REST / Agent Gateway** → **Shared Assessment Service** → **SSDLC Orchestration (LangGraph)** → **Core Services** → **LLM Abstraction** → **LLM Backends**. MCP and A2A are adapters, not alternate paths around application policy. ![DocSentinel Architecture Overview](docsentinel_architecture.png) ### Mermaid: Logical View ```mermaid flowchart TB subgraph Users["Users"] Staff["Security Staff"] APIUser["API / CI-CD / External Agents"] end subgraph Access["Access Layer"] Console["React Console\n(Vite + Tailwind)"] API["REST API\n(FastAPI)"] Gateway["Agent Gateway"] MCP["MCP\n(stdio + HTTP)"] A2A["A2A 1.0\n(JSON-RPC)"] Tasks["Shared Assessment\nService"] end subgraph Orchestration["SSDLC Orchestration (LangGraph)"] Router["Phase Router"] subgraph Agents["Phase Agents"] A1["Requirements\nAgent"] A2["Design\nAgent"] A3["Development\nAgent"] A4["Testing\nAgent"] A5["Deployment\nAgent"] A6["Operations\nAgent"] end State["Shared State\n& Checkpointing"] end subgraph Core["Core Services"] KB["Knowledge Base\n(RAG)"] Parser["Parser"] Mem["Memory"] Skill["Skills"] end subgraph LLM["LLM Layer (LangChain)"] Abst["LLM Abstraction"] end subgraph Backends["LLM Backends"] Cloud["OpenAI / Claude / Qwen"] Local["Ollama / vLLM"] end subgraph Integrations["Integrations"] AAD["AAD (SSO)"] SN["ServiceNow"] Tools["SAST / DAST\nTools"] end Staff --> Console Staff --> API Console --> API APIUser --> API APIUser --> MCP APIUser --> A2A MCP --> Gateway A2A --> Gateway API --> Tasks Gateway --> Tasks Tasks --> Router Router --> A1 Router --> A2 Router --> A3 Router --> A4 Router --> A5 Router --> A6 A1 & A2 & A3 & A4 & A5 & A6 <--> State A1 & A2 & A3 & A4 & A5 & A6 --> KB A1 & A2 & A3 & A4 & A5 & A6 --> Parser A1 & A2 & A3 & A4 & A5 & A6 --> Skill A1 & A2 & A3 & A4 & A5 & A6 --> Abst Abst --> Cloud Abst --> Local Router -.-> AAD Router -.-> SN A4 -.-> Tools ``` --- ## SSDLC Agent Design | SSDLC Agent 设计 ### LangGraph State Machine | LangGraph 状态机 The core orchestration is a **LangGraph StateGraph** where each SSDLC phase is a node. Edges define the workflow — sequential, parallel, or conditional based on project context and user configuration. ```mermaid stateDiagram-v2 [*] --> Router Router --> Requirements: phase=requirements Router --> Design: phase=design Router --> Development: phase=development Router --> Testing: phase=testing Router --> Deployment: phase=deployment Router --> Operations: phase=operations Router --> FullSSDLC: phase=full state FullSSDLC { [*] --> Requirements Requirements --> Design Design --> Development Development --> Testing Testing --> Deployment Deployment --> Operations Operations --> [*] } Requirements --> Reviewer Design --> Reviewer Development --> Reviewer Testing --> Reviewer Deployment --> Reviewer Operations --> Reviewer Reviewer --> [*] ``` **Key Design Decisions:** - **Shared State**: All agents read/write to a shared `SSDLCState` TypedDict managed by LangGraph. This enables cross-phase traceability (e.g. threat from Design is linked to test case in Testing). - **Checkpointing**: LangGraph's built-in checkpointing persists state across requests, enabling long-running multi-phase assessments. - **Conditional Routing**: The Router node inspects the request (phase, project type, risk level) and routes to the appropriate agent(s). For full SSDLC, agents execute in sequence with optional parallel sub-steps. - **Human-in-the-Loop**: LangGraph interrupt points allow human review before progressing to the next phase. ### Phase Agent Details | 阶段 Agent 详情 | Agent | SSDLC Phase | Key Tools / Skills | Input Examples | Output | | :--- | :--- | :--- | :--- | :--- | | **Requirements Agent** | Requirements | Compliance matcher, risk classifier, requirements extractor | PRDs, BRDs, user stories | Security requirements list, compliance obligations, risk classification | | **Design Agent** | Design | STRIDE analyzer, architecture reviewer, SDR generator | Architecture docs, design specs, data flow diagrams | Threat model (STRIDE/DREAD), SDR report, security architecture findings | | **Development Agent** | Development | Secure coding checker, SAST triage, code reviewer | Source code, coding guidelines, SAST reports | Secure coding findings, SAST triage results, coding recommendations | | **Testing Agent** | Testing | SAST/DAST parser, pentest analyzer, remediation tracker | SAST/DAST reports, pentest findings | Prioritized vulnerabilities, remediation plan, fix verification | | **Deployment Agent** | Deployment | Config reviewer, hardening checker, sign-off generator | Deployment configs, infra-as-code, release checklists | Configuration findings, hardening gaps, release sign-off report | | **Operations Agent** | Operations | CVE analyzer, incident assistant, log auditor | CVE feeds, incident reports, security logs | Vulnerability alerts, incident analysis, audit findings | --- ## Component Design | 组件设计 ### 1. Access Layer | 接入层 - **REST API** (FastAPI): Request validation, routing to SSDLC assessment / KB / health / skills endpoints. Phase-aware endpoints (e.g. `POST /assessments/{phase}`). - **Agent Gateway**: MCP stdio/Streamable HTTP and A2A 1.0 JSON-RPC adapters. Tokenless remote access is loopback-only; all agent submissions require human review. - **Shared Assessment Service**: Owns task state, activity, remediation tracking, and calls into LangGraph for REST, MCP, and A2A alike. - **React Console**: FastAPI serves the production Vite build at `/console`; the console covers assessment, evidence, knowledge-base, skill, and runtime workflows. - **Current boundary**: Authentication (AAD/JWT), tenant isolation, and rate limiting remain future enterprise controls and are not yet wired into endpoints. ### 2. SSDLC Orchestrator (LangGraph) | SSDLC 编排器 - Built on **LangChain + LangGraph**: stateful, graph-based agent workflow with conditional edges. - **Graph Definition**: `StateGraph` with nodes for Router, 6 phase agents, and Reviewer. Graph nodes: Parser → SSDLC Router → Policy+History Agent ∥ Evidence Agent → Drafter Agent → Reviewer Agent. - **State Schema**: `SSDLCState` TypedDict containing parsed documents, phase findings, threat models, cross-phase references, and metadata. - **Conditional Edges**: Route based on requested phase, project risk level, or full SSDLC mode. SSDLC Router node determines the lifecycle stage and injects stage-specific skill + checklist. - **Parallel Execution**: Policy and Evidence nodes run **in parallel** (LangGraph fan-out/fan-in). Within phases, sub-tasks (e.g. KB lookup + document parsing) run concurrently via `asyncio.gather`. - **Checkpointing**: Persistent state via LangGraph `MemorySaver` or database-backed checkpointer. - Assessment submission is **non-blocking** — returns task_id immediately, processes in background. - Singleton `KnowledgeBaseService` and cached LLM client shared across requests. ### 3. Memory | 记忆体 - **Working memory**: LangGraph shared state (`SSDLCState`) persisted via checkpointing. - **Cross-phase context**: Findings from earlier phases are carried forward automatically (e.g. Design threats → Testing test cases). - **History reuse**: Past assessment reports are indexed into a dedicated Chroma collection and retrieved as context for new assessments. - **Status**: LangGraph `MemorySaver` for MVP; database-backed checkpointer for production. ### 4. Skills & Personas | 技能与角色 - **Persona-based Assessment**: Defines "who" is assessing (e.g. ISO 27001 Auditor vs. AppSec Engineer). - **Built-in Persona Skills**: 4 hardcoded personas (ISO 27001 Auditor, AppSec Engineer, GDPR DPO, Cloud Architect) in `skills_registry.py`. - **Phase-specific Personas**: Each SSDLC phase has dedicated personas: - Requirements: Compliance Analyst, Risk Assessor - Design: Threat Modeler, Security Architect - Development: Secure Code Reviewer, SAST Analyst - Testing: Pentest Analyst, Vulnerability Manager - Deployment: Release Security Reviewer, Hardening Specialist - Operations: Vulnerability Monitor, Incident Responder - **Built-in SSDLC Skills**: 6 stage-specific skills (one per SSDLC stage) with tailored `system_prompt`, `risk_focus`, checklists, and `compliance_frameworks`. - **Custom Skills**: File-backed (`data/skills.json`) CRUD via REST API. - **Dynamic Orchestration**: LangGraph injects skill-specific context into RAG queries and LLM prompts based on the selected persona and SSDLC stage. ### 5. Knowledge Base (RAG) | 知识库 - **Vector Store**: ChromaDB for chunk-level similarity search (sentence-transformers embeddings). - **Graph RAG**: LightRAG for entity-relationship aware retrieval (controls → policies → vulnerabilities → threats). Enabled via `ENABLE_GRAPH_RAG` config. - **Phase-specific Collections**: Separate knowledge collections per SSDLC phase: - `kb_requirements`: Compliance frameworks, security policies, requirement templates - `kb_design`: Security patterns, threat catalogs, architecture guidelines - `kb_development`: Secure coding standards, OWASP guidelines, language-specific practices - `kb_testing`: Vulnerability databases, testing methodologies, remediation guides - `kb_deployment`: CIS benchmarks, hardening guides, configuration standards - `kb_operations`: CVE databases, incident playbooks, audit checklists - **Hybrid Query**: When Graph RAG is enabled, results from both vector and graph retrieval are merged and deduplicated. - **History Reuse**: Indexes past assessment responses into a dedicated Chroma collection. - **Singleton**: Single `KnowledgeBaseService` instance shared across the application lifecycle. ### 6. Parser | 文件解析 - **Primary engine**: Docling — preserves tables, headings, and supports OCR for scanned PDFs. Outputs structured Markdown. - **Fallback engine**: Legacy parsers (PyMuPDF, python-docx, openpyxl, python-pptx) for when Docling is unavailable. - **SAST/DAST Report Parsers**: Dedicated parsers for SARIF format, SonarQube JSON, Checkmarx XML, Burp Suite XML, OWASP ZAP reports. - **Engine selection**: Configurable via `PARSER_ENGINE` (`auto` / `docling` / `legacy`). `auto` tries Docling first, falls back to legacy. - Shared pipeline for both assessment input and KB document ingestion. ### 7. LLM Abstraction (LangChain) | LLM 抽象层 - Single interface for chat/completion via **LangChain** (`ChatOpenAI` / `ChatOllama`). - LangChain is also the foundation for LangGraph agent nodes — each node uses LangChain's `Runnable` interface. - **LangChain Tools**: Phase agents use LangChain tools for structured interactions (KB query, document parsing, report generation). - **Prompt Management**: LangChain `ChatPromptTemplate` with phase-specific system prompts and few-shot examples. - **Cached client**: LLM instance is `@lru_cache`d — one client per process lifetime. - Supported providers: OpenAI (and compatible APIs), **Ollama** (local). ```mermaid flowchart LR subgraph Agents["Phase Agents"] A["Requirements / Design / Dev / Test / Deploy / Ops"] end subgraph LangChain["LangChain"] Tools["Tools"] Prompts["Prompt Templates"] LLM["LLM Abstraction"] end subgraph Providers["Providers"] O["OpenAI"] Ol["Ollama"] end A --> Tools A --> Prompts Tools --> LLM Prompts --> LLM LLM --> O LLM --> Ol ``` --- ## SSDLC Pipeline | SSDLC 流水线 DocSentinel supports all 6 SSDLC stages defined by NIST, OWASP, and Microsoft SDL. Each stage has a dedicated skill with stage-specific prompts, checklists, and risk focus areas. ```mermaid flowchart LR subgraph SSDLC["SSDLC Stages"] S1["1. Requirements\n需求"] S2["2. Design\n设计"] S3["3. Development\n开发"] S4["4. Testing\n测试"] S5["5. Deployment\n部署"] S6["6. Operations\n运维"] end S1 --> S2 --> S3 --> S4 --> S5 --> S6 ``` | Stage | Key Assessment Focus | Example Inputs | | :---- | :------------------- | :------------- | | **Requirements** | Security requirements completeness, compliance mapping (GDPR, ISO 27001), risk analysis | Requirements docs, compliance checklists | | **Design** | Security architecture, STRIDE/DREAD threat model, encryption/permission design, SDR | Architecture docs, threat models, data flow diagrams | | **Development** | Secure coding standards, built-in controls (anti-injection, XSS), code review findings | Code review reports, coding guidelines | | **Testing** | SAST/DAST triage, penetration test evaluation, vulnerability fix verification | Scan reports, pen-test findings | | **Deployment** | Release readiness, config security, key management, least privilege, hardening | Deployment configs, release checklists | | **Operations** | Vulnerability monitoring, incident response, patch management, log audit | Incident reports, audit logs, monitoring alerts | ### LangGraph Agent Flow ```mermaid stateDiagram-v2 [*] --> Parse: files received Parse --> SSDLCRouter: parsed docs SSDLCRouter --> PolicyAgent: stage + skill loaded SSDLCRouter --> EvidenceAgent: stage + skill loaded state fork_state <> PolicyAgent --> fork_state EvidenceAgent --> fork_state fork_state --> DrafterAgent: policy_chunks + evidence DrafterAgent --> ReviewerAgent: draft report ReviewerAgent --> [*]: final report + confidence ``` The **SSDLC Router** is a LangGraph node that: 1. Accepts an explicit `ssdlc_stage` parameter, or auto-detects the stage from document content. 2. Loads the corresponding stage skill (system prompt, risk focus, checklist). 3. Routes to the parallel Policy+Evidence fan-out, then sequentially to Drafter and Reviewer. --- ## Data Flow | 数据流 End-to-end flow for an SSDLC assessment: ```mermaid sequenceDiagram participant U as User participant API as REST API participant LG as LangGraph Router participant Agent as Phase Agent participant Parser as Parser participant Router as SSDLC Router participant KB as Knowledge Base participant Skill as Skill participant LLM as LLM (LangChain) participant Review as Human Review U->>API: POST /assessments (files, phase, scenario_id, ssdlc_stage?) API-->>U: 202 Accepted (task_id) API->>LG: background task: route(phase, files, context) LG->>Parser: parse(files) Parser-->>LG: parsed docs LG->>Agent: execute(parsed_docs, state) Agent->>KB: query(phase_collection, relevant policy) KB-->>Agent: chunks Agent->>Skill: apply(persona, parsed_docs, chunks) Skill->>LLM: prompt + context LLM-->>Skill: structured findings Skill-->>Agent: phase findings Agent-->>LG: update state (findings, threats, gaps) LG->>LG: checkpoint state LG-->>API: assessment report U->>API: GET /assessments/{task_id} API-->>U: report U->>Review: review & approve/reject ``` **Full SSDLC Flow:** 1. User submits files and selects phase(s) or "full SSDLC" mode (and optional `ssdlc_stage` / skill ID). API returns `task_id` immediately (non-blocking). 2. **Parser** converts files to unified Markdown/text format (Docling or legacy). 3. **LangGraph Router** determines which phase agent(s) to invoke and loads stage-specific skill + checklist. 4. For full SSDLC, agents execute sequentially (Requirements → Design → Development → Testing → Deployment → Operations), with findings from each phase carried forward in shared state. **Policy+History Agent** queries KB (vector + graph RAG) and **Evidence Agent** scans documents — these run **in parallel** via LangGraph fan-out. 5. Each **Phase Agent**: parses documents → queries phase-specific KB → applies skill persona → calls LLM → produces structured findings. 6. **Drafter Agent** synthesizes findings into a structured report via LLM, guided by the stage checklist. 7. **Reviewer** node validates completeness, assigns confidence (0.0-1.0), cross-references findings across phases. 8. Report with cross-phase traceability is returned for **human-in-the-loop** review. User polls `GET /assessments/{task_id}` to retrieve the completed report. --- ## Integration Points | 集成 ```mermaid flowchart LR subgraph DocSentinel["DocSentinel"] API["API"] LG["LangGraph\nOrchestrator"] Agents["Phase Agents"] end subgraph IdP["Identity"] AAD["Azure AD / Entra ID"] end subgraph PM["Project Management"] SN["ServiceNow"] end subgraph SecTools["Security Tools"] SAST["SAST\n(SonarQube, Checkmarx)"] DAST["DAST\n(Burp, ZAP)"] Scanner["Vuln Scanners\n(Nessus, Qualys)"] end User["User"] -->|Login / Token| AAD AAD -->|JWT / SSO| API LG -->|Project metadata| SN Agents -->|Parse reports| SAST Agents -->|Parse reports| DAST Agents -->|CVE feeds| Scanner ``` - **AAD**: SSO and API token validation (OAuth2/OIDC). - **ServiceNow**: Read project metadata (type, compliance scope, owner); optional write-back of assessment results. - **SAST/DAST Tools**: Import scan results in SARIF, native JSON/XML formats for automated triage by Testing Agent. - **Vulnerability Scanners**: CVE feed integration for Operations Agent monitoring. See [docs/04-integration-guide.md](./docs/04-integration-guide.md) for configuration and field mapping. --- ## Security Architecture | 安全架构 Security is designed along five areas (detailed in [PRD §7.2](./SPEC.md)): | Area | Summary | | :--- | :--- | | **Identity & access** | AAD/SSO, RBAC (analyst, lead, project owner, API consumer, admin), token/API key, data isolation by project/role. | | **Data** | TLS for transport; secrets not in code; minimal retention; optional local-only LLM for data sovereignty. | | **Application** | Input validation, injection prevention (including prompt injection), dependency/SCA, safe error responses, security headers, rate limiting. | | **Operations** | Audit log (who/what/when), LangGraph state transition logging, alerting, backup and recovery. | | **Supply chain** | Trusted dependencies, vulnerability handling, license compliance. | --- ## Deployment View | 部署视图 ```mermaid flowchart TB subgraph Client["Client"] Browser["Browser / CLI / CI-CD / MCP Agent"] end subgraph Server["Server / Container"] App["DocSentinel\n(FastAPI + LangGraph)"] Chroma["Chroma\n(vector store)"] Checkpoint["LangGraph\nCheckpoint Store"] end subgraph External["External"] AAD["AAD"] SN["ServiceNow"] LLM["LLM (OpenAI / Ollama)"] SecTools["SAST/DAST Tools"] end Browser --> App App --> Chroma App --> Checkpoint App --> AAD App --> SN App --> LLM App --> SecTools ``` - **Runtime**: Python 3.10+, FastAPI, Uvicorn, LangGraph, LangChain. - **Storage**: Vector store (Chroma) persisted on disk; LangGraph checkpoint store (memory/SQLite/PostgreSQL); optional Redis for sessions. - **Network**: Outbound to AAD, ServiceNow, LLM endpoints, and SAST/DAST tools; TLS recommended for production. - **Deployment**: Single node / container for MVP; scale out by separating API and agent workers if needed. See [docs/05-deployment-runbook.md](./docs/05-deployment-runbook.md) for environment, configuration, and runbook. --- ## References | 参考 | Document | Description | | :--- | :--- | | [SPEC.md](./SPEC.md) | Product requirements, SSDLC phases, features, security controls. | | [docs/01-architecture-and-tech-stack.md](./docs/01-architecture-and-tech-stack.md) | Technology choices and module layout. | | [docs/openapi.json](./docs/openapi.json) | Authoritative OpenAPI spec generated from FastAPI. | | [docs/03-assessment-report-and-skill-contract.md](./docs/03-assessment-report-and-skill-contract.md) | Report schema and Skill I/O. | | [docs/04-integration-guide.md](./docs/04-integration-guide.md) | AAD, ServiceNow, SAST/DAST integration. | | [docs/05-deployment-runbook.md](./docs/05-deployment-runbook.md) | Deployment and operations. | --- *This architecture document is part of the [DocSentinel](https://github.com/arthurpanhku/DocSentinel) open-source project.*