# 01 — Technology Choices and Architecture | 技术选型与架构设计 | | | | :-------------- | :------------------------------------------------------- | | **Status** | [x] Updated (v4.0 aligned) \| [ ] In Review \| [ ] Approved | | **Version** | 1.0 | | **Related PRD** | Section 5 System Architecture, Section 8 Tech Stack | --- ## 1. Technology Choices | 技术选型 ### 1.1 Language and Runtime | 语言与运行时 | Item | Choice | Version | Notes | | :-------------- | :--------------- | :------ | :----------------------------- | | **Language** | Python | 3.10+ | | | **Package Mgr** | pip | — | `requirements.txt` | | **Runtime** | CPython / Docker | — | See `05-deployment-runbook.md` | ### 1.2 Web and API | Web 与 API | Item | Choice | Version | Notes | | :------------ | :---------- | :------ | :---------------------------------------------------- | | **Framework** | FastAPI | >=0.109 | Async, auto OpenAPI | | **Server** | Uvicorn | >=0.27 | ASGI server | | **Docs** | OpenAPI 3.x | — | Generated by FastAPI; see `02-api-specification.yaml` | ### 1.3 Agent Orchestration | Agent 编排 | Item | Choice | Version | Notes | | :------------------ | :-------- | :------ | :------------------------------------------ | | **Workflow Engine** | LangGraph | Latest | Stateful graph-based agent orchestration; StateGraph with conditional routing, parallel execution, checkpointing | | **LLM Framework** | LangChain | Latest | Unified LLM abstraction, prompt templates, tool integration, RAG chains | | **State Management** | LangGraph Checkpointing | — | Cross-phase state persistence; MemorySaver (MVP) or DB-backed (production) | ### 1.4 LLM Providers | LLM 提供商 | Item | Choice | Version | Notes | | :------------------ | :------------------- | :------ | :------------------------------------------ | | **Cloud LLM** | OpenAI (ChatGPT) | — | Via LangChain `ChatOpenAI`; compatible with Azure OpenAI, Claude, Qwen | | **Local LLM** | Ollama | — | Via LangChain `ChatOllama`; data stays on-prem | | **LLM Client** | Cached | — | `@lru_cache` — one client per process lifetime | ### 1.5 Vector Store and RAG | 向量库与 RAG | Item | Choice | Version | Notes | | :------------- | :----------------- | :------ | :------------------------------------------ | | **Vector DB** | Chroma | >=0.4 | Embedded, persisted to `CHROMA_PERSIST_DIR`; phase-specific collections | | **Embeddings** | HuggingFace | — | `sentence-transformers/all-MiniLM-L6-v2` | | **Chunking** | RecursiveCharacter | — | 1024 chars, 128 overlap (configurable) | | **Graph RAG** | LightRAG | — | Entity-relationship aware retrieval; `ENABLE_GRAPH_RAG` | ### 1.6 Document Parsing | 文档解析 | Format | Library | Version | Notes | | :---------- | :------------- | :------ | :---------------------------------------- | | **All (primary)** | Docling | Latest | Table/heading preserving; OCR capable; `PARSER_ENGINE=auto` | | **PDF** | PyMuPDF (fitz) | >=1.23 | Fallback when Docling unavailable | | **Word** | python-docx | >=1.1 | | | **Excel** | openpyxl | >=3.1 | | | **PPT** | python-pptx | >=0.6 | | | **SAST/DAST** | Custom parsers | — | SARIF, SonarQube JSON, Checkmarx XML, Burp XML, ZAP | | **Text/MD** | Built-in | — | `.txt`, `.md` | | **Router** | Custom | — | Dispatches by extension in `parse_file()` | ### 1.7 Identity and Integrations | 身份与集成 | Item | Choice | Notes | | :----------- | :---------------- | :-------------------------------------------------------------------- | | **Auth** | OAuth2/OIDC (AAD) | Placeholder in `app/integrations`; see `docs/04-integration-guide.md` | | **Metadata** | ServiceNow | Placeholder in `app/integrations`; see `docs/04-integration-guide.md` | | **SAST/DAST** | Tool connectors | SonarQube, Checkmarx, Burp Suite, OWASP ZAP; see `docs/04-integration-guide.md` | | **Config** | pydantic-settings | `app/core/config.py` reads `.env` | ### 1.8 Storage and Cache | 存储与缓存 | Item | Choice | Notes | | :--------------- | :-------------------- | :------------------------------------------------- | | **Task State** | LangGraph Checkpoints | Persistent state across phases; MemorySaver for MVP, DB-backed for production | | **Vector Store** | Local disk | Persisted to `CHROMA_PERSIST_DIR`; separate collections per SSDLC phase | | **Files** | Transient | Stream processing; parsed content goes to KB/Agent | | **Checkpoints** | Local disk / DB | `LANGGRAPH_CHECKPOINT_DIR` for MVP; PostgreSQL for production | --- ## 2. Architecture and Data Flow | 整体架构与数据流 ### 2.1 Logical Architecture | 逻辑架构 Aligned with PRD Section 5.1. ```text [ Access Layer ] API (FastAPI) / MCP Server (stdio) / CLI | [ SSDLC Orchestration ] LangGraph StateGraph | ├── Phase Router (conditional edges) | ├── SSDLC Pipeline (6-stage router) | ├── Requirements Agent | ├── Design Agent | ├── Development Agent | ├── Testing Agent | ├── Deployment Agent | ├── Operations Agent | └── Reviewer Agent | [ Core Services ] Knowledge Base (Vector + Graph RAG) | Parser (Docling / legacy) | Memory | Skills (persona + SSDLC stage skills) | [ LLM Layer ] LangChain Abstraction | ├── OpenAI / Claude / Qwen (Cloud) | └── Ollama / vLLM (Local) | [ Integrations ] AAD (Auth) | ServiceNow (Metadata) | SAST/DAST Tools ``` ### 2.2 Components and Interfaces | 组件职责与接口 | Component | Responsibility | Interface | | :----------------- | :--------------------------------------------- | :-------------------------------------------- | | **API Layer** | Auth, routing, rate limiting, validation. | REST, see `02-api-specification.yaml` | | **LangGraph Orchestrator** | SSDLC workflow, phase routing, state management, checkpointing. | Internal Python API; `StateGraph` definition | | **Phase Agents** | Phase-specific assessment logic via LangChain tools. | LangGraph node functions; shared `SSDLCState` | | **Memory** | Cross-phase context, LangGraph checkpoints. | LangGraph state + checkpointer | | **Skills** | Phase-specific assessment capabilities. | I/O Contract, see `03-assessment-report...md` | | **Knowledge Base** | Ingest (Parse->Chunk->Embed) and Retrieve per phase. | `upload()`, `query(text, collection)` | | **Parser** | File to unified JSON/Markdown (including SAST/DAST reports). | `parse(file_stream)` -> Schema 03 | | **LLM Layer** | Unified chat/completion API via LangChain. | `invoke(prompt, context)` / LangChain Tools | ### 2.3 KB Chunking Strategy | 知识库切块策略 | Parameter | Value (Default) | Description | | :------------- | :-------------- | :----------------------------------------- | | **Chunk Size** | 1024 | Characters or tokens per chunk. | | **Overlap** | 128 | Overlap to maintain context at boundaries. | | **Splitter** | Recursive | Splits by paragraphs, then sentences. | | **Metadata** | Yes | Filename, page number, section headers, SSDLC phase tag. | | **Collections** | Per phase | `kb_requirements`, `kb_design`, `kb_development`, `kb_testing`, `kb_deployment`, `kb_operations` | --- ## 3. Module Layout | 目录结构 Target implementation structure: ```text DocSentinel/ ├── app/ │ ├── api/ # FastAPI routes: health, assessments, kb, skills │ ├── core/ # Configuration (pydantic-settings), guardrails │ ├── agent/ # LangGraph orchestrator and phase agents │ │ ├── orchestrator.py # LangGraph StateGraph definition │ │ ├── state.py # SSDLCState TypedDict │ │ ├── router.py # Phase routing logic │ │ ├── ssdlc/ # SSDLC pipeline: router, stage skills, checklists │ │ ├── agents/ # Phase agent implementations │ │ │ ├── requirements.py │ │ │ ├── design.py │ │ │ ├── development.py │ │ │ ├── testing.py │ │ │ ├── deployment.py │ │ │ └── operations.py │ │ ├── reviewer.py # Cross-phase review agent │ │ ├── skills_registry.py # Built-in skills per SSDLC phase │ │ └── skills_service.py # Skill CRUD and management │ ├── kb/ # KnowledgeBaseService (Chroma + chunking + phase collections) │ │ └── graph_rag.py # LightRAG integration │ ├── llm/ # LangChain LLM factory and invocation │ ├── parser/ # Parsers: Docling + legacy + SAST/DAST report parsers │ ├── integrations/ # AAD, ServiceNow, SAST/DAST tool connectors │ ├── models/ # Pydantic models for API and internal data │ ├── main.py # App entry point │ └── mcp_server.py # MCP Server ├── docs/ # Design docs and schemas ├── tests/ # Pytest suite ├── requirements.txt # Production dependencies └── .env.example # Environment template ``` --- ## 4. Key Dependencies | 关键依赖 Maintained in `requirements.txt`. Key architectural dependencies: ```text # Web & API fastapi>=0.109.0 uvicorn[standard]>=0.27.0 # Agent Orchestration langgraph>=0.2.0 langchain>=0.2.0 langchain-community langchain-openai langgraph # Graph-based agent orchestration # Vector Store & Graph RAG chromadb>=0.4.22 lightrag-hku # Graph RAG (entity-relationship retrieval) # Parsing docling>=2.0.0 # Primary parser (table/heading/OCR) pymupdf>=1.23 # PDF fallback python-docx>=1.1 # Word fallback openpyxl>=3.1 # Excel fallback python-pptx>=0.6 # PPT fallback # Embeddings sentence-transformers # MCP mcp[cli] # Model Context Protocol server # Utils httpx pydantic-settings>=2.1 python-multipart ``` --- ## 5. Changelog | 修订记录 | Version | Date | Changes | | :------ | :------ | :--------------------------------------------- | | **1.0** | 2026-03 | Major rewrite: LangGraph orchestration, SSDLC phase agents, phase-specific KB collections, SAST/DAST parsers, SSDLC stage skills. | | **0.4** | 2026-03 | Added Graph RAG, Docling parser, MCP Server, singleton KB, async assessment. | | **0.2** | 2025-03 | Updated tech stack versions and module layout. | | **0.1** | Initial | Draft selection. |