# Security Policy | 安全策略

This document covers vulnerability disclosure and security-related practices for the **DocSentinel** project — an AI-powered SSDLC platform. It aligns with [**PRD §7.2 Security Requirements and Controls**](./SPEC.md).

本文档涵盖 **DocSentinel** 项目（AI 驱动的 SSDLC 平台）的漏洞披露与安全实践，遵循 [**PRD §7.2 安全需求与控制**](./SPEC.md)。

---

## Supported Versions | 支持版本

| Version   | Supported          |
| :-------- | :----------------- |
| **4.0.x** | :white_check_mark: |
| **3.1.x** | :white_check_mark: |
| **3.0.x** | :white_check_mark: |
| **2.0.x** | :warning: Limited  |
| < 2.0     | :x:                |

---

## Reporting a Vulnerability | 漏洞报告

If you discover a security vulnerability, please report it responsibly:

1.  **Do not** open a public GitHub issue for security-sensitive findings.
2.  **Email** the maintainers (e.g. the contact in the PRD: `u3638376@connect.hku.hk`) with:
    -   A description of the vulnerability and steps to reproduce.
    -   Impact and suggested fix if possible.
3.  We will acknowledge receipt and aim to respond within a reasonable timeframe. We may ask for more details and will keep you updated on remediation and disclosure.

如果您发现了安全漏洞，请负责任地进行报告：

1.  **请勿**针对敏感安全问题提交公开的 GitHub Issue。
2.  请**发送邮件**给维护者（联系方式见 PRD：`u3638376@connect.hku.hk`），包含：
    -   漏洞描述与复现步骤。
    -   影响范围与建议修复方案（如有）。
3.  我们将在合理时间内确认收到并回复。可能会向您询问更多细节，并同步后续的修复与披露进度。

---

## Security-Related Configuration | 安全相关配置

-   **Secrets**: Do not commit `.env` or any file containing `SECRET_KEY`, API keys, or passwords. Use `.env.example` as a template only.
-   **Input Validation**: File type and size limits are enforced (see `UPLOAD_MAX_FILE_SIZE_MB`, `UPLOAD_MAX_FILES`). Only allowed extensions are parsed (see `app/parser/service.py`).
-   **MCP Document Roots**: `assess_document.file_path` is confined to `MCP_DOCUMENT_ROOTS` before any file read. Configure this to the smallest approved document directory; never expose the MCP server with broad roots such as `/`, a user home directory, or a shared workspace containing secrets.
-   **KB Reindex Roots**: `/api/v1/kb/reindex` is confined to `KB_REINDEX_ROOTS`, including resolved symlink targets. Do not reuse a broad application working directory as the reindex root.
-   **Prompt Injection Guardrails**: Input sanitization via regex pattern detection and length limits is enforced before content reaches the LLM (see `app/core/guardrails.py`). Malicious inputs are rejected with HTTP 400.
-   **TLS**: In production, use HTTPS and TLS 1.2+ for all endpoints and external calls ([PRD §7.2 DATA-01](./SPEC.md)).
-   **Auth**: API currently does not enforce authentication in the MVP; add AAD/API Key as per [PRD §7.2 IAM](./SPEC.md) before exposing externally.
-   **LangGraph State**: Assessment state and checkpoints may contain sensitive document content. Ensure `LANGGRAPH_CHECKPOINT_DIR` is on encrypted storage in production.
-   **SAST/DAST Integration**: When ingesting scan results from external tools, validate report integrity and source authenticity.

-   **机密信息**：请勿提交 `.env` 或任何包含 `SECRET_KEY`、API Key、密码的文件。`.env.example` 仅作为模板使用。
-   **输入验证**：强制执行文件类型与大小限制（见 `UPLOAD_MAX_FILE_SIZE_MB`、`UPLOAD_MAX_FILES`）。仅解析允许的扩展名（见 `app/parser/service.py`）。
-   **MCP 文档根目录**：`assess_document.file_path` 必须在任何文件读取之前被限制在 `MCP_DOCUMENT_ROOTS` 内。请将该值配置为最小必要的批准文档目录；不要在对外暴露 MCP server 时使用 `/`、用户 home 目录或包含密钥的共享工作区等宽泛根目录。
-   **知识库重建根目录**：`/api/v1/kb/reindex` 必须限制在 `KB_REINDEX_ROOTS` 内，并检查解析后的 symlink 目标。不要直接把宽泛的应用工作目录作为重建根目录。
-   **提示注入防护**：通过正则模式检测和长度限制对输入进行清洗，在内容到达 LLM 之前执行（见 `app/core/guardrails.py`）。恶意输入将被 HTTP 400 拒绝。
-   **TLS**：生产环境中，所有端点与外部调用必须使用 HTTPS 和 TLS 1.2+（[PRD §7.2 DATA-01](./SPEC.md)）。
-   **认证**：MVP 阶段 API 暂未强制认证；在对外暴露前，请根据 [PRD §7.2 IAM](./SPEC.md) 添加 AAD/API Key 认证。
-   **LangGraph 状态**：评估状态和检查点可能包含敏感文档内容。生产环境中请确保 `LANGGRAPH_CHECKPOINT_DIR` 位于加密存储上。
-   **SAST/DAST 集成**：从外部工具接入扫描结果时，请验证报告完整性和来源真实性。

---

## Secure Development Guidelines | 安全开发准则

Use the following principles when adding new API, MCP, parser, KB, or agent features.

新增 API、MCP、Parser、KB 或 Agent 功能时，请遵循以下原则。

### Treat External Inputs as Authority Requests | 将外部输入视为权限请求

Any value controlled by an API client, MCP caller, LLM tool call, browser UI, uploaded file, or environment-adjacent integration is untrusted. Before using it to access local resources, ask:

-   **Who supplied this value?**
-   **Whose authority will execute the action?**
-   **What boundary proves this caller is allowed to do it?**

API client、MCP caller、LLM tool call、浏览器 UI、上传文件或外部集成提供的值都不可信。使用它访问本地资源前，应先问：

-   **这个值是谁提供的？**
-   **实际执行动作的是谁的权限？**
-   **有什么边界能证明调用者被允许这样做？**

### File and Path Handling | 文件与路径处理

File paths are not ordinary strings. A caller-controlled path asks the server process to use server-side filesystem permissions.

-   Resolve paths with `Path.resolve()` or equivalent realpath semantics before access.
-   Check that the resolved path is inside an explicit allow-root such as `MCP_DOCUMENT_ROOTS` or `KB_REINDEX_ROOTS`.
-   Validate symlink targets after resolution; symlinks must not escape the allow-root.
-   Reject directories, devices, sockets, and other non-regular files.
-   Validate file extension and size before reading content.
-   Never use extension allow-lists as a substitute for directory confinement.
-   Add tests for absolute paths, `..`, symlink escape, unsupported extensions, and missing files.

文件路径不是普通字符串。调用者可控路径意味着调用者请求 server 进程使用 server 端文件系统权限。

-   访问前使用 `Path.resolve()` 或等价 realpath 语义解析路径。
-   检查解析后的路径是否位于显式允许根目录内，例如 `MCP_DOCUMENT_ROOTS` 或 `KB_REINDEX_ROOTS`。
-   解析后检查 symlink 目标；symlink 不得逃逸允许根目录。
-   拒绝目录、设备文件、socket 和其他非普通文件。
-   在读取内容前验证扩展名和大小。
-   不要把扩展名白名单当作目录访问控制。
-   为绝对路径、`..`、symlink 逃逸、不支持扩展名和不存在文件添加测试。

### MCP and Agent Tools | MCP 与 Agent 工具

MCP tools and A2A messages are security boundaries because an agent may call them based on user input or prompt-injected instructions.

-   Keep tool scopes narrow and explicit.
-   Prefer IDs, handles, or uploaded document references over arbitrary local paths.
-   If a tool must touch local files, require an allow-root and document the expected configuration.
-   Keep tokenless remote protocols confined to loopback. Require `AGENT_GATEWAY_TOKEN`, TLS, and an upstream identity layer before network exposure.
-   Keep MCP DNS-rebinding protection enabled. Set `AGENT_GATEWAY_ALLOWED_HOSTS` and `AGENT_GATEWAY_ALLOWED_ORIGINS` to the smallest production allow-lists that work.
-   Do not let an agent disable collaborative review or approve its own assessment.
-   Treat MCP tools as bounded capabilities and A2A as task delegation; both must call the same application service and policy checks.
-   Return minimal error details; do not echo sensitive paths or content.
-   Assume tool output may be visible to the caller and may be copied into an LLM transcript.

MCP 工具和 A2A 消息都是安全边界，因为 agent 可能基于用户输入或 prompt injection 指令调用它们。

-   保持工具作用域小而明确。
-   优先使用 ID、handle 或上传文档引用，而不是任意本地路径。
-   如果工具必须访问本地文件，必须要求允许根目录并记录配置方式。
-   未配置 token 时，远程协议只能绑定本机回环地址。对网络开放前必须配置 `AGENT_GATEWAY_TOKEN`、TLS 和上游身份认证。
-   保持 MCP DNS rebinding 防护开启，并将 `AGENT_GATEWAY_ALLOWED_HOSTS`、`AGENT_GATEWAY_ALLOWED_ORIGINS` 收敛到最小可信范围。
-   不允许 agent 关闭协作评审，也不允许 agent 审批自己的评估结果。
-   MCP 用于受限工具能力，A2A 用于任务委派；两者必须复用相同的应用服务和策略检查。
-   返回最小必要错误信息；不要回显敏感路径或内容。
-   假设工具输出会被调用者看到，也可能进入 LLM transcript。

### LLM Data Flow | LLM 数据流

Anything sent to an LLM provider can leave the local process. Before passing data to the LLM:

-   Confirm the data was intentionally selected by an authorized workflow.
-   Avoid sending secrets, credentials, raw `.env` content, private keys, or unrelated local files.
-   Preserve citations and metadata so generated findings can be audited.
-   Make local-model and private-deployment modes clear for sensitive use cases.

任何发送给 LLM provider 的内容都可能离开本地进程。传给 LLM 前应确认：

-   数据是由授权工作流有意选择的。
-   避免发送密钥、凭据、原始 `.env` 内容、私钥或无关本地文件。
-   保留引用和元数据，方便审计生成结果。
-   对敏感场景清楚说明本地模型和私有部署模式。

### Required Review Checklist | 必要 Review 检查清单

Before merging security-relevant changes, reviewers should verify:

-   New external inputs have validation, authorization, and boundary checks.
-   File access is confined before `open()`, parser invocation, indexing, or LLM processing.
-   Tests cover the denied path as well as the successful path.
-   Documentation and `.env.example` describe any new security-sensitive setting.
-   The change has been checked with `ruff`, `pytest`, and relevant frontend build/tests.

合并安全相关改动前，reviewer 应确认：

-   新增外部输入具备验证、授权和边界检查。
-   文件访问在 `open()`、parser 调用、索引或 LLM 处理前已完成范围限制。
-   测试覆盖拒绝路径和成功路径。
-   文档和 `.env.example` 描述了新的安全敏感配置。
-   改动已通过 `ruff`、`pytest` 以及相关前端 build/tests。

---

## References | 参考

-   [**SPEC.md Section 7.2**](./SPEC.md) — Security Requirements and Controls (identity, data, application, operations, supply chain).
-   [**ARCHITECTURE.md**](./ARCHITECTURE.md) — System architecture with LangGraph design and security architecture section.
-   [**docs/05-deployment-runbook.md**](./docs/05-deployment-runbook.md) — Deployment, configuration, and network requirements.

-   [**SPEC.md 第 7.2 节**](./SPEC.md) — 安全需求与控制（身份、数据、应用、运维、供应链）。
-   [**ARCHITECTURE.md**](./ARCHITECTURE.md) — 系统架构，含 LangGraph 设计与安全架构章节。
-   [**docs/05-deployment-runbook.md**](./docs/05-deployment-runbook.md) — 部署、配置与网络需求。