--- namespace: aiwg name: confusable-unicode-audit platforms: [all] description: Detect bidi controls, zero-width characters, mixed-script identifiers, and homoglyph risks in source and release metadata requires: - git: repository file enumeration ensures: - report: suspicious Unicode occurrences with code point, context, and allowlist status - exit-code: non-zero when violations found and --fail-on-violation is set errors: - allowlist-invalid: .aiwg/security/confusable-unicode-allowlist.yaml cannot be parsed invariants: - exact code points are reported so reviewers do not need visual inspection - allowlisted non-ASCII is still included in the exceptions section commandHint: argumentHint: "[--fail-on-violation] [--include-metadata] [--format text|json]" allowedTools: Read, Bash, Grep model: sonnet category: security orchestration: false --- # Confusable Unicode Audit Detect Trojan Source and homoglyph risks in source files, dependency names, and release metadata. This enforces `no-confusable-unicode` and maps curl Practice 8 into an AIWG control. ## Detection Targets - Bidirectional controls: U+202A through U+202E, U+2066 through U+2069. - Zero-width characters: U+200B through U+200F, U+FEFF. - Non-ASCII identifiers in source code. - Mixed-script identifiers, especially Latin plus Cyrillic or Greek. - Package/dependency names containing non-ASCII or confusable characters. - Optional metadata scan: commit subject, PR titles, release notes. ## Allowlist Legitimate non-ASCII is declared in `.aiwg/security/confusable-unicode-allowlist.yaml`: ```yaml version: 1 allow: - path: "docs/i18n/**" reason: "localized documentation" - identifier: "naive_bayes" codepoints: ["U+00EF"] reason: "historical exported API spelling" ``` ## Output Reports show file, line, column, Unicode code point, character name, and remediation. Bidi and zero-width controls are always HIGH severity. ## References - `agentic/code/frameworks/security-engineering/rules/no-confusable-unicode.md` - Unicode TR39 - Trojan Source / CVE-2021-42574