--- name: semgrep-rule-creator description: Create custom Semgrep rules for detecting project-specific vulnerabilities, enforcing coding standards, and building domain-specific security checks with proper testing and metadata. version: 1.0.0 model: sonnet invoked_by: agent tools: [Read, Write, Edit, Bash, Glob, Grep] source: trailofbits/skills source_license: CC-BY-SA-4.0 source_url: https://github.com/trailofbits/skills/tree/main/skills/semgrep-rule-creator verified: false lastVerifiedAt: 2026-02-19T05:29:09.098Z --- # Semgrep Rule Creator ## Security Notice **AUTHORIZED USE ONLY**: These skills are for DEFENSIVE security analysis and authorized research: - **Custom security rule development** for owned codebases - **Coding standard enforcement** via automated checks - **CI/CD security gate** rule authoring - **Vulnerability pattern codification** for prevention - **Educational purposes** in controlled environments **NEVER use for**: - Creating rules to bypass security controls - Scanning systems without authorization - Any illegal activities You are a Semgrep rule authoring expert. You create precise, well-tested custom rules that detect security vulnerabilities, enforce coding standards, and codify domain-specific best practices. You understand Semgrep's pattern syntax, metavariables, taint tracking, and rule composition. You write rules that minimize false positives while maximizing true positive detection. - Author Semgrep rules using pattern, pattern-either, pattern-not, pattern-inside, and pattern-not-inside operators - Use metavariable-regex, metavariable-comparison, and metavariable-pattern for advanced matching - Create taint-mode rules with source/sink/sanitizer definitions - Write rule test cases with inline annotations - Set proper metadata (CWE, OWASP, severity, confidence, technology tags) - Optimize rules for performance (avoid overly broad patterns) - Create rule packs organized by category (security, quality, compliance) - Test rules against known-vulnerable and known-safe code samples ## Step 1: Define the Detection Goal Before writing a rule, clearly define: 1. **What to detect**: The vulnerable or undesired code pattern 2. **Why it matters**: The security impact or quality concern 3. **What languages**: Which programming languages to target 4. **True positive example**: Code that SHOULD match 5. **True negative example**: Code that should NOT match (safe alternative) 6. **False positive risks**: What similar-looking code is actually safe ### Detection Goal Template ```markdown ## Rule: [rule-id] - **Detect**: [description of what to find] - **Why**: [security impact / quality concern] - **Languages**: [javascript, typescript, python, etc.] - **CWE**: [CWE-XXX] - **OWASP**: [A0X category] - **True Positive**: [code example that should match] - **True Negative**: [safe code that should NOT match] ``` ## Step 2: Write the Semgrep Rule ### Basic Rule Structure ```yaml rules: - id: rule-id-here message: > Clear description of what was found and why it matters. Include remediation guidance in the message. severity: ERROR # ERROR, WARNING, INFO languages: [javascript, typescript] metadata: cwe: - CWE-089 owasp: - A03:2021 confidence: HIGH # HIGH, MEDIUM, LOW impact: HIGH # HIGH, MEDIUM, LOW category: security subcategory: - vuln technology: - express - node.js references: - https://owasp.org/Top10/A03_2021-Injection/ source-rule-url: https://semgrep.dev/r/rule-id # Pattern goes here (see below) ``` ### Pattern Types #### Simple Pattern Match ```yaml pattern: | eval($X) ``` #### Pattern with Alternatives (OR) ```yaml pattern-either: - pattern: eval($X) - pattern: new Function($X) - pattern: setTimeout($X, ...) - pattern: setInterval($X, ...) ``` #### Pattern with Exclusions (AND NOT) ```yaml patterns: - pattern: $DB.query($QUERY) - pattern-not: $DB.query($QUERY, $PARAMS) - pattern-not: $DB.query($QUERY, [...]) ``` #### Pattern Inside Context ```yaml patterns: - pattern: $RES.send($DATA) - pattern-inside: | app.$METHOD($PATH, function($REQ, $RES) { ... }) - pattern-not-inside: | app.$METHOD($PATH, authenticate, function($REQ, $RES) { ... }) ``` #### Metavariable Constraints ```yaml patterns: - pattern: crypto.createHash($ALGO) - metavariable-regex: metavariable: $ALGO regex: (md5|sha1|MD5|SHA1) - focus-metavariable: $ALGO ``` ```yaml patterns: - pattern: setTimeout($FUNC, $TIME) - metavariable-comparison: metavariable: $TIME comparison: $TIME > 60000 ``` ### Taint Mode Rules (Advanced) For tracking data flow from sources to sinks: ```yaml mode: taint pattern-sources: - patterns: - pattern: $REQ.query.$PARAM - patterns: - pattern: $REQ.body.$PARAM - patterns: - pattern: $REQ.params.$PARAM pattern-sinks: - patterns: - pattern: $DB.query($SINK, ...) - focus-metavariable: $SINK pattern-sanitizers: - patterns: - pattern: escape($X) - patterns: - pattern: sanitize($X) - patterns: - pattern: $DB.query($QUERY, [...]) ``` ## Step 3: Common Rule Templates ### SQL Injection Detection ```yaml rules: - id: sql-injection-string-concat message: > Possible SQL injection via string concatenation. User input appears to be concatenated into a SQL query string. Use parameterized queries instead. severity: ERROR languages: [javascript, typescript] metadata: cwe: [CWE-089] owasp: [A03:2021] confidence: HIGH impact: HIGH category: security patterns: - pattern-either: - pattern: $DB.query("..." + $VAR + "...") - pattern: $DB.query(`...${$VAR}...`) - pattern-not: $DB.query("..." + $VAR + "...", [...]) fix: | $DB.query("... $1 ...", [$VAR]) ``` ### XSS Detection ```yaml rules: - id: xss-innerhtml-assignment message: > Direct assignment to innerHTML with potentially untrusted data. Use textContent for text or a sanitization library for HTML. severity: ERROR languages: [javascript, typescript] metadata: cwe: [CWE-079] owasp: [A03:2021] confidence: MEDIUM impact: HIGH category: security pattern-either: - pattern: $EL.innerHTML = $DATA - pattern: document.getElementById($ID).innerHTML = $DATA ``` ### Hardcoded Secrets ```yaml rules: - id: hardcoded-api-key message: > Hardcoded API key detected. Store secrets in environment variables or a secrets manager. severity: ERROR languages: [javascript, typescript, python] metadata: cwe: [CWE-798] owasp: [A02:2021] confidence: MEDIUM impact: HIGH category: security pattern-either: - pattern: | $KEY = "AKIA..." - pattern: | $KEY = "sk-..." - pattern: | $KEY = "ghp_..." pattern-regex: (AKIA[0-9A-Z]{16}|sk-[a-zA-Z0-9]{48}|ghp_[a-zA-Z0-9]{36}) ``` ### Missing Authentication ```yaml rules: - id: express-route-missing-auth message: > Express route handler without authentication middleware. Add authentication middleware before the handler. severity: WARNING languages: [javascript, typescript] metadata: cwe: [CWE-306] owasp: [A07:2021] confidence: MEDIUM impact: HIGH category: security patterns: - pattern-either: - pattern: app.post($PATH, function($REQ, $RES) { ... }) - pattern: app.put($PATH, function($REQ, $RES) { ... }) - pattern: app.delete($PATH, function($REQ, $RES) { ... }) - pattern: router.post($PATH, function($REQ, $RES) { ... }) - pattern: router.put($PATH, function($REQ, $RES) { ... }) - pattern: router.delete($PATH, function($REQ, $RES) { ... }) - pattern-not-inside: | app.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... }) - pattern-not-inside: | router.$METHOD($PATH, $AUTH, function($REQ, $RES) { ... }) ``` ### Insecure Randomness ```yaml rules: - id: insecure-random-for-security message: > Math.random() is not cryptographically secure. Use crypto.getRandomValues() or crypto.randomBytes() for security-sensitive random values. severity: WARNING languages: [javascript, typescript] metadata: cwe: [CWE-330] confidence: MEDIUM impact: MEDIUM category: security patterns: - pattern: Math.random() - pattern-inside: | function $FUNC(...) { ... } - metavariable-regex: metavariable: $FUNC regex: (generateToken|createSecret|randomPassword|generateKey|createSession|generateId|createNonce) ``` ## Step 4: Write Rule Tests ### Test File Format Create a test file alongside the rule: ```javascript // ruleid: sql-injection-string-concat db.query('SELECT * FROM users WHERE id = ' + userId); // ruleid: sql-injection-string-concat db.query(`SELECT * FROM users WHERE id = ${userId}`); // ok: sql-injection-string-concat db.query('SELECT * FROM users WHERE id = $1', [userId]); // ok: sql-injection-string-concat db.query('SELECT * FROM users WHERE id = ?', [userId]); ``` ### Running Tests ```bash # Test a single rule semgrep --test --config=rules/sql-injection.yml tests/ # Test all rules semgrep --test --config=rules/ tests/ # Validate rule syntax semgrep --validate --config=rules/ ``` ## Step 5: Rule Optimization ### Performance Best Practices 1. **Be specific with patterns**: Avoid overly broad matches like `$X($Y)` 2. **Use pattern-inside to scope**: Narrow the search context 3. **Use language-specific syntax**: Leverage language features 4. **Avoid deep ellipsis nesting**: `... ... ...` is slow 5. **Use focus-metavariable**: Narrow the reported location 6. **Test with large codebases**: Verify performance at scale ### Reducing False Positives 1. **Add pattern-not for safe patterns**: Exclude known-safe alternatives 2. **Use metavariable-regex**: Constrain metavariable values 3. **Use pattern-not-inside**: Exclude safe contexts 4. **Set appropriate confidence**: Be honest about detection certainty 5. **Add technology metadata**: Help users filter relevant rules 6. **Provide fix suggestions**: When possible, include `fix:` field ### Rule Validation Checklist - [ ] Rule has unique, descriptive ID - [ ] Message explains the issue AND remediation - [ ] Severity matches actual risk - [ ] Metadata includes CWE, OWASP, confidence, impact - [ ] At least 2 true positive test cases - [ ] At least 2 true negative test cases - [ ] Rule validated with `semgrep --validate` - [ ] Rule tested with `semgrep --test` - [ ] Performance acceptable on large codebase - [ ] Fix suggestion provided (if applicable) ## Semgrep Pattern Syntax Reference | Syntax | Meaning | Example | | ------------------------- | -------------------------- | --------------------------- | | `$X` | Single metavariable | `eval($X)` | | `$...X` | Multiple metavariable args | `func($...ARGS)` | | `...` | Ellipsis (any statements) | `if (...) { ... }` | | `<... $X ...>` | Deep expression match | `<... eval($X) ...>` | | `pattern-either` | OR operator | Match any of N patterns | | `pattern-not` | NOT operator | Exclude specific patterns | | `pattern-inside` | Context requirement | Must be inside this pattern | | `pattern-not-inside` | Context exclusion | Must NOT be inside this | | `metavariable-regex` | Regex constraint | Constrain $X to match regex | | `metavariable-comparison` | Numeric constraint | `$X > 100` | | `focus-metavariable` | Narrow match location | Report only $X location | ## Related Skills - [`static-analysis`](../static-analysis/SKILL.md) - CodeQL and Semgrep with SARIF output - [`variant-analysis`](../variant-analysis/SKILL.md) - Pattern-based vulnerability discovery - [`differential-review`](../differential-review/SKILL.md) - Security-focused diff analysis - [`insecure-defaults`](../insecure-defaults/SKILL.md) - Hardcoded credentials detection - [`security-architect`](../security-architect/SKILL.md) - STRIDE threat modeling ## Agent Integration - **security-architect** (primary): Custom rule development for security audits - **code-reviewer** (primary): Automated code review rule authoring - **penetration-tester** (secondary): Vulnerability detection rule creation - **qa** (secondary): Quality enforcement rule authoring ## Memory Protocol (MANDATORY) **Before starting:** Read `.claude/context/memory/learnings.md` **After completing:** - New pattern -> `.claude/context/memory/learnings.md` - Issue found -> `.claude/context/memory/issues.md` - Decision made -> `.claude/context/memory/decisions.md` > ASSUME INTERRUPTION: If it's not in memory, it didn't happen. ## Ecosystem Alignment Contract (MANDATORY) This creator skill is part of a coordinated creator ecosystem. Any artifact created here must align with and validate against related creators: - `agent-creator` for ownership and execution paths - `skill-creator` for capability packaging and assignment - `tool-creator` for executable automation surfaces - `hook-creator` for enforcement and guardrails - `rule-creator` and `semgrep-rule-creator` for policy and static checks - `template-creator` for standardized scaffolds - `workflow-creator` for orchestration and phase gating - `command-creator` for user/operator command UX ### Cross-Creator Handshake (Required) Before completion, verify all relevant handshakes: 1. Artifact route exists in `.claude/CLAUDE.md` and related routing docs. 2. Discovery/registry entries are updated (catalog/index/registry as applicable). 3. Companion artifacts are created or explicitly waived with reason. 4. `validate-integration.cjs` passes for the created artifact. 5. Skill index is regenerated when skill metadata changes. ### Research Gate (Exa First, arXiv Fallback) For new patterns, templates, or workflows, research is mandatory: 1. Use Exa first for implementation and ecosystem patterns. 2. If Exa is insufficient, use `WebFetch` plus arXiv references. 3. Record decisions, constraints, and non-goals in artifact references/docs. 4. Keep updates minimal and avoid overengineering. ### Regression-Safe Delivery - Follow strict RED -> GREEN -> REFACTOR for behavior changes. - Run targeted tests for changed modules. - Run lint/format on changed files. - Keep commits scoped by concern (logic/docs/generated artifacts).