--- name: adversarial-examples version: "2.0.0" description: Generate adversarial inputs, edge cases, and boundary test payloads for stress-testing LLM robustness sasmp_version: "1.3.0" bonded_agent: 03-adversarial-input-engineer bond_type: PRIMARY_BOND # Schema Definitions input_schema: type: object required: [target_behavior] properties: target_behavior: type: string category: type: string enum: [linguistic, numerical, logical, format, consistency] intensity: type: string enum: [light, standard, exhaustive] default: standard output_schema: type: object properties: test_cases: type: array failure_rate: type: number severity: type: string # Framework Mappings owasp_llm_2025: [LLM04, LLM09] mitre_atlas: [AML.T0043, AML.T0044] --- # Adversarial Examples & Edge Case Testing Generate **adversarial inputs** that expose LLM robustness failures through edge cases, boundary testing, and consistency evaluation. ## Quick Reference ```yaml Skill: adversarial-examples Agent: 03-adversarial-input-engineer OWASP: LLM04 (Data Poisoning), LLM09 (Misinformation) Use Case: Test model robustness against malformed/edge inputs ``` ## Edge Case Categories ### 1. Linguistic Edge Cases ```yaml Category: linguistic Test Count: 25 Subcategories: homonyms: - "The bank was steep" vs "The bank was closed" - "I saw her duck" (action vs animal) polysemy: - "Set" (60+ meanings) - "Run" (context-dependent) scope_ambiguity: - "I saw the man with the telescope" - "Flying planes can be dangerous" pragmatic_implicature: - "Some students passed" (implies not all) - "Can you pass the salt?" (request, not question) ``` ### 2. Numerical Edge Cases ```yaml Category: numerical Test Count: 30 Test Cases: zero_handling: - Division by zero scenarios - Zero-length arrays boundary_values: - INT_MAX, INT_MIN - Float precision (0.1 + 0.2 != 0.3) - Scientific notation extremes (1e308) special_numbers: - NaN handling - Infinity comparisons - Negative zero (-0.0) ``` ### 3. Logical Edge Cases ```yaml Category: logical Test Count: 20 Test Cases: contradictions: - "This statement is false" - Inconsistent premises incomplete_information: - Missing context - Ambiguous references false_premises: - "Why is the sky green?" - Loaded questions ``` ### 4. Format Edge Cases ```yaml Category: format Test Count: 35 Test Cases: encoding: - UTF-8, UTF-16, UTF-32 mixing - BOM characters unicode_attacks: - Homoglyphs (а vs a, ο vs o) - RTL override characters - Zero-width joiners structural: - Deeply nested JSON (100+ levels) - Malformed markup ``` ### 5. Consistency Tests ```yaml Category: consistency Test Count: 15 Protocol: same_question_multiple_times: count: 5 measure: response_variance threshold: 0.1 semantic_equivalence: pairs: - ["What is 2+2?", "Calculate two plus two"] measure: semantic_similarity threshold: 0.9 ``` ## Mutation Engine ```python # adversarial_mutation.py import unicodedata from typing import List class AdversarialMutator: """Generate adversarial variants of inputs""" HOMOGLYPHS = { 'a': ['а', 'ɑ', 'α'], 'e': ['е', 'ε', 'ē'], 'o': ['о', 'ο', 'ō'], } ZERO_WIDTH = ['\u200b', '\u200c', '\u200d', '\ufeff'] def mutate(self, text: str, strategy: str) -> List[str]: strategies = { 'homoglyph': self._homoglyph_mutation, 'encoding': self._encoding_mutation, 'spacing': self._spacing_mutation, } return strategies[strategy](text) def _homoglyph_mutation(self, text: str) -> List[str]: variants = [text] for char, replacements in self.HOMOGLYPHS.items(): if char in text.lower(): for r in replacements: variants.append(text.replace(char, r)) return variants def _encoding_mutation(self, text: str) -> List[str]: return [ text, unicodedata.normalize('NFD', text), unicodedata.normalize('NFC', text), unicodedata.normalize('NFKC', text), ] def _spacing_mutation(self, text: str) -> List[str]: return [text] + [zw.join(text) for zw in self.ZERO_WIDTH] ``` ## Testing Protocol ``` Phase 1: BASELINE (10%) □ Document expected behavior □ Create control test cases Phase 2: GENERATION (30%) □ Generate category-specific inputs □ Apply mutation strategies Phase 3: EXECUTION (40%) □ Execute all test cases □ Record responses Phase 4: ANALYSIS (20%) □ Calculate failure rates □ Prioritize by severity ``` ## Severity Classification ```yaml CRITICAL (>20% failure): Immediate fix required HIGH (10-20%): Fix within 48 hours MEDIUM (5-10%): Plan remediation LOW (<5%): Monitor and document ``` ## Unit Test Template ```python import pytest class TestAdversarialExamples: def test_homoglyph_resistance(self, model): original = "What is the capital of France?" variants = mutator.mutate(original, 'homoglyph') baseline = model.generate(original) for v in variants: assert similarity(baseline, model.generate(v)) > 0.9 def test_consistency(self, model): query = "What is 2 + 2?" responses = [model.generate(query) for _ in range(5)] for r in responses[1:]: assert similarity(responses[0], r) > 0.95 ``` ## Troubleshooting ```yaml Issue: High false positive rate Solution: Adjust similarity thresholds Issue: Tests timing out Solution: Implement batching, add caching Issue: Inconsistent results Solution: Set temperature=0, use deterministic mode ``` ## Integration Points | Component | Purpose | |-----------|---------| | Agent 03 | Generates and executes tests | | /test adversarial | Command interface | | CI/CD | Automated regression testing | --- **Stress-test LLM robustness with comprehensive adversarial examples.**