--- name: agent-safety description: Ensure agent safety - guardrails, content filtering, monitoring, and compliance sasmp_version: "1.3.0" bonded_agent: 07-agent-safety bond_type: PRIMARY_BOND version: "2.0.0" --- # Agent Safety Implement safety systems for responsible AI agent deployment. ## When to Use This Skill Invoke this skill when: - Adding input/output guardrails - Implementing content filtering - Setting up rate limiting - Ensuring compliance (GDPR, SOC2) ## Parameter Schema | Parameter | Type | Required | Description | Default | |-----------|------|----------|-------------|---------| | `task` | string | Yes | Safety goal | - | | `risk_level` | enum | No | `strict`, `moderate`, `permissive` | `strict` | | `filters` | list | No | Filter types to enable | `["injection", "pii", "toxicity"]` | ## Quick Start ```python from guardrails import Guard from guardrails.validators import ToxicLanguage, PIIFilter guard = Guard.from_validators([ ToxicLanguage(threshold=0.8, on_fail="exception"), PIIFilter(on_fail="fix") ]) # Validate output validated = guard.validate(llm_response) ``` ## Guardrail Types ### Input Guardrails ```python # Prompt injection detection INJECTION_PATTERNS = [ r"ignore (previous|all) instructions", r"you are now", r"forget everything" ] ``` ### Output Guardrails ```python # Content filtering filters = [ ToxicityFilter(), PIIRedactor(), HallucinationDetector() ] ``` ## Rate Limiting ```python class RateLimiter: def __init__(self, rpm=60, tpm=100000): self.rpm = rpm self.tpm = tpm def check(self, user_id, tokens): # Token bucket algorithm pass ``` ## Troubleshooting | Issue | Solution | |-------|----------| | False positives | Tune thresholds | | Injection bypass | Add LLM-based detection | | PII leakage | Add secondary validation | | Performance hit | Cache filter results | ## Best Practices - Defense in depth (multiple layers) - Fail-safe defaults (deny by default) - Audit everything - Regular red team testing ## Compliance Checklist - [ ] Input validation active - [ ] Output filtering enabled - [ ] Audit logging configured - [ ] Rate limits set - [ ] PII handling compliant ## Related Skills - `tool-calling` - Input validation - `llm-integration` - API security - `multi-agent` - Per-agent permissions ## References - [OWASP LLM Top 10](https://owasp.org/www-project-top-10-for-large-language-model-applications/) - [Guardrails AI](https://docs.guardrailsai.com/)