DeepTeam.

The LLM Red Teaming Framework

Documentation | Vulnerabilities, Attacks, and Features | Getting Started | Confident AI

**DeepTeam** is a simple-to-use, open-source red teaming framework for LLM systems. Think of it as penetration testing, but for LLMs. DeepTeam simulates attacks — jailbreaking, prompt injection, multi-turn exploitation, and more — to uncover vulnerabilities like bias, PII leakage, and SQL injection in your AI agents, RAG pipelines, and chatbots. It also offers **guardrails** to prevent these issues in production. DeepTeam runs **locally on your machine** and is built on [DeepEval](https://github.com/confident-ai/deepeval), the open-source LLM evaluation framework. > [!IMPORTANT] > Need a place for your red teaming results to live? Sign up to the [Confident AI](https://app.confident-ai.com?utm_source=GitHub) platform to manage risk assessments, monitor vulnerabilities in production, and share reports with your team.

Confident AI + DeepTeam

> Want to talk LLM security, need help picking attacks, or just to say hi? [Come join our discord.](https://discord.com/invite/3SEyvpgu2f) # 🔥 Vulnerabilities, Attacks, and Features - 📐 50+ ready-to-use [vulnerabilities](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities) (all with explanations) powered by **ANY** LLM of your choice. Each vulnerability uses LLM-as-a-Judge metrics that run **locally on your machine** to produce binary pass/fail scores with reasoning: -

Data Privacy

- [PII Leakage](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-pii-leakage) — disclosure of sensitive personal information - [Prompt Leakage](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-prompt-leakage) — exposure of system prompt secrets and instructions

Responsible AI

- [Bias](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-bias) — stereotypes and unfair treatment across gender, race, religion, politics - [Toxicity](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-toxicity) — harmful, offensive, or demeaning content - [Child Protection](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-child-protection) — child-related privacy and safety risks - [Ethics](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-ethics) — violations of moral reasoning and organizational values - [Fairness](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-fairness) — discriminatory outcomes across groups and contexts

Security

- [BFLA](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-bfla) — broken function-level authorization - [BOLA](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-bola) — broken object-level authorization - [RBAC](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-rbac) — role-based access control bypass - [Debug Access](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-debug-access) — unauthorized access to debug modes and dev endpoints - [Shell Injection](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-shell-injection) — unauthorized system command execution - [SQL Injection](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-sql-injection) — database query manipulation - [SSRF](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-ssrf) — server-side request forgery to internal services - [Tool Metadata Poisoning](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-tool-metadata-poisoning) — corrupted tool schemas and descriptions - [Cross-Context Retrieval](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-cross-context-retrieval) — data access across isolation boundaries - [System Reconnaissance](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-system-reconnaissance) — probing internal architecture and configurations

Safety

- [Illegal Activity](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-illegal-activity) — facilitation of fraud, weapons, drugs, or other unlawful actions - [Graphic Content](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-graphic-content) — explicit, violent, or sexual material - [Personal Safety](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-personal-safety) — self-harm, harassment, or dangerous advice - [Unexpected Code Execution](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-unexpected-code-execution) — coerced execution of unauthorized code

Business

- [Misinformation](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-misinformation) — factual errors and unsupported claims - [Intellectual Property](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-intellectual-property) — copyright, trademark, and patent violations - [Competition](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-competition) — competitor endorsement and market manipulation

Agentic

- [Goal Theft](https://www.trydeepteam.com/docs/red-teaming-agentic-vulnerabilities-goal-theft) — extracting or redirecting an agent's objectives - [Recursive Hijacking](https://www.trydeepteam.com/docs/red-teaming-agentic-vulnerabilities-recursive-hijacking) — self-modifying goal chains that alter objectives - [Excessive Agency](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-excessive-agency) — agents acting beyond their authority - [Robustness](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-robustness) — input overreliance and prompt hijacking - [Indirect Instruction](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-indirect-instruction) — hidden instructions in retrieved content - [Tool Orchestration Abuse](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-tool-orchestration-abuse) — exploiting tool calling sequences - [Agent Identity & Trust Abuse](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-agent-identity-abuse) — impersonating agent identity - [Inter-Agent Communication Compromise](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-inter-agent-communication-compromise) — spoofing multi-agent message passing - [Autonomous Agent Drift](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-autonomous-agent-drift) — agents deviating from intended goals over time - [Exploit Tool Agent](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-exploit-tool-agent) — weaponizing tools for unintended actions - [External System Abuse](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-external-system-abuse) — using agents to attack external services

Custom

- [Custom Vulnerabilities](https://www.trydeepteam.com/docs/red-teaming-custom-vulnerability) — define and test your own criteria in a few lines of code

- 💥 20+ research-backed [adversarial attack](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks) methods for both single-turn and multi-turn (conversational) red teaming. Attacks enhance baseline vulnerability probes using SOTA techniques like jailbreaking, prompt injection, and encoding-based obfuscation: -

Single-Turn

- [Prompt Injection](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-prompt-injection) — crafted injections that bypass LLM restrictions - [Roleplay](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-roleplay) — persona-based scenarios exploiting collaborative training - [Leetspeak](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-leetspeak) — symbolic character substitution to avoid keyword detection - [ROT13](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-rot13-encoding) — alphabetic rotation to evade content filters - [Base64](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-base64-encoding) — encoding attacks as random-looking data - [Gray Box](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-gray-box-attack) — leveraging partial system knowledge for targeted attacks - [Math Problem](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-math-problem) — disguising attacks within mathematical inputs - [Multilingual](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-multilingual) — translating attacks to less-spoken languages - Prompt Probing — probing the LLM to extract system prompt details - [Adversarial Poetry](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-adversarial-poetry) — transforming attacks into poetic verse with metaphor - [System Override](https://www.trydeepteam.com/docs/red-teaming-agentic-attacks-system-override) — disguising attacks as legitimate system commands - [Permission Escalation](https://www.trydeepteam.com/docs/red-teaming-agentic-attacks-permission-escalation) — shifting perceived identity to bypass role restrictions - [Goal Redirection](https://www.trydeepteam.com/docs/red-teaming-agentic-attacks-goal-redirection) — reframing agent objectives for unauthorized outcomes - [Linguistic Confusion](https://www.trydeepteam.com/docs/red-teaming-agentic-attacks-semantic-manipulation) — semantic ambiguity to confuse language understanding - [Input Bypass](https://www.trydeepteam.com/docs/red-teaming-agentic-attacks-input-bypass) — circumventing validation via exception handling claims - [Context Poisoning](https://www.trydeepteam.com/docs/red-teaming-agentic-attacks-context-poisoning) — injecting false background context to bias reasoning - [Character Stream](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-character-stream) — character-by-character input to bypass filters - [Context Flooding](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-context-flooding) — flooding input with benign text to hide malicious instructions - [Embedded Instruction JSON](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-embedded-instruction-json) — hiding attacks inside realistic JSON structures - [Synthetic Context Injection](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-synthetic-context-injection) — fabricating system context to exploit long-context handling - [Authority Escalation](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-authority-escalation) — framing requests from positions of power - [Emotional Manipulation](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-emotional-manipulation) — high-intensity emotional pressure for unsafe compliance

Multi-Turn

- [Linear Jailbreaking](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-linear-jailbreaking) — iteratively refining attacks using target LLM responses - [Tree Jailbreaking](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-tree-jailbreaking) — exploring parallel attack variations to find the best bypass - [Crescendo Jailbreaking](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-crescendo-jailbreaking) — gradual escalation from benign to harmful prompts - [Sequential Jailbreak](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-sequential-jailbreaking) — multi-turn conversational scaffolding toward restricted outputs - [Bad Likert Judge](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-bad-likert-judge) — exploiting Likert scale evaluation roles to extract harmful content

- 🏛️ Red team against established [AI safety frameworks](https://www.trydeepteam.com/docs/guidelines-and-frameworks) out-of-the-box. Each framework automatically maps its categories to the right vulnerabilities and attacks: - OWASP Top 10 for LLMs 2025 - OWASP Top 10 for Agents 2026 - NIST AI RMF - MITRE ATLAS - BeaverTails - Aegis - 🛡️ 7 production-ready [guardrails](https://www.trydeepteam.com/docs/guardrails) for fast binary classification to guard LLM inputs and outputs in real time. - 🧩 Build your own **custom vulnerabilities** and attacks that integrate seamlessly with DeepTeam's ecosystem. - 🔗 Run red teaming from the **CLI** with YAML configs, or programmatically in Python. - 📊 Access risk assessments, display in dataframes, and save locally in JSON. # 🚀 QuickStart DeepTeam does not require you to define what LLM system you are red teaming — because neither will malicious users. All you need to do is install `deepteam`, define a `model_callback`, and you're good to go. ## Installation ``` pip install -U deepteam ``` ## Red Team Your First LLM ```python from deepteam import red_team from deepteam.vulnerabilities import Bias from deepteam.attacks.single_turn import PromptInjection async def model_callback(input: str) -> str: # Replace this with your LLM application return f"I'm sorry but I can't answer this: {input}" risk_assessment = red_team( model_callback=model_callback, vulnerabilities=[Bias(types=["race"])], attacks=[PromptInjection()] ) ``` Don't forget to set your `OPENAI_API_KEY` as an environment variable before running (you can also use [any custom model](https://deepeval.com/guides/guides-using-custom-llms) supported in DeepEval), and run the file: ```bash python red_team_llm.py ``` **That's it! Your first red team is complete.** Here's what happened: - `model_callback` wraps your LLM system and generates a `str` output for a given `input`. - At red teaming time, `deepteam` simulates a [`PromptInjection`](https://www.trydeepteam.com/docs/red-teaming-adversarial-attacks-prompt-injection) attack targeting [`Bias`](https://www.trydeepteam.com/docs/red-teaming-vulnerabilities-bias) vulnerabilities. - Your `model_callback`'s outputs are evaluated using the `BiasMetric`, producing a binary score of 0 or 1. - The final passing rate for `Bias` is determined by the proportion of scores that equal 1. Unlike traditional evaluation, red teaming does not require a prepared dataset — adversarial attacks are dynamically generated based on the vulnerabilities you want to test for. ## Red Team Against Safety Frameworks Use established AI safety standards like OWASP and NIST instead of manually picking vulnerabilities: ```python from deepteam import red_team from deepteam.frameworks import OWASPTop10 async def model_callback(input: str) -> str: # Replace this with your LLM application return f"I'm sorry but I can't answer this: {input}" risk_assessment = red_team( model_callback=model_callback, framework=OWASPTop10() ) ``` This automatically maps the framework's categories to the right vulnerabilities and attacks. Available frameworks include `OWASPTop10`, `OWASP_ASI_2026`, `NIST`, `MITRE`, `Aegis`, and `BeaverTails`. ## Guard Your LLM in Production Once you've found your vulnerabilities, use DeepTeam's guardrails to prevent them in production: ```python from deepteam import Guardrails from deepteam.guardrails import PromptInjectionGuard, ToxicityGuard, PrivacyGuard guardrails = Guardrails( input_guards=[PromptInjectionGuard(), PrivacyGuard()], output_guards=[ToxicityGuard()] ) # Guard inputs before they reach your LLM input_result = guardrails.guard_input("Tell me how to hack a database") print(input_result.breached) # True # Guard outputs before they reach your users output_result = guardrails.guard_output(input="Hi", output="Here is some toxic content...") print(output_result.breached) # True ``` 7 guards are available out-of-the-box: `ToxicityGuard`, `PromptInjectionGuard`, `PrivacyGuard`, `IllegalGuard`, `HallucinationGuard`, `TopicalGuard`, and `CybersecurityGuard`. [Read the full guardrails docs here.](https://www.trydeepteam.com/docs/guardrails) # DeepTeam with Confident AI [Confident AI](https://app.confident-ai.com?utm_source=GitHub) is the all-in-one platform that integrates natively with DeepTeam and [DeepEval](https://github.com/confident-ai/deepeval). - **Manage risk assessments** — view, compare, and track red teaming results across iterations - **Monitor in production** — detect and alert on vulnerabilities hitting your live LLM system - **Share reports** — generate and distribute security reports across your team - **Run from your IDE** — use Confident AI's MCP server to run red teams, pull results, and inspect vulnerabilities without leaving Cursor or Claude Code

Confident AI

# Contributing Please read [CONTRIBUTING.md](https://github.com/confident-ai/deepteam/blob/main/CONTRIBUTING.md) for details on our code of conduct, and the process for submitting pull requests to us. # Authors Built by the founders of Confident AI. Contact jeffreyip@confident-ai.com for all enquiries. # License DeepTeam is licensed under Apache 2.0 - see the [LICENSE.md](https://github.com/confident-ai/deepteam/blob/main/LICENSE.md) file for details.