giskardlogo

Evals, Red Teaming and Test Generation for Agentic Systems

Modular, Lightweight, Dynamic and Async-first

[![GitHub release](https://img.shields.io/github/v/release/Giskard-AI/giskard)](https://github.com/Giskard-AI/giskard/releases) [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/Giskard-AI/giskard/blob/main/LICENSE) [![Downloads](https://static.pepy.tech/badge/giskard/month)](https://pepy.tech/project/giskard) [![CI](https://github.com/Giskard-AI/giskard-oss/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/Giskard-AI/giskard-oss/actions/workflows/ci.yml/badge.svg?branch=main) [![Giskard on Discord](https://img.shields.io/discord/939190303397666868?label=Discord)](https://gisk.ar/discord)

Docs • Website • Community

> [!IMPORTANT] > **Giskard v3** is a fresh rewrite designed for dynamic, multi-turn testing of AI agents. This release drops heavy dependencies for better efficiency while introducing a more powerful AI vulnerability scanner and enhanced RAG evaluation capabilities. For now, the vulnerability scanner and RAG evaluation still rely on Giskard v2. > **Giskard v2 remains available but is no longer actively maintained.** > Follow progress → [Read the v3 Announcement](https://github.com/orgs/Giskard-AI/discussions/2250) · [Roadmap](https://github.com/Giskard-AI/giskard-oss/issues/2252) ## Install ```sh pip install giskard ``` Requires Python 3.12+. **Telemetry:** Libraries built on `giskard-core` (including `giskard-checks`) may send **optional, aggregated usage analytics** to help improve the product. No prompts, model outputs, or scenario text are included. See [what is collected and how to opt out](libs/giskard-core/README.md#telemetry). --- Giskard is an open-source Python library for **testing and evaluating agentic systems**. The v3 architecture is a modular set of focused packages — each carrying only the dependencies it needs — built from scratch to wrap anything: an LLM, a black-box agent, or a multi-step pipeline. | Status | Package | Description | | -------------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ✅ Beta | `giskard-checks` | Testing & evaluation — scenario API, built-in checks, LLM-as-judge | | 🚧 In progress | `giskard-scan` | Agent vulnerability scanner — red teaming, prompt injection, data leakage (successor of [v2 Scan](https://legacy-docs.giskard.ai/en/stable/open_source/scan/index.html)) | | 📋 Planned | `giskard-rag` | RAG evaluation & synthetic data generation (successor of [v2 RAGET](https://legacy-docs.giskard.ai/en/stable/open_source/testset_generation/index.html)) | ## Giskard Checks — create and apply evals for testing agents ```sh pip install giskard-checks ``` **[Giskard Checks](https://docs.giskard.ai/oss/checks)** is a lightweight library for creating evaluations (evals) that test LLM-based systems — from simple assertions to LLM-as-judge assessments. Unlike traditional unit tests, evals are designed for **non-deterministic outputs** where the same input can produce different valid responses. Use Giskard Checks to: - **Catch regressions** — verify your system still behaves correctly after changes - **Validate RAG quality** — check if answers are grounded in retrieved context - **Enforce safety rules** — ensure outputs conform to your content policies - **Evaluate multi-turn agents** — test full conversations, not just single exchanges Built-in evals include string matching, comparisons, regex, semantic similarity, and LLM-as-judge checks (`Groundedness`, `Conformity`, `LLMJudge`). ### Quickstart ```python from openai import OpenAI from giskard.checks import Scenario, Groundedness client = OpenAI() def get_answer(inputs: str) -> str: response = client.chat.completions.create( model="gpt-5-mini", messages=[{"role": "user", "content": inputs}], ) return response.choices[0].message.content scenario = ( Scenario("test_dynamic_output") .interact( inputs="What is the capital of France?", outputs=get_answer, ) .check( Groundedness( name="answer is grounded", context="France is a country in Western Europe. Its capital is Paris.", ) ) ) result = await scenario.run() result.print_report() ``` > The `run()` method is async. In a script, wrap it with `asyncio.run()`. See the [full docs](https://docs.giskard.ai/oss/checks) for `Suites`, `LLMJudge`, multi-turn scenarios, and more. --- ## Giskard Scan — vulnerability scanner for AI agents ```sh pip install giskard-scan ``` **Giskard Scan** is the red-teaming and vulnerability scanning layer for agentic systems. It generates adversarial test suites automatically from a plain-language description of your agent, covering prompt injection, harmful content, stereotypes, misinformation, and more. Use Giskard Scan to: - **Red-team your agent** — automatically generate adversarial inputs across OWASP LLM Top-10 threat categories - **Run prompt-injection probes** — built-in dataset of injection payloads ready to use - **Extend with custom generators** — pass your own `ScenarioGenerator` instances to `generate_suite`, or register them on `vulnerability_suite_generator_registry` ### Quickstart ```python import asyncio from giskard.scan import vulnerability_scan async def main(): await vulnerability_scan( target=my_agent, description="A customer support chatbot for an e-commerce platform.", languages=["en"], ) asyncio.run(main()) ``` ## Looking for Giskard v2? Giskard v2 included **Scan** (automatic vulnerability detection) and **RAGET** (RAG evaluation test set generation) for both ML models and LLM applications. These features are not available in v3. ```sh pip install "giskard[llm]>2,<3" ``` ### [Scan](https://legacy-docs.giskard.ai/en/stable/open_source/scan/index.html) — automatically detect performance, bias & security issues Wrap your model and run the scan: ```python import giskard import pandas as pd # Replace my_llm_chain with your actual LLM chain or model inference logic def model_predict(df: pd.DataFrame): """The function takes a DataFrame and must return a list of outputs (one per row).""" return [my_llm_chain.run({"query": question}) for question in df["question"]] giskard_model = giskard.Model( model=model_predict, model_type="text_generation", name="My LLM Application", description="A question answering assistant", feature_names=["question"], ) scan_results = giskard.scan(giskard_model) display(scan_results) ```

Scan Example

### [RAGET](https://legacy-docs.giskard.ai/en/stable/open_source/testset_generation/index.html) — generate evaluation datasets for RAG applications Automatically generate questions, reference answers, and context from your knowledge base: ```python import pandas as pd from giskard.rag import generate_testset, KnowledgeBase # Load your knowledge base documents df = pd.read_csv("path/to/your/knowledge_base.csv") knowledge_base = KnowledgeBase.from_pandas(df, columns=["column_1", "column_2"]) testset = generate_testset( knowledge_base, num_questions=60, language='en', agent_description="A customer support chatbot for company X", ) ```

RAGET Example

[Full v2 docs](https://legacy-docs.giskard.ai)

👋 Community

We welcome contributions from the AI community! Read this [guide](./CONTRIBUTING.md) to get started, and join our thriving community on [Discord](https://gisk.ar/discord). Follow the progress and share feedback: [v3 Announcement](https://github.com/orgs/Giskard-AI/discussions/2250) · [Roadmap](https://github.com/Giskard-AI/giskard-oss/issues/2252) 🌟 [Leave us a star](https://github.com/Giskard-AI/giskard), it helps the project to get discovered by others and keeps us motivated to build awesome open-source tools! 🌟 ❤️ If you find our work useful, please consider [sponsoring us](https://github.com/sponsors/Giskard-AI) on GitHub. With a monthly sponsoring, you can get a sponsor badge, display your company in this readme, and get your bug reports prioritized. We also offer one-time sponsoring if you want us to get involved in a consulting project, run a workshop, or give a talk at your company.