GitHub Trending Today for python - Python Daily

usestrix/strix

Sun, 09 Nov 2025 00:04:33 GMT

usestrix/strix

✨ Open-source AI hackers for your apps 👨🏻‍💻

Language: Python

Stars: 4,676

Forks: 516

Stars today: 593 stars today

README

Strix

Open-source AI Hackers to secure your Apps

[![Python](https://img.shields.io/pypi/pyversions/strix-agent?color=3776AB)](https://pypi.org/project/strix-agent/) [![PyPI](https://img.shields.io/pypi/v/strix-agent?color=10b981)](https://pypi.org/project/strix-agent/) [![PyPI Downloads](https://static.pepy.tech/personalized-badge/strix-agent?period=total&units=INTERNATIONAL_SYSTEM&left_color=GREY&right_color=RED&left_text=Downloads)](https://pepy.tech/projects/strix-agent) [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE) [![GitHub Stars](https://img.shields.io/github/stars/usestrix/strix)](https://github.com/usestrix/strix) [![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?&logo=discord&logoColor=white)](https://discord.gg/YjKFvEZSdZ) [![Website](https://img.shields.io/badge/Website-usestrix.com-2d3748.svg)](https://usestrix.com)

:star: _Love Strix? Give us a star to help other developers discover it!_

> [!TIP] > **New!** Strix now integrates seamlessly with GitHub Actions and CI/CD pipelines. Automatically scan for vulnerabilities on every pull request and block insecure code before it reaches production! --- ## 🦉 Strix Overview Strix are autonomous AI agents that act just like real hackers - they run your code dynamically, find vulnerabilities, and validate them through actual proof-of-concepts. Built for developers and security teams who need fast, accurate security testing without the overhead of manual pentesting or the false positives of static analysis tools. - **Full hacker toolkit** out of the box - **Teams of agents** that collaborate and scale - **Real validation** with PoCs, not false positives - **Developer‑first** CLI with actionable reports - **Auto‑fix & reporting** to accelerate remediation --- ### 🎯 Use Cases - Detect and validate critical vulnerabilities in your applications. - Get penetration tests done in hours, not weeks, with compliance reports. - Automate bug bounty research and generate PoCs for faster reporting. - Run tests in CI/CD to block vulnerabilities before reaching production. --- ### 🚀 Quick Start Prerequisites: - Docker (running) - Python 3.12+ - An LLM provider key (or a local LLM) ```bash # Install pipx install strix-agent # Configure AI provider export STRIX_LLM="openai/gpt-5" export LLM_API_KEY="your-api-key" # Run security assessment strix --target ./app-directory ``` First run pulls the sandbox Docker image. Results are saved under `agent_runs/`. ### ☁️ Cloud Hosted Want to skip the setup? Try our cloud-hosted version: **[usestrix.com](https://usestrix.com)** ## ✨ Features ### 🛠️ Agentic Security Tools - **🔌 Full HTTP Proxy** - Full request/response manipulation and analysis - **🌐 Browser Automation** - Multi-tab browser for testing of XSS, CSRF, auth flows - **💻 Terminal Environments** - Interactive shells for command execution and testing - **🐍 Python Runtime** - Custom exploit development and validation - **🔍 Reconnaissance** - Automated OSINT and attack surface mapping - **📁 Code Analysis** - Static and dynamic analysis capabilities - **📝 Knowledge Management** - Structured findings and attack documentation ### 🎯 Comprehensive Vulnerability Detection - **Access Control** - IDOR, privilege escalation, auth bypass - **Injection Attacks** - SQL, NoSQL, command injection - **Server-Side** - SSRF, XXE, deserialization flaws - **Client-Side** - XSS, prototype pollution, DOM vulnerabilities - **Business Logic** - Race conditions, workflow manipulation - **Authentication** - JWT vulnerabilities, session management - **Infrastructure** - Misconfigurations, exposed services ### 🕸️ Graph of Agents - **Distributed Workflows** - Specialized agents for different attacks and assets - **Scalable Testing** - Parallel execution for fast comprehensive coverage - **Dynamic Coordination** - Agents collaborate and share discoveries ## 💻 Usage Examples ```bash # Local codebase analysis strix --target ./app-directory # Repository security review strix --target https://github.com/org/repo # Web application assessment strix --target https://your-app.com # Multi-target white-box testing (source code + deployed app) strix -t https://github.com/org/app -t https://your-app.com # Test multiple environments simultaneously strix -t https://dev.your-app.com -t https://staging.your-app.com -t https://prod.your-app.com # Focused testing with instructions strix --target api.your-app.com --instruction "Prioritize authentication and authorization testing" # Testing with credentials strix --target https://your-app.com --instruction "Test with credentials: testuser/testpass. Focus on privilege escalation and access control bypasses." ``` ### ⚙️ Configuration ```bash export STRIX_LLM="openai/gpt-5" export LLM_API_KEY="your-api-key" # Optional export LLM_API_BASE="your-api-base-url" # if using a local model, e.g. Ollama, LMStudio export PERPLEXITY_API_KEY="your-api-key" # for search capabilities ``` [📚 View supported AI models](https://docs.litellm.ai/docs/providers) ### 🤖 Headless Mode Run Strix programmatically without interactive UI using the `-n/--non-interactive` flag—perfect for servers and automated jobs. The CLI prints real-time vulnerability findings, and the final report before exiting. Exits with non-zero code when vulnerabilities are found. ```bash strix -n --target https://your-app.com --instruction "Focus on authentication and authorization vulnerabilities" ``` ### 🔄 CI/CD (GitHub Actions) Strix can be added to your pipeline to run a security test on pull requests with a lightweight GitHub Actions workflow: ```yaml name: strix-penetration-test on: pull_request: jobs: security-scan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Install Strix run: pipx install strix-agent - name: Run Strix env: STRIX_LLM: ${{ secrets.STRIX_LLM }} LLM_API_KEY: ${{ secrets.LLM_API_KEY }} run: strix -n -t ./ ``` ## 🏆 Enterprise Platform Our managed platform provides: - **📈 Executive Dashboards** - **🧠 Custom Fine-Tuned Models** - **⚙️ CI/CD Integration** - **🔍 Large-Scale Scanning** - **🔌 Third-Party Integrations** - **🎯 Enterprise Support** [**Get Enterprise Demo →**](https://usestrix.com) ## 🔒 Security Architecture - **Container Isolation** - All testing in sandboxed Docker environments - **Local Processing** - Testing runs locally, no data sent to external services > [!WARNING] > Only test systems you own or have permission to test. You are responsible for using Strix ethically and legally. ## 🤝 Contributing We welcome contributions from the community! There are several ways to contribute: ### Code Contributions See our [Contributing Guide](CONTRIBUTING.md) for details on: - Setting up your development environment - Running tests and quality checks - Submitting pull requests - Code style guidelines ### Prompt Modules Collection Help expand our collection of specialized prompt modules for AI agents: - Advanced testing techniques for vulnerabilities, frameworks, and technologies - See [Prompt Modules Documentation](strix/prompts/README.md) for guidelines - Submit via [pull requests](https://github.com/usestrix/strix/pulls) or [issues](https://github.com/usestrix/strix/issues) ## 🌟 Support the Project **Love Strix?** Give us a ⭐ on GitHub! ## 👥 Join Our Community Have questions? Found a bug? Want to contribute? **[Join our Discord!](https://discord.gg/YjKFvEZSdZ)**

localstack/localstack

Sun, 09 Nov 2025 00:04:32 GMT

localstack/localstack

💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline

Language: Python

Stars: 62,574

Forks: 4,380

Stars today: 139 stars today

README

:zap: We are thrilled to announce the release of LocalStack 4.9 :zap:

LocalStack is a cloud software development framework to develop and test your AWS applications locally.

Overview • Install • Quickstart • Run • Usage • Releases • Contributing
📖 Docs • 💻 Pro version • ☑️ LocalStack coverage

Tinker Cookbook

  


We provide two libraries for the broader community to customize their language models: `tinker` and `tinker-cookbook`.

- `tinker` is a training SDK for researchers and developers to fine-tune language models. You send API requests to us and we handle the complexities of distributed training.
- `tinker-cookbook` includes realistic examples of fine-tuning language models. It builds on the Tinker API and provides common abstractions to fine-tune language models.

## Installation

1. Sign up for Tinker through the [waitlist](https://thinkingmachines.ai/tinker).
2. Once you have access, create an API key from the [console](https://tinker-console.thinkingmachines.ai) and export it as environment variable `TINKER_API_KEY`.
3. Install tinker python client via `pip install tinker`
4. We recommend installing `tinker-cookbook` in a virtual env either with `conda` or `uv`. For running most examples, you can install via `pip install -e .`.

## Tinker

Refer to the [docs](https://tinker-docs.thinkingmachines.ai/training-sampling) to start from basics.
Here we introduce a few Tinker primitives - the basic components to fine-tune LLMs:

```python
service_client = tinker.ServiceClient()
training_client = service_client.create_lora_training_client(
  base_model="meta-llama/Llama-3.2-1B", rank=32,
)
training_client.forward_backward(...)
training_client.optim_step(...)
training_client.save_state(...)
training_client.load_state(...)

sampling_client = training_client.save_weights_and_get_sampling_client(name="my_model")
sampling_client.sample(...)
```

See [tinker_cookbook/recipes/sl_loop.py](tinker_cookbook/recipes/sl_loop.py) and [tinker_cookbook/recipes/rl_loop.py](tinker_cookbook/recipes/rl_loop.py) for minimal examples of using these primitives to fine-tune LLMs.

To download the weights of any model:
```python
rest_client = service_client.create_rest_client()
future = rest_client.download_checkpoint_archive_from_tinker_path(sampling_client.model_path)
with open(f"model-checkpoint.tar.gz", "wb") as f:
    f.write(future.result())
```

### Tinker Cookbook

Besides these primitives, we also offer **Tinker Cookbook** (a.k.a. this repo), a library of a wide range of abstractions to help you customize training environments.
[`tinker_cookbook/recipes/sl_basic.py`](tinker_cookbook/recipes/sl_basic.py) and [`tinker_cookbook/recipes/rl_basic.py`](tinker_cookbook/recipes/rl_basic.py) contain minimal examples to configure supervised learning and reinforcement learning.

We also include a wide range of more sophisticated examples in the [`tinker_cookbook/recipes/`](tinker_cookbook/recipes/) folder:
1. **[Chat supervised learning](tinker_cookbook/recipes/chat_sl/)**: supervised fine-tuning on conversational datasets like Tulu3.
2. **[Math reasoning](tinker_cookbook/recipes/math_rl/)**: improve LLM reasoning capability by rewarding it for answering math questions correctly.
3. **[Preference learning](tinker_cookbook/recipes/preference/)**: showcase a three-stage RLHF pipeline: 1) supervised fine-tuning, 2) learning a reward model, 3) RL against the reward model.
4. **[Tool use](tinker_cookbook/recipes/tool_use/)**: train LLMs to better use retrieval tools to answer questions more accurately.
5. **[Prompt distillation](tinker_cookbook/recipes/prompt_distillation/)**: internalize long and complex instructions into LLMs.
6. **[Multi-Agent](tinker_cookbook/recipes/multiplayer_rl/)**: optimize LLMs to play against another LLM or themselves.

These examples are located in each subfolder, and their `README.md` files will walk you through the key implementation details, the commands to run them, and the expected performance.

### Import our utilities

Tinker cookbook includes several utilities. Here's a quick overview:
- [`renderers`](tinker_cookbook/renderers.py) converts tokens from/to structured chat message objects
- [`hyperparam_utils`](tinker_cookbook/hyperparam_utils.py) helps calculate hyperparameters suitable for LoRAs
- [`evaluation`](tinker_cookbook/eval/evaluators.py) provides abstractions for evaluating Tinker models and [`inspect_evaluation`](tinker_cookbook/eval/inspect_evaluators.py) shows how to integrate with InspectAI to make evaluating on standard benchmarks easy.

## Contributing

This project is built in the spirit of open science and collaborative development. We believe that the best tools emerge through community involvement and shared learning.

We welcome PR contributions after our private beta is over. If you have any feedback, please email us at tinker@thinkingmachines.ai.

## Citation
If you use Tinker for your research, please cite it as:
```
Thinking Machines Lab, 2025. Tinker. https://thinkingmachines.ai/tinker/.
```

Or use this BibTeX citation:
```
@misc{tml2025tinker,
  author = {Thinking Machines Lab},
  title = {Tinker},
  year = {2025},
  url = {https://thinkingmachines.ai/tinker/},
}
```

jamwithai/arxiv-paper-curator

Sun, 09 Nov 2025 00:04:25 GMT

jamwithai/arxiv-paper-curator

Language: Python

Stars: 1,342

Forks: 424

Stars today: 42 stars today

README

# The Mother of AI Project ## Phase 1 RAG Systems: arXiv Paper Curator

A Learner-Focused Journey into Production RAG Systems

Learn to build modern AI systems from the ground up through hands-on implementation

Master the most in-demand AI engineering skills: RAG (Retrieval-Augmented Generation)

## 📖 About This Course This is a **learner-focused project** where you'll build a complete research assistant system that automatically fetches academic papers, understands their content, and answers your research questions using advanced RAG techniques. **The arXiv Paper Curator** will teach you to build a **production-grade RAG system using industry best practices**. Unlike tutorials that jump straight to vector search, we follow the **professional path**: master keyword search foundations first, then enhance with vectors for hybrid retrieval. > **🎯 The Professional Difference:** We build RAG systems the way successful companies do - solid search foundations enhanced with AI, not AI-first approaches that ignore search fundamentals. By the end of this course, you'll have your own AI research assistant and the deep technical skills to build production RAG systems for any domain. ### **🎓 What You'll Build** - **Week 1:** Complete infrastructure with Docker, FastAPI, PostgreSQL, OpenSearch, and Airflow - **Week 2:** Automated data pipeline fetching and parsing academic papers from arXiv - **Week 3:** Production BM25 keyword search with filtering and relevance scoring - **Week 4:** Intelligent chunking + hybrid search combining keywords with semantic understanding - **Week 5:** **Complete RAG pipeline with local LLM, streaming responses, and Gradio interface** - **Week 6:** **Production monitoring with Langfuse tracing and Redis caching for optimized performance** --- ## 🚀 Quick Start ### **📋 Prerequisites** - **Docker Desktop** (with Docker Compose) - **Python 3.12+** - **UV Package Manager** ([Install Guide](https://docs.astral.sh/uv/getting-started/installation/)) - **8GB+ RAM** and **20GB+ free disk space** ### **⚡ Get Started** ```bash # 1. Clone and setup git clone cd arxiv-paper-curator # 2. Configure environment (IMPORTANT!) cp .env.example .env # The .env file contains all necessary configuration for OpenSearch, # arXiv API, and service connections. Defaults work out of the box. # For Week 4: Add JINA_API_KEY=your_key_here for hybrid search # 3. Install dependencies uv sync # 4. Start all services docker compose up --build -d # 5. Verify everything works curl http://localhost:8000/health ``` ### **📚 Weekly Learning Path** | Week | Topic | Blog Post | Code Release | |------|-------|-----------|--------------| | **Week 0** | The Mother of AI project - 6 phases | [The Mother of AI project](https://jamwithai.substack.com/p/the-mother-of-ai-project) | - | | **Week 1** | Infrastructure Foundation | [The Infrastructure That Powers RAG Systems](https://jamwithai.substack.com/p/the-infrastructure-that-powers-rag) | [week1.0](https://github.com/jamwithai/arxiv-paper-curator/releases/tag/week1.0) | | **Week 2** | Data Ingestion Pipeline | [Building Data Ingestion Pipelines for RAG](https://jamwithai.substack.com/p/bringing-your-rag-system-to-life) | [week2.0](https://github.com/jamwithai/arxiv-paper-curator/releases/tag/week2.0) | | **Week 3** | OpenSearch ingestion & BM25 retrieval | [The Search Foundation Every RAG System Needs](https://jamwithai.substack.com/p/the-search-foundation-every-rag-system) | [week3.0](https://github.com/jamwithai/arxiv-paper-curator/releases/tag/week3.0) | | **Week 4** | **Chunking & Hybrid Search** | [The Chunking Strategy That Makes Hybrid Search Work](https://jamwithai.substack.com/p/chunking-strategies-and-hybrid-rag) | [week4.0](https://github.com/jamwithai/arxiv-paper-curator/releases/tag/week4.0) | | **Week 5** | **Complete RAG system** | [The Complete RAG System](https://jamwithai.substack.com/p/the-complete-rag-system) | [week5.0](https://github.com/jamwithai/arxiv-paper-curator/releases/tag/week5.0) | | **Week 6** | **Production monitoring & caching** | [Production-ready RAG: Monitoring & Caching](https://jamwithai.substack.com/p/production-ready-rag-monitoring-and) | [week6.0](https://github.com/jamwithai/arxiv-paper-curator/releases/tag/week6.0) | **📥 Clone a specific week's release:** ```bash # Clone a specific week's code git clone --branch https://github.com/jamwithai/arxiv-paper-curator cd arxiv-paper-curator uv sync docker compose down -v docker compose up --build -d # Replace with: week1.0, week2.0, etc. ``` ### **📊 Access Your Services** | Service | URL | Purpose | |---------|-----|---------| | **API Documentation** | http://localhost:8000/docs | Interactive API testing | | **Gradio RAG Interface** | http://localhost:7861 | User-friendly chat interface | | **Langfuse Dashboard** | http://localhost:3000 | RAG pipeline monitoring & tracing | | **Airflow Dashboard** | http://localhost:8080 | Workflow management | | **OpenSearch Dashboards** | http://localhost:5601 | Hybrid search engine UI | #### **NOTE**: Check airflow/simple_auth_manager_passwords.json.generated for Airflow username and password --- ## 📚 Week 1: Infrastructure Foundation ✅ **Start here!** Master the infrastructure that powers modern RAG systems. ### **🎯 Learning Objectives** - Complete infrastructure setup with Docker Compose - FastAPI development with automatic documentation and health checks - PostgreSQL database configuration and management - OpenSearch hybrid search engine setup - Ollama local LLM service configuration - Service orchestration and health monitoring - Professional development environment with code quality tools ### **🏗️ Architecture Overview**

**Infrastructure Components:** - **FastAPI**: REST endpoints with async support (Port 8000) - **PostgreSQL 16**: Paper metadata storage (Port 5432) - **OpenSearch 2.19**: Search engine with dashboards (Ports 9200, 5601) - **Apache Airflow 3.0**: Workflow orchestration (Port 8080) - **Ollama**: Local LLM server (Port 11434) ### **📓 Setup Guide** ```bash # Launch the Week 1 notebook uv run jupyter notebook notebooks/week1/week1_setup.ipynb ``` ### **✅ Success Criteria** Complete when you can: - [ ] Start all services with `docker compose up -d` - [ ] Access API docs at http://localhost:8000/docs - [ ] Login to Airflow at http://localhost:8080 - [ ] Browse OpenSearch at http://localhost:5601 - [ ] All tests pass: `uv run pytest` ### **📖 Deep Dive** **Blog Post:** [The Infrastructure That Powers RAG Systems](https://jamwithai.substack.com/p/the-infrastructure-that-powers-rag) - Detailed walkthrough and production insights --- ## 📚 Week 2: Data Ingestion Pipeline ✅ **Building on Week 1 infrastructure:** Learn to fetch, process, and store academic papers automatically. ### **🎯 Learning Objectives** - arXiv API integration with rate limiting and retry logic - Scientific PDF parsing using Docling - Automated data ingestion pipelines with Apache Airflow - Metadata extraction and storage workflows - Complete paper processing from API to database ### **🏗️ Architecture Overview**

**Data Pipeline Components:** - **MetadataFetcher**: 🎯 Main orchestrator coordinating the entire pipeline - **ArxivClient**: Rate-limited paper fetching with retry logic - **PDFParserService**: Docling-powered scientific document processing - **Airflow DAGs**: Automated daily paper ingestion workflows - **PostgreSQL Storage**: Structured paper metadata and content ### **📓 Implementation Guide** ```bash # Launch the Week 2 notebook uv run jupyter notebook notebooks/week2/week2_arxiv_integration.ipynb ``` ### **💻 Code Examples** **arXiv API Integration:** ```python # Example: Fetch papers with rate limiting from src.services.arxiv.factory import make_arxiv_client async def fetch_recent_papers(): client = make_arxiv_client() papers = await client.search_papers( query="cat:cs.AI", max_results=10, from_date="20240801", to_date="20240807" ) return papers ``` **PDF Processing Pipeline:** ```python # Example: Parse PDF with Docling from src.services.pdf_parser.factory import make_pdf_parser_service async def process_paper_pdf(pdf_url: str): parser = make_pdf_parser_service() parsed_content = await parser.parse_pdf_from_url(pdf_url) return parsed_content # Structured content with text, tables, figures ``` **Complete Ingestion Workflow:** ```python # Example: Full paper ingestion pipeline from src.services.metadata_fetcher import make_metadata_fetcher async def ingest_papers(): fetcher = make_metadata_fetcher() results = await fetcher.fetch_and_store_papers( query="cat:cs.AI", max_results=5, from_date="20240807" ) return results # Papers stored in database with full content ``` ### **✅ Success Criteria** Complete when you can: - [ ] Fetch papers from arXiv API: Test in Week 2 notebook - [ ] Parse PDF content with Docling: View extracted structured content - [ ] Run Airflow DAG: `arxiv_paper_ingestion` executes successfully - [ ] Verify database storage: Papers appear in PostgreSQL with full content - [ ] API endpoints work: `/papers` returns stored papers with metadata ### **📖 Deep Dive** **Blog Post:** [Building Data Ingestion Pipelines for RAG](https://jamwithai.substack.com/p/bringing-your-rag-system-to-life) - arXiv API integration and PDF processing --- ## 📚 Week 3: Keyword Search First - The Critical Foundation ⚡ > **🚨 The 90% Problem:** Most RAG systems jump straight to vector search and miss the foundation that powers the best retrieval systems. We're doing it right! **Building on Weeks 1-2 foundation:** Implement the keyword search foundation that professional RAG systems rely on. ### **🎯 Why Keyword Search First?** **The Reality Check:** Vector search alone is not enough. The most effective RAG systems use **hybrid retrieval** - combining keyword search (BM25) with vector search. Here's why we start with keywords: 1. **🔍 Exact Match Power:** Keywords excel at finding specific terms, technical jargon, and precise phrases 2. **📊 Interpretable Results:** You can understand exactly why a document was retrieved 3. **⚡ Speed & Efficiency:** BM25 is computationally fast and doesn't require expensive embedding models 4. **🎯 Domain Knowledge:** Technical papers often require exact terminology matches that vector search might miss 5. **📈 Production Reality:** Companies like Elasticsearch, Algolia, and enterprise search all use keyword search as their foundation ### **🏗️ Week 3 Architecture Overview**

Complete Week 3 architecture showing the OpenSearch integration flow

**Search Infrastructure:** Master full-text search with OpenSearch before adding vector complexity. #### **🎯 Learning Objectives** - **Foundation First:** Why keyword search is essential for RAG systems - **OpenSearch Mastery:** Index management, mappings, and search optimization - **BM25 Algorithm:** Understanding the math behind effective keyword search - **Query DSL:** Building complex search queries with filters and boosting - **Search Analytics:** Measuring search relevance and performance - **Production Patterns:** How real companies structure their search systems #### **Key Components** - `src/services/opensearch/`: Professional search service implementation - `src/routers/search.py`: Search API endpoints with BM25 scoring - `notebooks/week3/`: Complete OpenSearch integration guide - **Search Quality Metrics:** Precision, recall, and relevance scoring #### **💡 The Pedagogical Approach** ``` Week 3: Master keyword search (BM25) ← YOU ARE HERE Week 4: Add intelligent chunking strategies Week 5: Introduce vector embeddings for hybrid retrieval Week 6: Optimize the complete hybrid system ``` **This progression mirrors how successful companies build search systems - solid foundation first, then enhance with advanced techniques.** ### **📓 Week 3 Implementation Guide** ```bash # Launch the Week 3 notebook uv run jupyter notebook notebooks/week3/week3_opensearch.ipynb ``` ### **💻 Code Examples** **BM25 Search Implementation:** ```python # Example: Search papers with BM25 scoring from src.services.opensearch.factory import make_opensearch_client async def search_papers(): client = make_opensearch_client() results = await client.search_papers( query="transformer attention mechanism", max_results=10, categories=["cs.AI", "cs.LG"] ) return results # Papers ranked by BM25 relevance ``` **Search API Usage:** ```python # Example: Use the search endpoint import httpx async def query_papers(): async with httpx.AsyncClient() as client: response = await client.post("http://localhost:8000/api/v1/search", json={ "query": "neural networks optimization", "max_results": 5, "latest_papers": True }) return response.json() ``` ### **✅ Success Criteria** Complete when you can: - [ ] Index papers in OpenSearch: Papers searchable via OpenSearch Dashboards - [ ] Search via API: `/search` endpoint returns relevant papers with BM25 scoring - [ ] Filter by categories: Search within specific arXiv categories (cs.AI, cs.LG, etc.) - [ ] Sort by relevance or date: Toggle between BM25 scoring and latest papers - [ ] View search analytics: Understanding why papers matched your query ### **📖 Deep Dive** **Blog Post:** [The Search Foundation Every RAG System Needs](https://jamwithai.substack.com/p/the-search-foundation-every-rag-system) - Complete BM25 implementation with OpenSearch --- ## 📚 Week 4: Chunking & Hybrid Search - The Semantic Layer 🔥 > **🚀 The Intelligence Upgrade:** Now we enhance our solid BM25 foundation with semantic understanding through intelligent chunking and hybrid retrieval. **Building on Week 3 foundation:** Add the semantic layer that makes search truly intelligent. ### **🎯 Why Chunking + Hybrid Search?** **The Next Level:** With solid BM25 search proven, we can now intelligently add semantic capabilities: 1. **🧩 Smart Chunking:** Break documents into coherent, searchable segments that preserve context 2. **🤖 Semantic Understanding:** Find relevant content even when users paraphrase or use synonyms 3. **⚖️ Hybrid Excellence:** Combine keyword precision with semantic recall using RRF fusion 4. **📊 Best of Both Worlds:** Fast exact matching + deep semantic understanding 5. **🏭 Production Reality:** How modern RAG systems actually work in practice ### **🏗️ Week 4 Architecture Overview**

Complete Week 4 hybrid search architecture with chunking, embeddings, and RRF fusion

**Hybrid Search Infrastructure:** Production-grade chunking strategies with unified search supporting BM25, vector, and hybrid modes. #### **🎯 Learning Objectives** - **Section-Based Chunking:** Intelligent document segmentation that respects structure - **Production Embeddings:** Jina AI integration with fallback strategies - **Hybrid Search Mastery:** RRF fusion combining keyword + semantic retrieval - **Unified API Design:** Single endpoint supporting multiple search modes - **Performance Analysis:** Understanding trade-offs between search approaches #### **Key Components** - `src/services/indexing/text_chunker.py`: Section-aware chunking with overlap strategies - `src/services/embeddings/`: Production embedding pipeline with Jina AI - `src/routers/hybrid_search.py`: Unified search API supporting all modes - `notebooks/week4/`: Complete hybrid search implementation guide ### **📓 Week 4 Implementation Guide** ```bash # Launch the Week 4 notebook uv run jupyter notebook notebooks/week4/week4_hybrid_search.ipynb ``` ### **💻 Code Examples** **Section-Based Chunking:** ```python # Example: Intelligent document chunking from src.services.indexing.text_chunker import TextChunker chunker = TextChunker(chunk_size=600, overlap_size=100) chunks = chunker.chunk_paper( title="Attention Mechanisms in Neural Networks", abstract="Recent advances in attention...", full_text=paper_content, sections=parsed_sections # From Docling PDF parsing ) # Result: Coherent chunks respecting document structure ``` **Hybrid Search Implementation:** ```python # Example: Unified search supporting multiple modes async def search_papers(query: str, use_hybrid: bool = True): async with httpx.AsyncClient() as client: response = await client.post("http://localhost:8000/api/v1/hybrid-search/", json={ "query": query, "use_hybrid": use_hybrid, # Auto-generates embeddings "size": 10, "categories": ["cs.AI"] }) return response.json() # BM25 only: Fast keyword matching (~50ms) bm25_results = await search_papers("transformer attention", use_hybrid=False) # Hybrid search: Semantic + keyword understanding (~400ms) hybrid_results = await search_papers("how to make models more efficient", use_hybrid=True) ``` ### **✅ Success Criteria** Complete when you can: - [ ] Chunk documents intelligently: Papers broken into coherent 600-word segments - [ ] Generate embeddings: Jina AI integration working with automatic query embedding - [ ] Hybrid search working: RRF fusion combining BM25 + vector similarity - [ ] Compare search modes: Understand when to use BM25 vs hybrid search - [ ] Production API ready: `/hybrid-search` endpoint handling all search types ### **📊 Performance Benchmarks** | Search Mode | Speed | Precision@10 | Recall@10 | Use Case | |-------------|-------|--------------|-----------|----------| | **BM25 Only** | ~50ms | 0.67 | 0.71 | Exact keywords, author names | | **Hybrid (RRF)** | ~400ms | 0.84 | 0.89 | Conceptual queries, synonyms | ### **📖 Deep Dive** **Blog Post:** [The Chunking Strategy That Makes Hybrid Search Work](link-to-week4-blog) - Production chunking and RRF fusion implementation --- ## 📚 Week 5: Complete RAG Pipeline with LLM Integration 🚀 > **🎯 The RAG Completion:** Transform search results into intelligent answers with local LLM integration and streaming responses. **Building on Week 4 hybrid search:** Add the LLM layer that turns search into intelligent conversation. ### **🎯 Why Local LLM + Streaming?** **The Production Advantage:** Complete the RAG pipeline with privacy-first, optimized generation: 1. **🏠 Local LLM Control:** Complete data privacy with Ollama - no external API calls 2. **⚡ 6x Performance Boost:** Optimized from 120s → 15-20s through prompt engineering 3. **📡 Real-time Streaming:** Server-Sent Events for immediate user feedback 4. **🎛️ User-Friendly Interface:** Gradio web UI for non-technical users 5. **🔧 Production Ready:** Clean API design ... [README content truncated due to size. Visit the repository for the complete README] ...

Skyvern-AI/skyvern

Sun, 09 Nov 2025 00:04:24 GMT

Skyvern-AI/skyvern

Automate browser based workflows with AI

Language: Python

Stars: 17,180

Forks: 1,453

Stars today: 205 stars today

README

🐉 Automate Browser-based workflows using LLMs and Computer Vision 🐉

[Skyvern](https://www.skyvern.com) automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows on a large number of websites, replacing brittle or unreliable automation solutions.

Traditional approaches to browser automations required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed. Instead of only relying on code-defined XPath interactions, Skyvern relies on Vision LLMs to learn and interact with the websites. # How it works Skyvern was inspired by the Task-Driven autonomous agent design popularized by [BabyAGI](https://github.com/yoheinakajima/babyagi) and [AutoGPT](https://github.com/Significant-Gravitas/AutoGPT) -- with one major bonus: we give Skyvern the ability to interact with websites using browser automation libraries like [Playwright](https://playwright.dev/). Skyvern uses a swarm of agents to comprehend a website, and plan and execute its actions: This approach has a few advantages: 1. Skyvern can operate on websites it's never seen before, as it's able to map visual elements to actions necessary to complete a workflow, without any customized code 1. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate 1. Skyvern is able to take a single workflow and apply it to a large number of websites, as it's able to reason through the interactions necessary to complete the workflow 1. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include: 1. If you wanted to get an auto insurance quote from Geico, the answer to a common question "Were you eligible to drive at 18?" could be inferred from the driver receiving their license at age 16 1. If you were doing competitor analysis, it's understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!) A detailed technical report can be found [here](https://www.skyvern.com/blog/skyvern-2-0-state-of-the-art-web-navigation-with-85-8-on-webvoyager-eval/). # Demo https://github.com/user-attachments/assets/5cab4668-e8e2-4982-8551-aab05ff73a7f # Performance & Evaluation Skyvern has SOTA performance on the [WebBench benchmark](webbench.ai) with a 64.4% accuracy. The technical report + evaluation can be found [here](https://www.skyvern.com/blog/web-bench-a-new-way-to-compare-ai-browser-agents/)

## Performance on WRITE tasks (eg filling out forms, logging in, downloading files, etc) Skyvern is the best performing agent on WRITE tasks (eg filling out forms, logging in, downloading files, etc), which is primarily used for RPA (Robotic Process Automation) adjacent tasks.

# Quickstart ## Skyvern Cloud [Skyvern Cloud](https://app.skyvern.com) is a managed cloud version of Skyvern that allows you to run Skyvern without worrying about the infrastructure. It allows you to run multiple Skyvern instances in parallel and comes bundled with anti-bot detection mechanisms, proxy network, and CAPTCHA solvers. If you'd like to try it out, navigate to [app.skyvern.com](https://app.skyvern.com) and create an account. ## Install & Run Dependencies needed: - [Python 3.11.x](https://www.python.org/downloads/), works with 3.12, not ready yet for 3.13 - [NodeJS & NPM](https://nodejs.org/en/download/) Additionally, for Windows: - [Rust](https://rustup.rs/) - VS Code with C++ dev tools and Windows SDK ### 1. Install Skyvern ```bash pip install skyvern ``` ### 2. Run Skyvern This is most helpful for first time run (db setup, db migrations etc). ```bash skyvern quickstart ``` ### 3. Run task #### UI (Recommended) Start the Skyvern service and UI (when DB is up and running) ```bash skyvern run all ``` Go to http://localhost:8080 and use the UI to run a task #### Code ```python from skyvern import Skyvern skyvern = Skyvern() task = await skyvern.run_task(prompt="Find the top post on hackernews today") print(task) ``` Skyvern starts running the task in a browser that pops up and closes it when the task is done. You will be able to view the task from http://localhost:8080/history You can also run a task on different targets: ```python from skyvern import Skyvern # Run on Skyvern Cloud skyvern = Skyvern(api_key="SKYVERN API KEY") # Local Skyvern service skyvern = Skyvern(base_url="http://localhost:8000", api_key="LOCAL SKYVERN API KEY") task = await skyvern.run_task(prompt="Find the top post on hackernews today") print(task) ``` ## Advanced Usage ### Control your own browser (Chrome) > ⚠️ WARNING: Since [Chrome 136](https://developer.chrome.com/blog/remote-debugging-port), Chrome refuses any CDP connect to the browser using the default user_data_dir. In order to use your browser data, Skyvern copies your default user_data_dir to `./tmp/user_data_dir` the first time connecting to your local browser. ⚠️ 1. Just With Python Code ```python from skyvern import Skyvern # The path to your Chrome browser. This example path is for Mac. browser_path = "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" skyvern = Skyvern( base_url="http://localhost:8000", api_key="YOUR_API_KEY", browser_path=browser_path, ) task = await skyvern.run_task( prompt="Find the top post on hackernews today", ) ``` 2. With Skyvern Service Add two variables to your .env file: ```bash # The path to your Chrome browser. This example path is for Mac. CHROME_EXECUTABLE_PATH="/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" BROWSER_TYPE=cdp-connect ``` Restart Skyvern service `skyvern run all` and run the task through UI or code ### Run Skyvern with any remote browser Grab the cdp connection url and pass it to Skyvern ```python from skyvern import Skyvern skyvern = Skyvern(cdp_url="your cdp connection url") task = await skyvern.run_task( prompt="Find the top post on hackernews today", ) ``` ### Get consistent output schema from your run You can do this by adding the `data_extraction_schema` parameter: ```python from skyvern import Skyvern skyvern = Skyvern() task = await skyvern.run_task( prompt="Find the top post on hackernews today", data_extraction_schema={ "type": "object", "properties": { "title": { "type": "string", "description": "The title of the top post" }, "url": { "type": "string", "description": "The URL of the top post" }, "points": { "type": "integer", "description": "Number of points the post has received" } } } ) ``` ### Helpful commands to debug issues ```bash # Launch the Skyvern Server Separately* skyvern run server # Launch the Skyvern UI skyvern run ui # Check status of the Skyvern service skyvern status # Stop the Skyvern service skyvern stop all # Stop the Skyvern UI skyvern stop ui # Stop the Skyvern Server Separately skyvern stop server ``` ## Docker Compose setup 1. Make sure you have [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed and running on your machine 1. Make sure you don't have postgres running locally (Run `docker ps` to check) 1. Clone the repository and navigate to the root directory 1. Run `skyvern init llm` to generate a `.env` file. This will be copied into the Docker image. 1. Fill in the LLM provider key on the [docker-compose.yml](./docker-compose.yml). *If you want to run Skyvern on a remote server, make sure you set the correct server ip for the UI container in [docker-compose.yml](./docker-compose.yml).* 2. Run the following command via the commandline: ```bash docker compose up -d ``` 3. Navigate to `http://localhost:8080` in your browser to start using the UI > **Important:** Only one Postgres container can run on port 5432 at a time. If you switch from the CLI-managed Postgres to Docker Compose, you must first remove the original container: > ```bash > docker rm -f postgresql-container > ``` If you encounter any database related errors while using Docker to run Skyvern, check which Postgres container is running with `docker ps`. # Skyvern Features ## Skyvern Tasks Tasks are the fundamental building block inside Skyvern. Each task is a single request to Skyvern, instructing it to navigate through a website and accomplish a specific goal. Tasks require you to specify a `url`, `prompt`, and can optionally include a `data schema` (if you want the output to conform to a specific schema) and `error codes` (if you want Skyvern to stop running in specific situations).

## Skyvern Workflows Workflows are a way to chain multiple tasks together to form a cohesive unit of work. For example, if you wanted to download all invoices newer than January 1st, you could create a workflow that first navigated to the invoices page, then filtered down to only show invoices newer than January 1st, extracted a list of all eligible invoices, and iterated through each invoice to download it. Another example is if you wanted to automate purchasing products from an e-commerce store, you could create a workflow that first navigated to the desired product, then added it to a cart. Second, it would navigate to the cart and validate the cart state. Finally, it would go through the checkout process to purchase the items. Supported workflow features include: 1. Browser Task 1. Browser Action 1. Data Extraction 1. Validation 1. For Loops 1. File parsing 1. Sending emails 1. Text Prompts 1. HTTP Request Block 1. Custom Code Block 1. Uploading files to block storage 1. (Coming soon) Conditionals

## Livestreaming Skyvern allows you to livestream the viewport of the browser to your local machine so that you can see exactly what Skyvern is doing on the web. This is useful for debugging and understanding how Skyvern is interacting with a website, and intervening when necessary ## Form Filling Skyvern is natively capable of filling out form inputs on websites. Passing in information via the `navigation_goal` will allow Skyvern to comprehend the information and fill out the form accordingly. ## Data Extraction Skyvern is also capable of extracting data from a website. You can also specify a `data_extraction_schema` directly within the main prompt to tell Skyvern exactly what data you'd like to extract from the website, in jsonc format. Skyvern's output will be structured in accordance to the supplied schema. ## File Downloading Skyvern is also capable of downloading files from a website. All downloaded files are automatically uploaded to block storage (if configured), and you can access them via the UI. ## Authentication Skyvern supports a number of different authentication methods to make it easier to automate tasks behind a login. If you'd like to try it out, please reach out to us [via email](mailto:founders@skyvern.com) or [discord](https://discord.gg/fG2XXEuQX3).

### 🔐 2FA Support (TOTP) Skyvern supports a number of different 2FA methods to allow you to automate workflows that require 2FA. Examples include: 1. QR-based 2FA (e.g. Google Authenticator, Authy) 1. Email based 2FA 1. SMS based 2FA 🔐 Learn more about 2FA support [here](https://www.skyvern.com/docs/credentials/totp). ### Password Manager Integrations Skyvern currently supports the following password manager integrations: - [x] Bitwarden - [ ] 1Password - [ ] LastPass ## Model Context Protocol (MCP) Skyvern supports the Model Context Protocol (MCP) to allow you to use any LLM that supports MCP. See the MCP documentation [here](https://github.com/Skyvern-AI/skyvern/blob/main/integrations/mcp/README.md) ## Zapier / Make.com / N8N Integration Skyvern supports Zapier, Make.com, and N8N to allow you to connect your Skyvern workflows to other apps. * [Zapier](https://www.skyvern.com/docs/integrations/zapier) * [Make.com](https://www.skyvern.com/docs/integrations/make.com) * [N8N](https://www.skyvern.com/docs/integrations/n8n) 🔐 Learn more about 2FA support [here](https://www.skyvern.com/docs/credentials/totp). # Real-world examples of Skyvern We love to see how Skyvern is being used in the wild. Here are some examples of how Skyvern is being used to automate workflows in the real world. Please open PRs to add your own examples! ## Invoice Downloading on many different websites [Book a demo to see it live](https://meetings.hubspot.com/skyvern/demo)

## Automate the job application process [💡 See it in action](https://app.skyvern.com/tasks/create/job_application)

## Automate materials procurement for a manufacturing company [💡 See it in action](https://app.skyvern.com/tasks/create/finditparts)

## Navigating to government websites to register accounts or fill out forms [💡 See it in action](https://app.skyvern.com/tasks/create/california_edd)

## Filling out random contact us forms [💡 See it in action](https://app.skyvern.com/tasks/create/contact_us_forms)

## Retrieving insurance quotes from insurance providers in any language [💡 See it in action](https://app.skyvern.com/tasks/create/bci_seguros)

[💡 See it in action](https://app.skyvern.com/tasks/create/geico)

# Contributor Setup Make sure to have [uv](https://docs.astral.sh/uv/getting-started/installation/) installed. 1. Run this to create your virtual environment (`.venv`) ```bash uv sync --group dev ``` 2. Perform initial server configuration ```bash uv run skyvern quickstart ``` 3. Navigate to `http://localhost:8080` in your browser to start using the UI *The Skyvern CLI supports Windows, WSL, macOS, and Linux environments.* # Documentation More extensive documentation can be found on our [📕 docs page](https://www.skyvern.com/docs). Please let us know if something is unclear or missing by opening an issue or reaching out to us [via email](mailto:founders@skyvern.com) or [discord](https://discord.gg/fG2XXEuQX3). # Supported LLMs | Provider | Supported Models | | -------- | ------- | | OpenAI | gpt4-turbo, gpt-4o, gpt-4o-mini | | Anthropic | Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet) | | Azure OpenAI | Any GPT models. Better performance with a multimodal llm (azure/gpt4-o) | | AWS Bedrock | Anthropic Claude 3 (Haiku, Sonnet, Opus), Claude 3.5 (Sonnet) | | Gemini | Gemini 2.5 Pro and flash, Gemini 2.0 | | Ollama | Run any locally hosted model via [Ollama](https://github.com/ollama/ollama) | | OpenRouter | Access models through [OpenRouter](https://openrouter.ai) | | OpenAI-compatible | Any custom API endpoint that follows OpenAI's API format (via [liteLLM](https://docs.litellm.ai/docs/providers/openai_compatible)) | #### Environment Variables ##### OpenAI | Variable | Description| Type | Sample Value| | -------- | ------- | ------- | ------- | | `ENABLE_OPENAI`| Register OpenAI models | Boolean | `true`, `false` | | `OPENAI_API_KEY` | OpenAI API Key | String | `sk-1234567890` | | `OPENAI_API_BASE` | OpenAI API Base, optional | String | `https://openai.api.base` | | `OPENAI_ORGANIZATION` | OpenAI Organization ID, optional | String | `your-org-id` | Recommended `LLM_KEY`: `OPENAI_GPT4O`, `OPENAI_GPT4O_MINI`, `OPENAI_GPT4_1`, `OPENAI_O4_MINI`, `OPENAI_O3` ##### Anthropic | Variable | Description| Type | Sample Value| | -------- | ------- | ------- | ------- | | `ENABLE_ANTHROPIC` | Register Anthropic models| Boolean | `true`, `false` | | `ANTHROPIC_API_KEY` | Anthropic API key| String | `sk-1234567890` | Recommended`LLM_KEY`: `ANTHROPIC_CLAUDE3.5_SONNET`, `ANTHROPIC_CLAUDE3.7_SONNET`, `ANTHROPIC_CLAUDE4_OPUS`, `ANTHROPIC_CLAUDE4_SONNET` ##### Azure OpenAI | Variable | Description| Type | Sample Value| | -------- | ------- | ------- | ------- | | `ENABLE_AZURE` | Register Azure OpenAI models | Boolean | `true`, `false` | | `AZURE_API_KEY` | Azure deployment API key | String | `sk-1234567890` | | `AZURE_DEPLOYMENT` | Azure OpenAI Deployment Name | String | `skyvern-deployment`| | `AZURE_API_BASE` | Azure deployment api base url| String | `https://skyvern-deployment.openai.azure.com/`| | `AZURE_API_VERSION` | Azure API Version| String | `2024-02-01`| Recommended `LLM_KEY`: `AZURE_OPENAI` ##### AWS Bedrock | Variable | Description| Type | Sample Value| | -------- | ------- | ------- | ------- | | `ENABLE_BEDROCK` | Register AWS Bedrock models. To use AWS Bedrock, you need to make sure your [AWS configurations](https://github.com/boto/boto3?tab=readme-ov-file#using-boto3) are set up correctly first. | Boolean | `true`, `false` | Recommended `LLM_KEY`: `BEDROCK_ANTHROPIC_CLAUDE3.7_SONNET_INFERENCE_PROFILE`, `BEDROCK_ANTHROPIC_CLAUDE4_OPUS_INFERENCE_PROFILE`, `BEDROCK_ANTHROPIC_CLAUDE4_SONNET_INFERENCE_PROFILE` ##### Gemini | Variable | Description| Type | Sample Value| | -------- | ------- | ------- | ------- | | `ENABLE_GEMINI` | Register Gemini models| Boolean | `true`, `false` | | `GEMINI_API_KEY` | Gemini API Key| String | `your_google_gemini_api_key`| Recommended `LLM_KEY`: `GEMINI_2.5_PRO_PREVIEW`, `GEMINI_2.5_FLASH_PREVIEW` ##### Ollama | Variable ... [README content truncated due to size. Visit the repository for the complete README] ...

cheahjs/free-llm-api-resources

Sun, 09 Nov 2025 00:04:23 GMT

cheahjs/free-llm-api-resources

A list of free LLM inference resources accessible via API.

Language: Python

Stars: 6,444

Forks: 608

Stars today: 23 stars today

README


# Free LLM API resources

This lists various services that provide free access or credits towards API-based LLM usage.

> [!NOTE]  
> Please don't abuse these services, else we might lose them.

> [!WARNING]  
> This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)

- [Free Providers](#free-providers)
  - [OpenRouter](#openrouter)
  - [Google AI Studio](#google-ai-studio)
  - [NVIDIA NIM](#nvidia-nim)
  - [Mistral (La Plateforme)](#mistral-la-plateforme)
  - [Mistral (Codestral)](#mistral-codestral)
  - [HuggingFace Inference Providers](#huggingface-inference-providers)
  - [Vercel AI Gateway](#vercel-ai-gateway)
  - [Cerebras](#cerebras)
  - [Groq](#groq)
  - [Cohere](#cohere)
  - [GitHub Models](#github-models)
  - [Cloudflare Workers AI](#cloudflare-workers-ai)
  - [Google Cloud Vertex AI](#google-cloud-vertex-ai)
- [Providers with trial credits](#providers-with-trial-credits)
  - [Fireworks](#fireworks)
  - [Baseten](#baseten)
  - [Nebius](#nebius)
  - [Novita](#novita)
  - [AI21](#ai21)
  - [Upstage](#upstage)
  - [NLP Cloud](#nlp-cloud)
  - [Alibaba Cloud (International) Model Studio](#alibaba-cloud-international-model-studio)
  - [Modal](#modal)
  - [Inference.net](#inferencenet)
  - [Hyperbolic](#hyperbolic)
  - [SambaNova Cloud](#sambanova-cloud)
  - [Scaleway Generative APIs](#scaleway-generative-apis)

## Free Providers

### [OpenRouter](https://openrouter.ai)

**Limits:**

[20 requests/minute
50 requests/day
Up to 1000 requests/day with $10 lifetime topup](https://openrouter.ai/docs/api-reference/limits)

Models share a common quota.

- [DeepCoder 14B Preview](https://openrouter.ai/agentica-org/deepcoder-14b-preview:free)
- [DeepSeek R1](https://openrouter.ai/deepseek/deepseek-r1:free)
- [DeepSeek R1 Distill Llama 70B](https://openrouter.ai/deepseek/deepseek-r1-distill-llama-70b:free)
- [DeepSeek V3 0324](https://openrouter.ai/deepseek/deepseek-chat-v3-0324:free)
- [Gemma 3 12B Instruct](https://openrouter.ai/google/gemma-3-12b-it:free)
- [Gemma 3 27B Instruct](https://openrouter.ai/google/gemma-3-27b-it:free)
- [Gemma 3 4B Instruct](https://openrouter.ai/google/gemma-3-4b-it:free)
- [Hermes 3 Llama 3.1 405B](https://openrouter.ai/nousresearch/hermes-3-llama-3.1-405b:free)
- [Llama 3.2 3B Instruct](https://openrouter.ai/meta-llama/llama-3.2-3b-instruct:free)
- [Llama 3.3 70B Instruct](https://openrouter.ai/meta-llama/llama-3.3-70b-instruct:free)
- [Llama 4 Maverick](https://openrouter.ai/meta-llama/llama-4-maverick:free)
- [Llama 4 Scout](https://openrouter.ai/meta-llama/llama-4-scout:free)
- [Mistral 7B Instruct](https://openrouter.ai/mistralai/mistral-7b-instruct:free)
- [Mistral Nemo](https://openrouter.ai/mistralai/mistral-nemo:free)
- [Mistral Small 24B Instruct 2501](https://openrouter.ai/mistralai/mistral-small-24b-instruct-2501:free)
- [Mistral Small 3.1 24B Instruct](https://openrouter.ai/mistralai/mistral-small-3.1-24b-instruct:free)
- [QwQ 32B ArliAI RpR v1](https://openrouter.ai/arliai/qwq-32b-arliai-rpr-v1:free)
- [Qwen 2.5 72B Instruct](https://openrouter.ai/qwen/qwen-2.5-72b-instruct:free)
- [Qwen 2.5 VL 32B Instruct](https://openrouter.ai/qwen/qwen2.5-vl-32b-instruct:free)
- [Qwen2.5 Coder 32B Instruct](https://openrouter.ai/qwen/qwen-2.5-coder-32b-instruct:free)
- [alibaba/tongyi-deepresearch-30b-a3b:free](https://openrouter.ai/alibaba/tongyi-deepresearch-30b-a3b:free)
- [cognitivecomputations/dolphin-mistral-24b-venice-edition:free](https://openrouter.ai/cognitivecomputations/dolphin-mistral-24b-venice-edition:free)
- [deepseek/deepseek-chat-v3.1:free](https://openrouter.ai/deepseek/deepseek-chat-v3.1:free)
- [deepseek/deepseek-r1-0528-qwen3-8b:free](https://openrouter.ai/deepseek/deepseek-r1-0528-qwen3-8b:free)
- [deepseek/deepseek-r1-0528:free](https://openrouter.ai/deepseek/deepseek-r1-0528:free)
- [google/gemma-3n-e2b-it:free](https://openrouter.ai/google/gemma-3n-e2b-it:free)
- [google/gemma-3n-e4b-it:free](https://openrouter.ai/google/gemma-3n-e4b-it:free)
- [meituan/longcat-flash-chat:free](https://openrouter.ai/meituan/longcat-flash-chat:free)
- [meta-llama/llama-3.3-8b-instruct:free](https://openrouter.ai/meta-llama/llama-3.3-8b-instruct:free)
- [microsoft/mai-ds-r1:free](https://openrouter.ai/microsoft/mai-ds-r1:free)
- [minimax/minimax-m2:free](https://openrouter.ai/minimax/minimax-m2:free)
- [mistralai/mistral-small-3.2-24b-instruct:free](https://openrouter.ai/mistralai/mistral-small-3.2-24b-instruct:free)
- [moonshotai/kimi-k2:free](https://openrouter.ai/moonshotai/kimi-k2:free)
- [nvidia/nemotron-nano-12b-v2-vl:free](https://openrouter.ai/nvidia/nemotron-nano-12b-v2-vl:free)
- [nvidia/nemotron-nano-9b-v2:free](https://openrouter.ai/nvidia/nemotron-nano-9b-v2:free)
- [openai/gpt-oss-20b:free](https://openrouter.ai/openai/gpt-oss-20b:free)
- [qwen/qwen3-14b:free](https://openrouter.ai/qwen/qwen3-14b:free)
- [qwen/qwen3-235b-a22b:free](https://openrouter.ai/qwen/qwen3-235b-a22b:free)
- [qwen/qwen3-30b-a3b:free](https://openrouter.ai/qwen/qwen3-30b-a3b:free)
- [qwen/qwen3-4b:free](https://openrouter.ai/qwen/qwen3-4b:free)
- [qwen/qwen3-coder:free](https://openrouter.ai/qwen/qwen3-coder:free)
- [tngtech/deepseek-r1t-chimera:free](https://openrouter.ai/tngtech/deepseek-r1t-chimera:free)
- [tngtech/deepseek-r1t2-chimera:free](https://openrouter.ai/tngtech/deepseek-r1t2-chimera:free)
- [z-ai/glm-4.5-air:free](https://openrouter.ai/z-ai/glm-4.5-air:free)

### [Google AI Studio](https://aistudio.google.com)

Data is used for training when used outside of the UK/CH/EEA/EU.

Model Name Model Limits

Gemini 2.5 Pro 3,000,000 tokens/day
125,000 tokens/minute
50 requests/day
2 requests/minute
Gemini 2.5 Flash 250,000 tokens/minute
250 requests/day
10 requests/minute
Gemini 2.5 Flash-Lite 250,000 tokens/minute
1,000 requests/day
15 requests/minute
Gemini 2.0 Flash 1,000,000 tokens/minute
200 requests/day
15 requests/minute
Gemini 2.0 Flash-Lite 1,000,000 tokens/minute
200 requests/day
30 requests/minute
Gemini 2.0 Flash (Experimental) 250,000 tokens/minute
50 requests/day
10 requests/minute
LearnLM 2.0 Flash (Experimental) 1,500 requests/day
15 requests/minute
Gemma 3 27B Instruct 15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 12B Instruct 15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 4B Instruct 15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 1B Instruct 15,000 tokens/minute
14,400 requests/day
30 requests/minute


### [NVIDIA NIM](https://build.nvidia.com/explore/discover)

Phone number verification required.
Models tend to be context window limited.

**Limits:** 40 requests/minute

- [Various open models](https://build.nvidia.com/models)

### [Mistral (La Plateforme)](https://console.mistral.ai/)

* Free tier (Experiment plan) requires opting into data training
* Requires phone number verification.

**Limits (per-model):** 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month

- [Open and Proprietary Mistral models](https://docs.mistral.ai/getting-started/models/models_overview/)

### [Mistral (Codestral)](https://codestral.mistral.ai/)

* Currently free to use
* Monthly subscription based
* Requires phone number verification

**Limits:** 30 requests/minute, 2,000 requests/day

- Codestral

### [HuggingFace Inference Providers](https://huggingface.co/docs/inference-providers/en/index)

HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB.

**Limits:** [$0.10/month in credits](https://huggingface.co/docs/inference-providers/en/pricing)

- Various open models across supported providers

### [Vercel AI Gateway](https://vercel.com/docs/ai-gateway)

Routes to various supported providers.

**Limits:** [$5/month](https://vercel.com/docs/ai-gateway/pricing)


### [Cerebras](https://cloud.cerebras.ai/)

Model Name Model Limits

gpt-oss-120b 30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Qwen 3 235B A22B Instruct 30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Qwen 3 235B A22B Thinking 30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Qwen 3 Coder 480B 10 requests/minute
150,000 tokens/minute
100 requests/hour
1,000,000 tokens/hour
100 requests/day
1,000,000 tokens/day
Llama 3.3 70B 30 requests/minute
64,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Qwen 3 32B 30 requests/minute
64,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 3.1 8B 30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 4 Scout 30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 4 Maverick 30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day


### [Groq](https://console.groq.com)

Model Name Model Limits

Allam 2 7B 7,000 requests/day
6,000 tokens/minute
Llama 3.1 8B 14,400 requests/day
6,000 tokens/minute
Llama 3.3 70B 1,000 requests/day
12,000 tokens/minute
Llama 4 Maverick 17B 128E Instruct 1,000 requests/day
6,000 tokens/minute
Llama 4 Scout Instruct 1,000 requests/day
30,000 tokens/minute
Whisper Large v3 7,200 audio-seconds/minute
2,000 requests/day
Whisper Large v3 Turbo 7,200 audio-seconds/minute
2,000 requests/day
groq/compound 250 requests/day
70,000 tokens/minute
groq/compound-mini 250 requests/day
70,000 tokens/minute
meta-llama/llama-guard-4-12b 14,400 requests/day
15,000 tokens/minute
meta-llama/llama-prompt-guard-2-22m 
meta-llama/llama-prompt-guard-2-86m 
moonshotai/kimi-k2-instruct 1,000 requests/day
10,000 tokens/minute
moonshotai/kimi-k2-instruct-0905 1,000 requests/day
10,000 tokens/minute
openai/gpt-oss-120b 1,000 requests/day
8,000 tokens/minute
openai/gpt-oss-20b 1,000 requests/day
8,000 tokens/minute
openai/gpt-oss-safeguard-20b 1,000 requests/day
8,000 tokens/minute
qwen/qwen3-32b 1,000 requests/day
6,000 tokens/minute


### [Cohere](https://cohere.com)

**Limits:**

[20 requests/minute
1,000 requests/month](https://docs.cohere.com/docs/rate-limits)

Models share a common quota.

- c4ai-aya-expanse-32b
- c4ai-aya-expanse-8b
- c4ai-aya-vision-32b
- c4ai-aya-vision-8b
- command-a-03-2025
- command-a-reasoning-08-2025
- command-a-translate-08-2025
- command-a-vision-07-2025
- command-r-08-2024
- command-r-plus-08-2024
- command-r7b-12-2024
- command-r7b-arabic-02-2025

### [GitHub Models](https://github.com/marketplace/models)

Extremely restrictive input/output token limits.

**Limits:** [Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)](https://docs.github.com/en/github-models/prototyping-with-ai-models#rate-limits)

- AI21 Jamba 1.5 Large
- AI21 Jamba 1.5 Mini
- Codestral 25.01
- Cohere Command A
- Cohere Command R 08-2024
- Cohere Command R+ 08-2024
- Cohere Embed v3 English
- Cohere Embed v3 Multilingual
- DeepSeek-R1
- DeepSeek-R1-0528
- DeepSeek-V3-0324
- Grok 3
- Grok 3 Mini
- Llama 4 Maverick 17B 128E Instruct FP8
- Llama 4 Scout 17B 16E Instruct
- Llama-3.2-11B-Vision-Instruct
- Llama-3.2-90B-Vision-Instruct
- Llama-3.3-70B-Instruct
- MAI-DS-R1
- Meta-Llama-3.1-405B-Instruct
- Meta-Llama-3.1-8B-Instruct
- Ministral 3B
- Mistral Medium 3 (25.05)
- Mistral Small 3.1
- OpenAI GPT-4.1
- OpenAI GPT-4.1-mini
- OpenAI GPT-4.1-nano
- OpenAI GPT-4o
- OpenAI GPT-4o mini
- OpenAI Text Embedding 3 (large)
- OpenAI Text Embedding 3 (small)
- OpenAI gpt-5
- OpenAI gpt-5-chat (preview)
- OpenAI gpt-5-mini
- OpenAI gpt-5-nano
- OpenAI o1
- OpenAI o1-mini
- OpenAI o1-preview
- OpenAI o3
- OpenAI o3-mini
- OpenAI o4-mini
- Phi-4
- Phi-4-mini-instruct
- Phi-4-mini-reasoning
- Phi-4-multimodal-instruct
- Phi-4-reasoning

### [Cloudflare Workers AI](https://developers.cloudflare.com/workers-ai)

**Limits:** [10,000 neurons/day](https://developers.cloudflare.com/workers-ai/platform/pricing/#free-allocation)

- @cf/aisingapore/gemma-sea-lion-v4-27b-it
- @cf/ibm-granite/granite-4.0-h-micro
- @cf/openai/gpt-oss-120b
- @cf/openai/gpt-oss-20b
- DeepSeek R1 Distill Qwen 32B
- Deepseek Coder 6.7B Base (AWQ)
- Deepseek Coder 6.7B Instruct (AWQ)
- Deepseek Math 7B Instruct
- Discolm German 7B v1 (AWQ)
- Falcom 7B Instruct
- Gemma 2B Instruct (LoRA)
- Gemma 3 12B Instruct
- Gemma 7B Instruct
- Gemma 7B Instruct (LoRA)
- Hermes 2 Pro Mistral 7B
- Llama 2 13B Chat (AWQ)
- Llama 2 7B Chat (FP16)
- Llama 2 7B Chat (INT8)
- Llama 2 7B Chat (LoRA)
- Llama 3 8B Instruct
- Llama 3 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (AWQ)
- Llama 3.1 8B Instruct (FP8)
- Llama 3.2 11B Vision Instruct
- Llama 3.2 1B Instruct
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct (FP8)
- Llama 4 Scout Instruct
- Llama Guard 3 8B
- LlamaGuard 7B (AWQ)
- Mistral 7B Instruct v0.1
- Mistral 7B Instruct v0.1 (AWQ)
- Mistral 7B Instruct v0.2
- Mistral 7B Instruct v0.2 (LoRA)
- Mistral Small 3.1 24B Instruct
- Neural Chat 7B v3.1 (AWQ)
- OpenChat 3.5 0106
- OpenHermes 2.5 Mistral 7B (AWQ)
- Phi-2
- Qwen 1.5 0.5B Chat
- Qwen 1.5 1.8B Chat
- Qwen 1.5 14B Chat (AWQ)
- Qwen 1.5 7B Chat (AWQ)
- Qwen 2.5 Coder 32B Instruct
- Qwen QwQ 32B
- SQLCoder 7B 2
- Starling LM 7B Beta
- TinyLlama 1.1B Chat v1.0
- Una Cybertron 7B v2 (BF16)
- Zephyr 7B Beta (AWQ)

### [Google Cloud Vertex AI](https://console.cloud.google.com/vertex-ai/model-garden)

Very stringent payment verification for Google Cloud.

Model Name Model Limits

Llama 3.2 90B Vision Instruct 30 requests/minute
Free during preview
Llama 3.1 70B Instruct 60 requests/minute
Free during preview
Llama 3.1 8B Instruct 60 requests/minute
Free during preview




## Providers with trial credits

### [Fireworks](https://fireworks.ai/)

**Credits:** $1

**Models:** [Various open models](https://fireworks.ai/models)

### [Baseten](https://app.baseten.co/)

**Credits:** $30

**Models:** [Any supported model - pay by compute time](https://www.baseten.co/library/)

### [Nebius](https://studio.nebius.com/)

**Credits:** $1

**Models:** [Various open models](https://studio.nebius.ai/models)

### [Novita](https://novita.ai/?ref=ytblmjc&utm_source=affiliate)

**Credits:** $0.5 for 1 year

**Models:** [Various open models](https://novita.ai/models)

### [AI21](https://studio.ai21.com/)

**Credits:** $10 for 3 months

**Models:** Jamba family of models

### [Upstage](https://console.upstage.ai/)

**Credits:** $10 for 3 months

**Models:** Solar Pro/Mini

### [NLP Cloud](https://nlpcloud.com/home)

**Credits:** $15

**Requirements:** Phone number verification

**Models:** Various open models

### [Alibaba Cloud (International) Model Studio](https://bailian.console.alibabacloud.com/)

**Credits:** 1 million tokens/model

**Models:** [Various open and proprietary Qwen models](https://www.alibabacloud.com/en/product/modelstudio)

### [Modal](https://modal.com)

**Credits:** $5/month upon sign up, $30/month with payment method added

**Models:** Any supported model - pay by compute time

### [Inference.net](https://inference.net)

**Credits:** $1, $25 on responding to email survey

**Models:** Various open models

### [Hyperbolic](https://app.hyperbolic.xyz/)

**Credits:** $1

**Models:**
- DeepSeek V3
- DeepSeek V3 0324
- Hermes 3 Llama 3.1 70B
- Llama 3 70B Instruct
- Llama 3.1 405B Base
- Llama 3.1 405B Instruct
- Llama 3.1 70B Instruct
- Llama 3.1 8B Instruct
- Llama 3.2 3B Instruct
- Llama 3.3 70B Instruct
- Pixtral 12B (2409)
- Qwen QwQ 32B
- Qwen2.5 72B Instruct
- Qwen2.5 Coder 32B Instruct
- Qwen2.5 VL 72B Instruct
- Qwen2.5 VL 7B Instruct
- deepseek-ai/deepseek-r1-0528
- openai/gpt-oss-120b
- openai/gpt-oss-120b-turbo
- openai/gpt-oss-20b
- qwen/qwen3-235b-a22b
- qwen/qwen3-235b-a22b-instruct-2507
- qwen/qwen3-coder-480b-a35b-instruct
- qwen/qwen3-next-80b-a3b-instruct
- qwen/qwen3-next-80b-a3b-thinking

### [SambaNova Cloud](https://cloud.sambanova.ai/)

**Credits:** $5 for 3 months

**Models:**
- E5-Mistral-7B-Instruct
- Llama 3.1 8B
- Llama 3.3 70B
- Llama 3.3 70B
- Llama-4-Maverick-17B-128E-Instruct
- Qwen/Qwen3-32B
- Whisper-Large-v3
- deepseek-ai/DeepSeek-R1-0528
- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
- deepseek-ai/DeepSeek-V3-0324
- deepseek-ai/DeepSeek-V3.1
- deepseek-ai/DeepSeek-V3.1-Terminus
- openai/gpt-oss-120b
- tbd

### [Scaleway Generative APIs](https://console.scaleway.com/generative-api/models)

**Credits:** 1,000,000 free tokens

**Models:**
- BGE-Multilingual-Gemma2
- DeepSeek R1 Distill Llama 70B
- Gemma 3 27B Instruct
- Llama 3.1 8B Instruct
- Llama 3.3 70B Instruct
- Mistral Nemo 2407
- Mistral Small 3.1 24B Instruct 2503
- Pixtral 12B (2409)
- Qwen2.5 Coder 32B Instruct
- Whisper Large v3
- devstral-small-2505
- gpt-oss-120b
- mistral-small-3.2-24b-instruct-2506
- qwen3-235b-a22b-instruct-2507
- qwen3-coder-30b-a3b-instruct
- voxtral-small-24b-2507

Model Name	Model Limits
Gemini 2.5 Pro	3,000,000 tokens/day 125,000 tokens/minute 50 requests/day 2 requests/minute
Gemini 2.5 Flash	250,000 tokens/minute 250 requests/day 10 requests/minute
Gemini 2.5 Flash-Lite	250,000 tokens/minute 1,000 requests/day 15 requests/minute
Gemini 2.0 Flash	1,000,000 tokens/minute 200 requests/day 15 requests/minute
Gemini 2.0 Flash-Lite	1,000,000 tokens/minute 200 requests/day 30 requests/minute
Gemini 2.0 Flash (Experimental)	250,000 tokens/minute 50 requests/day 10 requests/minute
LearnLM 2.0 Flash (Experimental)	1,500 requests/day 15 requests/minute
Gemma 3 27B Instruct	15,000 tokens/minute 14,400 requests/day 30 requests/minute
Gemma 3 12B Instruct	15,000 tokens/minute 14,400 requests/day 30 requests/minute
Gemma 3 4B Instruct	15,000 tokens/minute 14,400 requests/day 30 requests/minute
Gemma 3 1B Instruct	15,000 tokens/minute 14,400 requests/day 30 requests/minute

Model Name	Model Limits
gpt-oss-120b	30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day
Qwen 3 235B A22B Instruct	30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day
Qwen 3 235B A22B Thinking	30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day
Qwen 3 Coder 480B	10 requests/minute 150,000 tokens/minute 100 requests/hour 1,000,000 tokens/hour 100 requests/day 1,000,000 tokens/day
Llama 3.3 70B	30 requests/minute 64,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day
Qwen 3 32B	30 requests/minute 64,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day
Llama 3.1 8B	30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day
Llama 4 Scout	30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day
Llama 4 Maverick	30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day

Model Name	Model Limits
Allam 2 7B	7,000 requests/day 6,000 tokens/minute
Llama 3.1 8B	14,400 requests/day 6,000 tokens/minute
Llama 3.3 70B	1,000 requests/day 12,000 tokens/minute
Llama 4 Maverick 17B 128E Instruct	1,000 requests/day 6,000 tokens/minute
Llama 4 Scout Instruct	1,000 requests/day 30,000 tokens/minute
Whisper Large v3	7,200 audio-seconds/minute 2,000 requests/day
Whisper Large v3 Turbo	7,200 audio-seconds/minute 2,000 requests/day
groq/compound	250 requests/day 70,000 tokens/minute
groq/compound-mini	250 requests/day 70,000 tokens/minute
meta-llama/llama-guard-4-12b	14,400 requests/day 15,000 tokens/minute
meta-llama/llama-prompt-guard-2-22m
meta-llama/llama-prompt-guard-2-86m
moonshotai/kimi-k2-instruct	1,000 requests/day 10,000 tokens/minute
moonshotai/kimi-k2-instruct-0905	1,000 requests/day 10,000 tokens/minute
openai/gpt-oss-120b	1,000 requests/day 8,000 tokens/minute
openai/gpt-oss-20b	1,000 requests/day 8,000 tokens/minute
openai/gpt-oss-safeguard-20b	1,000 requests/day 8,000 tokens/minute
qwen/qwen3-32b	1,000 requests/day 6,000 tokens/minute

Model Name	Model Limits
Llama 3.2 90B Vision Instruct	30 requests/minute Free during preview
Llama 3.1 70B Instruct	60 requests/minute Free during preview
Llama 3.1 8B Instruct	60 requests/minute Free during preview

google-deepmind/mujoco_menagerie

Sun, 09 Nov 2025 00:04:22 GMT

google-deepmind/mujoco_menagerie

A collection of high-quality models for the MuJoCo physics engine, curated by Google DeepMind.

Language: Python

Stars: 2,653

Forks: 376

Stars today: 4 stars today

README


  



  
    
  
  
    
  
  
    
  


**Menagerie** is a collection of high-quality models for the
[MuJoCo](https://github.com/google-deepmind/mujoco) physics engine, curated by
Google DeepMind.

A physics simulator is only as good as the model it is simulating, and in a
powerful simulator like MuJoCo with many modeling options, it is easy to create
"bad" models which do not behave as expected. The goal of this collection is to
provide the community with a curated library of well-designed models that work
well right out of the gate.

### Gallery

||||||
| :---: | :---: | :---: | :---: | :---: |
||||||
||||||
||||||
||||||
||||||
||||||
||||||

- [Getting Started](#getting-started)
  - [Prerequisites](#prerequisites)
  - [Overview](#overview)
  - [Usage](#usage)
    - [Via `robot-descriptions`](#via-robot-descriptions)
    - [Via `git clone`](#via-git-clone)
- [Model Quality and Contributing](#model-quality-and-contributing)
- [Menagerie Models](#menagerie-models)
- [Citing Menagerie](#citing-menagerie)
- [Acknowledgments](#acknowledgments)
- [Changelog](#changelog)
- [License and Disclaimer](#license-and-disclaimer)

## Getting Started

### Prerequisites

The minimum required MuJoCo version for each model is specified in its
respective README. You can download prebuilt binaries for MuJoCo from the GitHub
[releases page](https://github.com/google-deepmind/mujoco/releases/), or if you
are working with Python, you can install the native bindings from
[PyPI](https://pypi.org/project/mujoco/) via `pip install mujoco`. For
alternative installation instructions, see
[here](https://github.com/google-deepmind/mujoco#installation).

### Overview

The structure of Menagerie is illustrated below. For brevity, we have only
included one model directory since all others follow the exact same pattern.

```bash
├── unitree_go2
│   ├── assets
│   │   ├── base_0.obj
│   │   ├── ...
│   ├── go2.png
│   ├── go2.xml
│   ├── LICENSE
│   ├── README.md
│   └── scene.xml
│   └── go2_mjx.xml
│   └── scene_mjx.xml
```

- `assets`: stores the 3D meshes (.stl or .obj) of the model used for visual and
  collision purposes
- `LICENSE`: describes the copyright and licensing terms of the model
- `README.md`: contains detailed steps describing how the model's MJCF XML file
  was generated
- `.xml`: contains the MJCF definition of the model
- `scene.xml`: includes `.xml` with a plane, a light source and
  potentially other objects
- `.png`: a PNG image of `scene.xml`
- `_mjx.xml`: contains an MJX-compatible version of the model. Not all
  models have an MJX variant (see [Menagerie Models](#menagerie-models) for more
  information).
- `scene_mjx.xml`: same as `scene.xml` but loads the MJX variant

Note that `.xml` solely describes the model, i.e., no other entity is
defined in the kinematic tree. We leave additional body definitions for the
`scene.xml` file, as can be seen in the Shadow Hand
[`scene.xml`](shadow_hand/scene_right.xml).

### Usage

#### Via `robot-descriptions`

You can use the opensource
[`robot_descriptions`](https://github.com/robot-descriptions/robot_descriptions.py)
package to load any model in Menagerie. It is available on PyPI and can be
installed via `pip install robot_descriptions`.

Once installed, you can load a model of your choice as follows:

```python
import mujoco

# Loading a specific model description as an imported module.
from robot_descriptions import panda_mj_description
model = mujoco.MjModel.from_xml_path(panda_mj_description.MJCF_PATH)

# Directly loading an instance of MjModel.
from robot_descriptions.loaders.mujoco import load_robot_description
model = load_robot_description("panda_mj_description")

# Loading a variant of the model, e.g. panda without a gripper.
model = load_robot_description("panda_mj_description", variant="panda_nohand")
```

#### Via `git clone`

You can also directly clone this repository in the directory of your choice:

```bash
git clone https://github.com/google-deepmind/mujoco_menagerie.git
```

You can then interactively explore the model using the Python viewer:

```bash
python -m mujoco.viewer --mjcf mujoco_menagerie/unitree_go2/scene.xml
```

If you have further questions, please check out our [FAQ](FAQ.md).

## Model Quality and Contributing

Our goal is to eventually make all Menagerie models as faithful as possible to
the real system they are being modeled after. Improving model quality is an
ongoing effort, and the current state of many models is not necessarily
as good as it could be.

However, by releasing Menagerie in its current state, we hope to consolidate
and increase visibility for community contributions. To help Menagerie users
set proper expectations around the quality of each model, we introduce the
following grading system:

| Grade | Description                                                 |
|-------|-------------------------------------------------------------|
| A+    | Values are the product of proper system identification      |
| A     | Values are realistic, but have not been properly identified |
| B     | Stable, but some values are unrealistic                     |
| C     | Conditionally stable, can be significantly improved         |

The grading system will be applied to each model once a proper system
identification toolbox is created. We are currently planning to release
this toolbox later this year.

For more information regarding contributions, for example to add a new model to
Menagerie, see [CONTRIBUTING](CONTRIBUTING.md).

## Menagerie Models

**Arms.**

| Name | Maker | DoFs    | License | MJX |
|------|-------|---------|---------|-----|
| ARX L5 | ARX Robotics | 7 | [BSD-3-Clause](arx_l5/LICENSE) |✖️|
| PiPER | AgileX | 7 | [MIT](agilex_piper/LICENSE) |✖️|
| FR3 | Franka Robotics | 7 | [Apache-2.0](franka_fr3/LICENSE) |✖️|
| iiwa14 | KUKA | 7 | [BSD-3-Clause](kuka_iiwa_14/LICENSE) |✖️|
| Lite6 | UFACTORY | 6 | [BSD-3-Clause](ufactory_lite6/LICENSE) |✖️|
| Panda | Franka Robotics | 7 | [BSD-3-Clause](franka_emika_panda/LICENSE) |✔️|
| Sawyer | Rethink Robotics | 7 | [Apache-2.0](rethink_robotics_sawyer/LICENSE) |✖️|
| Unitree Z1 | Unitree Robotics | 6 | [BSD-3-Clause](unitree_z1/LICENSE) |✖️|
| UR5e | Universal Robots | 6 | [BSD-3-Clause](universal_robots_ur5e/LICENSE) |✖️|
| UR10e | Universal Robots | 6 | [BSD-3-Clause](universal_robots_ur10e/LICENSE) |✖️|
| ViperX 300 | Trossen Robotics | 8 | [BSD-3-Clause](trossen_vx300s/LICENSE) |✖️|
| WidowX 250 | Trossen Robotics | 8 | [BSD-3-Clause](trossen_wx250s/LICENSE) |✖️|
| xarm7 | UFACTORY | 7 | [BSD-3-Clause](ufactory_xarm7/LICENSE) |✖️|
| Gen3 | Kinova Robotics | 7 | [BSD-3-Clause](kinova_gen3/LICENSE) |✖️|
| SO-ARM100 | The Robot Studio | 5 | [Apache-2.0](trs_so_arm100/LICENSE) |✖️|
| Koch v1.1 Low-Cost Robot | Hugging Face | 5 | [Apache-2.0](low_cost_robot_arm/LICENSE) |✖️|
| YAM | I2RT Robotics | 7 | [MIT](i2rt_yam/LICENSE) |✖️|

**Bipeds.**

| Name | Maker | DoFs    | License | MJX |
|------|-------|---------|---------|-----|
| Cassie | Agility Robotics | 28 | [BSD-3-Clause](agility_cassie/LICENSE) |✖️|

**Dual Arms.**

| Name | Maker | DoFs    | License | MJX |
|------|-------|---------|---------|-----|
| ALOHA 2 | Trossen Robotics, Google DeepMind | 16 | [BSD-3-Clause](aloha/LICENSE) |✔️|

**Drones.**

| Name | Maker | DoFs    | License | MJX |
|------|-------|---------|---------|-----|
| Crazyflie 2 | Bitcraze | 0 | [MIT](bitcraze_crazyflie_2/LICENSE) |✖️|
| Skydio X2 | Skydio | 0 | [Apache-2.0](skydio_x2/LICENSE) |✖️|

**End-effectors.**

| Name | Maker | DoFs    | License | MJX |
|------|-------|---------|---------|-----|
| Allegro Hand V3 | Wonik Robotics | 16 | [BSD-2-Clause](wonik_allegro/LICENSE) |✖️|
| UMI Gripper | Stanford University | 1 | [MIT](umi_gripper/LICENSE) |✖️|
| LEAP Hand | Carnegie Mellon University | 16 | [MIT](leap_hand/LICENSE) |✖️|
| Robotiq 2F-85 | Robotiq | 8 | [BSD-2-Clause](robotiq_2f85/LICENSE) |✖️|
| Shadow Hand EM35 | Shadow Robot Company | 24 | [Apache-2.0](shadow_hand/LICENSE) |✖️|
| Shadow DEX-EE Hand | Shadow Robot Company | 12 | [Apache-2.0](shadow_dexee/LICENSE) |✖️|

**Mobile Manipulators.**

| Name | Maker | DoFs    | License | MJX |
|------|-------|---------|---------|-----|
| Google Robot | Google DeepMind | 9 | [Apache-2.0](google_robot/LICENSE) |✖️|
| Stanford TidyBot | Stanford University | 11 | [MIT](stanford_tidybot/LICENSE) |✖️|
| Stretch 2 | Hello Robot | 17 | [Clear BSD](hello_robot_stretch/LICENSE) |✖️|
| Stretch 3 | Hello Robot | 17 | [Apache-2.0](hello_robot_stretch_3/LICENSE) |✖️|
| PAL Tiago | PAL Robotics | 12 | [Apache-2.0](pal_tiago/LICENSE) |✖️|
| PAL Tiago Dual | PAL Robotics | 21 | [Apache-2.0](pal_tiago_dual/LICENSE) |✖️|

**Mobile Bases.**

| Name | Maker | DoFs    | License | MJX |
|------|-------|---------|---------|-----|
| Omniwheel Soccer Robot | Robot Soccer Kit | 4 | [MIT](robot_soccer_kit/LICENSE) |✖️|

**Humanoids.**

| Name | Maker | DoFs    | License | MJX |
|------|-------|---------|---------|-----|
| Adam Lite | PNDbotics | 25 | [MIT](pndbotics_adam_lite/LICENSE) |✖️|
| Apptronik Apollo | Apptronik | 32 | [Apache-2.0](apptronik_apollo/LICENSE) |✔️|
| Berkeley Humanoid | Hybrid Robotics | 12 | [BSD-3-Clause](berkeley_humanoid/LICENSE) |✖️|
| Booster T1 | Booster Robotics | 23 | [Apache-2.0](booster_t1/LICENSE) |✖️|
| Fourier N1 | Fourier Robotics | 30 | [Apache-2.0](fourier_n1/LICENSE) |✖️|
| Robotis OP3 | Robotis | 20 | [Apache-2.0](robotis_op3/LICENSE) |✖️|
| TALOS | PAL Robotics | 32 | [Apache-2.0](pal_talos/LICENSE) |✖️|
| Unitree G1 | Unitree Robotics | 37 | [BSD-3-Clause](unitree_g1/LICENSE) |✔️|
| Unitree H1 | Unitree Robotics | 19 | [BSD-3-Clause](unitree_h1/LICENSE) |✖️|

**Quadrupeds.**

| Name | Maker | DoFs    | License | MJX |
|------|-------|---------|---------|-----|
| ANYmal B | ANYbotics | 12 | [BSD-3-Clause](anybotics_anymal_b/LICENSE) |✖️|
| ANYmal C | ANYbotics | 12 | [BSD-3-Clause](anybotics_anymal_c/LICENSE) |✔️|
| Spot | Boston Dynamics | 12 | [BSD-3-Clause](boston_dynamics_spot/LICENSE) |✖️|
| Unitree A1 | Unitree Robotics | 12 | [BSD-3-Clause](unitree_a1/LICENSE) |✖️|
| Unitree Go1 | Unitree Robotics | 12 | [BSD-3-Clause](unitree_go1/LICENSE) |✖️|
| Unitree Go2 | Unitree Robotics | 12 | [BSD-3-Clause](unitree_go2/LICENSE) |✔️|
| Google Barkour v0 | Google DeepMind | 12 | [Apache-2.0](google_barkour_v0/LICENSE) |✔️|
| Google Barkour vB | Google DeepMind | 12 | [Apache-2.0](google_barkour_vb/LICENSE) |✔️|

**Biomechanical.**

| Name | Maker | DoFs    | License | MJX |
|------|-------|---------|---------|-----|
| IIT Softfoot | IIT Softbots | 92 | [BSD-3-Clause](iit_softfoot/LICENSE) |✖️|
| flybody | Google DeepMind, HHMI Janelia Research Campus | 102 | [Apache-2.0](flybody/LICENSE) |✖️|

**Miscellaneous.**

| Name | Maker | DoFs    | License | MJX |
|------|-------|---------|---------|-----|
| D435i | Intel Realsense | 0 | [Apache-2.0](realsense_d435i/LICENSE) |✖️|

## Citing Menagerie

If you use Menagerie in your work, please use the following citation:

```bibtex
@software{menagerie2022github,
  author = {Zakka, Kevin and Tassa, Yuval and {MuJoCo Menagerie Contributors}},
  title = {{MuJoCo Menagerie: A collection of high-quality simulation models for MuJoCo}},
  url = {http://github.com/google-deepmind/mujoco_menagerie},
  year = {2022},
}
```

## Acknowledgments

The models in this repository are based on third-party models designed by many talented people, and would not have been possible without their generous open-source contributions. We would like to acknowledge all the designers and engineers who made MuJoCo Menagerie possible.

We'd like to thank Pedro Vergani for his help with visuals and design.

The main effort required to make this repository publicly available was undertaken by [Kevin Zakka](https://kzakka.com/), with help from the Robotics Simulation team at Google DeepMind.

This project has also benefited from contributions by members of the broader community — see the [CONTRIBUTORS.md](./CONTRIBUTORS.md) for a full list.

## Changelog

For a summary of key updates across the repository, see the [global CHANGELOG.md](./CHANGELOG.md).

Each individual model also includes its own `CHANGELOG.md` file with model-specific updates, linked directly from the corresponding README.

## License and Disclaimer

XML and asset files in each individual model directory of this repository are
subject to different license terms. Please consult the `LICENSE` files under
each specific model subdirectory for the relevant license and copyright
information.

All other content is Copyright 2022 DeepMind Technologies Limited and licensed
under the Apache License, Version 2.0. A copy of this license is provided in the
top-level LICENSE file in this repository.
You can also obtain it from https://www.apache.org/licenses/LICENSE-2.0.

This is not an officially supported Google product.

TheAlgorithms/Python

Sun, 09 Nov 2025 00:04:21 GMT

TheAlgorithms/Python

All Algorithms implemented in Python

Language: Python

Stars: 212,797

Forks: 49,201

Stars today: 164 stars today

README



  
    
  
  The Algorithms - Python


  
  
    
  
  
    
  
  
  
    
  
  
    
  

  
  

  
    
  
  
    
  
  
    
  


  All algorithms implemented in Python - for education 📚


Implementations are for learning purposes only. They may be less efficient than the implementations in the Python standard library. Use them at your discretion.

## 🚀 Getting Started

📋 Read through our [Contribution Guidelines](CONTRIBUTING.md) before you contribute.

## 🌐 Community Channels

We are on [Discord](https://the-algorithms.com/discord) and [Gitter](https://gitter.im/TheAlgorithms/community)! Community channels are a great way for you to ask questions and get help. Please join us!

## 📜 List of Algorithms

See our [directory](DIRECTORY.md) for easier navigation and a better overview of the project.

agno-agi/agno

Sun, 09 Nov 2025 00:04:20 GMT

agno-agi/agno

Multi-agent framework, runtime and control plane. Built for speed, privacy, and scale.

Language: Python

Stars: 34,975

Forks: 4,589

Stars today: 22 stars today

README

Documentation • Examples • Website

## What is Agno? Agno is a multi-agent framework, runtime and control plane. Built for speed, privacy, and scale. It provides a rich set of tools for building: - **Agents** with memory, knowledge, session management, and advanced features like human-in-the-loop, guardrails, dynamic context management and best-in-class MCP support. - **Multi-Agent Teams** that operate autonomously under a team leader that maintains shared state and context. Perfect for use cases where the scope exceeds beyond a single agent. - **Step-based Workflows** for controlled, deterministic execution. Steps can be Agents, Teams, or regular python functions that run sequentially, in parallel, in loops, branches, or conditionally. Agno also provides a ready-to-use FastAPI app (called the AgentOS) for serving your agents, teams and workflows in production. Stateless, horizontally scalable and designed for scale, the AgentOS gives you major head start in building your AI product. ## Getting started If you're new to Agno, follow our [quickstart](https://docs.agno.com/introduction/quickstart) to build your first Agent and chat with it using the AgentOS UI. After that, checkout the [examples gallery](https://docs.agno.com/examples/introduction) and build real-world applications with Agno. ## Documentation, Community & More Examples - Docs: docs.agno.com - Cookbook: Cookbook - Community forum: community.agno.com - Discord: discord ## Example Here’s an example of an Agent that connects to an MCP server, manages conversation state in a database, is served using a FastAPI application that you can chat with using the [AgentOS UI](https://os.agno.com). ```python agno_agent.py from agno.agent import Agent from agno.db.sqlite import SqliteDb from agno.models.anthropic import Claude from agno.os import AgentOS from agno.tools.mcp import MCPTools # ************* Create Agent ************* agno_agent = Agent( name="Agno Agent", model=Claude(id="claude-sonnet-4-5"), # Add a database to the Agent db=SqliteDb(db_file="agno.db"), # Add the Agno MCP server to the Agent tools=[MCPTools(transport="streamable-http", url="https://docs.agno.com/mcp")], # Add the previous session history to the context add_history_to_context=True, markdown=True, ) # ************* Create AgentOS ************* agent_os = AgentOS(agents=[agno_agent]) # Get the FastAPI app for the AgentOS app = agent_os.get_app() # ************* Run AgentOS ************* if __name__ == "__main__": agent_os.serve(app="agno_agent:app", reload=True) ``` ## AgentOS - Production Runtime for Multi-Agent Systems Building Agents is easy, running them is hard, and that's where the AgentOS comes in. AgentOS is a high-performance runtime for serving multi-agent systems in production. Key features include: 1. **Pre-built FastAPI app**: AgentOS ships with a ready-to-use FastAPI app for orchestrating your agents, teams, and workflows. This gives you a major head start in building your AI product. 2. **Integrated Control Plane**: The [AgentOS UI](https://os.agno.com) connects directly to your runtime, letting you test, monitor, and manage your system in real time, giving you unmatched control over your system. 3. **Private by Design**: AgentOS runs entirely in your cloud, ensuring complete data privacy. No data ever leaves your system. This is ideal for security-conscious enterprises. Here's what the [AgentOS UI](https://os.agno.com) looks like in action: https://github.com/user-attachments/assets/feb23db8-15cc-4e88-be7c-01a21a03ebf6 ## The Complete Agentic Solution For companies building agents, Agno provides the complete agentic solution: - The fastest framework for building agents, multi-agent teams and agentic workflows. - A ready-to-use FastAPI app that gets you building AI products on day one. - A control plane for testing, monitoring and managing your system. Agno brings a novel architecture that no other framework provides, your AgentOS runs securely in your cloud, and the control plane connects directly to it from your browser. You don't need to send data to any external services or pay retention costs, you get complete privacy and control. ## Designed for Agent Engineering Agno is an incredibly feature-rich framework, designed for Agent Engineering. Here are some key features: | **Category** | **Feature** | **Description** | |---------------|-------------|-----------------| | **Core Intelligence** | **Model Agnostic** | Works with any model provider so you can use your favorite LLMs. | | | **Type Safe** | Enforce structured I/O through `input_schema` and `output_schema` for predictable, composable behavior. | | | **Dynamic Context Engineering** | Inject variables, state, and retrieved data on the fly into context. Perfect for dependency-driven agents. | | **Memory, Knowledge, and Persistence** | **Persistent Storage** | Give your Agents, Teams, and Workflows a database to persist session history, state, and messages. | | | **User Memory** | Built-in memory system that allows Agents to recall user-specific context across sessions. | | | **Agentic RAG** | Connect to 20+ vector stores (called **Knowledge** in Agno) with hybrid search + reranking out of the box. | | | **Culture (Collective Memory)** | Shared knowledge that compounds across agents and time. | | **Execution & Control** | **Human-in-the-Loop** | Native support for confirmations, manual overrides, and external tool execution. | | | **Guardrails** | Built-in safeguards for validation, security, and prompt protection. | | | **Agent Lifecycle Hooks** | Pre- and post-hooks to validate or transform inputs and outputs. | | | **MCP Integration** | First-class support for the Model Context Protocol (MCP) to connect Agents with external systems. | | | **Toolkits** | 100+ built-in toolkits with thousands of tools, ready for use across data, code, web, and enterprise APIs. | | **Runtime & Evaluation** | **Runtime** | Pre-built FastAPI based runtime with SSE compatible endpoints, ready for production on day 1. | | | **Control Plane (UI)** | Integrated interface to visualize, monitor, and debug agent activity in real time. | | | **Natively Multimodal** | Agents can process and generate text, images, audio, video, and files. | | | **Evals** | Measure your Agents' Accuracy, Performance, and Reliability. | | **Security & Privacy** | **Private by Design** | Runs entirely in your cloud. The UI connects directly to your AgentOS from your browser, no data is ever sent externally. | | | **Data Governance** | Your data lives securely in your Agent database, no external data sharing or vendor lock-in. | | | **Access Control** | Role-based access (RBAC) and per-agent permissions to protect sensitive contexts and tools. | Every part of Agno is built for real-world deployment — where developer experience meets production performance. ## Setup Your Coding Agent to Use Agno For LLMs and AI assistants to understand and navigate Agno's documentation, we provide an [llms.txt](https://docs.agno.com/llms.txt) or [llms-full.txt](https://docs.agno.com/llms-full.txt) file. This file is built for AI systems to efficiently parse and reference our documentation. ### IDE Integration When building Agno agents, using Agno documentation as a source in your IDE is a great way to speed up your development. Here's how to integrate with Cursor: 1. In Cursor, go to the "Cursor Settings" menu. 2. Find the "Indexing & Docs" section. 3. Add `https://docs.agno.com/llms-full.txt` to the list of documentation URLs. 4. Save the changes. Now, Cursor will have access to the Agno documentation. You can do the same with other IDEs like VSCode, Windsurf etc. ## Performance If you're building with Agno, you're guaranteed best-in-class performance by default. Our obsession with performance is necessary because even simple AI workflows can spawn hundreds of Agents and because many tasks are long-running -- stateless, horizontal scalability is key for success. At Agno, we optimize performance across 3 dimensions: 1. **Agent performance:** We optimize static operations (instantiation, memory footprint) and runtime operations (tool calls, memory updates, history management). 2. **System performance:** The AgentOS API is async by default and has a minimal memory footprint. The system is stateless and horizontally scalable, with a focus on preventing memory leaks. It handles parallel and batch embedding generation during knowledge ingestion, metrics collection in background tasks, and other system-level optimizations. 3. **Agent reliability and accuracy:** Monitored through evals, which we’ll explore later. ### Agent Performance Let's measure the time it takes to instantiate an Agent and the memory footprint of an Agent. Here are the numbers (last measured in Oct 2025, on an Apple M4 MacBook Pro): - **Agent instantiation:** ~3μs on average - **Memory footprint:** ~6.6Kib on average We'll show below that Agno Agents instantiate **529× faster than Langgraph**, **57× faster than PydanticAI**, and **70× faster than CrewAI**. Agno Agents also use **24× lower memory than Langgraph**, **4× lower than PydanticAI**, and **10× lower than CrewAI**. > [!NOTE] > Run time performance is bottlenecked by inference and hard to benchmark accurately, so we focus on minimizing overhead, reducing memory usage, and parallelizing tool calls. ### Instantiation Time Let's measure instantiation time for an Agent with 1 tool. We'll run the evaluation 1000 times to get a baseline measurement. We'll compare Agno to LangGraph, CrewAI and Pydantic AI. > [!NOTE] > The code for this benchmark is available [here](https://github.com/agno-agi/agno/tree/main/cookbook/evals/performance). You should run the evaluation yourself on your own machine, please, do not take these results at face value. ```shell # Setup virtual environment ./scripts/perf_setup.sh source .venvs/perfenv/bin/activate # Agno python cookbook/evals/performance/instantiate_agent_with_tool.py # LangGraph python cookbook/evals/performance/comparison/langgraph_instantiation.py # CrewAI python cookbook/evals/performance/comparison/crewai_instantiation.py # Pydantic AI python cookbook/evals/performance/comparison/pydantic_ai_instantiation.py ``` LangGraph is on the right, **let's start it first and give it a head start**. Then CrewAI and Pydantic AI follow, and finally Agno. Agno obviously finishes first, but let's see by how much. https://github.com/user-attachments/assets/54b98576-1859-4880-9f2d-15e1a426719d ### Memory Usage To measure memory usage, we use the `tracemalloc` library. We first calculate a baseline memory usage by running an empty function, then run the Agent 1000x times and calculate the difference. This gives a (reasonably) isolated measurement of the memory usage of the Agent. We recommend running the evaluation yourself on your own machine, and digging into the code to see how it works. If we've made a mistake, please let us know. ### Results Taking Agno as the baseline, we can see that: | Metric | Agno | Langgraph | PydanticAI | CrewAI | | ------------------ | ---- | ----------- | ---------- | ---------- | | **Time (seconds)** | 1× | 529× slower | 57× slower | 70× slower | | **Memory (MiB)** | 1× | 24× higher | 4× higher | 10× higher | Exact numbers from the benchmark: | Metric | Agno | Langgraph | PydanticAI | CrewAI | | ------------------ | -------- | --------- | ---------- | -------- | | **Time (seconds)** | 0.000003 | 0.001587 | 0.000170 | 0.000210 | | **Memory (MiB)** | 0.006642 | 0.161435 | 0.028712 | 0.065652 | > [!NOTE] > Agno agents are designed for performance and while we share benchmarks against other frameworks, we should be mindful that accuracy and reliability are more important than speed. ## Contributions We welcome contributions, read our [contributing guide](https://github.com/agno-agi/agno/blob/v2.0/CONTRIBUTING.md) to get started. ## Telemetry Agno logs which model an agent used so we can prioritize updates to the most popular providers. You can disable this by setting `AGNO_TELEMETRY=false` in your environment.

WhisperLiveKit





Real-time, Fully Local Speech-to-Text with Speaker Identification









Real-time transcription directly to your browser, with a ready-to-use backend+server and a simple frontend.

#### Powered by Leading Research:

- Simul-[Whisper](https://github.com/backspacetg/simul_whisper)/[Streaming](https://github.com/ufal/SimulStreaming) (SOTA 2025) - Ultra-low latency transcription using [AlignAtt policy](https://arxiv.org/pdf/2305.11408)
- [NLLW](https://github.com/QuentinFuxa/NoLanguageLeftWaiting) (2025), based on [distilled](https://huggingface.co/entai2965/nllb-200-distilled-600M-ctranslate2) [NLLB](https://arxiv.org/abs/2207.04672) (2022, 2024) - Simulatenous translation from & to 200 languages.
- [WhisperStreaming](https://github.com/ufal/whisper_streaming) (SOTA 2023) - Low latency transcription using [LocalAgreement policy](https://www.isca-archive.org/interspeech_2020/liu20s_interspeech.pdf)
- [Streaming Sortformer](https://arxiv.org/abs/2507.18446) (SOTA 2025) - Advanced real-time speaker diarization
- [Diart](https://github.com/juanmc2005/diart) (SOTA 2021) - Real-time speaker diarization
- [Silero VAD](https://github.com/snakers4/silero-vad) (2024) - Enterprise-grade Voice Activity Detection


> **Why not just run a simple Whisper model on every audio batch?** Whisper is designed for complete utterances, not real-time chunks. Processing small segments loses context, cuts off words mid-syllable, and produces poor transcription. WhisperLiveKit uses state-of-the-art simultaneous speech research for intelligent buffering and incremental processing.


### Architecture



*The backend supports multiple concurrent users. Voice Activity Detection reduces overhead when no voice is detected.*

### Installation & Quick Start

```bash
pip install whisperlivekit
```
> You can also clone the repo and `pip install -e .` for the latest version.

#### Quick Start
1. **Start the transcription server:**
   ```bash
   whisperlivekit-server --model base --language en
   ```

2. **Open your browser** and navigate to `http://localhost:8000`. Start speaking and watch your words appear in real-time!


> - See [tokenizer.py](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/simul_whisper/whisper/tokenizer.py) for the list of all available languages.
> - For HTTPS requirements, see the **Parameters** section for SSL configuration options.

#### Use it to capture audio from web pages.

Go to `chrome-extension` for instructions.







#### Optional Dependencies

| Optional | `pip install` |
|-----------|-------------|
| **Speaker diarization** | `git+https://github.com/NVIDIA/NeMo.git@main#egg=nemo_toolkit[asr]` |
| **Apple Silicon optimizations** | `mlx-whisper` |
| **Translation** | `nllw` |
| *[Not recommanded]*  Speaker diarization with Diart | `diart` |
| *[Not recommanded]*  Original Whisper backend | `whisper` |
| *[Not recommanded]*  Improved timestamps backend | `whisper-timestamped` |
| OpenAI API backend | `openai` |

See  **Parameters & Configuration** below on how to use them.



### Usage Examples

**Command-line Interface**: Start the transcription server with various options:

```bash
# Large model and translate from french to danish
whisperlivekit-server --model large-v3 --language fr --target-language da

# Diarization and server listening on */80 
whisperlivekit-server --host 0.0.0.0 --port 80 --model medium --diarization --language fr
```


**Python API Integration**: Check [basic_server](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/basic_server.py) for a more complete example of how to use the functions and classes.

```python
from whisperlivekit import TranscriptionEngine, AudioProcessor, parse_args
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from fastapi.responses import HTMLResponse
from contextlib import asynccontextmanager
import asyncio

transcription_engine = None

@asynccontextmanager
async def lifespan(app: FastAPI):
    global transcription_engine
    transcription_engine = TranscriptionEngine(model="medium", diarization=True, lan="en")
    yield

app = FastAPI(lifespan=lifespan)

async def handle_websocket_results(websocket: WebSocket, results_generator):
    async for response in results_generator:
        await websocket.send_json(response)
    await websocket.send_json({"type": "ready_to_stop"})

@app.websocket("/asr")
async def websocket_endpoint(websocket: WebSocket):
    global transcription_engine

    # Create a new AudioProcessor for each connection, passing the shared engine
    audio_processor = AudioProcessor(transcription_engine=transcription_engine)    
    results_generator = await audio_processor.create_tasks()
    results_task = asyncio.create_task(handle_websocket_results(websocket, results_generator))
    await websocket.accept()
    while True:
        message = await websocket.receive_bytes()
        await audio_processor.process_audio(message)        
```

**Frontend Implementation**: The package includes an HTML/JavaScript implementation [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/web/live_transcription.html). You can also import it using `from whisperlivekit import get_inline_ui_html` & `page = get_inline_ui_html()`


## Parameters & Configuration


| Parameter | Description | Default |
|-----------|-------------|---------|
| `--model` | Whisper model size. List and recommandations [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/docs/available_models.md) | `small` |
| `--model-path` | .pt file/directory containing whisper model. Overrides `--model`. Recommandations [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/docs/models_compatible_formats.md) | `None` |
| `--language` | List [here](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/whisperlivekit/simul_whisper/whisper/tokenizer.py). If you use `auto`, the model attempts to detect the language automatically, but it tends to bias towards English. | `auto` |
| `--target-language` | If sets, translate to using NLLB. Ex: `fr`. [200 languages available](https://github.com/QuentinFuxa/WhisperLiveKit/blob/main/docs/supported_languages.md). If you want to translate to english, you should rather use `--task translate`, since Whisper can do it directly. | `None` |
| `--task` | Set to `translate` to translate *only* to english, using Whisper translation. | `transcribe` |
| `--diarization` | Enable speaker identification | `False` |
| `--backend` | Processing backend. You can switch to `faster-whisper` if  `simulstreaming` does not work correctly | `simulstreaming` |
| `--no-vac` | Disable Voice Activity Controller | `False` |
| `--no-vad` | Disable Voice Activity Detection | `False` |
| `--warmup-file` | Audio file path for model warmup | `jfk.wav` |
| `--host` | Server host address | `localhost` |
| `--port` | Server port | `8000` |
| `--ssl-certfile` | Path to the SSL certificate file (for HTTPS support) | `None` |
| `--ssl-keyfile` | Path to the SSL private key file (for HTTPS support) | `None` |
| `--forwarded-allow-ips` | Ip or Ips allowed to reverse proxy the whisperlivekit-server. Supported types are  IP Addresses (e.g. 127.0.0.1), IP Networks (e.g. 10.100.0.0/16), or Literals (e.g. /path/to/socket.sock) | `None` |
| `--pcm-input` | raw PCM (s16le) data is expected as input and FFmpeg will be bypassed. Frontend will use AudioWorklet instead of MediaRecorder | `False` |

| Translation options | Description | Default |
|-----------|-------------|---------|
| `--nllb-backend` | `transformers` or `ctranslate2` | `ctranslate2` |
| `--nllb-size` | `600M` or `1.3B` | `600M` |

| Diarization options | Description | Default |
|-----------|-------------|---------|
| `--diarization-backend` |  `diart` or `sortformer` | `sortformer` |
| `--disable-punctuation-split` |  Disable punctuation based splits. See #214 | `False` |
| `--segmentation-model` | Hugging Face model ID for Diart segmentation model. [Available models](https://github.com/juanmc2005/diart/tree/main?tab=readme-ov-file#pre-trained-models) | `pyannote/segmentation-3.0` |
| `--embedding-model` | Hugging Face model ID for Diart embedding model. [Available models](https://github.com/juanmc2005/diart/tree/main?tab=readme-ov-file#pre-trained-models) | `speechbrain/spkrec-ecapa-voxceleb` |

| SimulStreaming backend options | Description | Default |
|-----------|-------------|---------|
| `--disable-fast-encoder` | Disable Faster Whisper or MLX Whisper backends for the encoder (if installed). Inference can be slower but helpful when GPU memory is limited | `False` |
| `--custom-alignment-heads` | Use your own alignment heads, useful when `--model-dir` is used | `None` |
| `--frame-threshold` | AlignAtt frame threshold (lower = faster, higher = more accurate) | `25` |
| `--beams` | Number of beams for beam search (1 = greedy decoding) | `1` |
| `--decoder` | Force decoder type (`beam` or `greedy`) | `auto` |
| `--audio-max-len` | Maximum audio buffer length (seconds) | `30.0` |
| `--audio-min-len` | Minimum audio length to process (seconds) | `0.0` |
| `--cif-ckpt-path` | Path to CIF model for word boundary detection | `None` |
| `--never-fire` | Never truncate incomplete words | `False` |
| `--init-prompt` | Initial prompt for the model | `None` |
| `--static-init-prompt` | Static prompt that doesn't scroll | `None` |
| `--max-context-tokens` | Maximum context tokens | `None` |
| `--preload-model-count` | Optional. Number of models to preload in memory to speed up loading (set up to the expected number of concurrent users) | `1` |



| WhisperStreaming backend options | Description | Default |
|-----------|-------------|---------|
| `--confidence-validation` | Use confidence scores for faster validation | `False` |
| `--buffer_trimming` | Buffer trimming strategy (`sentence` or `segment`) | `segment` |




> For diarization using Diart, you need to accept user conditions [here](https://huggingface.co/pyannote/segmentation) for the `pyannote/segmentation` model, [here](https://huggingface.co/pyannote/segmentation-3.0) for the `pyannote/segmentation-3.0` model and [here](https://huggingface.co/pyannote/embedding) for the `pyannote/embedding` model. **Then**, login to HuggingFace: `huggingface-cli login`

### 🚀 Deployment Guide

To deploy WhisperLiveKit in production:
 
1. **Server Setup**: Install production ASGI server & launch with multiple workers
   ```bash
   pip install uvicorn gunicorn
   gunicorn -k uvicorn.workers.UvicornWorker -w 4 your_app:app
   ```

2. **Frontend**: Host your customized version of the `html` example & ensure WebSocket connection points correctly

3. **Nginx Configuration** (recommended for production):
    ```nginx    
   server {
       listen 80;
       server_name your-domain.com;
        location / {
            proxy_pass http://localhost:8000;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;
    }}
    ```

4. **HTTPS Support**: For secure deployments, use "wss://" instead of "ws://" in WebSocket URL

## 🐋 Docker

Deploy the application easily using Docker with GPU or CPU support.

### Prerequisites
- Docker installed on your system
- For GPU support: NVIDIA Docker runtime installed

### Quick Start

**With GPU acceleration (recommended):**
```bash
docker build -t wlk .
docker run --gpus all -p 8000:8000 --name wlk wlk
```

**CPU only:**
```bash
docker build -f Dockerfile.cpu -t wlk .
docker run -p 8000:8000 --name wlk wlk
```

### Advanced Usage

**Custom configuration:**
```bash
# Example with custom model and language
docker run --gpus all -p 8000:8000 --name wlk wlk --model large-v3 --language fr
```

### Memory Requirements
- **Large models**: Ensure your Docker runtime has sufficient memory allocated


#### Customization

- `--build-arg` Options:
  - `EXTRAS="whisper-timestamped"` - Add extras to the image's installation (no spaces). Remember to set necessary container options!
  - `HF_PRECACHE_DIR="./.cache/"` - Pre-load a model cache for faster first-time start
  - `HF_TKN_FILE="./token"` - Add your Hugging Face Hub access token to download gated models

## 🔮 Use Cases
Capture discussions in real-time for meeting transcription, help hearing-impaired users follow conversations through accessibility tools, transcribe podcasts or videos automatically for content creation, transcribe support calls with speaker identification for customer service...