# Agent Runtime Intelligence Layer
[](https://pypi.org/project/cascadeflow/)
[](https://www.npmjs.com/package/@cascadeflow/core)
[](https://www.npmjs.com/package/@cascadeflow/langchain)
[](https://www.npmjs.com/package/@cascadeflow/vercel-ai)
[](https://www.npmjs.com/package/@cascadeflow/n8n-nodes-cascadeflow)
[](./LICENSE)
[](https://pepy.tech/project/cascadeflow)
[](https://www.npmjs.com/search?q=%40cascadeflow)
[](https://github.com/lemony-ai/cascadeflow/actions/workflows/test.yml)
[](https://docs.cascadeflow.ai)
[](https://docs.cascadeflow.ai/api-reference/python/overview)
[](https://docs.cascadeflow.ai/api-reference/typescript/overview)
[](https://x.com/saschabuehrle)
[](https://github.com/lemony-ai/cascadeflow/stargazers)
**Cost Savings:** 69% (MT-Bench), 93% (GSM8K), 52% (MMLU), 80% (TruthfulQA) savings, retaining 96% GPT-5 quality.
**[

Python](https://docs.cascadeflow.ai/api-reference/python/overview) • [

TypeScript](https://docs.cascadeflow.ai/api-reference/typescript/overview) • [

LangChain](https://docs.cascadeflow.ai/integrations/langchain) • [

OpenAI Agents](https://docs.cascadeflow.ai/integrations/openai-agents) • [

CrewAI](https://docs.cascadeflow.ai/integrations/crewai) • [

PydanticAI](https://docs.cascadeflow.ai/integrations/pydantic-ai) • [

Google ADK](https://docs.cascadeflow.ai/integrations/google-adk) • [

n8n](https://docs.cascadeflow.ai/integrations/n8n) • [

Vercel AI](https://docs.cascadeflow.ai/integrations/vercel-ai) • [

OpenClaw](https://docs.cascadeflow.ai/integrations/openclaw) • [Hermes Agent](https://docs.cascadeflow.ai/integrations/hermes-agent) • [📖 Docs](https://docs.cascadeflow.ai) • [💡 Examples](#examples)**
---
**The in-process intelligence layer for AI agents.** Optimize cost, latency, quality, budget, compliance, and energy — inside the execution loop, not at the HTTP boundary.
cascadeflow works where external proxies can't: per-step model decisions based on agent state, per-tool-call budget gating, runtime stop/continue/escalate actions, and business KPI injection during agent loops. It accumulates insight from every model call, tool result, and quality score — the agent gets smarter the more it runs. Sub-5ms overhead. Works with LangChain, OpenAI Agents SDK, CrewAI, PydanticAI, Google ADK, n8n, Vercel AI SDK, and Hermes Agent.
> **Update**
> ### Hermes Agent delegation cascading
>
> CascadeFlow now provides a Hermes Agent integration for per-skill model cascading, task-complexity cascading, topic-aware subagent cascading, observe-mode rollout, and auditable decisions without taking over provider credentials, base URLs, fallback chains, or API modes.
```bash
pip install cascadeflow
```
```bash
npm install @cascadeflow/core
```
---
## Why cascadeflow?
### Proxy vs In-Process Harness
| Dimension | External Proxy | cascadeflow Harness |
|---|---|---|
| **Scope** | HTTP request boundary | Inside agent execution loop |
| **Dimensions** | Cost only | Cost + quality + latency + budget + compliance + energy |
| **Latency overhead** | 10-50ms network RTT | <5ms in-process |
| **Business logic** | None | KPI weights and targets |
| **Enforcement** | None (observe only) | stop, deny_tool, switch_model |
| **Auditability** | Request logs | Per-step decision traces |
cascadeflow is a **library** and **agent harness** — an intelligent AI model cascading package that dynamically selects the optimal model for each query or tool call through speculative execution. It's based on the research that 40-70% of queries don't require slow, expensive flagship models, and domain-specific smaller models often outperform large general-purpose models on specialized tasks. For the remaining queries that need advanced reasoning, cascadeflow automatically escalates to flagship models if needed.