# Circuit Breaker

When an upstream provider (e.g. Stripe, twilio) suffers an extended outage, repeatedly firing requests will slow down your application, waste resources, and delay error responses. 

Meridian builds a **Circuit Breaker** directly into the `RequestPipeline` for every provider.

---

## How It Works

The circuit breaker has three states:
1. **Closed**: Normal state. All requests are sent to the provider. If requests fail, failure counts are tracked.
2. **Open**: If failures cross the threshold inside a rolling window, the circuit opens. All subsequent requests fail fast immediately, throwing a `MeridianError` with category `"provider"` and message `"Circuit breaker is open"`, avoiding network calls entirely.
3. **Half-Open**: After a cooldown timeout, the circuit enters half-open mode. A limited number of trial requests are sent. If they succeed, the circuit closes. If any fail, it re-opens.

```
       +-------------------------+
       |                         |
       v                         | (Outage Continues / Trial Fails)
  +----------+  Threshold Crossed |  +----------+
  |  CLOSED  |------------------->|   OPEN   |
  +----------+                    +----------+
       ^                               |
       |                               | (Cooldown Expired)
       |       +-------------+         |
       +-------|  HALF-OPEN  |<--------+
 (Trial Succeeds) +-------------+
```

---

## The two-trigger algorithm

`ProviderCircuitBreaker.shouldOpenCircuit()` has two independent triggers — either is sufficient to open the circuit:

**Trigger 1 — Consecutive failure count**

```typescript
if (this.failures >= this.config.failureThreshold) return true;
```

Opens the circuit as soon as `failureThreshold` consecutive failures accumulate, regardless of total request volume. Useful for catching hard outages fast.

**Trigger 2 — Rolling window error rate**

```typescript
const windowStart = Date.now() - this.config.rollingWindowMs;
const recentInWindow = this.recentResults.filter(r => r.timestamp >= windowStart);
if (recentInWindow.length < this.config.volumeThreshold) return false; // not enough data
const errorRate = failuresInWindow / recentInWindow.length * 100;
return errorRate >= this.config.errorThresholdPercentage;
```

Requires at least `volumeThreshold` requests in the window before the percentage check activates. This prevents a single failure on a cold provider from tripping the circuit.

A **success** in the `CLOSED` state resets the consecutive failure counter to zero, but does not remove past failures from the rolling window. A provider that alternates success/failure can still trip via the error-rate trigger.

## HALF_OPEN state machine

When the cooldown expires (`Date.now() >= nextAttempt`), the next call transitions to `HALF_OPEN`. In this state:

- Successes increment `this.successes`. Once `successThreshold` successes accumulate, the circuit closes and `failures` resets.
- Any failure immediately reopens the circuit and sets a new `nextAttempt`.

`OPEN` itself fails every caller fast and in sync — `state` and `nextAttempt` live on one `ProviderCircuitBreaker` instance per provider, so every caller hitting a tripped circuit sees the same `nextAttempt` and rejects immediately without a network call. But the transition out of `OPEN` is not currently gated: as soon as `nextAttempt` passes, **every** caller in flight at that moment flips the breaker to `HALF_OPEN` and sends its own trial request — there's no lock limiting it to a single probe. If several callers are queued up when a long-tripped provider's cooldown expires, they will all dispatch trial requests at once. See [Retry-After and Shared Cooldowns](#retry-after-and-shared-cooldowns) below.

## Position in the pipeline

The circuit breaker wraps the retry strategy, not individual attempts:

```
rateLimiter.acquire()
  └─ circuitBreaker.execute(
       └─ retryStrategy.execute(
            └─ fetch(builtRequest.url)
         )
     )
```

This architecture means:
- The circuit breaker tracks **logical requests**, not physical retry attempts. A request that fails and retries 3 times counts as 1 failure, not 3.
- Recovery is proportional to the configured `failureThreshold`: a breaker set to open after 5 failures will actually open after 5 logical failures, not 5 attempts per failure.
- The service-layer failover sees the final `MeridianError` after all retries have been exhausted. If the circuit is `OPEN`, the error reaches the service layer synchronously (< 1ms) without any network calls.

## Fail-fast savings

When the circuit is `OPEN`, `execute()` throws `CircuitOpenError` synchronously before calling `fetch()`. The benchmark shows this takes **< 1 ms** compared to the real upstream round-trip (25–500 ms for real network calls). For a provider that is down for 60 seconds with 1000 req/s of traffic, a closed circuit wastes ~60 000 network calls; an open circuit wastes 5 (the failures that tripped it) plus probe calls during recovery.

---

## Configuration

You can configure circuit breaker parameters globally or override them per provider.

```typescript
const meridian = await Meridian.create({
  providers: {
    github: {
      auth: { token: "gh_token" },
      circuitBreaker: {
        failureThreshold: 5,        // Open after 5 consecutive failures
        timeout: 30000,             // Wait 30 seconds before half-open state
        volumeThreshold: 10,        // Minimum 10 requests required in rolling window
        rollingWindowMs: 10000,     // Window size of 10 seconds
        errorThresholdPercentage: 50 // Open if 50%+ of requests in window fail
      }
    }
  },
  localUnsafe: true
});
```

### Parameters

| Parameter | Type | Default | Description |
|---|---|---|---|
| `failureThreshold` | `number` | `5` | Trigger opening the circuit after this many consecutive errors. |
| `timeout` | `number` | `10000` | Cooldown duration (in milliseconds) the circuit remains open before transitioning to half-open. |
| `volumeThreshold` | `number` | `10` | The minimum number of requests in the window before percentage-based checks run. |
| `rollingWindowMs` | `number` | `60000` | The rolling time window (in milliseconds) to calculate stats over. |
| `errorThresholdPercentage`| `number` | `50` | Open the circuit if the failure rate in the window exceeds this percentage. |

---

## Retry-After and Shared Cooldowns

`timeout` (the breaker's cooldown) is a fixed, configured duration — it is **not** derived from any upstream `Retry-After` value. A provider returning `Retry-After: 60` doesn't make the breaker wait 60 seconds; it waits whatever `timeout` is set to, regardless of what the provider asked for. The parsed `retryAfter` is used elsewhere (the [rate limiter's token bucket](./rate-limits.md)), not here. See [429 Is Not One Signal](./retries.md#429-is-not-one-signal) for how that classification flows through the pipeline.

What *is* shared across callers today is the breaker's `OPEN` state itself — one breaker instance per provider means every caller fails fast against the same `nextAttempt`, so you don't get N callers each independently hammering a known-down provider. What is **not** yet coordinated is re-entry: as noted above, nothing currently staggers or rate-limits the trial requests once the cooldown expires, so recovery can itself produce a burst against a provider that just came back up. Today, backoff between an individual caller's own retry attempts is jittered per-caller (see [Retry Delay Strategy](./retries.md#retry-delay-strategy)) rather than coordinated across callers at the adapter level.

An adapter-level shared cooldown with staggered re-entry (e.g. one designated probe per recovery window, others waiting on its result) is on the roadmap but not implemented yet.

---

## Persistence in Production (StateStorage)

By default, the circuit breaker state is kept in-memory. However, in serverless or multi-instance containerized deployments, local in-memory state leads to "cold-start resets" and uncoordinated status syncs.

For production, you should pass a shared storage implementation (like Redis or Upstash) using the `StateStorage` interface:

```typescript
import { Meridian, RedisStateStorage } from "meridianjs";
import { createClient } from "redis";

const redisClient = createClient({ url: "redis://localhost:6379" });
await redisClient.connect();

const meridian = await Meridian.create({
  providers: {
    stripe: { auth: { apiKey: "sk_key" } }
  },
  mode: "distributed",               // Enable distributed state mode
  stateStorage: new RedisStateStorage(redisClient) // Share state breaker across instances!
});
```