{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "361e6e0a",
   "metadata": {},
   "source": [
    "---\n",
    "title: Anatomy of a Real Crypto Order Book\n",
    "summary: Spread, depth, and slippage measured on a real recorded BTC-USD order book from Coinbase — the microstructure that sets your true trading costs.\n",
    "tags: [crypto, microstructure, order-book, slippage]\n",
    "---\n",
    "\n",
    "# Anatomy of a Real Crypto Order Book\n",
    "\n",
    "Trading isn't free, and the \"price\" on a screen isn't the price you get. This post dissects a **real\n",
    "recorded BTC-USD limit order book** (Coinbase, ~600 snapshots over ~10 minutes — the same data that\n",
    "powers the [ConvexPi Arena](https://convexpi.ai/compete/arena-book)) to measure the three things that\n",
    "actually determine your cost: the **spread**, the **depth**, and the **slippage** of walking the book.\n",
    "See [how matching works](https://convexpi.ai/exchange) for the mechanics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd722456",
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import json, urllib.request\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "URL = \"https://raw.githubusercontent.com/convexpi/arena/f75088c55d58c8693fec8e85c9ef4a8927ab2f2e/data/btcusd_book.jsonl\"\n",
    "raw = urllib.request.urlopen(URL).read().decode()\n",
    "frames = [json.loads(l) for l in raw.splitlines() if l.strip()]\n",
    "print(f\"{len(frames)} order-book snapshots; each has bids 'b' and asks 'a' = [[price, size], ...]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7725ac97",
   "metadata": {},
   "source": [
    "## 1. The spread\n",
    "\n",
    "The **best bid** and **best ask** bracket the mid; the gap is the **spread** — what you immediately\n",
    "pay to cross from buyer to seller. Coinbase BTC is famously tight, so we look at it in basis points."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aecb1e1f",
   "metadata": {},
   "outputs": [],
   "source": [
    "rows = []\n",
    "for f in frames:\n",
    "    bb, ba = f[\"b\"][0][0], f[\"a\"][0][0]\n",
    "    mid = (bb + ba) / 2\n",
    "    rows.append({\"t\": f[\"t\"], \"mid\": mid, \"spread\": ba - bb, \"spread_bps\": (ba - bb) / mid * 1e4})\n",
    "df = pd.DataFrame(rows)\n",
    "df[\"time\"] = pd.to_datetime(df[\"t\"], unit=\"ms\")\n",
    "print(f\"mid price : ${df['mid'].mean():,.2f}\")\n",
    "print(f\"spread    : ${df['spread'].mean():.3f}  ({df['spread_bps'].mean():.2f} bps avg, \"\n",
    "      f\"{df['spread_bps'].median():.2f} bps median)\")\n",
    "\n",
    "fig, ax = plt.subplots(2, 1, figsize=(10, 5), sharex=True)\n",
    "ax[0].plot(df[\"time\"], df[\"mid\"], lw=1); ax[0].set_title(\"BTC-USD mid price\")\n",
    "ax[1].plot(df[\"time\"], df[\"spread_bps\"], lw=1, color=\"indianred\"); ax[1].set_title(\"Spread (bps)\")\n",
    "plt.tight_layout(); plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb9b3ce6",
   "metadata": {},
   "source": [
    "## 2. Depth — the shape of the book\n",
    "\n",
    "The spread only tells you the cost of a *tiny* trade. For real size, what matters is **depth**: how\n",
    "much is resting at each price level. We average the cumulative size available within a given distance\n",
    "(in bps) of the mid, on each side."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4613595",
   "metadata": {},
   "outputs": [],
   "source": [
    "LEVELS = 15\n",
    "def cum_depth(side_key, sign):\n",
    "    # cumulative size and its avg price-offset (bps) for the first LEVELS levels, averaged over frames\n",
    "    sizes = np.zeros(LEVELS); offs = np.zeros(LEVELS); cnt = np.zeros(LEVELS)\n",
    "    for f in frames:\n",
    "        mid = (f[\"b\"][0][0] + f[\"a\"][0][0]) / 2\n",
    "        cum = 0.0\n",
    "        for i, (price, qty) in enumerate(f[side_key][:LEVELS]):\n",
    "            cum += qty\n",
    "            sizes[i] += cum\n",
    "            offs[i] += sign * (price - mid) / mid * 1e4\n",
    "            cnt[i] += 1\n",
    "    return offs / cnt, sizes / cnt\n",
    "bid_off, bid_cum = cum_depth(\"b\", -1)\n",
    "ask_off, ask_cum = cum_depth(\"a\", +1)\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(9, 3.5))\n",
    "ax.step(-bid_off, bid_cum, where=\"mid\", color=\"seagreen\", label=\"bids (cumulative)\")\n",
    "ax.step(ask_off, ask_cum, where=\"mid\", color=\"indianred\", label=\"asks (cumulative)\")\n",
    "ax.axvline(0, color=\"grey\", lw=0.5)\n",
    "ax.set_xlabel(\"distance from mid (bps)\"); ax.set_ylabel(\"cumulative size (BTC)\")\n",
    "ax.set_title(\"Average book depth\"); ax.legend(); plt.tight_layout(); plt.show()\n",
    "print(f\"avg size in top {LEVELS} ask levels: {ask_cum[-1]:.2f} BTC  (~${ask_cum[-1]*df['mid'].mean():,.0f})\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03d3c42a",
   "metadata": {},
   "source": [
    "## 3. Slippage — what a market order really costs\n",
    "\n",
    "A market buy *walks the ask ladder*: it fills the cheapest offers first, then more expensive ones. The\n",
    "**volume-weighted price you pay** drifts above the mid — that's **slippage**, and it grows with order\n",
    "size. This is the cost the Arena makes you feel when you send a market order into the real book."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb0c0775",
   "metadata": {},
   "outputs": [],
   "source": [
    "def slippage_buy_bps(frame, size_btc):\n",
    "    mid = (frame[\"b\"][0][0] + frame[\"a\"][0][0]) / 2\n",
    "    remaining, cost, filled = size_btc, 0.0, 0.0\n",
    "    for price, qty in frame[\"a\"]:\n",
    "        take = min(remaining, qty); cost += take * price; filled += take; remaining -= take\n",
    "        if remaining <= 1e-12: break\n",
    "    if filled < size_btc * 0.999:       # book too thin to fill this size\n",
    "        return np.nan\n",
    "    return (cost / filled / mid - 1) * 1e4\n",
    "\n",
    "sizes = [0.05, 0.1, 0.25, 0.5, 1, 2, 3, 5]\n",
    "slip = [np.nanmean([slippage_buy_bps(f, s) for f in frames]) for s in sizes]\n",
    "fillable = [np.mean([not np.isnan(slippage_buy_bps(f, s)) for f in frames]) for s in sizes]\n",
    "\n",
    "fig, ax = plt.subplots(figsize=(9, 3.5))\n",
    "ax.plot(sizes, slip, \"o-\"); ax.set_xlabel(\"market-buy size (BTC)\"); ax.set_ylabel(\"avg slippage (bps)\")\n",
    "ax.set_title(\"Slippage vs order size\"); plt.tight_layout(); plt.show()\n",
    "for s, sl, fr in zip(sizes, slip, fillable):\n",
    "    print(f\"  buy {s:>4} BTC  (~${s*df['mid'].mean():>10,.0f})  ->  {sl:5.2f} bps slippage   [fillable in {fr:.0%} of snapshots]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c51af885",
   "metadata": {},
   "source": [
    "## Takeaways\n",
    "\n",
    "1. **The spread is the floor, not the cost.** Coinbase's BTC spread is ~1 bp — but that only covers a\n",
    "   dust-sized trade.\n",
    "2. **Depth is finite.** The top-of-book holds only a few BTC; a larger order walks up the ladder.\n",
    "3. **Slippage grows with size** and explodes once you exhaust visible depth — the real reason big\n",
    "   orders are split, worked, or routed.\n",
    "\n",
    "This is exactly what the [Arena's real-order-book mode](https://convexpi.ai/compete/arena-book) makes\n",
    "tangible: your market orders pay *this* slippage, and a market maker earns it. Build an agent in the\n",
    "[market-making lesson](https://convexpi.ai/lessons/market-making)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}