# BTC Orderbook Microstructure Research **38 days of real Binance BTC/USDT orderbook data (2026-03-10 to 2026-04-18), analyzed.** [![Python](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/) [![License](https://img.shields.io/badge/license-MIT-green.svg)]() ## TL;DR — Five findings 1. **OBI signal decays in ~26 seconds.** Order Book Imbalance autocorrelation at lag 1 is ≈ 0.64 and falls to the noise floor (~0.05) in ~26 s — offering a short window for signals based on order book state. 2. **OBI predicts short-horizon returns (ρ ≈ 0.20 at 10 s).** Spearman correlation with 10-second forward returns is ~0.20 for all three OBI levels; it decays to ~0.075 at 60 s (all p ≈ 0 across ~1.9M observations). 3. **CVD lags price, not leads it.** Cross-correlation of cumulative volume delta with mid-price log-returns peaks at lag −45 s (cvd_60s), meaning aggressive order flow reacts to price moves rather than anticipating them at 1-second resolution. 4. **Spread is ultra-tight and nearly constant.** Binance BTC/USDT spread stays around 0.014 bps throughout the day — an intraday range of only 0.0005 bps. High-volatility periods are associated with slightly *tighter* spreads (Spearman ρ = −0.19), likely reflecting increased market-maker competition. 5. **Network interruptions matter.** Mean daily data coverage was 57.8% — 562 gaps >30 s totaling 308 hours of missing data. Gap-aware segmentation is essential before computing any rolling statistics. ![daily coverage](figures/00_daily_coverage.png) ## What's in this repo | Path | Contents | |------|----------| | `data/loader.py` | Parquet loader with gap detection and date-range queries | | `analysis/00_data_quality.ipynb` | Coverage report, gap distribution, longest clean segments | | `analysis/01_obi_analysis.ipynb` | OBI distribution, ACF, Spearman forward-return correlation | | `analysis/02_cvd_analysis.ipynb` | CVD timeseries, price cross-correlation, timescale correlation | | `analysis/03_spread_liquidity.ipynb` | Hourly spread, vol vs. spread, extreme-event frequency | | `figures/` | Pre-rendered PNGs — no need to run code to see results | | `sample_data/` | 7-day parquet subset (2026-03-11 to 2026-03-17) for reproducibility | ## Figures ### Data Quality ![gap distribution](figures/00_gap_distribution.png) ### OBI Statistical Properties ![obi distribution](figures/01_obi_distribution.png) ![obi acf](figures/01_obi_acf.png) ![obi forward return](figures/01_obi_forward_return.png) ### CVD Analysis ![cvd timeseries](figures/02_cvd_timeseries.png) ![cvd xcorr](figures/02_cvd_xcorr.png) ![cvd correlation](figures/02_cvd_correlation.png) ### Spread & Liquidity ![spread hourly](figures/03_spread_hourly.png) ![spread vs vol](figures/03_spread_vs_vol.png) ![spread extremes](figures/03_spread_extremes.png) ## Data schema One Parquet file per hour. Columns: | Column | Description | |--------|-------------| | `timestamp` | ISO string, UTC | | `micro_price`, `mid_price` | Derived from top of book | | `spread_bps` | Bid-ask spread in basis points | | `b1_price`, `a1_price` | Best bid and ask | | `obi_5`, `obi_10`, `obi_20` | Order Book Imbalance at top 5/10/20 levels | | `cvd_60s`, `cvd_300s`, `cvd_900s` | Cumulative Volume Delta over 1/5/15 minutes | Sampling rate: ~1 Hz. Source: Binance WebSocket `@depth20@100ms` + `@aggTrade`. ## Reproducing ```bash git clone https://github.com/whoareunot/btc-orderbook-research cd btc-orderbook-research python3 -m venv .venv && source .venv/bin/activate pip install -r requirements.txt python scripts/run_all.py # regenerates all figures from sample_data/ ``` The included `sample_data/` (2026-03-11 to 2026-03-17) is the week with highest data coverage from the full dataset. ## What's NOT in this repo This is a **statistical characterization** of the data, not a trading strategy. The following are intentionally excluded: - Entry/exit thresholds or trading signals - Model parameters or feature importance rankings - Live PnL, win rate, or drawdown data The methodology (Triple-Barrier labeling, gradient boosting on microstructure features) follows López de Prado (2018) and standard quant-research practice. The statistics here describe the data; the strategy that uses them is separate. ## Citation ```bibtex @misc{btc_orderbook_research_2026, title = {BTC Orderbook Microstructure Research}, author = {}, year = {2026}, url = {https://github.com/whoareunot/btc-orderbook-research} } ``` ## License MIT for code. `sample_data/` is CC-BY-4.0 — attribution required. ## About the Author 不小心 — a muggle in 2025, an AI-native coder in 2026. Got to know BTC only two months ago. Looking forward to more conversations.