--- name: rangebar-eval-metrics description: Range bar evaluation metrics for quant trading. TRIGGERS - range bar metrics, Sharpe ratio, WFO metrics, PSR DSR MinTRL. allowed-tools: Read, Grep, Glob, Bash --- # Range Bar Evaluation Metrics Machine-readable reference + computation scripts for state-of-the-art metrics evaluating range bar (price-based sampling) data. ## When to Use This Skill Use this skill when: - Evaluating ML model performance on range bar data - Computing Sharpe ratios with non-IID bar sequences - Running Walk-Forward Optimization metric analysis - Calculating PSR, DSR, or MinTRL statistical tests - Generating evaluation reports from fold results ## Quick Start ```bash # Compute metrics from predictions + actuals python scripts/compute_metrics.py --predictions preds.npy --actuals actuals.npy --timestamps ts.npy # Generate full evaluation report python scripts/generate_report.py --results folds.jsonl --output report.md ``` ## Metric Tiers | Tier | Purpose | Metrics | Compute | | ---------------------- | ------------------ | ------------------------------------------------------------------------ | -------------------- | | **Primary** (5) | Research decisions | weekly_sharpe, hit_rate, cumulative_pnl, n_bars, positive_sharpe_rate | Per-fold + aggregate | | **Secondary/Risk** (5) | Additional context | max_drawdown, bar_sharpe, return_per_bar, profit_factor, cv_fold_returns | Per-fold | | **ML Quality** (3) | Prediction health | ic, prediction_autocorr, is_collapsed | Per-fold | | **Diagnostic** (5) | Final validation | psr, dsr, autocorr_lag1, effective_n, binomial_pvalue | Aggregate only | | **Extended Risk** (5) | Deep risk analysis | var_95, cvar_95, omega_ratio, sortino_ratio, ulcer_index | Per-fold (optional) | ## Why Range Bars Need Special Treatment Range bars violate standard IID assumptions: 1. **Variable duration**: Bars form based on price movement, not time 2. **Autocorrelation**: High-volatility periods cluster bars → temporal correlation 3. **Non-constant information**: More bars during volatility = more information per day **Canonical solution**: Daily aggregation via `_group_by_day()` before Sharpe calculation. ## References ### Core Reference Files | Topic | Reference File | | ------------------------------------ | ----------------------------------------------------------------- | | Sharpe Ratio Calculations | [sharpe-formulas.md](./references/sharpe-formulas.md) | | Risk Metrics (VaR, Omega, Ulcer) | [risk-metrics.md](./references/risk-metrics.md) | | ML Prediction Quality (IC, Autocorr) | [ml-prediction-quality.md](./references/ml-prediction-quality.md) | | Crypto Market Considerations | [crypto-markets.md](./references/crypto-markets.md) | | Temporal Aggregation Rules | [temporal-aggregation.md](./references/temporal-aggregation.md) | | JSON Schema for Metrics | [metrics-schema.md](./references/metrics-schema.md) | | Anti-Patterns (Transaction Costs) | [anti-patterns.md](./references/anti-patterns.md) | | SOTA 2025-2026 (SHAP, BOCPD, etc.) | [sota-2025-2026.md](./references/sota-2025-2026.md) | | Worked Examples (BTC, EUR/USD) | [worked-examples.md](./references/worked-examples.md) | | **Structured Logging (NDJSON)** | [structured-logging.md](./references/structured-logging.md) | ### Related Skills | Skill | Relationship | | ---------------------------------------------------- | ------------------------------------------------------ | | [adaptive-wfo-epoch](../adaptive-wfo-epoch/SKILL.md) | Uses `weekly_sharpe`, `psr`, `dsr` for WFE calculation | ### Dependencies ```bash pip install -r requirements.txt # Or: pip install numpy>=1.24 pandas>=2.0 scipy>=1.10 ``` ## Key Formulas ### Daily-Aggregated Sharpe (Primary Metric) ```python def weekly_sharpe(pnl: np.ndarray, timestamps: np.ndarray) -> float: """Sharpe with daily aggregation for range bars.""" daily_pnl = _group_by_day(pnl, timestamps) # Sum PnL per calendar day if len(daily_pnl) < 2 or np.std(daily_pnl) == 0: return 0.0 daily_sharpe = np.mean(daily_pnl) / np.std(daily_pnl) # For crypto (7-day week): sqrt(7). For equities: sqrt(5) return daily_sharpe * np.sqrt(7) # Crypto default ``` ### Information Coefficient (Prediction Quality) ```python from scipy.stats import spearmanr def information_coefficient(predictions: np.ndarray, actuals: np.ndarray) -> float: """Spearman rank IC - captures magnitude alignment.""" ic, _ = spearmanr(predictions, actuals) return ic # Range: [-1, 1]. >0.02 acceptable, >0.05 good, >0.10 excellent ``` ### Probabilistic Sharpe Ratio (Statistical Validation) ```python from scipy.stats import norm def psr(sharpe: float, se: float, benchmark: float = 0.0) -> float: """P(true Sharpe > benchmark).""" return norm.cdf((sharpe - benchmark) / se) ``` ## Annualization Factors | Market | Daily → Weekly | Daily → Annual | Rationale | | ----------------- | -------------- | ---------------- | ------------------- | | **Crypto (24/7)** | sqrt(7) = 2.65 | sqrt(365) = 19.1 | 7 trading days/week | | **Equity** | sqrt(5) = 2.24 | sqrt(252) = 15.9 | 5 trading days/week | **NEVER use sqrt(252) for crypto markets.** ## CRITICAL: Session Filter Changes Annualization | View | Filter | days_per_week | Rationale | | -------------------------------- | -------------------- | ------------- | --------------------- | | **Session-filtered** (London-NY) | Weekdays 08:00-16:00 | **sqrt(5)** | Trading like equities | | **All-bars** (unfiltered) | None | **sqrt(7)** | Full 24/7 crypto | **Using sqrt(7) for session-filtered data overstates Sharpe by ~18%!** See [crypto-markets.md](./references/crypto-markets.md#critical-session-specific-annualization) for detailed rationale. ## Dual-View Metrics For comprehensive analysis, compute metrics with BOTH views: 1. **Session-filtered** (London 08:00 to NY 16:00): Primary strategy evaluation 2. **All-bars**: Regime detection, data quality diagnostics ## Academic References | Concept | Citation | | ---------------------------- | ------------------------------ | | Deflated Sharpe Ratio | Bailey & López de Prado (2014) | | Sharpe SE with Non-Normality | Mertens (2002) | | Statistics of Sharpe Ratios | Lo (2002) | | Omega Ratio | Keating & Shadwick (2002) | | Ulcer Index | Peter Martin (1987) | ## Decision Framework ### Go Criteria (Research) ```yaml go_criteria: - positive_sharpe_rate > 0.55 - mean_weekly_sharpe > 0 - cv_fold_returns < 1.5 - mean_hit_rate > 0.50 ``` ### Publication Criteria ```yaml publication_criteria: - binomial_pvalue < 0.05 - psr > 0.85 - dsr > 0.50 # If n_trials > 1 ``` ## Scripts | Script | Purpose | | ---------------------------- | -------------------------------------------- | | `scripts/compute_metrics.py` | Compute all metrics from predictions/actuals | | `scripts/generate_report.py` | Generate Markdown report from fold results | | `scripts/validate_schema.py` | Validate metrics JSON against schema | ## Remediations (2026-01-19 Multi-Agent Audit) The following fixes were applied based on a 12-subagent adversarial audit: | Issue | Root Cause | Fix | Source | | ------------------------------ | ------------------------- | ---------------------------------------------- | ------------------ | | `weekly_sharpe=0` | Constant predictions | Model collapse detection + architecture fix | model-expert | | `IC=None` | Zero variance predictions | Return 1.0 for constant (semantically correct) | model-expert | | `prediction_autocorr=NaN` | Division by zero | Guard for std < 1e-10, return 1.0 | model-expert | | Ulcer Index divide-by-zero | Peak equity = 0 | Guard with np.where(peak > 1e-10, ...) | risk-analyst | | Omega/Profit Factor unreliable | Too few samples | min_days parameter (default: 5) | robustness-analyst | | BiLSTM mean collapse | Architecture too small | hidden_size: 16→48, dropout: 0.5→0.3 | model-expert | | `profit_factor=1.0` (n_bars=0) | Early return wrong value | Return NaN when no data to compute ratio | risk-analyst | ### Model Collapse Detection ```python # ALWAYS check for model collapse after prediction pred_std = np.std(predictions) if pred_std < 1e-6: logger.warning( f"Constant predictions detected (std={pred_std:.2e}). " "Model collapsed to mean - check architecture." ) ``` ### Recommended BiLSTM Architecture ```python # BEFORE (causes collapse on range bars) HIDDEN_SIZE = 16 DROPOUT = 0.5 # AFTER (prevents collapse) HIDDEN_SIZE = 48 # Triple capacity DROPOUT = 0.3 # Less aggressive regularization ``` See reference docs for complete implementation details. --- ## Troubleshooting | Issue | Cause | Solution | | -------------------------- | ---------------------------- | -------------------------------------------------- | | weekly_sharpe is 0 | Constant predictions | Check for model collapse, increase hidden_size | | IC returns None | Zero variance in predictions | Model collapsed - check architecture | | prediction_autocorr is NaN | Division by zero | Guard for std < 1e-10 in autocorr calculation | | Ulcer Index divide error | Peak equity is zero | Add guard: np.where(peak > 1e-10, ...) | | profit_factor = 1.0 | No bars processed | Return NaN when n_bars is 0 | | Sharpe inflated 18% | Wrong annualization for data | Use sqrt(5) for session-filtered, sqrt(7) for 24/7 | | PSR/DSR not computed | Missing scipy | Install: `pip install scipy` | | Timestamps not parsed | Wrong format | Ensure Unix timestamps, not datetime strings |