# OmniOracle

**Automatic discovery of non-trivial statistical truths from heterogeneous public data.**

OmniOracle ingests hundreds of public time series across domains (economics, commodities, labor, prices, demographics) and automatically discovers statistically significant lagged relationships using a rigorous multi-stage pipeline. No human hypotheses needed -- the engine finds them, validates them, and filters out the noise.

[![CI](https://github.com/cesabici-bit/omni-oracle/actions/workflows/ci.yml/badge.svg)](https://github.com/cesabici-bit/omni-oracle/actions/workflows/ci.yml)
[![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)

## The Problem

Public data contains thousands of latent relationships. Economists know some (Oil -> CPI, Fed Funds -> Yields), but manually screening 500+ time series (125,000+ pairwise combinations) is intractable. Most automated approaches drown in false positives from multiple testing.

OmniOracle tackles this with a 6-stage statistical pipeline that goes from raw data to validated, ranked hypotheses -- automatically.

## Pipeline

```
Public Data APIs (FRED, World Bank, EIA, NOAA)
         |
   [Ingest + Normalize]       551 monthly time series
         |
   [Quality + Stationarity]   ADF/KPSS tests, differencing
         |
   [MI Screening]             Mutual Information: discard 99% of pairs
         |
   [Lagged MI Direction]      Non-linear directional test + optimal lag
         |
   [FDR Correction]           Benjamini-Hochberg at alpha=0.05
         |
   [OOS Validation]           Ridge/RF walk-forward, incremental R2
         |
   [Post-Filters]             Blacklist derived series, remove identity
         |                    pairs, high-correlation duplicates
   [Walk-Forward CV]          Multi-window robustness check
         |
   Ranked Hypothesis Cards
```

### Why This Order Matters

Each stage is more expensive than the previous one. MI screening (fast, non-parametric) eliminates 99% of pairs before the expensive directional test runs. FDR correction prevents the multiple-testing explosion. OOS validation catches overfitting. Post-filters catch tautologies (see [Lessons Learned](#lessons-learned)). Walk-forward CV catches regime-dependent relationships.

## Key Results

### Discovery: It Works

From 551 time series (253 FRED + 298 World Bank), pipeline v2 (Lagged MI + Ridge/RF walk-forward):

| Metric | Value |
|--------|-------|
| Clean hypotheses | 6,882 |
| Known relationships rediscovered | **8/8** (100%) |
| Walk-forward ROBUST signals | 5 (4 adjusted-robust) |

The engine finds known economic relationships **without being told to look for them**:

- Okun's Law (unemployment <-> GDP growth)
- Oil prices -> CPI (3-6 month lag)
- Fed Funds Rate <-> Treasury yields
- M2 money supply -> inflation
- Corporate credit spreads -> economic activity
- Manufacturing hours -> manufacturing employment
- Consumer confidence -> retail spending
- Housing starts -> construction employment

### Trading: It Doesn't Work

The 5 ROBUST signals were backtested with a simple directional strategy (Ridge, 60/40 train/test split). **None beat the random benchmark** (no Sharpe ratio > 2 sigma above random shuffles):

| Signal | Lag | OOS R2 | Backtest Sharpe | vs Random |
|--------|-----|--------|-----------------|-----------|
| Imports -> Gas Price | 8 | 0.57 | -0.10 | NO |
| Imports -> Gas Price | 3 | 0.53 | -0.57 | NO |
| Imports -> Trade Balance | 11 | 0.52 | -0.15 | NO |
| USD/EUR -> Semiconductor | 8 | 0.22 | -0.15 | NO |
| Fed Collateral -> Exports | 4 | 0.21 | +0.14 | NO |

**Why high R2 but no trading edge?** Walk-forward R2 measures variance explained -- the model captures the *shape* of the relationship. But directional trading needs consistent *sign* prediction, and with near-zero coefficients or regime-shifting relationships, the direction is essentially a coin flip. Additionally, Imports -> Trade Balance is near-tautological (imports are an accounting component of trade balance).

### Conclusions

1. **The discovery engine works**: it reliably finds genuine statistical relationships, including all known benchmarks
2. **Public monthly macro data has no tradable edge**: if a signal in FRED data were actionable, it would have been arbitraged away long ago
3. **Statistical significance != economic significance**: a relationship can be statistically robust but have zero practical value
4. **Honest negative results are valuable**: knowing that automated discovery from public data doesn't produce alpha is useful information for anyone considering this path

## Installation

```bash
git clone https://github.com/cesabici-bit/omni-oracle.git
cd omni-oracle

python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -e ".[dev]"

# Set FRED API key (free: https://fred.stlouisfed.org/docs/api/api_key.html)
echo "FRED_API_KEY=your_key_here" > .env
```

## Usage

```bash
# Run all checks (lint + test + cross-tool verification)
make check-all

# Run tests only
pytest tests/ -v

# Ingest data (requires FRED_API_KEY)
python -m src.ingest.fred --limit 500
python -m src.ingest.worldbank --limit 300

# Run discovery pipeline
python -m src.run_f5

# Apply filters + cross-validation (fast, from cache)
python -m src.run_f5_filter --refilter

# Re-run full pipeline from scratch (~50 min)
python -m src.run_f5_filter --recompute

# Backtest ROBUST signals
python -m src.backtest
```

## Output Format

Each discovery is a **Hypothesis Card**:

```
+-----------------------------------------------------------+
| #6  Score: 7.1/10  [HIGH]
+-----------------------------------------------------------+
|  ICE BofA US Corporate Index Option-Adjusted Spread
|    x->y  (lag: 2 periods)
|  Chicago Fed National Activity Index
|
|  MI: 0.1335  |  Direction p: 9.90e-03  |  OOS R2: 0.2581
+-----------------------------------------------------------+
```

## Verification

5-level verification framework designed to catch errors at every stage:

| Level | What | How |
|-------|------|-----|
| **L1** Unit | Each function does what it claims | 46 unit tests |
| **L2** Domain | Results are plausible in the domain | 11 tests with values from published sources (Granger 1969, FRED documented relationships) |
| **L3** Property | Statistical invariants hold for any valid input | 6 property-based tests (Hypothesis library) |
| **L4** Golden | Pipeline output is stable and human-reviewed | Smoke test snapshot, approved once |
| **L5** Real data | System rediscovers known truths from literature | 10 tests against documented economic relationships |

**Cross-tool verification (M4)**: Alternative MI (histogram-based) and Granger (manual OLS) implementations in `verify/` confirm main pipeline results.

Total: **118 tests**, all passing.

## Lessons Learned

### EC-003: Derived Series Create Tautological Discoveries

The St. Louis Fed Price Pressures Measure (STLPPM) is a FAVAR model that takes 104 input series (including PCE and commodity prices) and outputs a 12-month forward inflation probability. When we included STLPPM as a discoverable variable, the engine correctly found that PCE and Brent Crude "predict" it -- but this is circular (input predicts output of model), not a genuine causal discovery.

**Fix**: Blacklist derived/model-based series. Before including any series in the discovery pool, verify: (1) is it forward-looking? (2) are its inputs already in the pool?

**Reference**: Jackson, Kliesen, Owyang (2015) "A Measure of Price Pressures", Federal Reserve Bank of St. Louis Review, 97(1), pp.25-52.

### High Walk-Forward R2 Does Not Imply Tradability

Walk-forward cross-validation measures whether a model consistently explains variance across time windows. A signal can have R2 = 0.57 (strong) but produce Sharpe = -0.10 (useless for trading) because:
- The regression coefficient can be near-zero (direction prediction is noise)
- The relationship can be near-tautological (accounting identity, not causal)
- Regime shifts can invert the coefficient sign between windows

**Takeaway**: OOS R2 validates *statistical* relationships. *Economic* significance requires separate testing (backtest, position sizing, transaction costs).

## Project Structure

```
omni-oracle/
  src/
    ingest/        # Data fetchers (FRED, World Bank, EIA, NOAA)
    storage/       # DuckDB repository layer
    preprocess/    # Quality checks, stationarity transforms
    discovery/     # MI screening, lagged MI directional test
    validation/    # FDR correction, OOS temporal validation (Ridge/RF)
    scoring/       # Composite ranking
    output/        # Hypothesis cards, trading reports, walk-forward CV
    pipeline.py    # End-to-end orchestrator
    backtest.py    # Trading signal backtester
  tests/           # 118 tests (L1-L5 verification levels)
  verify/          # M4 cross-tool verification
```

## Tech Stack

Python 3.12+ | Pandas | SciPy | Scikit-learn | Statsmodels | DuckDB | FRED API | World Bank API

## License

MIT

## Acknowledgments

Development assisted by [Claude Code](https://claude.ai/claude-code) (Anthropic).

## Disclaimer

All results are **statistical associations**, not proof of causation. This is a research tool, not financial advice. Past statistical relationships do not guarantee future persistence.