--- name: adding-api-sources description: Use when implementing a new data source adapter for metapyle, before writing any source code --- # Adding API Sources to Metapyle ## Overview Add new financial data source adapters following TDD and established patterns. Each source provides `fetch()` and `get_metadata()` methods with lazy imports for optional dependencies. **Core principle:** Use `brainstorming` skill first for design decisions, then implement following established patterns. ## Workflow 1. **Design** - Use `brainstorming` skill to decide data model mapping 2. **Plan** - Use `writing-plans` skill for implementation plan 3. **Implement** - Follow TDD with subagents (see Quick Reference) ## Design Questions (Brainstorming Phase) Before coding, answer these questions using the `brainstorming` skill: | Question | Why It Matters | |----------|----------------| | What maps to `symbol`? | Primary identifier (ticker, bbid, series ID) | | What maps to `field`? | Secondary identifier if needed (PX_LAST, dataset::column) | | Need `params` field? | Extra filters (tenor, location, deltaStrike) | | Authentication model? | External (user calls auth) or internal (credentials passed) | | Batch strategy? | Single call for all symbols, or group by some key? | | Column naming? | Symbol only, or symbol::field for uniqueness? | | Metadata available? | What can `get_metadata()` return? | ## Quick Reference | Step | Files | Key Actions | |------|-------|-------------| | 1. Branch | — | `git checkout -b feature/-source` | | 2. Skeleton | `sources/.py` | Lazy import + class with `NotImplementedError` | | 3. Export | `sources/__init__.py` | Add import + `__all__` | | 4. Tests | `tests/unit/test_sources_.py` | Mock-based tests (RED) | | 5. Implement | `sources/.py` | `fetch()` then `get_metadata()` (GREEN) | | 6. Config | `pyproject.toml` | Optional dep + mypy ignore | | 7. Verify | — | pytest, mypy, ruff | ## Batch Fetch API Sources receive batched requests via `Sequence[FetchRequest]`: ```python from collections.abc import Sequence from metapyle.sources.base import BaseSource, FetchRequest, make_column_name, register_source @register_source("") class Source(BaseSource): def fetch( self, requests: Sequence[FetchRequest], start: str, end: str, ) -> pd.DataFrame: """ Parameters ---------- requests : Sequence[FetchRequest] Each has: symbol, field (optional), path (optional), params (optional) start, end : str ISO dates (YYYY-MM-DD) Returns ------- pd.DataFrame DatetimeIndex, columns named via make_column_name(symbol, field) """ if not requests: return pd.DataFrame() # ... implementation ``` ## FetchRequest Fields ```python @dataclass(frozen=True, slots=True, kw_only=True) class FetchRequest: symbol: str # Required - primary identifier field: str | None = None # Optional - e.g., "PX_LAST", "dataset::col" path: str | None = None # Optional - for localfile source params: dict[str, Any] | None = None # Optional - extra filters ``` ## Column Naming Always use `make_column_name()` for output columns: ```python from metapyle.sources.base import make_column_name # In fetch(), rename columns: for req in requests: col_name = make_column_name(req.symbol, req.field) # "AAPL::PX_LAST" or "AAPL" result[col_name] = data[req.symbol] ``` ## Batch Grouping Pattern When API requires grouping (e.g., by dataset): ```python def fetch(self, requests: Sequence[FetchRequest], start: str, end: str) -> pd.DataFrame: # Group by some key (dataset_id, field type, etc.) groups: dict[str, list[FetchRequest]] = {} for req in requests: key = extract_key(req.field) # Your grouping logic groups.setdefault(key, []).append(req) # Fetch each group (potentially in parallel) result_dfs: list[pd.DataFrame] = [] for key, group_requests in groups.items(): symbols = [req.symbol for req in group_requests] df = api.batch_fetch(key, symbols, start, end) result_dfs.append(df) # Merge results result = result_dfs[0] for df in result_dfs[1:]: result = result.join(df, how="outer") return result ``` ## Lazy Import Pattern ```python _LIB_AVAILABLE: bool | None = None _lib_modules: dict[str, Any] = {} def _get_lib() -> dict[str, Any]: """Lazy import of library modules.""" global _LIB_AVAILABLE, _lib_modules if _LIB_AVAILABLE is None: try: from library import Module1, Module2 _lib_modules = {"Module1": Module1, "Module2": Module2} _LIB_AVAILABLE = True except (ImportError, Exception): _lib_modules = {} _LIB_AVAILABLE = False return _lib_modules ``` ## Exception Handling ```python try: data = api.fetch(symbols, start, end) except (FetchError, NoDataError): raise # Re-raise our exceptions as-is except Exception as e: logger.error("fetch_failed: symbols=%s, error=%s", symbols, str(e)) raise FetchError(f"API error: {e}") from e if data.empty: raise NoDataError(f"No data returned for {symbols}") ``` ## Test Pattern ```python class TestSourceFetch: def test_single_request(self) -> None: with patch("metapyle.sources.._get_lib") as mock_get: mock_lib = {"API": MagicMock()} mock_lib["API"].fetch.return_value = mock_data mock_get.return_value = mock_lib source = Source() requests = [FetchRequest(symbol="SYM", field="FIELD")] df = source.fetch(requests, "2024-01-01", "2024-12-31") assert "SYM::FIELD" in df.columns assert isinstance(df.index, pd.DatetimeIndex) ``` ## pyproject.toml ```toml [project.optional-dependencies] = [""] [[tool.mypy.overrides]] module = ["", ".*"] ignore_missing_imports = true ``` ## Common Mistakes | Mistake | Fix | |---------|-----| | Wrong `fetch()` signature | Must be `fetch(requests: Sequence[FetchRequest], start, end)` | | Import at module level | Use lazy import pattern with `_get_lib()` | | Manual column naming | Use `make_column_name(symbol, field)` | | f-strings in logging | Use `logger.debug("msg: %s", var)` | | Missing empty request check | Return `pd.DataFrame()` if `not requests` | | Catching exceptions silently | Re-raise `FetchError`/`NoDataError`, wrap others | ## TDD Order 1. **RED:** Write test for `_get_lib()` (library not installed) 2. **GREEN:** Implement lazy import 3. **RED:** Write test for single request fetch 4. **GREEN:** Implement basic fetch 5. **RED:** Write test for batch fetch 6. **GREEN:** Implement batch handling 7. **RED:** Write error handling tests 8. **GREEN:** Implement error handling 9. **VERIFY:** Run full test suite, ruff, mypy