{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": "# Multi-band feature extraction\n\n`light-curve` supports multi-band light curves natively in every feature extractor.\nPass `bands=[\"g\", \"r\", ...]` to the constructor to switch a feature into **multiband mode**:\nthe feature is evaluated independently per passband and the outputs are concatenated,\nwith names like `amplitude_g`, `amplitude_r`.\n\nThis tutorial covers:\n\n1. Building a multiband light curve\n2. Per-band feature extraction\n3. Pure-multiband color features (`ColorOfMedian`, `ColorOfMaximum`, `ColorOfMinimum`, `ColorSpread`)\n4. Mixed single-band + multiband `Extractor`\n5. Multiband `Bins` and `Periodogram`\n6. Batch processing with `.many()`\n7. Arrow input for multiband features"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %pip install light-curve pyarrow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import light_curve as licu\n",
    "import numpy as np\n",
    "\n",
    "rng = np.random.default_rng(42)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "## 1. Building a multiband light curve\n",
    "\n",
    "A multiband light curve is just a regular `(t, m, sigma)` triple plus a `band` string array\n",
    "that labels each observation.  We sort everything by time so we can pass `sorted=True` later.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "def make_multiband_lc(band_labels, n_per_band=80, rng=None):\n",
    "    \"\"\"Return time-sorted (t, m, sigma, band) with n_per_band obs per passband.\"\"\"\n",
    "    rng = np.random.default_rng(rng)\n",
    "    parts = []\n",
    "    for label in band_labels:\n",
    "        t_b = rng.uniform(0, 100, n_per_band)\n",
    "        m_b = rng.normal(15.0, 0.3, n_per_band)\n",
    "        s_b = np.full(n_per_band, 0.05)\n",
    "        b_b = np.full(n_per_band, label)\n",
    "        parts.append((t_b, m_b, s_b, b_b))\n",
    "    t, m, s, b = (np.concatenate([p[i] for p in parts]) for i in range(4))\n",
    "    idx = np.argsort(t)\n",
    "    return t[idx], m[idx], s[idx], b[idx]\n",
    "\n",
    "\n",
    "band_labels = [\"g\", \"r\", \"i\"]\n",
    "t, m, sigma, band = make_multiband_lc(band_labels, rng=rng)\n",
    "print(f\"Total observations: {len(t)}\")\n",
    "print(f\"Band counts: { {b: int(np.sum(band == b)) for b in band_labels} }\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": "## 2. Per-band feature extraction\n\nPass `bands=` to any feature constructor to enter multiband mode.\nThe output array has `len(bands) * n_features_per_band` elements.\n\nBand labels can be **strings** *or* **integer IDs** (any NumPy integer dtype).\nInteger IDs are useful when survey data stores passbands as filter numbers\n(e.g. ZTF: 1 = g, 2 = r, 3 = i) and are slightly faster because they avoid\nstring hashing during band dispatch."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Amplitude per band — 3 values (one per passband)\n",
    "noband_amp = licu.Amplitude()\n",
    "noband_result = noband_amp(t, m, sigma)\n",
    "\n",
    "multiband_amp = licu.Amplitude(bands=band_labels)\n",
    "multiband_result = multiband_amp(t, m, sigma, band)\n",
    "\n",
    "print(f\"All-band amplitude: {noband_result[0]:.3f} mag\")\n",
    "for name, value in zip(multiband_amp.names, multiband_result):\n",
    "    print(f\"{name.removeprefix(\"amplitude_\")}-band amplitude: {value:.3f} mag\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": "# LinearFit per band using integer band IDs — no code changes needed in make_multiband_lc\n# numpy infers dtype=int64 from integer labels, which the feature accepts directly\nint_band_labels = [1, 2, 3]  # e.g. ZTF filter IDs: 1=g, 2=r, 3=i\nt_int, m_int, sigma_int, band_int = make_multiband_lc(int_band_labels, rng=rng)\n\nlf_int = licu.LinearFit(bands=int_band_labels)\nresult_int = lf_int(t_int, m_int, sigma_int, band_int, sorted=True)\n\nprint(\"Feature names with integer band IDs:\")\nfor name, val in zip(lf_int.names, result_int):\n    print(f\"  {name:35s} = {val:.4f}\")"
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "## 3. Pure-multiband color features\n",
    "\n",
    "Some features are **inherently multiband** — they always require `bands` and have no single-band mode.\n",
    "These compute cross-band statistics rather than per-band statistics.\n",
    "\n",
    "| Feature | Constructor | Output |\n",
    "|---------|-------------|--------|\n",
    "| `ColorOfMedian(bands)` | exactly 2 bands | `median(band[0]) − median(band[1])` |\n",
    "| `ColorOfMaximum(bands)` | exactly 2 bands | `max(band[0]) − max(band[1])` |\n",
    "| `ColorOfMinimum(bands)` | exactly 2 bands | `min(band[0]) − min(band[1])` |\n",
    "| `ColorSpread(bands)` | ≥2 bands | population std dev of per-band weighted mean magnitudes |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# g − r median color: filter to configured bands or it raises on unknown passbands\n",
    "gr = band != \"i\"\n",
    "com = licu.ColorOfMedian([\"g\", \"r\"])\n",
    "print(com.names[0], \"=\", com(t[gr], m[gr], sigma[gr], band[gr], sorted=True)[0].round(4), \"mag\")\n",
    "\n",
    "# spread of weighted-mean magnitudes across all three bands\n",
    "cs = licu.ColorSpread([\"g\", \"r\", \"i\"])\n",
    "print(cs.names[0], \"=\", cs(t, m, sigma, band, sorted=True)[0].round(4), \"mag\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10",
   "metadata": {},
   "source": [
    "## 4. Mixed single-band + multiband `Extractor`\n",
    "\n",
    "`Extractor` accepts any combination of single-band and multiband features.\n",
    "Single-band features receive the **full** light curve; multiband features receive per-band splits.\n",
    "Output names and values are concatenated in declaration order.\n",
    "\n",
    "`ColorOfMedian` (a pure-multiband feature from section 3) fits naturally here alongside\n",
    "per-band and whole-curve features:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "ext = licu.Extractor(\n",
    "    licu.Amplitude(bands=band_labels),  # 3 values — per band\n",
    "    licu.WeightedMean(bands=band_labels),  # 3 values — per band\n",
    "    licu.ColorOfMedian(band_labels[:2]),  # 1 value  — g − r median color\n",
    "    licu.ReducedChi2(),  # 1 value  — whole light curve (single-band)\n",
    ")\n",
    "\n",
    "result_ext = ext(t, m, sigma, band=band, sorted=True)\n",
    "\n",
    "print(\"Feature names and values:\")\n",
    "for name, val in zip(ext.names, result_ext):\n",
    "    print(f\"  {name:35s} = {val:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12",
   "metadata": {},
   "source": [
    "## 5a. Multiband `Bins`\n",
    "\n",
    "`Bins` bins observations by time window before evaluating inner features.\n",
    "Add `bands=` to apply the same binning independently per passband."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13",
   "metadata": {},
   "outputs": [],
   "source": [
    "bins_mb = licu.Bins(\n",
    "    [licu.Amplitude(), licu.Mean()],\n",
    "    window=5.0, offset=0.0,\n",
    "    bands=[\"g\", \"r\"],\n",
    ")\n",
    "\n",
    "t2, m2, s2, b2 = make_multiband_lc([\"g\", \"r\"], n_per_band=100, rng=rng)\n",
    "result_bins = bins_mb(t2, m2, s2, band=b2)\n",
    "\n",
    "print(\"Bins multiband names:\", bins_mb.names)\n",
    "print(\"Values:              \", result_bins)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14",
   "metadata": {},
   "source": [
    "## 5b. Multiband `Periodogram`\n",
    "\n",
    "`Periodogram` supports multiband mode via `MultiColorPeriodogram`, which finds a single best\n",
    "period that fits all passbands simultaneously.  Use `multiband_normalization='chi2'` for the\n",
    "chi-squared-weighted variant."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "pg = licu.Periodogram(peaks=2, bands=[\"g\", \"r\"], multiband_normalization=\"chi2\")\n",
    "\n",
    "t_pg, m_pg, s_pg, b_pg = make_multiband_lc([\"g\", \"r\"], n_per_band=150, rng=rng)\n",
    "result_pg = pg(t_pg, m_pg, s_pg, band=b_pg)\n",
    "\n",
    "print(\"Periodogram names:\", pg.names)\n",
    "print(\"Best period (days):\", 1.0 / result_pg[0] if result_pg[0] > 0 else \"N/A\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16",
   "metadata": {},
   "source": [
    "## 6. Batch processing with `.many()`\n",
    "\n",
    "`.many()` processes a **list of light curves** in parallel.\n",
    "For multiband features each element must be a four-tuple `(t, m, sigma, band)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17",
   "metadata": {},
   "outputs": [],
   "source": [
    "feature = licu.Amplitude(bands=[\"g\", \"r\"])\n",
    "\n",
    "# Generate 500 two-band light curves\n",
    "n_lcs = 500\n",
    "lcs = [make_multiband_lc([\"g\", \"r\"], n_per_band=60, rng=rng) for _ in range(n_lcs)]\n",
    "\n",
    "# .many() returns shape (n_lcs, n_features)\n",
    "results = feature.many(lcs, sorted=True)\n",
    "print(f\"Shape: {results.shape}\")\n",
    "print(f\"Mean amplitude_g: {results[:, 0].mean():.4f}\")\n",
    "print(f\"Mean amplitude_r: {results[:, 1].mean():.4f}\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18",
   "metadata": {},
   "source": "## 7. Arrow input for multiband `.many()`\n\nFor zero-copy data exchange from polars / pyarrow / nanoarrow, pass an Arrow\n`List<Struct<...>>` array and include `\"band\"` in the `arrow_fields` dict.\n\nSee the [batch processing tutorial](../batch_processing/) for more complete examples\nincluding nested-pandas and Polars."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "19",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy.testing as npt\n",
    "import pyarrow as pa\n",
    "\n",
    "\n",
    "def make_arrow_lcs(lcs):\n",
    "    \"\"\"Convert list of (t, m, sigma, band) tuples to a pyarrow List<Struct>.\"\"\"\n",
    "    struct_type = pa.struct([\n",
    "        (\"t\", pa.float64()),\n",
    "        (\"m\", pa.float64()),\n",
    "        (\"sigma\", pa.float64()),\n",
    "        (\"band\", pa.utf8()),\n",
    "    ])\n",
    "    rows_per_lc = []\n",
    "    for t_i, m_i, s_i, b_i in lcs:\n",
    "        rows_per_lc.append([\n",
    "            {\"t\": float(t_i[j]), \"m\": float(m_i[j]),\n",
    "             \"sigma\": float(s_i[j]), \"band\": str(b_i[j])}\n",
    "            for j in range(len(t_i))\n",
    "        ])\n",
    "    return pa.array(rows_per_lc, type=pa.list_(struct_type))\n",
    "\n",
    "\n",
    "arrow_arr = make_arrow_lcs(lcs[:10])\n",
    "\n",
    "result_arrow = feature.many(\n",
    "    arrow_arr,\n",
    "    sorted=True,\n",
    "    arrow_fields={\"t\": \"t\", \"m\": \"m\", \"sigma\": \"sigma\", \"band\": \"band\"},\n",
    ")\n",
    "print(f\"Arrow result shape: {result_arrow.shape}\")\n",
    "\n",
    "# Verify against list input\n",
    "result_list = feature.many(lcs[:10], sorted=True)\n",
    "\n",
    "npt.assert_array_equal(result_list, result_arrow)\n",
    "print(\"Arrow and list results match ✓\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "20",
   "metadata": {},
   "source": "## Summary\n\n| Feature | Constructor | Output |\n|---------|-------------|--------|\n| Any Rust feature | `Feature(bands=[...])` | `n_bands × k` values per call |\n| Any Rust feature (integer IDs) | `Feature(bands=[0, 1, 2])` | same, slightly faster — band arrays must be integer dtype |\n| `Extractor` | Mix freely | Single-band and multiband combined |\n| `ColorOfMedian` / `ColorOfMaximum` / `ColorOfMinimum` | `cls([\"g\", \"r\"])` or `cls([0, 1])` | 1 value — cross-band statistic |\n| `ColorSpread` | `ColorSpread([\"g\", \"r\", \"i\"])` or `ColorSpread([0, 1, 2])` | 1 value — std dev of band means |\n| `Bins` | `Bins([...], window=…, bands=[...])` | Per-band binned features |\n| `Periodogram` | `Periodogram(bands=[...])` | Joint period, `n_peaks` values |\n| `.many()` | `feature.many([(t, m, sigma, band), ...])` | `(n_lcs, n_features)` array |\n| Arrow `.many()` | add `arrow_fields={..., \"band\": \"...\"}` | Same, zero-copy |"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}