{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": "# Multi-band feature extraction\n\n`light-curve` supports multi-band light curves natively in every feature extractor.\nPass `bands=[\"g\", \"r\", ...]` to the constructor to switch a feature into **multiband mode**:\nthe feature is evaluated independently per passband and the outputs are concatenated,\nwith names like `amplitude_g`, `amplitude_r`.\n\nThis tutorial covers:\n\n1. Building a multiband light curve\n2. Per-band feature extraction\n3. Pure-multiband color features (`ColorOfMedian`, `ColorOfMaximum`, `ColorOfMinimum`, `ColorSpread`)\n4. Mixed single-band + multiband `Extractor`\n5. Multiband `Bins` and `Periodogram`\n6. Batch processing with `.many()`\n7. Arrow input for multiband features" }, { "cell_type": "code", "execution_count": null, "id": "1", "metadata": {}, "outputs": [], "source": [ "# %pip install light-curve pyarrow" ] }, { "cell_type": "code", "execution_count": null, "id": "2", "metadata": {}, "outputs": [], "source": [ "import light_curve as licu\n", "import numpy as np\n", "\n", "rng = np.random.default_rng(42)" ] }, { "cell_type": "markdown", "id": "3", "metadata": {}, "source": [ "## 1. Building a multiband light curve\n", "\n", "A multiband light curve is just a regular `(t, m, sigma)` triple plus a `band` string array\n", "that labels each observation. We sort everything by time so we can pass `sorted=True` later.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "4", "metadata": {}, "outputs": [], "source": [ "def make_multiband_lc(band_labels, n_per_band=80, rng=None):\n", " \"\"\"Return time-sorted (t, m, sigma, band) with n_per_band obs per passband.\"\"\"\n", " rng = np.random.default_rng(rng)\n", " parts = []\n", " for label in band_labels:\n", " t_b = rng.uniform(0, 100, n_per_band)\n", " m_b = rng.normal(15.0, 0.3, n_per_band)\n", " s_b = np.full(n_per_band, 0.05)\n", " b_b = np.full(n_per_band, label)\n", " parts.append((t_b, m_b, s_b, b_b))\n", " t, m, s, b = (np.concatenate([p[i] for p in parts]) for i in range(4))\n", " idx = np.argsort(t)\n", " return t[idx], m[idx], s[idx], b[idx]\n", "\n", "\n", "band_labels = [\"g\", \"r\", \"i\"]\n", "t, m, sigma, band = make_multiband_lc(band_labels, rng=rng)\n", "print(f\"Total observations: {len(t)}\")\n", "print(f\"Band counts: { {b: int(np.sum(band == b)) for b in band_labels} }\")" ] }, { "cell_type": "markdown", "id": "5", "metadata": {}, "source": "## 2. Per-band feature extraction\n\nPass `bands=` to any feature constructor to enter multiband mode.\nThe output array has `len(bands) * n_features_per_band` elements.\n\nBand labels can be **strings** *or* **integer IDs** (any NumPy integer dtype).\nInteger IDs are useful when survey data stores passbands as filter numbers\n(e.g. ZTF: 1 = g, 2 = r, 3 = i) and are slightly faster because they avoid\nstring hashing during band dispatch." }, { "cell_type": "code", "execution_count": null, "id": "6", "metadata": {}, "outputs": [], "source": [ "# Amplitude per band — 3 values (one per passband)\n", "noband_amp = licu.Amplitude()\n", "noband_result = noband_amp(t, m, sigma)\n", "\n", "multiband_amp = licu.Amplitude(bands=band_labels)\n", "multiband_result = multiband_amp(t, m, sigma, band)\n", "\n", "print(f\"All-band amplitude: {noband_result[0]:.3f} mag\")\n", "for name, value in zip(multiband_amp.names, multiband_result):\n", " print(f\"{name.removeprefix(\"amplitude_\")}-band amplitude: {value:.3f} mag\")" ] }, { "cell_type": "code", "execution_count": null, "id": "7", "metadata": {}, "outputs": [], "source": "# LinearFit per band using integer band IDs — no code changes needed in make_multiband_lc\n# numpy infers dtype=int64 from integer labels, which the feature accepts directly\nint_band_labels = [1, 2, 3] # e.g. ZTF filter IDs: 1=g, 2=r, 3=i\nt_int, m_int, sigma_int, band_int = make_multiband_lc(int_band_labels, rng=rng)\n\nlf_int = licu.LinearFit(bands=int_band_labels)\nresult_int = lf_int(t_int, m_int, sigma_int, band_int, sorted=True)\n\nprint(\"Feature names with integer band IDs:\")\nfor name, val in zip(lf_int.names, result_int):\n print(f\" {name:35s} = {val:.4f}\")" }, { "cell_type": "markdown", "id": "8", "metadata": {}, "source": [ "## 3. Pure-multiband color features\n", "\n", "Some features are **inherently multiband** — they always require `bands` and have no single-band mode.\n", "These compute cross-band statistics rather than per-band statistics.\n", "\n", "| Feature | Constructor | Output |\n", "|---------|-------------|--------|\n", "| `ColorOfMedian(bands)` | exactly 2 bands | `median(band[0]) − median(band[1])` |\n", "| `ColorOfMaximum(bands)` | exactly 2 bands | `max(band[0]) − max(band[1])` |\n", "| `ColorOfMinimum(bands)` | exactly 2 bands | `min(band[0]) − min(band[1])` |\n", "| `ColorSpread(bands)` | ≥2 bands | population std dev of per-band weighted mean magnitudes |" ] }, { "cell_type": "code", "execution_count": null, "id": "9", "metadata": {}, "outputs": [], "source": [ "# g − r median color: filter to configured bands or it raises on unknown passbands\n", "gr = band != \"i\"\n", "com = licu.ColorOfMedian([\"g\", \"r\"])\n", "print(com.names[0], \"=\", com(t[gr], m[gr], sigma[gr], band[gr], sorted=True)[0].round(4), \"mag\")\n", "\n", "# spread of weighted-mean magnitudes across all three bands\n", "cs = licu.ColorSpread([\"g\", \"r\", \"i\"])\n", "print(cs.names[0], \"=\", cs(t, m, sigma, band, sorted=True)[0].round(4), \"mag\")" ] }, { "cell_type": "markdown", "id": "10", "metadata": {}, "source": [ "## 4. Mixed single-band + multiband `Extractor`\n", "\n", "`Extractor` accepts any combination of single-band and multiband features.\n", "Single-band features receive the **full** light curve; multiband features receive per-band splits.\n", "Output names and values are concatenated in declaration order.\n", "\n", "`ColorOfMedian` (a pure-multiband feature from section 3) fits naturally here alongside\n", "per-band and whole-curve features:" ] }, { "cell_type": "code", "execution_count": null, "id": "11", "metadata": {}, "outputs": [], "source": [ "ext = licu.Extractor(\n", " licu.Amplitude(bands=band_labels), # 3 values — per band\n", " licu.WeightedMean(bands=band_labels), # 3 values — per band\n", " licu.ColorOfMedian(band_labels[:2]), # 1 value — g − r median color\n", " licu.ReducedChi2(), # 1 value — whole light curve (single-band)\n", ")\n", "\n", "result_ext = ext(t, m, sigma, band=band, sorted=True)\n", "\n", "print(\"Feature names and values:\")\n", "for name, val in zip(ext.names, result_ext):\n", " print(f\" {name:35s} = {val:.4f}\")" ] }, { "cell_type": "markdown", "id": "12", "metadata": {}, "source": [ "## 5a. Multiband `Bins`\n", "\n", "`Bins` bins observations by time window before evaluating inner features.\n", "Add `bands=` to apply the same binning independently per passband." ] }, { "cell_type": "code", "execution_count": null, "id": "13", "metadata": {}, "outputs": [], "source": [ "bins_mb = licu.Bins(\n", " [licu.Amplitude(), licu.Mean()],\n", " window=5.0, offset=0.0,\n", " bands=[\"g\", \"r\"],\n", ")\n", "\n", "t2, m2, s2, b2 = make_multiband_lc([\"g\", \"r\"], n_per_band=100, rng=rng)\n", "result_bins = bins_mb(t2, m2, s2, band=b2)\n", "\n", "print(\"Bins multiband names:\", bins_mb.names)\n", "print(\"Values: \", result_bins)\n" ] }, { "cell_type": "markdown", "id": "14", "metadata": {}, "source": [ "## 5b. Multiband `Periodogram`\n", "\n", "`Periodogram` supports multiband mode via `MultiColorPeriodogram`, which finds a single best\n", "period that fits all passbands simultaneously. Use `multiband_normalization='chi2'` for the\n", "chi-squared-weighted variant." ] }, { "cell_type": "code", "execution_count": null, "id": "15", "metadata": {}, "outputs": [], "source": [ "pg = licu.Periodogram(peaks=2, bands=[\"g\", \"r\"], multiband_normalization=\"chi2\")\n", "\n", "t_pg, m_pg, s_pg, b_pg = make_multiband_lc([\"g\", \"r\"], n_per_band=150, rng=rng)\n", "result_pg = pg(t_pg, m_pg, s_pg, band=b_pg)\n", "\n", "print(\"Periodogram names:\", pg.names)\n", "print(\"Best period (days):\", 1.0 / result_pg[0] if result_pg[0] > 0 else \"N/A\")\n" ] }, { "cell_type": "markdown", "id": "16", "metadata": {}, "source": [ "## 6. Batch processing with `.many()`\n", "\n", "`.many()` processes a **list of light curves** in parallel.\n", "For multiband features each element must be a four-tuple `(t, m, sigma, band)`." ] }, { "cell_type": "code", "execution_count": null, "id": "17", "metadata": {}, "outputs": [], "source": [ "feature = licu.Amplitude(bands=[\"g\", \"r\"])\n", "\n", "# Generate 500 two-band light curves\n", "n_lcs = 500\n", "lcs = [make_multiband_lc([\"g\", \"r\"], n_per_band=60, rng=rng) for _ in range(n_lcs)]\n", "\n", "# .many() returns shape (n_lcs, n_features)\n", "results = feature.many(lcs, sorted=True)\n", "print(f\"Shape: {results.shape}\")\n", "print(f\"Mean amplitude_g: {results[:, 0].mean():.4f}\")\n", "print(f\"Mean amplitude_r: {results[:, 1].mean():.4f}\")\n" ] }, { "cell_type": "markdown", "id": "18", "metadata": {}, "source": "## 7. Arrow input for multiband `.many()`\n\nFor zero-copy data exchange from polars / pyarrow / nanoarrow, pass an Arrow\n`List>` array and include `\"band\"` in the `arrow_fields` dict.\n\nSee the [batch processing tutorial](../batch_processing/) for more complete examples\nincluding nested-pandas and Polars." }, { "cell_type": "code", "execution_count": null, "id": "19", "metadata": {}, "outputs": [], "source": [ "import numpy.testing as npt\n", "import pyarrow as pa\n", "\n", "\n", "def make_arrow_lcs(lcs):\n", " \"\"\"Convert list of (t, m, sigma, band) tuples to a pyarrow List.\"\"\"\n", " struct_type = pa.struct([\n", " (\"t\", pa.float64()),\n", " (\"m\", pa.float64()),\n", " (\"sigma\", pa.float64()),\n", " (\"band\", pa.utf8()),\n", " ])\n", " rows_per_lc = []\n", " for t_i, m_i, s_i, b_i in lcs:\n", " rows_per_lc.append([\n", " {\"t\": float(t_i[j]), \"m\": float(m_i[j]),\n", " \"sigma\": float(s_i[j]), \"band\": str(b_i[j])}\n", " for j in range(len(t_i))\n", " ])\n", " return pa.array(rows_per_lc, type=pa.list_(struct_type))\n", "\n", "\n", "arrow_arr = make_arrow_lcs(lcs[:10])\n", "\n", "result_arrow = feature.many(\n", " arrow_arr,\n", " sorted=True,\n", " arrow_fields={\"t\": \"t\", \"m\": \"m\", \"sigma\": \"sigma\", \"band\": \"band\"},\n", ")\n", "print(f\"Arrow result shape: {result_arrow.shape}\")\n", "\n", "# Verify against list input\n", "result_list = feature.many(lcs[:10], sorted=True)\n", "\n", "npt.assert_array_equal(result_list, result_arrow)\n", "print(\"Arrow and list results match ✓\")" ] }, { "cell_type": "markdown", "id": "20", "metadata": {}, "source": "## Summary\n\n| Feature | Constructor | Output |\n|---------|-------------|--------|\n| Any Rust feature | `Feature(bands=[...])` | `n_bands × k` values per call |\n| Any Rust feature (integer IDs) | `Feature(bands=[0, 1, 2])` | same, slightly faster — band arrays must be integer dtype |\n| `Extractor` | Mix freely | Single-band and multiband combined |\n| `ColorOfMedian` / `ColorOfMaximum` / `ColorOfMinimum` | `cls([\"g\", \"r\"])` or `cls([0, 1])` | 1 value — cross-band statistic |\n| `ColorSpread` | `ColorSpread([\"g\", \"r\", \"i\"])` or `ColorSpread([0, 1, 2])` | 1 value — std dev of band means |\n| `Bins` | `Bins([...], window=…, bands=[...])` | Per-band binned features |\n| `Periodogram` | `Periodogram(bands=[...])` | Joint period, `n_peaks` values |\n| `.many()` | `feature.many([(t, m, sigma, band), ...])` | `(n_lcs, n_features)` array |\n| Arrow `.many()` | add `arrow_fields={..., \"band\": \"...\"}` | Same, zero-copy |" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 5 }