{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Feature extractors: basics\n",
    "\n",
    "This tutorial introduces the `light_curve` feature extractor interface:\n",
    "creating features, combining them with [`Extractor`](../api/meta/#light_curve.Extractor),\n",
    "reading names and descriptions, and batch processing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %pip install light-curve nested-pandas polars universal-pathlib"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": "## Single feature\n\nEach feature class is callable. It accepts `(t, m, sigma)` arrays and returns a NumPy array.\nThe `.names` attribute lists the output column names.\n\nHere we use [`Amplitude`](../api/variability/#light_curve.Amplitude) — the half peak-to-peak amplitude:"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import light_curve as licu\n",
    "import numpy as np\n",
    "\n",
    "rng = np.random.default_rng(42)\n",
    "t = np.sort(rng.uniform(0, 100, 200))\n",
    "m = 15.0 + 0.3 * np.sin(2 * np.pi * t / 20) + rng.normal(0, 0.05, 200)\n",
    "err = np.full(200, 0.05)\n",
    "\n",
    "amp = licu.Amplitude()\n",
    "result = amp(t, m, err)\n",
    "print(f'names:  {amp.names}')\n",
    "print(f'result: {result}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": [
    "`.descriptions` gives a human-readable explanation of each output:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import light_curve as licu\n",
    "\n",
    "f = licu.EtaE()\n",
    "print(f.descriptions)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6",
   "metadata": {},
   "source": [
    "## Combining features with `Extractor`\n",
    "\n",
    "[`Extractor`](../api/meta/#light_curve.Extractor) combines multiple features into a single callable evaluated in one pass.\n",
    "It is especially efficient for **cheap features** (statistical moments, variability\n",
    "indices, etc.) because it avoids some computations and reduces Python–Rust\n",
    "call overhead:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "import light_curve as licu\n",
    "import numpy as np\n",
    "\n",
    "rng = np.random.default_rng(42)\n",
    "n = 200\n",
    "t = np.sort(rng.uniform(0, 100, n))\n",
    "m = 15.0 + 0.3 * np.sin(2 * np.pi * t / 20) + rng.normal(0, 0.1, n)\n",
    "err = np.full(n, 0.1)\n",
    "\n",
    "ext = licu.Extractor(\n",
    "    licu.InterPercentileRange(quantile=0.25),\n",
    "    licu.BeyondNStd(nstd=1),\n",
    "    licu.BeyondNStd(nstd=2),\n",
    "    licu.StandardDeviation(),\n",
    "    licu.WeightedMean(),\n",
    "    licu.LinearFit(),\n",
    "    licu.StetsonK(),\n",
    ")\n",
    "result = ext(t, m, err)\n",
    "for name, value in zip(ext.names, result):\n",
    "    print(f'  {name:35s} = {value:.5f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "## Batch processing with `.many()`\n",
    "\n",
    "`.many()` processes a list of `(t, m, sigma)` tuples and returns a 2-D NumPy array\n",
    "(shape `(N, n_features)`). It supports **multi-threading** (enabled by default via the\n",
    "`n_jobs` parameter) and is the preferred path for large datasets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "import light_curve as licu\n",
    "import numpy as np\n",
    "\n",
    "rng = np.random.default_rng(0)\n",
    "light_curves = [\n",
    "    (np.sort(rng.random(50)), rng.random(50), rng.random(50) * 0.1)\n",
    "    for _ in range(1000)\n",
    "]\n",
    "\n",
    "feature = licu.Extractor(licu.Skew(), licu.Kurtosis(), licu.ReducedChi2())\n",
    "results = feature.many(light_curves)\n",
    "print(f'Extracted from {len(light_curves)} light curves: shape = {results.shape}')\n",
    "for name, col in zip(feature.names, results.T):\n",
    "    print(f'  {name:20s} mean = {col.mean():.4f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10",
   "metadata": {},
   "source": [
    "## Batch processing with nested-pandas\n",
    "\n",
    "[nested-pandas](https://nested-pandas.readthedocs.io) stores each light curve as a nested Arrow column,\n",
    "letting `.many()` consume it with zero copies.\n",
    "The `generate_data` helper creates a toy `NestedFrame` — its `nested` column holds\n",
    "`t`, `flux`, and `band` fields:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %pip install light-curve nested-pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": "import light_curve as licu\nimport pyarrow as pa\nfrom nested_pandas.datasets import generate_data\n\nndf = generate_data(100, 50, seed=42)\n\nfeature = licu.Extractor(licu.ObservationCount(bands=[\"g\", \"r\"]), licu.StandardDeviation(bands=[\"g\", \"r\"]))\nresult = feature.many(pa.array(ndf[\"nested\"]), arrow_fields={\"t\": \"t\", \"m\": \"flux\", \"band\": \"band\"})\nndf = ndf.assign(**dict(zip(feature.names, result.T)))\nndf[[\"a\", \"b\", *feature.names]].head()"
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "## Multiband light curves\n",
    "\n",
    "Every feature accepts a `bands=` constructor argument to switch into **per-passband mode**.\n",
    "When set, `__call__` expects a fourth `band` string array; outputs are named with a passband\n",
    "suffix (e.g. `amplitude_g`, `amplitude_r`).\n",
    "\n",
    "`Extractor` freely mixes single-band and multiband features — it filters the `band` array\n",
    "automatically for each sub-feature:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {},
   "outputs": [],
   "source": [
    "import light_curve as licu\n",
    "import numpy as np\n",
    "\n",
    "rng = np.random.default_rng(42)\n",
    "t = np.sort(rng.uniform(0, 100, 200))\n",
    "m = 15.0 + 0.3 * np.sin(2 * np.pi * t / 20) + rng.normal(0, 0.05, 200)\n",
    "err = np.full(200, 0.05)\n",
    "band = np.tile([\"g\", \"r\"], 100)\n",
    "\n",
    "sk = licu.StetsonK(bands=[\"g\", \"r\"])\n",
    "print(\"StetsonK per band:\", dict(zip(sk.names, sk(t, m, err, band))))\n",
    "\n",
    "ext = licu.Extractor(\n",
    "    licu.EtaE(bands=[\"g\", \"r\"]),\n",
    "    licu.LinearFit(bands=[\"g\", \"r\"]),\n",
    "    licu.WeightedMean(),\n",
    ")\n",
    "result = ext(t, m, err, band)\n",
    "for name, val in zip(ext.names, result):\n",
    "    print(f\"  {name:35s} = {val:.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "\n",
    "- [Feature table](../) — all 40+ extractors grouped by category\n",
    "- [API reference](../api/) — full signatures and equations\n",
    "- [Periodogram tutorial](../periodogram/) — Lomb–Scargle and period search\n",
    "- [Multiband tutorial](../multiband/) — per-band and cross-band features\n",
    "- [Rainbow fit tutorial](../multiband/rainbow/) — blackbody temperature and radius evolution\n",
    "- [Batch processing tutorial](batch_processing.ipynb) — nested-pandas with real survey data, PyArrow, Polars"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}