{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Feature extractors: basics\n", "\n", "This tutorial introduces the `light_curve` feature extractor interface:\n", "creating features, combining them with [`Extractor`](../api/meta/#light_curve.Extractor),\n", "reading names and descriptions, and batch processing." ] }, { "cell_type": "code", "execution_count": null, "id": "1", "metadata": {}, "outputs": [], "source": [ "# %pip install light-curve nested-pandas polars universal-pathlib" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": "## Single feature\n\nEach feature class is callable. It accepts `(t, m, sigma)` arrays and returns a NumPy array.\nThe `.names` attribute lists the output column names.\n\nHere we use [`Amplitude`](../api/variability/#light_curve.Amplitude) — the half peak-to-peak amplitude:" }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [], "source": [ "import light_curve as licu\n", "import numpy as np\n", "\n", "rng = np.random.default_rng(42)\n", "t = np.sort(rng.uniform(0, 100, 200))\n", "m = 15.0 + 0.3 * np.sin(2 * np.pi * t / 20) + rng.normal(0, 0.05, 200)\n", "err = np.full(200, 0.05)\n", "\n", "amp = licu.Amplitude()\n", "result = amp(t, m, err)\n", "print(f'names: {amp.names}')\n", "print(f'result: {result}')" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "`.descriptions` gives a human-readable explanation of each output:" ] }, { "cell_type": "code", "execution_count": null, "id": "5", "metadata": {}, "outputs": [], "source": [ "import light_curve as licu\n", "\n", "f = licu.EtaE()\n", "print(f.descriptions)" ] }, { "cell_type": "markdown", "id": "6", "metadata": {}, "source": [ "## Combining features with `Extractor`\n", "\n", "[`Extractor`](../api/meta/#light_curve.Extractor) combines multiple features into a single callable evaluated in one pass.\n", "It is especially efficient for **cheap features** (statistical moments, variability\n", "indices, etc.) because it avoids some computations and reduces Python–Rust\n", "call overhead:" ] }, { "cell_type": "code", "execution_count": null, "id": "7", "metadata": {}, "outputs": [], "source": [ "import light_curve as licu\n", "import numpy as np\n", "\n", "rng = np.random.default_rng(42)\n", "n = 200\n", "t = np.sort(rng.uniform(0, 100, n))\n", "m = 15.0 + 0.3 * np.sin(2 * np.pi * t / 20) + rng.normal(0, 0.1, n)\n", "err = np.full(n, 0.1)\n", "\n", "ext = licu.Extractor(\n", " licu.InterPercentileRange(quantile=0.25),\n", " licu.BeyondNStd(nstd=1),\n", " licu.BeyondNStd(nstd=2),\n", " licu.StandardDeviation(),\n", " licu.WeightedMean(),\n", " licu.LinearFit(),\n", " licu.StetsonK(),\n", ")\n", "result = ext(t, m, err)\n", "for name, value in zip(ext.names, result):\n", " print(f' {name:35s} = {value:.5f}')" ] }, { "cell_type": "markdown", "id": "8", "metadata": {}, "source": [ "## Batch processing with `.many()`\n", "\n", "`.many()` processes a list of `(t, m, sigma)` tuples and returns a 2-D NumPy array\n", "(shape `(N, n_features)`). It supports **multi-threading** (enabled by default via the\n", "`n_jobs` parameter) and is the preferred path for large datasets:" ] }, { "cell_type": "code", "execution_count": null, "id": "9", "metadata": {}, "outputs": [], "source": [ "import light_curve as licu\n", "import numpy as np\n", "\n", "rng = np.random.default_rng(0)\n", "light_curves = [\n", " (np.sort(rng.random(50)), rng.random(50), rng.random(50) * 0.1)\n", " for _ in range(1000)\n", "]\n", "\n", "feature = licu.Extractor(licu.Skew(), licu.Kurtosis(), licu.ReducedChi2())\n", "results = feature.many(light_curves)\n", "print(f'Extracted from {len(light_curves)} light curves: shape = {results.shape}')\n", "for name, col in zip(feature.names, results.T):\n", " print(f' {name:20s} mean = {col.mean():.4f}')" ] }, { "cell_type": "markdown", "id": "10", "metadata": {}, "source": [ "## Batch processing with nested-pandas\n", "\n", "[nested-pandas](https://nested-pandas.readthedocs.io) stores each light curve as a nested Arrow column,\n", "letting `.many()` consume it with zero copies.\n", "The `generate_data` helper creates a toy `NestedFrame` — its `nested` column holds\n", "`t`, `flux`, and `band` fields:" ] }, { "cell_type": "code", "execution_count": null, "id": "11", "metadata": {}, "outputs": [], "source": [ "# %pip install light-curve nested-pandas" ] }, { "cell_type": "code", "execution_count": null, "id": "12", "metadata": {}, "outputs": [], "source": "import light_curve as licu\nimport pyarrow as pa\nfrom nested_pandas.datasets import generate_data\n\nndf = generate_data(100, 50, seed=42)\n\nfeature = licu.Extractor(licu.ObservationCount(bands=[\"g\", \"r\"]), licu.StandardDeviation(bands=[\"g\", \"r\"]))\nresult = feature.many(pa.array(ndf[\"nested\"]), arrow_fields={\"t\": \"t\", \"m\": \"flux\", \"band\": \"band\"})\nndf = ndf.assign(**dict(zip(feature.names, result.T)))\nndf[[\"a\", \"b\", *feature.names]].head()" }, { "cell_type": "markdown", "id": "13", "metadata": {}, "source": [ "## Multiband light curves\n", "\n", "Every feature accepts a `bands=` constructor argument to switch into **per-passband mode**.\n", "When set, `__call__` expects a fourth `band` string array; outputs are named with a passband\n", "suffix (e.g. `amplitude_g`, `amplitude_r`).\n", "\n", "`Extractor` freely mixes single-band and multiband features — it filters the `band` array\n", "automatically for each sub-feature:" ] }, { "cell_type": "code", "execution_count": null, "id": "14", "metadata": {}, "outputs": [], "source": [ "import light_curve as licu\n", "import numpy as np\n", "\n", "rng = np.random.default_rng(42)\n", "t = np.sort(rng.uniform(0, 100, 200))\n", "m = 15.0 + 0.3 * np.sin(2 * np.pi * t / 20) + rng.normal(0, 0.05, 200)\n", "err = np.full(200, 0.05)\n", "band = np.tile([\"g\", \"r\"], 100)\n", "\n", "sk = licu.StetsonK(bands=[\"g\", \"r\"])\n", "print(\"StetsonK per band:\", dict(zip(sk.names, sk(t, m, err, band))))\n", "\n", "ext = licu.Extractor(\n", " licu.EtaE(bands=[\"g\", \"r\"]),\n", " licu.LinearFit(bands=[\"g\", \"r\"]),\n", " licu.WeightedMean(),\n", ")\n", "result = ext(t, m, err, band)\n", "for name, val in zip(ext.names, result):\n", " print(f\" {name:35s} = {val:.4f}\")" ] }, { "cell_type": "markdown", "id": "15", "metadata": {}, "source": [ "## Next steps\n", "\n", "- [Feature table](../) — all 40+ extractors grouped by category\n", "- [API reference](../api/) — full signatures and equations\n", "- [Periodogram tutorial](../periodogram/) — Lomb–Scargle and period search\n", "- [Multiband tutorial](../multiband/) — per-band and cross-band features\n", "- [Rainbow fit tutorial](../multiband/rainbow/) — blackbody temperature and radius evolution\n", "- [Batch processing tutorial](batch_processing.ipynb) — nested-pandas with real survey data, PyArrow, Polars" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.13" } }, "nbformat": 4, "nbformat_minor": 5 }