{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Batch processing\n",
    "\n",
    "This tutorial covers the `.many()` method for efficient bulk feature extraction:\n",
    "\n",
    "- Plain Python lists of `(t, m, sigma)` tuples\n",
    "- [nested-pandas](https://nested-pandas.readthedocs.io) with real ZTF survey data\n",
    "- [PyArrow](https://arrow.apache.org/docs/python/) `List<Struct>` arrays\n",
    "- [Polars](https://docs.pola.rs) Series\n",
    "\n",
    "All Arrow-compatible inputs avoid Python-level iteration and pass data to Rust with zero copies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": "# %pip install light-curve"
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## Plain list of tuples\n",
    "\n",
    "`.many()` accepts a list of `(t, m, sigma)` tuples and returns a 2-D NumPy array of shape\n",
    "`(N, n_features)`. Multi-threading is enabled by default via the `n_jobs` parameter:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3",
   "metadata": {},
   "outputs": [],
   "source": [
    "import light_curve as licu\n",
    "import numpy as np\n",
    "\n",
    "rng = np.random.default_rng(0)\n",
    "light_curves = [\n",
    "    (np.sort(rng.random(50)), rng.random(50), rng.random(50) * 0.1)\n",
    "    for _ in range(1000)\n",
    "]\n",
    "\n",
    "results = licu.Amplitude().many(light_curves)\n",
    "print(f'Extracted from {len(light_curves)} light curves: shape = {results.shape}')\n",
    "print(f'Mean amplitude = {results.mean():.4f} mag')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4",
   "metadata": {},
   "source": "## nested-pandas with ZTF survey data\n\n[nested-pandas](https://nested-pandas.readthedocs.io) extends pandas with nested Arrow column\nsupport, useful for catalog data such as ZTF or Rubin LSST."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5",
   "metadata": {},
   "outputs": [],
   "source": "# %pip install light-curve nested-pandas s3fs universal-pathlib"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "import light_curve as licu\n",
    "import nested_pandas as npd\n",
    "import numpy as np\n",
    "import pyarrow as pa\n",
    "from upath import UPath\n",
    "\n",
    "s3_path = UPath(\n",
    "    \"s3://ipac-irsa-ztf/contributed/dr23/lc/hats/ztf_dr23_lc-hats/dataset/Norder=6/Dir=30000/Npix=34623/part0.snappy.parquet\",\n",
    "    anon=True,\n",
    ")\n",
    "ndf = npd.read_parquet(\n",
    "    s3_path,\n",
    "    columns=[\"objectid\", \"lightcurve.hmjd\", \"lightcurve.mag\", \"lightcurve.magerr\"],\n",
    ")\n",
    "\n",
    "ndf = ndf.loc[ndf[\"lightcurve\"].list_lengths > 10]\n",
    "\n",
    "ndf[\"lightcurve.t\"] = np.asarray(ndf[\"lightcurve.hmjd\"] - 58000, dtype=np.float32)\n",
    "\n",
    "feature = licu.Extractor(licu.Chi2Pvar(), licu.InterPercentileRange(quantile=0.25), licu.LinearFit())\n",
    "result = feature.many(pa.array(ndf[\"lightcurve\"]), n_jobs=-1,\n",
    "                      arrow_fields={\"t\": \"t\", \"m\": \"mag\", \"sigma\": \"magerr\"})\n",
    "\n",
    "ndf = ndf.assign(**dict(zip(feature.names, result.T)))\n",
    "ndf.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7",
   "metadata": {},
   "source": "## PyArrow\n\n[PyArrow](https://arrow.apache.org/docs/python/) is the reference Python implementation of Apache Arrow.\nPass a `List<Struct<t, m, band>>` array directly to `.many()` for multiband extraction without sigma."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8",
   "metadata": {},
   "outputs": [],
   "source": "# %pip install light-curve pyarrow"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": "import light_curve as licu\nimport numpy as np\nimport pyarrow as pa\n\nBANDS = [\"g\", \"r\"]\nrng = np.random.default_rng(42)\nn_lc, n_per_band = 200, 40\n\nstruct_type = pa.struct([\n    (\"t\", pa.float64()),\n    (\"m\", pa.float64()),\n    (\"band\", pa.string()),\n])\n\n\ndef make_lc():\n    rows = []\n    for b in BANDS:\n        t = rng.uniform(0, 100, n_per_band)\n        m = rng.normal(15.0 if b == \"g\" else 15.3, 0.3, n_per_band)\n        rows.extend({\"t\": float(ti), \"m\": float(mi), \"band\": b} for ti, mi in zip(t, m))\n    rows.sort(key=lambda r: r[\"t\"])\n    return rows\n\n\nlcs_arrow = pa.array([make_lc() for _ in range(n_lc)], type=pa.list_(struct_type))\n\nfeature = licu.Extractor(\n    licu.InterPercentileRange(quantile=0.1, bands=BANDS),  # robust amplitude per band\n    licu.AndersonDarlingNormal(bands=BANDS),  # normality test per band\n    licu.ColorOfMaximum(BANDS),  # colour at brightness peak\n    licu.ColorOfMinimum(BANDS),  # colour at brightness trough\n)\nresult = feature.many(\n    lcs_arrow,\n    sorted=True,\n    arrow_fields={\"t\": \"t\", \"m\": \"m\", \"band\": \"band\"},\n)\nprint(f\"shape: {result.shape}\")  # (200, 6)\nprint(\"names:\", feature.names)"
  },
  {
   "cell_type": "markdown",
   "id": "10",
   "metadata": {},
   "source": "## Polars\n\n[Polars](https://docs.pola.rs) is a fast DataFrame library built on Arrow.\nGroup a flat multiband DataFrame by object and pass the nested Series to `.many()`."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "11",
   "metadata": {},
   "outputs": [],
   "source": "# %pip install light-curve polars"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "import light_curve as licu\n",
    "import numpy as np\n",
    "import polars as pl\n",
    "\n",
    "BANDS = [\"g\", \"r\"]\n",
    "rng = np.random.default_rng(42)\n",
    "n_obj, n_per_band = 200, 40\n",
    "\n",
    "object_id = np.repeat(np.arange(n_obj), n_per_band * len(BANDS))\n",
    "band_col = np.tile(np.repeat(BANDS, n_per_band), n_obj)\n",
    "t = np.sort(rng.uniform(0, 100, n_obj * n_per_band * len(BANDS)))\n",
    "m = rng.normal(15.0, 0.3, len(object_id))\n",
    "sigma = rng.uniform(0.01, 0.1, len(object_id))\n",
    "\n",
    "df = pl.DataFrame({\"object_id\": object_id, \"band\": band_col, \"t\": t, \"m\": m, \"sigma\": sigma})\n",
    "nested = df.group_by(\"object_id\").agg(pl.struct(\"t\", \"m\", \"sigma\", \"band\").alias(\"lc\"))\n",
    "\n",
    "feature = licu.Extractor(\n",
    "    licu.ExcessVariance(bands=BANDS),  # variability excess over noise per band\n",
    "    licu.StetsonK(bands=BANDS),  # variability index per band\n",
    "    licu.BeyondNStd(nstd=1.5, bands=BANDS),  # outlier fraction per band\n",
    "    licu.ColorOfMedian(BANDS),  # colour at median brightness\n",
    "    licu.ColorSpread(BANDS),  # std dev of per-band means\n",
    ")\n",
    "result = feature.many(\n",
    "    nested[\"lc\"],\n",
    "    arrow_fields={\"t\": \"t\", \"m\": \"m\", \"sigma\": \"sigma\", \"band\": \"band\"},\n",
    ")\n",
    "nested = nested.with_columns(\n",
    "    [pl.Series(name, result[:, i]) for i, name in enumerate(feature.names)]\n",
    ")\n",
    "nested.select([\"object_id\"] + feature.names)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "\n",
    "- [Feature basics tutorial](basics.ipynb) — single features, Extractor, multiband intro\n",
    "- [Multiband tutorial](../multiband/) — per-band and cross-band features\n",
    "- [Periodogram tutorial](../periodogram/) — Lomb–Scargle and period search\n",
    "- [API reference](../api/) — full signatures and equations"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}