{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "32605182",
   "metadata": {},
   "source": [
    "---\n",
    "title: Astrophysics Papers Daily Summaries Notebook with the Jetstream LLM Inference Service\n",
    "date: 2025-05-08\n",
    "categories:\n",
    "    - astrophysics\n",
    "    - llm\n",
    "    - jetstream2\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ede76e1",
   "metadata": {},
   "source": [
    "# Astrophysics Papers: Daily Summaries Notebook\n",
    "\n",
    "Fetch today's `astro-ph` papers from arXiv, summarize abstracts with `llm`, and output a Markdown summary.\n",
    "\n",
    "This is an example of using the [Jetstream Inference Service](https://docs.jetstream-cloud.org/inference-service/overview/), notice that you need first to [configure the `llm` package to access the Jetstream Inference Service via the API](https://docs.jetstream-cloud.org/inference-service/api/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "5d18ec41",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install required packages\n",
    "#!pip install llm requests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "e4b93591",
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime\n",
    "import requests\n",
    "import xml.etree.ElementTree as ET\n",
    "import llm\n",
    "\n",
    "# Get default LLM model (uses configured default, e.g. deepseek) citeturn0search0\n",
    "model = llm.get_model()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "746a83f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define system prompt for concise summaries citeturn0search0\n",
    "system_prompt = (\n",
    "    'You are an expert summarization assistant. '\n",
    "    'Provide a single concise sentence capturing the main result of an astrophysics abstract.'\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "893a7a13",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fetch today's astro-ph submissions from arXiv API\n",
    "\n",
    "# Download and parse today's astro-ph RSS feed from arXiv, take only the latest 10 papers\n",
    "rss_url = \"https://rss.arxiv.org/rss/astro-ph\"\n",
    "res = requests.get(rss_url)\n",
    "root = ET.fromstring(res.content)\n",
    "items = root.find('channel').findall('item')[:3]\n",
    "\n",
    "papers = []\n",
    "for item in items:\n",
    "    title = item.findtext('title', default='')\n",
    "    link = item.findtext('link', default='')\n",
    "    desc = item.findtext('description', default='')\n",
    "    author = item.findtext('{http://purl.org/dc/elements/1.1/}creator', default='')\n",
    "    # Extract abstract from description (after 'Abstract:')\n",
    "    abstract = ''\n",
    "    if 'Abstract:' in desc:\n",
    "        abstract = desc.split('Abstract:', 1)[1].strip()\n",
    "    papers.append({\n",
    "        'title': title,\n",
    "        'link': link,\n",
    "        'abstract': abstract,\n",
    "        'author': author,\n",
    "        'inst': ''\n",
    "    })"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "455bd1ba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "013f30f79f974e51b5bda7bd69ebe617",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Summarizing papers:   0%|          | 0/3 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from tqdm.notebook import tqdm\n",
    "import re\n",
    "\n",
    "today = datetime.date.today().strftime('%Y-%m-%d')\n",
    "# Summarize abstracts and build Markdown content citeturn0search0\n",
    "lines = [f'# Astrophysics Papers for {today}\\n']\n",
    "for p in tqdm(papers, desc=\"Summarizing papers\"):\n",
    "    resp = model.prompt(p['abstract'], system=system_prompt)\n",
    "    # Remove any <think>...</think> blocks from the response text\n",
    "    summary = re.sub(r'<think>.*?</think>', '', resp.text(), flags=re.DOTALL).strip()\n",
    "    lines.append(\n",
    "        f\"## {p['title']}\\n\"\n",
    "        f\"- **Author:** {p['author']}\\n\"\n",
    "        f\"- **Link:** {p['link']}\\n\"\n",
    "        f\"**Summary:** {summary}\\n\"\n",
    "    )\n",
    "md = '\\n'.join(lines)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "e79a68a8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Saved summary to astro_ph_summaries_2025-05-08.md\n"
     ]
    }
   ],
   "source": [
    "# Save Markdown summary\n",
    "fn = f'astro_ph_summaries_{today}.md'\n",
    "with open(fn, 'w') as f:\n",
    "    f.write(md)\n",
    "print(f'Saved summary to {fn}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "6f83253c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# Astrophysics Papers for 2025-05-08\n",
      "\n",
      "## Machine Learning Workflow for Morphological Classification of Galaxies\n",
      "- **Author:** Bernd Doser, Kai L. Polsterer, Andreas Fehlner, Sebastian Trujillo-Gomez\n",
      "- **Institution:** \n",
      "- **Link:** https://arxiv.org/abs/2505.04676\n",
      "**Summary:** The study presents a reproducible, scalable machine-learning workflow leveraging open-source tools and FAIR principles to efficiently analyze exascale astrophysical simulations, enabling collaborative exploration of galaxy morphologies.\n",
      "\n",
      "## A data-driven approach for star formation parameterization using symbolic regression\n",
      "- **Author:** Diane M. Salim, Matthew E. Orr, Blakesley Burkhart, Rachel S. Somerville, Miles Cramner\n",
      "- **Institution:** \n",
      "- **Link:** https://arxiv.org/abs/2505.04681\n",
      "**Summary:** Machine learning-driven symbolic regression applied to FIRE-2 simulations reveals that star formation rate surface density at 100 Myr scales robustly with gas surface density, velocity dispersion, and stellar surface density, converging to physically interpretable Kennicutt-Schmidt-like relations that capture intrinsic scatter.\n",
      "\n",
      "## Reassessing the ZTF Volume-Limited Type Ia Supernova Sample and Its Implications for Continuous, Dust-Dependent Models of Intrinsic Scatter\n",
      "- **Author:** Yukei S. Murakami, Daniel Scolnic\n",
      "- **Institution:** \n",
      "- **Link:** https://arxiv.org/abs/2505.04686\n",
      "**Summary:** The re-analysis of the ZTF DR2 Type Ia supernovae sample using a continuous, color-dependent model (Host2D) supports the dust hypothesis for luminosity diversity, revealing methodological differences—not sample properties—as the source of prior conflicts, with a 4.0σ improvement over previous models.\n"
     ]
    }
   ],
   "source": [
    "!cat $fn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aad15baa",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}