{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "ElM1_WkJMx97" }, "source": [ "\n", "\n", "# Single Lens Fitting & Pipelines\n", "\n", "
\n", "\n", "\"Open" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Learning Goals\n", "\n", "By the end of this session, you will be able to:\n", "\n", "- Configure a reproducible environment (local, Colab, or Nexus) for the single-lens workflow.\n", "- Inspect Roman Data Challenge light curves and understand the inputs required for `MulensModel`.\n", "- Fit single-lens microlensing events **at scale**.\n", "- Structure your code for **repeatability and automation**.\n", "- Apply **priors** that penalize unphysical solutions.\n", "- Detect when a pipeline is breaking—and treat that as actionable feedback instead of failure.\n", "\n", "In addition, you'll gain:\n", "- Hands-on experience with OGLE Early Warning System (EWS) data.\n", "- Practice writing your own event-finding heuristics.\n", "- An optional challenge that fits a full Roman observing season, tying the workflow together." ] }, { "cell_type": "markdown", "metadata": { "id": "v6WqTTenRT-M" }, "source": [ "## Introduction\n", "This session adapts the RGES-PIT minicourse material for the Roman Microlensing Data Challenge 2026, the AAS workshop, and self-directed learning. You will load challenge-provided light curves and run point-source point-lens fits with `MulensModel`. \n", "\n", "### Why Bulk Single-Lens Pipelines?\n", "\n", "The goal is to develop infrastructure that lets you process dozens—eventually thousands—of events automatically while still catching subtle astrophysics.\n", "\n", "### Why Are We Doing This?\n", "\n", "Simulations suggest the Roman Space Telescope will deliver tens of thousands of microlensing events. Manual, hand-tuned fits will not scale. Building lightweight pipelines that can stream data, test models, and flag interesting events is how we keep up with the volume.\n", "\n", "### What's in It for You?\n", "\n", "You will simulate processing a full observing season: downloading OGLE/Roman data, defining objective functions, writing event finders, applying priors, parallelizing computation, and adapting to subtle data challenges in real time. This is as much about workflow discipline as it is about models.\n", "\n", "### Microlensing Parameter Refresher\n", "A point-source point-lens (PSPL) event is described by a compact set of geometric and photometric parameters:\n", "\n", "- **$t_0$** – time of closest approach between the source and lens lines of sight.\n", "- **$u_0$** – impact parameter in units of the angular Einstein radius ($\\theta_E$). Values $|u_0| \\ll 1$ yield high magnifications.\n", "- **$t_E$** – Einstein timescale, i.e., the time required for the lens-source separation to change by $\\theta_E$; it sets the event duration.\n", "- **$F_s$** – unmagnified source flux in the observed bandpass.\n", "- **$F_b$** – blended flux from unrelated stars or the lens itself.\n", "\n", "Optional higher-order terms show up often in Roman analyses:\n", "- **$\\rho$** – source radius normalized by $\\theta_E$, needed when finite-source effects smear the peak.\n", "- **$\\pi_E$** – microlens parallax vector (north/east components) that captures the projected lens-source parallax.\n", "\n", "The instantaneous magnification for a PSPL event is\n", "$$\n", "A(u) = \\frac{u^2 + 2}{u\\sqrt{u^2 + 4}}, \\qquad u^2 = u_0^2 + \\left(\\frac{t - t_0}{t_E}\\right)^2.\n", "$$\n", "Packages like `MulensModel` will evaluate $A(u)$ and its finite-source/parallax extensions for us; the rest of this notebook shows how to fit $t_0$, $u_0$, $t_E$, $F_s$, $F_b$, and (optionally) $\\rho$ or $\\pi_E$ for bulk datasets.\n", "\n", "For a more robust introduction to the theory of microlensing, refer to the learning resources listed on the [RGES-PIT website](https://rges-pit.org/resources/#:~:text=Background%20material,Jupyter%20Notebook%20Course%20%C2%A0)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Notebook Contents\n", "\n", "The workflow for this notebook consists of:\n", "* [Environment Setup](#Environment-Setup)\n", " - [Imports](#Imports)\n", "* [Single-Lens Fitting with MulensModel](#single-lens-fitting-with-mulensmodel)\n", "* [OGLE EWS Bulk Light Curve Fit](#ogle-ews-bulk-light-curve-fit)\n", " - [Getting the Ground-Based Data](#getting-the-ground-based-data)\n", " - [Fitting a Basic PSPL Model](#fitting-a-basic-pspl-model)\n", " - [Priors](#priors)\n", " - [Parallel Processing](#parallel-processing)\n", " - [Custom Event Finder](#custom-event-finder)\n", " - [Advanced Modeling Techniques and Higher-Order Models](#advanced-modeling-techniques-and-higher-order-models)\n", "* [Full Season Roman Fit](#full-season-roman-fit)\n", " - [Nexus Data Access](#nexus-data-access)\n", " - [Alternate Data Access](#alternative-data-access)\n", " - [Load and Sort the Data](#load-and-sort-the-data)\n", " - [Adjusting the Model for an L2 Orbit](#adjusting-the-model-for-an-l2-orbit)\n", " - [Run the Fits](#run-the-fits)\n", " - [Evaluate Your Results](#evaluate-your-results)\n", "* [Exercises and Next Steps](#Exercises)\n", "* [Additional Resources](#additional-resources)\n", "* [About this Notebook](#about-this-notebook)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Environment Setup\n", "\n", "Choose whichever platform best matches your workflow:\n", "\n", "- **Colab** provides a zero-install option for quick experimentation.\n", "- **Roman Research Nexus** users should activate the `rges-pit-dc` kernel; all required packages are pre-installed there.\n", "- **Local** environments can be provisioned with the accompanying `env.yml` file for parity with the Nexus image.\n", "\n", "If you are new to Jupyter notebooks, the [Getting Started guide](https://medium.com/codingthesmartway-com-blog/getting-started-with-jupyter-notebook-for-python-4e7082bd5d46) offers a quick primer before you dive into this workflow.\n" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "nexus-only" ] }, "source": [ "\n", "> ### Running on the Roman Research Nexus\n", "> If you are following this notebook on the Roman Research Nexus (RRN) these packages are preinstalled on the RGES-PIT's kernel:\n", "> 1. In notebooks, open **Kernel > Change Kernel** menu and select `RGES PIT Nexus` or select the kernel via the Kernel menu or clicking on the \\\"kernel status display\\\" to the top right with the activity circle.\n", "> 1. In a terminal, if needed, run `mamba activate rges-pit-dc` to select the correct mamba environment.\"\n", "> 1. Store any files you generate in your home directory or team space; the `/roman/nexus-notebooks/notebooks` directory is read-only." ] }, { "cell_type": "markdown", "metadata": { "tags": [ "colab-only" ] }, "source": [ "\n", "> ### Running on Google Colab\n", "> 1. Open this notebook via the **Open in Colab** button in the workshop README or by pasting the GitHub URL into [https://colab.research.google.com](https://colab.research.google.com).\n", "> 1. Ensure you are signed into a Google account so Colab can save to your Drive.\n", "> 1. Use the `Runtime > Change runtime type` menu to select a GPU **off**, since this workflow is CPU bound, or select a local runtime.\n", "> 1. Remember to download your results or save a copy to Drive; Colab sessions are ephemeral.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "colab-only" ] }, "outputs": [], "source": [ "# COLAB-ONLY\n", "%pip install --quiet pathos MulensModel" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Please **save this notebook in a space you control** (GitHub fork, local repo, Nexus team storage, or Google Drive) so you can return to it later. Hosted environments such as Colab or the Nexus will not auto-save changes to the canonical repository." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Imports\n", "\n", "For this notebook, we rely on a standard scientific Python stack plus `MulensModel`, `pathos`, and a few utilities for working with OGLE/Roman data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title Imports and Setup\n", "\n", "# system tools\n", "import os\n", "from huggingface_hub import hf_hub_download\n", "import sys\n", "from io import StringIO\n", "import time\n", "from typing import Tuple, Callable, Optional, List\n", "import shutil\n", "from pathlib import Path\n", "\n", "# data analysis tools\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from IPython import get_ipython\n", "from IPython.display import display\n", "from scipy.optimize import minimize\n", "import astropy.units as u\n", "from astropy.coordinates import Angle, SkyCoord\n", "try:\n", " from google.colab import sheets # will only work if you are running on Colab\n", "except:\n", " pass\n", "\n", "# web scraping / data access tools\n", "import bs4 as bs\n", "import urllib\n", "import urllib.request\n", "import pandas as pd\n", "\n", "# parallel processing tools\n", "from pathos.multiprocessing import ProcessingPool as Pool # for multiprocessing inside Jupyter\n", "import multiprocessing as mp\n", "\n", "# microlensing tool\n", "import MulensModel\n", "\n", "import warnings\n", "\n", "# Suppress *just that specific warning* MulensModelbecause we know what its doing and\n", "# we want to keep the output clean\n", "warnings.filterwarnings(\"ignore\", message=\".*does not have a limb-darkening coefficient.*\")" ] }, { "cell_type": "markdown", "metadata": { "id": "V-r0Prgp_SJq" }, "source": [ "## Single-Lens Fitting with `MulensModel`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OII-gvK2yx1U" }, "outputs": [], "source": [ "#@title Available finite source methods\n", "\n", "finite_source_methods = [\n", " # Uniform source\n", " 'finite_source_uniform_Gould94', # 0, 10E-3 < rho < 1 (has a bug)\n", " 'finite_source_uniform_Gould94_direct', # 1, 10E-3 < rho < 1\n", " 'finite_source_uniform_WittMao94', # 2, rho < 0.01\n", " 'finite_source_uniform_Lee09', # 3, rho > 0.01\n", "\n", " # Limb-darkened source\n", " 'finite_source_LD_WittMao94', # 4, rho < 0.01\n", " 'finite_source_LD_Yoo04', # 5, 10E-3 < rho < 1\n", " 'finite_source_LD_Yoo04_direct', # 6, 10E-3 < rho < 1\n", " 'finite_source_LD_Lee09' # 7, rho > 0.01\n", "]" ] }, { "cell_type": "markdown", "metadata": { "id": "n_PL4GWmeJE4" }, "source": [ "Let's take a look at how different higher-order effects change the magnification model. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "colab": { "base_uri": "https://localhost:8080/", "height": 449 }, "id": "K6RPqiiZwGzt", "outputId": "a6f53116-e35d-4ccd-d086-e8b5fe68fc33" }, "outputs": [], "source": [ "#@title Plotting the magnification models\n", "\n", "# plot bounds\n", "t_min = 2452750\n", "t_max = 2452950\n", "t_range = [t_min, t_max]\n", "\n", "# Model parameters\n", "t_0 = 2452848.06\n", "u_0 = 0.133\n", "t_E = 61.5\n", "log_rho = -1.4 #@param {type:\"slider\", min:-3, max:0, step:0.1}\n", "rho = 10**log_rho\n", "pi_E_E = -0.5 #@param {\"type\":\"slider\",\"min\":-5,\"max\":5,\"step\":0.1}\n", "pi_E_N = 1.7 #@param {type:\"slider\", min:-5, max:5, step:0.1}\n", "t_0_par = 2452848.06 # should not change during modelling and needs to be close to t_0\n", "\n", "# Define a point source, point lens model\n", "pspl = MulensModel.Model({'t_0': t_0, 'u_0': u_0, 't_E': t_E})\n", "\n", "# Define a finite source, point lens model\n", "fspl = MulensModel.Model({'t_0': t_0, 'u_0': u_0, 't_E': t_E, 'rho': rho})\n", "\n", "# Define a parallax model\n", "fspl_pllx = MulensModel.Model({'t_0': t_0,\n", " 'u_0': u_0,\n", " 't_E': t_E,\n", " 'rho': rho,\n", " 'pi_E_E': pi_E_E,\n", " 'pi_E_N': pi_E_N,\n", " 't_0_par': t_0_par # fixed value for parallax calculations: ~t_0\n", " },\n", " ra='18:04:45.71',\n", " dec='-26:59:15.2'\n", " )\n", "\n", "# Plot the magnification curve:\n", "plt.close(0)\n", "plt.figure(0)\n", "pspl.plot_magnification(\n", " t_range=t_range,\n", " subtract_2450000=True,\n", " color='grey',\n", " linestyle='-',\n", " label='PSPL'\n", " )\n", "\n", "# calculate the magnification curve using a finite source model\n", "fspl.set_magnification_methods([2450000., finite_source_methods[1], 2470000.]) # rho = 0.1\n", "fspl.plot_magnification(\n", " t_range=t_range,\n", " subtract_2450000=True,\n", " color='blue',\n", " linestyle='--',\n", " label='FSPL with uniform-brightness'\n", " )\n", "\n", "# calculate the magnification curve using a finite source model with limb darkening\n", "fspl.set_magnification_methods([2450000., finite_source_methods[5], 2470000.]) # rho = 0.1\n", "fspl.plot_magnification(\n", " t_range=t_range,\n", " subtract_2450000=True,\n", " color='red',\n", " linestyle=':',\n", " linewidth=2,\n", " label='FSPL with Limb Darkening'\n", " )\n", "\n", "# calculate the magnification curve using a finite source model and parallax\n", "fspl_pllx.set_magnification_methods([2450000., finite_source_methods[1], 2470000.]) # rho = 0.1\n", "fspl_pllx.plot_magnification(\n", " t_range=t_range,\n", " subtract_2450000=True,\n", " color='green',\n", " linestyle=':',\n", " linewidth=2,\n", " label='FSPL with parallax'\n", " )\n", "\n", "\n", "# calculate the u_0 finite-source, parallax solution\n", "fspl_pllx.set_magnification_methods([2450000., finite_source_methods[3], 2470000.])\n", "parameters = [\"t_0\", \"u_0\", \"t_E\", \"rho\", \"pi_E_E\", \"pi_E_N\", \"t_0_par\"]\n", "setattr(fspl_pllx.parameters, \"u_0\", -u_0) # multiply u0 by -1\n", "fspl_pllx.plot_magnification(\n", " t_range=t_range,\n", " subtract_2450000=True,\n", " color='purple',\n", " linestyle='--',\n", " linewidth=2,\n", " label=r'-$u_0$ FSPL with parallax'\n", " )\n", "\n", "plt.legend(loc='upper left', bbox_to_anchor=(1.04, 1), borderaxespad=0)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If your version of `MulensModel` is working, this figure should have rendered without an error. If it is not working, ensure you have `mulensmodel>=3.4.0` installed." ] }, { "cell_type": "markdown", "metadata": { "id": "KnDGc3ejACDP" }, "source": [ "> There are a few things to take away from this plot:\n", "> * the finite source effect has a big effect on the shape of the magnification curve\n", "> * the surface brightness model (e.g., uniform) for the source has much less of an effect\n", "> * the degenerate parallax solutions may be noticably different with sufficiently large parallax\n", "> * parallax does not need to be as big, for the affect to noticably change the magnification curve, compared with a static model." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "
\n", "

Exercise 1

\n", "

Try playing with the parallax (\"pi_E_N\", \"pi_E_E\") and finite source (\"rho\")parameters and see how they change your magnification model.

\n", "

Note. This is not an interactive plot. You have to run the cell again after changing the values (or moving the slider, on Colab).

\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "SSTBwBCFKqhQ" }, "source": [ "\n", "## OGLE EWS Bulk Light Curve Fit\n", "\n", "Roman-style timelines include thousands of epochs, making model evaluations computationally heavy. Ground-based surveys like OGLE’s Early Warning System (EWS) provide faster alert metadata and estimated PSPL parameters. We will prototype our bulk fitter on these tractable datasets, then adapt it to simulated Roman seasons.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "48FwbafF0ddy" }, "source": [ "### Getting the Ground-Based Data\n", "\n", "\n", "Let's start this process by scraping for some lightcurves and microlensing model parameter estimates." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "colab": { "base_uri": "https://localhost:8080/" }, "id": "_LnKQvQEF4km", "outputId": "c397091f-64dc-4f0c-f2ff-4fb4de2935bb" }, "outputs": [], "source": [ "#@title Web scrapping functions\n", "\n", "def get_data_url(event: str) -> str:\n", " '''Takes an event name and returns the URL for the data page.'''\n", "\n", " event = event.split('-') # split the event name into its components, seperated by '-'\n", " year = event[0] # the first component is the year\n", " region = event[1].lower() # the second component is region (e.g., blg or gd), which we need to make lower case.\n", " number = event[2] #\n", " url = f'https://www.astrouw.edu.pl/ogle/ogle4/ews/{year}/{region}-{number}/phot.dat'\n", "\n", " return url\n", "\n", "def fetch_event_data(url: str) -> pd.DataFrame:\n", " '''Takes a url and returns the data as a pandas dataframe.'''\n", "\n", " # Read the data from the URL\n", " response = urllib.request.urlopen(url)\n", " data = response.read().decode('utf-8')\n", "\n", " # Convert the data to a pandas DataFrame\n", " #df = pd.read_csv(StringIO(data), delim_whitespace=True, header=None, names=['HJD', 'I magnitude', 'magnitude error', 'seeing', 'sky level'])\n", " df = pd.read_csv(StringIO(data), sep=r'\\s+', header=None, names=['HJD', 'I magnitude', 'magnitude error', 'seeing', 'sky level'])\n", "\n", " return df\n", "\n", "# Test\n", "event = '2017-BLG-0001'\n", "event_data_url = get_data_url(event)\n", "data = fetch_event_data(event_data_url)\n", "print(data)" ] }, { "cell_type": "markdown", "metadata": { "id": "XUMaAZzyJYok" }, "source": [ "Great. Now that we can fetch individual light curves from the OGLE EWS website, the next ingredient is metadata. The alert table provides first-pass PSPL estimates that keep our fits stable and fast:\n", "\n", "- `Tmax (HJD)` → proxy for $t_0$ (time of peak magnification)\n", "- `Umin` → proxy for $u_0$\n", "- `tau` → proxy for $t_E$\n", "- `Event` → the human-readable identifier we use to reconstruct download URLs\n", "\n", "We will shamelessly borrow these values as starting points so the optimizer begins near the global likelihood maximum instead of wasting time on divergent guesses.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "collapsed": true, "id": "n_oCJwoCGHuq", "outputId": "c92b1bb0-4002-49ae-c4f7-78148e069bb2" }, "outputs": [], "source": [ "#@title More web scraping (this time for the EWS table)\n", "\n", "def fetch_table_data(url):\n", " '''Takes a URL and returns the first table as a pandas DataFrame.'''\n", " source = urllib.request.urlopen(url).read()\n", " soup = bs.BeautifulSoup(source, 'lxml')\n", " table = soup.find_all('table')\n", " df = pd.read_html(StringIO(str(table)))[0]\n", "\n", " return df\n", "\n", "ews_url = \"https://ogle.astrouw.edu.pl/ogle4/ews/ews.html\" # https://ogle.astrouw.edu.pl/ogle4/ews/2025/ews.html for last year\n", "ews_df = fetch_table_data(ews_url)\n", "print(ews_df)" ] }, { "cell_type": "markdown", "metadata": { "id": "PTNBMc8oK5Kh" }, "source": [ "Next we attach a direct light-curve URL to every alert row so downstream code can iterate without re-deriving filenames. Think of this as building a lightweight catalog that couples OGLE metadata with downloadable photometry." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "collapsed": true, "id": "uw0PfdBvGLU4", "outputId": "e5955df0-ec27-4bf9-e9c3-af02b53dd997" }, "outputs": [], "source": [ "#@title Adding a URL column to the data frame\n", "\n", "# Add a new column to the EWS data frame ('ews_df'), using the column name 'event data url'.\n", "ews_df['event data url'] = ews_df['Event'].apply(get_data_url)\n", "print(ews_df)\n", "print(min(ews_df['Tmax (HJD)']), max(ews_df['Tmax (HJD)']))\n", "print(min(ews_df['Umin']), max(ews_df['Umin']))\n", "print(min(ews_df['tau']), max(ews_df['tau']))" ] }, { "cell_type": "markdown", "metadata": { "id": "DfpfmFgoLF_p" }, "source": [ "Nice! Now we have a miniature alert catalog—complete with URLs and first-pass PSPL estimates. Before we scale up, let’s sanity-check a single event by plotting the OGLE photometry alongside the EWS model parameters.\n", "\n", "\n", "\n", "
\n", "

Exercise 2

\n", "

Extend the cell below by adding parallax parameters (pi_E_E, pi_E_N) and plotting the resulting model so you can see how higher-order terms perturb the fit.

\n", "
\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "colab": { "base_uri": "https://localhost:8080/", "height": 561 }, "id": "qcEnAWfQLbSe", "outputId": "daf6df51-68ef-4a13-fae2-2cdfeaed74f7" }, "outputs": [], "source": [ "#@title Plotting the '2025-BLG-0001' event with the EWS model\n", "\n", "###### [ ] add parallax to the model.\n", "\n", "# Function to process a single event\n", "def plot_event_data(i, ews_df):\n", " event = ews_df['Event'][i]\n", " print(event)\n", " print(ews_df.columns)\n", " url = ews_df['event data url'][i]\n", " data = fetch_event_data(url)\n", " t_0_0 = ews_df['Tmax (HJD)'][i]\n", " u_0_0 = ews_df['Umin'][i] * 1.1 # initial guess\n", " t_E_0 = ews_df['tau'][i] * 1.1 # initial guess\n", " rho_0 = 0.001 # initial guess\n", " ######\n", " # pi_E_E =\n", " # pi_E_N =\n", " # t_0_par =\n", " ######\n", "\n", "\n", " plt.close(i+1)\n", " plt.figure(i+1)\n", "\n", " plt.errorbar(data['HJD'],\n", " data['I magnitude'],\n", " yerr=data['magnitude error'],\n", " fmt='x',\n", " color='black'\n", " )\n", " plt.axvline(ews_df['Tmax (HJD)'][i], color='blue', linestyle='--', label='EWS Tmax')\n", "\n", " plt.title(event)\n", " plt.xlabel('HJD')\n", " plt.ylabel('I magnitude')\n", "\n", " # Data as a list of numpy arrays\n", " data_list = [data['HJD'].to_numpy(), data['I magnitude'].to_numpy(), data['magnitude error'].to_numpy()]\n", "\n", " # Pack everything into MulensModel objects\n", " data_object = MulensModel.MulensData(data_list=data_list,\n", " plot_properties={'color': 'thistle',\n", " 'label': 'OGLE',\n", " 'marker': 'x',\n", " 'markersize': 2\n", " },\n", " phot_fmt='mag',\n", " bandpass='I'\n", " )\n", " ######\n", " fspl_model = MulensModel.Model({'t_0': t_0_0,\n", " 'u_0': u_0_0,\n", " 't_E': t_E_0,\n", " 'rho': rho_0}\n", " ) # add parallax parameters\n", " # (initialize with (0.0, 0.0))\n", " ######\n", " fspl_model.set_magnification_methods([t_0_0 - 3.0 * t_E_0,\n", " finite_source_methods[0],\n", " t_0_0 + 3.0 * t_E_0\n", " ],\n", " source=None\n", " ) # rho <= 0.1\n", " event_object = MulensModel.Event(datasets=data_object, model=fspl_model)\n", "\n", " ######\n", " parameters_to_fit = [\"t_0\", \"u_0\", \"t_E\", \"rho\"] # add parallax parameters\n", "\n", " ######\n", "\n", " # Plot the initial model\n", " ######\n", " label = 'EWS model: %1.3f, %1.3f, %1.3f, %1.3f' %(t_0_0,\n", " u_0_0,\n", " t_E_0,\n", " rho_0\n", " ) # Add parallax to the\n", " # plot label\n", " ######\n", " event_object.plot_model(color='r',\n", " linestyle=':',\n", " t_range=[min(data['HJD']),\n", " max(data['HJD'])\n", " ],\n", " label=label\n", " )\n", "\n", " plt.legend()\n", " plt.savefig(f'./{event}.png', bbox_inches='tight')\n", " plt.show()\n", "\n", "plot_event_data(0, ews_df)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "wXXGwp4YLqGA" }, "source": [ "Is it working?\n", "\n", "If so, we can move on and test our fitting algorithms." ] }, { "cell_type": "markdown", "metadata": { "id": "vRmBCne305Qu" }, "source": [ "### Fitting a Basic PSPL Model\n", "\n", "\n", "First, we’ll need an **objective function** — a way to measure how well our model fits the data (or, more precisely, how likely the model is to have generated the data, assuming Gaussian noise)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-CZpDATdEIVl" }, "outputs": [], "source": [ "#@title Objective function\n", "def mulens_neglogP_function(theta, parameters_to_fit, event, verbose=False):\n", " ''' negative log prob function for MulensModel fitting '''\n", "\n", " # Create a dictionary from theta values for easier access\n", " params = dict(zip(parameters_to_fit, theta))\n", "\n", " # unpack params\n", " t_0_value = params['t_0']\n", " u_0_value = params['u_0']\n", " t_E_value = params['t_E']\n", " # rho is handled later\n", "\n", " # Prior Checks\n", " if not ((2460600) <= t_0_value <= (2461000)): # this needs to change if you\n", " # use this code to fit a different\n", " # season\n", " # For example, for exercies X\n", " # !!!\n", " return np.inf\n", " if not (0.000001 <= u_0_value <= 2.5): # u_0 can't be 0\n", " return np.inf\n", " if not (0.1 <= t_E_value <= 700):\n", " return np.inf\n", "\n", " # Handle rho or log_rho prior\n", " if 'rho' in params:\n", " rho_value = params['rho']\n", " if not (0 <= rho_value <= 0.2): # Assuming rho >= 0 is desired\n", " return np.inf\n", " elif 'log_rho' in params:\n", " log_rho_value = params['log_rho']\n", " # Check log_rho lower bound first\n", " if log_rho_value < -10: # Check the log value directly\n", " return np.inf\n", " rho_value = 10**log_rho_value # Convert to rho for the upper bound check\n", " if rho_value > 0.2: # Check rho upper bound\n", " return np.inf\n", "\n", " # Update Model Parameters\n", " # If all prior checks passed, NOW update the model\n", " setattr(event.model.parameters, 't_0', t_0_value)\n", " setattr(event.model.parameters, 'u_0', u_0_value)\n", " setattr(event.model.parameters, 't_E', t_E_value)\n", " setattr(event.model.parameters, 'rho', rho_value)\n", " # Add other parameters if needed\n", "\n", " # Example flux priors\n", " penalty = 0.0\n", " '''\n", " dataset = event.datasets[0]\n", " event.fit_fluxes() # This needs the model parameters to be set correctly\n", " ([FS], FB) = event.get_flux_for_dataset(dataset)\n", "\n", " if verbose:\n", " print(f'FS: {FS}, FB: {FB}')\n", "\n", " # Flux priors (check AFTER fitting fluxes)\n", " if FB <= 0:\n", " penalty = ((FB / 100)**2) # why 100? I told you, vibes.\n", " if FS <= 0 or (FS + FB) <= 0:\n", " return np.inf # Return inf if fluxes are non-physical '''\n", "\n", " # Calculate Chi2\n", " chi2 = event.get_chi2()\n", " if verbose:\n", " print('chi2 = ', chi2)\n", "\n", " # Return the objective function value (negative log likelihood ~ chi2/2)\n", " # Scipy minimize finds the minimum, so we return chi2 (or chi2/2)\n", " return chi2/2.0 + penalty # Technically negLogP is chi2/2 + constant" ] }, { "cell_type": "markdown", "metadata": { "id": "XruXVcLsAuAe" }, "source": [ "\n", "\n", "
\n", "

Exercise 3

\n", "

Test out a fit of a subsample of the EWS data, by running the following 2 cells.

\n", "
\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Ouq9PSahGXRZ" }, "outputs": [], "source": [ "#@title Function to process a single event\n", "def process_event(i, ews_df, n, start_time, verbose=True, log_rho_prior=True):\n", "\n", " # Event stuff\n", " event = ews_df['Event'][i]\n", " event_start_time = time.time()\n", " url = ews_df['event data url'][i]\n", " data = fetch_event_data(url)\n", "\n", " # Model stuff\n", " t_0_0 = ews_df['Tmax (HJD)'][i]\n", " # We are going to change the next guesses from those provided by the EWS, to\n", " # give our optiizer some room to move\n", " u_0_0 = ews_df['Umin'][i] * 1.1 # initial guess\n", " t_E_0 = ews_df['tau'][i] * 1.1 # initial guess\n", " rho_0 = 0.001 # initial guess\n", "\n", " # Crop the data so that we don't have to fit 10 years worth\n", " t_window = (t_0_0 - 10.0 * t_E_0, t_0_0 + 10.0 * t_E_0)\n", " t_window = (max(min(data['HJD']), t_window[0]), min(max(data['HJD']), t_window[1]))\n", " # This creates a boolean mask: True for rows within the window, False otherwise\n", " time_mask = (data['HJD'] >= t_window[0]) & (data['HJD'] <= t_window[1])\n", " # Apply the mask to filter the DataFrame\n", " data = data[time_mask]\n", "\n", " if t_E_0 > 50: # liekly remnant (they do derpy stuff in the wings)\n", " mag_method = finite_source_methods[0] # Note: `finite_source_uniform_Gould94`\n", " # is not ideal for events where\n", " # rho ≲ 1e-3, but we use it here\n", " # to ensure fast and consistent\n", " # modeling while testing functions.\n", " # For high-mass lenses or precision\n", " # modeling, consider switching to\n", " # `WittMao94` variants (method 2\n", " # or 4).\n", " else: # likely MS star\n", " mag_method = finite_source_methods[0]\n", "\n", " if verbose:\n", " print('\\n', event)\n", " print('-----------------')\n", " print(f't_0_0: {t_0_0:1.3f}')\n", " print(f'u_0_0: {u_0_0:1.3f}')\n", " print(f't_E_0: {t_E_0:1.3f}')\n", " print(f'rho_0: {rho_0:1.5f}')\n", " print(f'Method = {mag_method}')\n", " print('-----------------')\n", " print(f'Time started: {time.strftime(\"%Y-%m-%d %H:%M:%S\", time.localtime())}')\n", "\n", " if i % n == 0:\n", " plt.close(i)\n", " plt.figure(i)\n", "\n", " plt.errorbar(data['HJD'],\n", " data['I magnitude'],\n", " yerr=data['magnitude error'],\n", " fmt='x',\n", " color='black'\n", " )\n", " plt.axvline(ews_df['Tmax (HJD)'][i], color='blue', linestyle='--', label=r'EWS $T_{max}$')\n", "\n", " plt.title(event)\n", "\n", " # Data as a list of numpy arrays\n", " data_list = [data['HJD'].to_numpy(), data['I magnitude'].to_numpy(), data['magnitude error'].to_numpy()]\n", "\n", " # Pack everything into MulensModel objects\n", " data_object = MulensModel.MulensData(data_list=data_list,\n", " plot_properties={'color': 'thistle',\n", " 'label': 'OGLE',\n", " 'marker': 'x',\n", " 'markersize': 2\n", " },\n", " phot_fmt='mag',\n", " bandpass='I'\n", " )\n", " fspl_model = MulensModel.Model({'t_0': t_0_0, 'u_0': u_0_0, 't_E': t_E_0, 'rho': rho_0})\n", "\n", " fspl_model.set_magnification_methods([t_window[0],\n", " mag_method,\n", " t_window[1]\n", " ],\n", " source=None\n", " ) # rho <= 0.1\n", " event_object = MulensModel.Event(datasets=data_object, model=fspl_model)\n", "\n", " # Plot the initial model\n", " if i % n == 0:\n", " event_object.plot_model(color='r',\n", " linestyle=':',\n", " t_range=[min(data['HJD']),\n", " max(data['HJD'])\n", " ],\n", " label='Initial Guess: %1.3f, %1.3f, %1.3f, %1.5f' %(t_0_0, u_0_0, t_E_0, rho_0)\n", " )\n", "\n", " # get initial chi2\n", " initial_chi2 = event_object.get_chi2()\n", "\n", " # index order\n", " parameters_to_fit = [\"t_0\", \"u_0\", \"t_E\", \"rho\"]\n", " if log_rho_prior:\n", " parameters_to_fit = [\"t_0\", \"u_0\", \"t_E\", \"log_rho\"]\n", "\n", " # Fit using scipy Nelder-Mead\n", " if log_rho_prior:\n", " result = minimize(mulens_neglogP_function,\n", " [t_0_0, u_0_0, t_E_0, np.log10(rho_0)],\n", " args=(parameters_to_fit,\n", " event_object),\n", " method='Nelder-Mead'\n", " ) # log rho prior\n", " else:\n", " result = minimize(mulens_neglogP_function,\n", " [t_0_0, u_0_0, t_E_0, rho_0],\n", " args=(parameters_to_fit,\n", " event_object),\n", " method='Nelder-Mead'\n", " )\n", "\n", " # Make sure the values in event_object are the best fit values\n", " neglogP_final = mulens_neglogP_function(result.x, parameters_to_fit, event_object, verbose=False)\n", "\n", " if verbose:\n", " print('-----------------')\n", " print(f'Time elapsed: {time.time() - event_start_time} seconds')\n", " print('-----------------')\n", " neglogP_final = mulens_neglogP_function(result.x, parameters_to_fit, event_object, verbose=True)\n", " print('-----------------')\n", " print(f't_0: {event_object.model.parameters.t_0:1.3f} ({result.x[0]:1.3f})')\n", " print(f'u_0: {event_object.model.parameters.u_0:1.3f} ({result.x[1]:1.3f})')\n", " print(f't_E: {event_object.model.parameters.t_E:1.3f} ({result.x[2]:1.3f})')\n", " print(f'rho: {event_object.model.parameters.rho:1.5f} ({result.x[3]:1.5f})') # Always rho here\n", " print('-----------------')\n", " print(f'Final -logP check: {neglogP_final}')\n", " print('-----------------')\n", " print(f'Initial chi2 (from initial guess): {initial_chi2}')\n", " print(f'Final chi2 (from event object): {event_object.get_chi2()}')\n", " print(f'Delta chi2: {event_object.get_chi2() - initial_chi2}')\n", " print('Delta chi2/dof: ',(event_object.get_chi2() - initial_chi2) / (len(data['HJD']) - 4))\n", " print('-----------------')\n", "\n", "\n", " # Plot the fit model and show (event_object now has correct params)\n", " if i % n == 0:\n", " # Construct the label string from the corrected event_object model state\n", " label = '''Best Fit: {0.t_0:1.3f}, {0.u_0:1.3f}, {0.t_E:1.3f},\n", " {0.rho:1.5f}'''.format(event_object.model.parameters)\n", "\n", " event_object.plot_model(color='r',\n", " linestyle='-',\n", " t_range=[min(data['HJD']), max(data['HJD'])],\n", " label=label\n", " )\n", " # Use parameters from the event object model for plot limits etc.\n", " t_0 = event_object.model.parameters.t_0\n", " t_E = event_object.model.parameters.t_E\n", " # Define plot window based on final params or keep original? Your choice.\n", " # Using original t_window:\n", " plt.xlim(max(min(data['HJD']), t_window[0]),\n", " min(max(data['HJD']), t_window[1])\n", " )\n", " plt.legend(loc='upper left')\n", " plt.xlabel('HJD')\n", " plt.ylabel('I magnitude')\n", " plt.savefig(f'./{event}.png', bbox_inches='tight')\n", " plt.show()\n", "\n", " # Return the actual best-fit parameters found by minimize\n", " # You might want to return the converted values for consistency\n", " final_params_values = list(result.x)\n", " if log_rho_prior:\n", " log_rho_idx = parameters_to_fit.index('log_rho')\n", " final_params_values[log_rho_idx] = 10**final_params_values[log_rho_idx] # Convert log_rho back to rho\n", "\n", " event_object.fit_fluxes() # This needs the model parameters to be set correctly\n", " ([FS], FB) = event_object.get_flux_for_dataset(event_object.datasets[0])\n", " if verbose:\n", " print(f'FS: {FS}, FB: {FB}')\n", " print('-----------------')\n", " final_params_values = [FS, FB] + final_params_values\n", "\n", " return i, final_params_values # Return values, maybe with rho always as rho" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "collapsed": true, "id": "NrGby0EuAQYa", "outputId": "4a5006ca-ca20-4690-8efc-6e3b72e4a5ae" }, "outputs": [], "source": [ "#@title Test fitting\n", "\n", "# numpy array for the fit params\n", "fit_params = np.zeros((ews_df.shape[0], 8))\n", "\n", "plot_fraction = 0.1 # fraction of events to show plots for\n", "N = 10 # number of events to fit\n", "\n", "start_time = time.time()\n", "for i in range(N):\n", " i, result = process_event(i,\n", " ews_df,\n", " int(1/plot_fraction),\n", " start_time,\n", " verbose=True,\n", " log_rho_prior=True\n", " )\n", " n = len(result)\n", " fit_params[i, :n] = result\n", "\n", "# Geez, how long is this going to take?\n", "time_at_N = time.time()\n", "time_for_N = time_at_N - start_time\n", "print(f'Time taken to fit the first {N}: {time_for_N} seconds')\n", "\n", "# Estimate completion time\n", "completion_time = time_at_N + (ews_df.shape[0] - N) / N * time_for_N\n", "print(time_at_N, (ews_df.shape[0] - N) / N, time_for_N)\n", "\n", "# Print the completion time in human readable format hr:min:sec\n", "print(f'Estimated season completion time: {time.strftime(\"%H:%M:%S\", time.localtime(completion_time))}')\n", "end_time = time.time()" ] }, { "cell_type": "markdown", "metadata": { "id": "fC1XCvoKGnK2" }, "source": [ "After fitting a subsample, inspect the posterior summaries. Do the recovered $t_0$, $u_0$, and $t_E$ distributions cluster around the OGLE seeds? Large skews or multimodal shapes often signal that priors or initial guesses need another pass." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "7R0OStnUGlJJ", "outputId": "97692dc9-7c3e-429d-aecc-3c10cc6f7dac" }, "outputs": [], "source": [ "#@title Exercise 3 fit parameter distributions\n", "\n", "def plot_histograms(fit_params, exercise):\n", " ''' '''\n", "\n", " plt.close(100 + exercise)\n", " plt.figure(100 + exercise)\n", " fig, axes = plt.subplots(8, 1, figsize=(5, 25))\n", "\n", " # FS values\n", " min_FS = min(fit_params[:, 0])\n", " max_FS = max(fit_params[:, 0])\n", " print(min_FS, max_FS)\n", " axes[0].hist(fit_params[:, 0], bins=20, color='#361d49')\n", " axes[0].set_xlabel(r'$F_\\text{S}$')\n", " axes[0].set_ylabel('Frequency')\n", " #axes[0].set_yscale('log')\n", "\n", " # FB values\n", " min_FB = min(fit_params[:, 1])\n", " max_FB = max(fit_params[:, 1])\n", " print(min_FB, max_FB)\n", " axes[1].hist(fit_params[:, 1], bins=20, color='#361d49')\n", " axes[1].set_xlabel(r'$F_\\text{B}$')\n", " axes[1].set_ylabel('Frequency')\n", " #axes[1].set_yscale('log')\n", "\n", " # t_0 values\n", " min_t0 = min(fit_params[:, 2])\n", " max_t0 = max(fit_params[:, 2])\n", " print(min_t0, max_t0)\n", " axes[2].hist(fit_params[:, 2], bins=20, color='#361d49')\n", " axes[2].set_xlabel(r'$t_0$')\n", " axes[2].set_ylabel('Frequency')\n", " #log the y axis\n", " #axes[2].set_yscale('log')\n", "\n", " # u0 values\n", " min_u0 = min(fit_params[:, 3])\n", " max_u0 = max(fit_params[:, 3])\n", " print(min_u0, max_u0)\n", " axes[3].hist(fit_params[:, 3], bins=20, color='#361d49')\n", " axes[3].set_xlabel(r'$u_0$')\n", " axes[3].set_ylabel('Frequency')\n", " #axes[3].set_yscale('log')\n", "\n", " # tE values\n", " min_tE = min(fit_params[:, 4])\n", " max_tE = max(fit_params[:, 4])\n", " print(min_tE, max_tE)\n", " axes[4].hist(fit_params[:, 4], bins=20, color='#361d49')\n", " axes[4].set_xlabel(r'$t_\\text{E}$')\n", " axes[4].set_ylabel('Frequency')\n", " #axes[4].set_yscale('log')\n", "\n", " # rho values\n", " min_rho = min(fit_params[:, 5])\n", " max_rho = max(fit_params[:, 5])\n", " print(min_rho, max_rho)\n", " print(np.log10(min_rho), np.log10(max_rho))\n", " axes[5].hist(np.log10(fit_params[:, 5]), bins=20, color='#361d49')\n", " axes[5].set_xlabel(r'$\\rho$')\n", " axes[5].set_ylabel('Frequency')\n", " #axes[5].set_yscale('log')\n", "\n", " # pi_E_E values\n", " min_FS = min(fit_params[:, 6])\n", " max_FS = max(fit_params[:, 6])\n", " print(min_FS, max_FS)\n", " axes[6].hist(fit_params[:, 6], bins=20, color='#361d49')\n", " axes[6].set_xlabel(r'$\\pi_{\\text{E},E}$')\n", " axes[6].set_ylabel('Frequency')\n", " #axes[6].set_yscale('log')\n", "\n", " # pi_E_N values\n", " min_FB = min(fit_params[:, 7])\n", " max_FB = max(fit_params[:, 7])\n", " print(min_FB, max_FB)\n", " axes[7].hist(fit_params[:, 7], bins=20, color='#361d49')\n", " axes[7].set_xlabel(r'$\\pi_{\\text{E},N}$')\n", " axes[7].set_ylabel('Frequency')\n", " #axes[7].set_yscale('log')\n", "\n", " plt.show()\n", "\n", "fit1_params = fit_params[:N].copy()\n", "\n", "plot_histograms(fit1_params, 3)" ] }, { "cell_type": "markdown", "metadata": { "id": "G9gl76c4S9c7" }, "source": [ "### Priors\n" ] }, { "cell_type": "markdown", "metadata": { "id": "GCQiM8RpTJpT" }, "source": [ "We will now interpret the deeply non-physical result where the background flux is very negative.\n", "\n", "Likely at least a few of your events had negative $F_{\\textrm{B}}$ (*we're looking at you, 2025-BLG-0030*). This is a pretty common issue with single-lens modelling.\n", "\n", "But what does it mean?\n", "\n", "Nothing that makes any physical sense. If the blend is a little bit negative, that can be explained by systematics in the photometry (e.g. the background was measured with very faint star in it, so our flux scale was zeroed slightly wrong), but there is no reason for the blend to be very negative; we can't detect anti-photons.\n", "\n", "So, how can we punish the optimizer for its sins? It's actually pretty simple. We punish it by adding a penalty to the objective function. If the penalty is too abrupt, it can sometimes interfere with a gradient descent optimizer's ability to calculate gradients. But if we put a gradual penalty on it, the penalty acts to lead the optimizer in the right direction. The exact shape of the penalty doesn't really matter, it just needs to make more sense than giant negative fluxes.\n", "\n", "This kind of penalty behaviour is called a prior. Where we use prior knowledge to inform the most probability landscape. Basically we tell optimization, “a very negative blend is dumb - don't do that.”\n", "\n", "You might argue that you would prefer to approach your modelling from an agnostic perspective and I appreciate your integrity. The problem with that argument though is the assumption that no prior is agnostic. Because no prior is like telling your optimizer, \"negative blend is just as reasonable as positive blend.\" That's not agnostic. You've still informed the fit. You just informed it that every solution was equally possible, which you know it's not. This not-actually-agnostic prior, where you don't code in any sort of penalties, is called a uniform prior. The concept of \"no prior\" is a fallacy.\n", "\n", "So let's put a reasonable prior on the blend flux so that it stops acting up. Our other fit parameters can continue to have truncated priors (bounds).\n", "\n", "The next question you might ask is, \"how do I know which prior is the right prior\" the answer to that is: vibes.\n", "I'm not even joking. You choose a prior that is physically informed or informed by your \"prior knowledge\" of what your solution should be and don't worry overly much about it. You should however, always be wary that your priors are not so strong that they dominate the fit. If what you get out from a fit is very similar to your prior, that is an indication that you have been too heavy handed and you need to loosen the leash.\n", "\n", "For example, if you made your prior a hard bound, requiring $F_{\\textrm{B}}>0$, and your best fit solution was $F_{\\textrm{B}}\\approx0$, you have likely stopped the optimizer from exploring valid parameter space." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "
\n", "

Exercise 4

\n", "

Edit mulens_neglogP_function() to use a Gaussian prior to constrain the blend flux, if it is below 0; use a piecewise prior combining:

\n", " \n", "
\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "tSAMNBaKGdvB", "outputId": "615c1a2d-ce08-4cea-a4fe-68ff5b739a47" }, "outputs": [], "source": [ "#@title Exercise 4 fit parameter distributions\n", "\n", "fit2_params = fit_params[:N].copy()\n", "plot_histograms(fit2_params, 4)" ] }, { "cell_type": "markdown", "metadata": { "id": "ALusqxtSHSpb" }, "source": [ "\n", "\n", "
\n", "

Exercise 5

\n", "

If everything is working, try adding parallax to model in the EWS event processing function.


\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "UPD364REIby6", "outputId": "423424a0-a5b5-4dcc-bde7-878fc0b39bec" }, "outputs": [], "source": [ "#@title Exercise 5 fit parameter distributions\n", "\n", "fit3_params = fit_params[:N].copy()\n", "plot_histograms(fit3_params, 5)" ] }, { "cell_type": "markdown", "metadata": { "id": "dUIZpeMbHVpj" }, "source": [ "\n", "\n", "
\n", "

Exercise 6

\n", "

Increase the number of events you are testing on (N) so that we can do some more robust benchmarking. You should aim for a number of events that means the above cells takes about 5 minutes to run.


\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "2eVVCXcjI1mo", "outputId": "eb3ce953-44ba-407b-c1e9-d72f3bc107e3" }, "outputs": [], "source": [ "#@title Exercise 6 fit parameter distributions\n", "\n", "fit4_params = fit_params[:N].copy()\n", "plot_histograms(fit4_params, 6)" ] }, { "cell_type": "markdown", "metadata": { "id": "I8FUNkSQSmJj" }, "source": [ "### Parallel Processing\n", "\n", "\n", "We are going to demonstrate speeding this process up with parallel processing. However, there are a few things you should know first.\n", "\n", "We're going to demonstrate how to speed up batch fitting using parallel processing. But before we dive in, there are a few things you should know.\n", "\n", "If you're running this notebook on Google Colab, your code is executing on a cloud-based virtual machine. Colab typically allocates only two CPU cores per session. While we can parallelize across these, the speed-up won't be dramatic.\n", "\n", "If you have Jupyter and Python installed on your local machine, you can run this notebook there to take advantage of your full hardware.\n", "\n", "Want to keep the Colab interface but use your local computer's power?\n", "You can connect to a local runtime by following [these instructions](https://research.google.com/colaboratory/local-runtimes.html).\n", "This allows Colab to use your local machine's resources while maintaining the notebook interface.\n", "\n", "⚠️ If you choose this route, your notebook will use local packages. You should install the required environment using the provided `.yml` file and `conda` (or your preferred environment manager). See the setup instructions at the start of this notebook.\n", "\n", "This notebook uses the `pathos` package for parallelization.\n", "In scripts, you'll more commonly see `multiprocessing` used—it has a very similar interface. However, multiprocessing has known issues inside Jupyter notebooks, which is why we're using pathos instead.\n", "\n", "Pathos is great for educational work in notebooks, but it has quirks. For example:\n", "* It caches the function you give it the first time you run it.\n", "* If you change that function afterward, it won't update unless you restart the kernel.\n", "\n", "That's why we finalized our `fit_event()` function earlier—so we could parallelize it now with confidence.\n", "\n", "In general, we don't recommend using all available cores for parallelization:\n", "\n", "```python\n", "with Pool(processes=mp.cpu_count()) as pool:\n", "```\n", "\n", "This often results in slower execution, as the system spends time managing threads instead of doing actual computation. It's usually better to leave **one core free** for system processes:\n", "\n", "```python\n", "with Pool(processes=mp.cpu_count() - 1) as pool:\n", "```\n", "\n", "That said, Colab's VM only provides two cores, so `mp.cpu_count() - 1` is pointless here. For now, let's just try a pathos batch run on 2 cores and compare the timing to the serial loop from earlier.\n", "\n", "\n", "\n", "
\n", "

Exercise 7

\n", "

Test how different values of processes in the Pool() function affect your batch fitting time.

\n", "

Try using 1, 2, and maybe mp.cpu_count()` if you're on your local runtime.

\n", "

What's the fastest? What's the most efficient?


\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "xJAlYnRHGZAI", "outputId": "27d7b4e3-3034-4dc4-8daf-7960a18098bb" }, "outputs": [], "source": [ "# Main function to parallelize the processing\n", "def run_parallel_processing(plot_fraction=None, kill_after=None):\n", " # numpy array for the fit params\n", " fit_params = np.zeros((ews_df.shape[0], 8))\n", "\n", " if plot_fraction is None:\n", " plot_fraction = 0.1\n", " if kill_after is None:\n", " kill_after = 30\n", "\n", " start_time = time.time()\n", "\n", " # Create a pool of worker processes\n", " print('CPU count:', mp.cpu_count())\n", " with Pool(processes=mp.cpu_count()) as pool: # if running locally, keep the\n", " # process count < mp.cpu_count()\n", " # or it will be very slow\n", " results = pool.map(lambda i: process_event(i, ews_df, int(1/plot_fraction), start_time), range(kill_after))\n", " for i, params in results:\n", " n = len(params)\n", " fit_params[i, :n] = params\n", "\n", " print(\"Total time:\", time.time() - start_time)\n", "\n", " # Save fit_params if needed\n", " # np.save('fit_params.npy', fit_params)\n", "\n", " return fit_params\n", "\n", "# Run the parallel processing function\n", "######\n", "#fit_params = run_parallel_processing() # test run\n", "fit_params = run_parallel_processing(kill_after=N, plot_fraction=plot_fraction) # test run\n", "#fit_params = run_parallel_processing(kill_after=ews_df.shape[0],\n", "# plot_fraction=0.01\n", "# ) # full run\n", "######" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "uQMhYxHAfIwT", "outputId": "0a1d5b93-5457-4682-ad38-292ffaa721bd" }, "outputs": [], "source": [ "fit_params[:N]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "LCT3QifHGlWu", "outputId": "ae77344a-78d0-4795-930f-1b0aea09d0a1" }, "outputs": [], "source": [ "#@title Exercise 7 fit parameter distributions\n", "\n", "fit5_params = fit_params[:N].copy()\n", "plot_histograms(fit5_params, 7)" ] }, { "cell_type": "markdown", "metadata": { "id": "yG0V-EU8HDGR" }, "source": [ "In beta testing, the timing results for fitting 30 events were:\n", "\n", "| Processing | Cores | Events (N) | Plots | Time |\n", "| :-: | :-: | :-: | :-: | :-: |\n", "| Serial | 1 | 30 | 100% | 246s |\n", "| Serial | 1 | 30 | 10% | 237s |\n", "| Parallel | 2 | 30 | 100% | 218s |\n", "| Parallel | 2 | 30 | 10% | 213s |\n", "\n", "A modest improvement, considering the extra coding overhead. However, on a local machine with more cores:\n", "\n", "| Processing | Cores | Events (N) | Plots | Time |\n", "| :-: | :-: | :-: | :-: | :-: |\n", "| Serial | 1 | 30 | 100% | 150s |\n", "| Serial | 1 | 30 | 10% | 144s |\n", "| Parallel | 7 | 30 | 100% | 56s |\n", "| Parallel | 7 | 30 | 10% |\t55s |\n", "\n", "\n", "This shows the real power of parallelisation—when the hardware can actually support it.\n", "\n", "If you even encounter this kind of problem in the wild, where you a very parallelisable job with limited time to implement parallelization, it's always worth considering poor-man parallelisation:\n", "Break the job into batches, run multiple scripts or terminals at once, and let your operating system juggle the rest.\n", "\n", "If your code is loop heavy, look for vectorisation opportunities using `numpy`.\n", "Or `cython`, if that's not an option.\n", "\n", "Now that you've seen how parallelisation affects efficiency, you're ready to run a full-season fit.\n", "However, if you're short on time, it's entirely reasonable to skip this exercise.\n", "\n", "\n", "\n", "
\n", "

Exercise 8

\n", "

Use the more efficient method (serial or parallel) to process the full season of EWS data.

\n", "

Plot histograms of the resulting parameter distributions (e.g., tE, u0, etc.).

\n", "

Look for trends and outliers.


\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "wYysb_DpJwa9", "outputId": "4d7ab473-f284-45b9-ed86-d17c860357de" }, "outputs": [], "source": [ "#@title Exercise 8 fit parameter distributions\n", "\n", "fit6_params = fit_params[:N].copy()\n", "plot_histograms(fit6_params, 8)" ] }, { "cell_type": "markdown", "metadata": { "id": "bzd3gcOPyW4H" }, "source": [ "### Custom Event Finder\n" ] }, { "cell_type": "markdown", "metadata": { "id": "UPfkNPtHypE1" }, "source": [ "If we weren't borrowing initial fit parameters from the OGLE EWS, these fits would fail completely under downhill optimizers. Starting too far from the truth in parameter space means the optimizer gets trapped in local minima - the peaks and valleys of the likelihood landscape become a cage, not a guide.\n", "\n", "We'll explore more robust (and computationally expensive) methods in the next section. But even with advanced samplers, having a good initial guess makes the entire process faster, more stable, and more scientifically honest.\n", "\n", "Different ground-based surveys use different strategies to identify microlensing events in stellar light curves. You, however, have a luxury they don't: you already know which light curves contain events. That means your challenge isn't classification—it's localization.\n", "\n", "And the most critical parameter to localize is $t_0$. \n", "Other parameters (like $u_0$, $t_\\textrm{E}$, and $\\rho$) can be guessed with population-based heuristics. But if your $t_0$ is wrong, your fit will miss the peak entirely - and what you get back will be noise-dressed nonsense.\n", "\n", "This is a central problem in bulk fitting single-lens microlensing events: \n", "> We need an event-finding algorithm to guess $t_0$ accurately.\n", "\n", "You now have:\n", "- A full season of OGLE data\n", "- Light curves that are guaranteed to contain events\n", "- A set of academic papers describing common event-finding strategies:\n", " - [OGLE Early Warnign System (EWS)](https://ui.adsabs.harvard.edu/abs/1994AcA....44..227U/abstract)\n", " - [KMTNet Alert Finder](https://ui.adsabs.harvard.edu/abs/2018arXiv180607545K/abstract)\n", " - [KMTNet Event Finder](https://ui.adsabs.harvard.edu/abs/2018AJ....155...76K/abstract)\n", " - [KMTNet Anomaly Finder](https://ui.adsabs.harvard.edu/abs/2021AJ....162..163Z/abstract)\n", " - [RTModel (Real-Time Modeling)](https://ui.adsabs.harvard.edu/abs/2024A%26A...688A..83B/abstract) (see their section 4)\n", "\n", "Do you think you can do better?\n", "\n", "\n", "\n", "
\n", "

Exercise 9

\n", "

Write an algorithm to estimate $t_0$ for microlensing events. Keep it simple. Keep it efficient. The rest of your fit may depend on it.


\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Ki_q_-3xWjVO" }, "outputs": [], "source": [ "#@title Your code goes here" ] }, { "cell_type": "markdown", "metadata": { "id": "cwwlRkfxL-V5" }, "source": [ "### Advanced Modeling Techniques and Higher-Order Models\n", "\n", "The idea that a good $t_0$ guess is enough for a good fit holds true for FSPL events. But when an FSPL model fails to fit cleanly, that's often a sign that higher-order effects are at play. You've encountered some of these in other chapters of this course.\n", "\n", "We know that binary stars are common in the galaxy. And yet, we often model events using PSPL or FSPL, as if single-lens-single-source events are the default. In fact, recent simulations of ground-based surveys using modern Galactic models suggest that around 50% of single-peaked microlensing events are actually \"hidden binaries\"—either:\n", "- **Binary lenses** (multiple objects in the lens system), or \n", "- **Binary sources** (two source stars lensed simultaneously).\n", "\n", "#### Binary Source Stars\n", "\n", "They introduce subtle distortions to the light curve and can easily masquerade as single-lens or binary-lens events.\n", "\n", "If you'd like to learn more, see this [notebook on binary sources](https://github.com/AmberLee2427/TheMicrolensersGuideToTheGalaxy/blob/main/Notebooks/BinarySource.ipynb).\n", "\n", "#### Binary Lenses\n", "\n", "These are a different beast entirely. They require:\n", "- Higher-order models\n", "- Optimizers that can escape local minima (e.g. **Monte Carlo** methods that allow uphill steps)\n", "\n", " ([This notebook](https://github.com/AmberLee2427/TheMicrolensersGuideToTheGalaxy/blob/main/Notebooks/Modelling.ipynb) on modeling methods commonly used in microlensing may also be of interest)\n", "- Often a **grid search** in $s$, $q$, and $\\rho$ to even get reasonable initial conditions\n", "\n", "Even with that, **degeneracies are common**—and easy to miss. For example, see the modeling of OGLE-2016-BLG-1195 (Shartzvald et al., 2017; Bond et al., 2017; Gould et al., 2023; Vandorou et al., 2024), where viable degenerate solutions were overlooked at each stage of modeling.\n", "\n", "If a light curve shows dramatic multi-peaked deviations from a Paczyński shape, you're **almost certainly dealing with lens multiplicity** (unless some other astrophysical event has contaminated your light curve). But determining the *correct* model requires balancing:\n", "- Evidence for complexity, and \n", "- The principle of parsimony (Occam's Razor), while knowing full well that these “complex” arrangements are not **rare** in *occurrence* so much as in *observability*.\n", "\n", "---\n", "\n", "For more on binary-lens microlensing fitting, see the next notebook in this series: [Fitting Binary Lenses](../Session%20C:%20Binary%20Lens/Fitting_Binary_Lenses.ipynb) (also available on [GitHub](https://github.com/rges-pit/data-challenge-notebooks/blob/main/AAS%20Workshop/Session%20C:%20Binary%20Lens/Fitting_Binary_Lenses.ipynb) and [Colab](https://colab.research.google.com/github/rges-pit/data-challenge-notebooks/blob/main/AAS%20Workshop/Session%20C:%20Binary%20Lens/Fitting_Binary_Lenses.ipynb)).\n", "\n", "---\n", "\n", "#### Other higher-order effects not covered in this series (but worth knowing):\n", "- **Lens orbital motion** \n", "- **Xallarap** (source orbital motion) \n", "- **Multiple lenses** (e.g. triple lenses) \n", "- **Variable stars** (source, lens, or blend stars with intrinsic variability) \n", "- Variable **blending** (ambient stars moving in/out of the aperture/PSF) \n", "- General data **systematics**\n" ] }, { "cell_type": "markdown", "metadata": { "id": "m3MbPK5SLH11" }, "source": [ "\n", "## Full Season Roman Fit\n", "\n", "First, the data. There is a trove full of \"simple\" *Roman*-like microlensing light curves from the 2018 WFIRST Data Challenge. We start our mini data challenge by cloning that and pulling out all the relevant light curves.\n", "\n", "\n", "The dataset contains:\n", "\n", "- `master_file.txt` – event catalog with 96 metadata columns\n", "- `wfirstColumnNumbers.txt` – column descriptions for the master file\n", "- `event_info.txt` – sky coordinates, extinction, and distance estimates\n", "- `ulwdc1_XXX.dat` – combined W149/Z087 light curves for each event\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Define data paths\n", "DATA_ROOT = Path('data-challenge-1')\n", "LOCAL_MASTER = DATA_ROOT / 'Answers/master_file.txt'\n", "LOCAL_HEADER = DATA_ROOT / 'Answers/wfirstColumnNumbers.txt'\n", "LOCAL_EVENT_INFO = DATA_ROOT / 'event_info.txt'\n", "LOCAL_LC_DIR = DATA_ROOT / 'lc'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "nexus-only" ] }, "outputs": [], "source": [ "#@title Cloning the GitHub repository (local/Colab/Nexus)\n", "\n", "if not DATA_ROOT.exists():\n", " print(\"Cloning data-challenge-1 repository...\")\n", " !git clone https://github.com/microlensing-data-challenge/data-challenge-1.git\n", " \n", " print(\"Extracting light curves...\")\n", " !tar -xzvf data-challenge-1/lc.tar.gz -C data-challenge-1/\n", "else:\n", " print(\"data-challenge-1 already cloned.\")\n", " if not LOCAL_LC_DIR.exists():\n", " print(\"Extracting light curves...\")\n", " !tar -xzvf data-challenge-1/lc.tar.gz -C data-challenge-1/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load and Sort the Data\n", "\n", "Now that we have a means to access the data, no matter our chosen platform, we need to tidy it into a more programatically friendly form." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Read data into memory\n", "\n", "def read_table(path: Path, **kwargs):\n", " \"\"\"Read a whitespace-delimited table from local disk into a DataFrame.\"\"\"\n", " with open(path, 'r') as handle:\n", " return pd.read_csv(handle, **kwargs)\n", "\n", "HEADER_SOURCE = LOCAL_HEADER\n", "MASTER_SOURCE = LOCAL_MASTER\n", "EVENT_INFO_SOURCE = LOCAL_EVENT_INFO" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Feel free to `Shift` + `Enter` through the rest of this section until you reach **Adjusting the Model for an L2 Orbit**, if you just want the pipeline results." ] }, { "cell_type": "markdown", "metadata": { "id": "6aYEHo7-QOgI" }, "source": [ "This dataset includes 293 light curves, 74 of which are single-lens events. We can cheat a little and specifically pull out the events that we know to be single lenses, keeping the challenge tractable for completion within the hour, with the added benefit of making the strangely organized `master_file.txt` easier to wrangle." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Key microlensing parameter columns:\n", " Column 30 (u0): u0\n", " Column 32 (t0): t0\n", " Column 33 (tE): tE\n", " Column 36 (piE): piE\n", " Column 37 (rhos): rhos\n" ] } ], "source": [ "# Derive descriptive column names from the header file (local path)\n", "# The wfirstColumnNumbers.txt file is tab-separated with columns:\n", "# colno, name, units, min, max, bins, lgscl, label, xmod\n", "\n", "header_df = read_table(\n", " HEADER_SOURCE,\n", " sep=r'\\s+',\n", " comment='#',\n", " header=None,\n", " engine='python',\n", " dtype=str # IMPORTANT: keep everything as string initially\n", ")\n", "\n", "# The first column is the index, the second is the name\n", "# Extract just these two columns\n", "header_df = header_df.iloc[:, [0, 1]]\n", "header_df.columns = ['index', 'name']\n", "\n", "# Convert index to numeric, dropping any rows that fail\n", "header_df['index'] = pd.to_numeric(header_df['index'], errors='coerce')\n", "header_df = header_df.dropna(subset=['index'])\n", "header_df['index'] = header_df['index'].astype(int)\n", "\n", "# Start with placeholder names\n", "colnames_96 = [f'col_{i:02d}' for i in range(96)]\n", "\n", "# Map column indices to their names from the header file\n", "for idx, name in zip(header_df['index'], header_df['name']):\n", " if 0 <= idx < len(colnames_96):\n", " # Skip separator rows marked with '|'\n", " if name != '|':\n", " colnames_96[idx] = name\n", "\n", "# Override specific columns that we reference later with consistent names\n", "# These ensure compatibility with the L2 orbit code\n", "colnames_96[30] = 'u0' # impact parameter\n", "colnames_96[32] = 't0' # time of closest approach \n", "colnames_96[33] = 'tE' # Einstein crossing time\n", "colnames_96[36] = 'piE' # parallax amplitude\n", "colnames_96[37] = 'rhos' # source size (rho_star)\n", "colnames_96[73] = 'N' # number of data points (or chi2_0)\n", "colnames_96[75] = 'Delta chi2' # delta chi-squared\n", "colnames_96[92] = 'sim type' # simulation type (dcnormffp, ombin, etc.)\n", "colnames_96[94] = 'filename' # filename string\n", "colnames_96[95] = 'lc_number' # light curve number\n", "\n", "# Replace any remaining '|' separators or empty strings with placeholder names\n", "colnames_96 = [\n", " name if isinstance(name, str) and name not in ('|', '')\n", " else f'col_{i:02d}'\n", " for i, name in enumerate(colnames_96)\n", "]\n", "\n", "# Print key columns for verification\n", "print(\"Key microlensing parameter columns:\")\n", "print(f\" Column 30 (u0): {colnames_96[30]}\")\n", "print(f\" Column 32 (t0): {colnames_96[32]}\")\n", "print(f\" Column 33 (tE): {colnames_96[33]}\")\n", "print(f\" Column 36 (piE): {colnames_96[36]}\")\n", "print(f\" Column 37 (rhos): {colnames_96[37]}\")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "cellView": "form", "colab": { "base_uri": "https://localhost:8080/", "height": 461 }, "id": "UDJXNqxQ_WzH", "outputId": "f8259b94-d9e1-40f3-b6b2-0ba069b9969f" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
event_idlc_numberNDelta chi2
0ulwdc1_0055117.036836.2
1ulwdc1_017171361.0397992.0
2ulwdc1_021211014.0253501.0
3ulwdc1_02222330.027108.2
4ulwdc1_029293108.01991220.0
\n", "
" ], "text/plain": [ " event_id lc_number N Delta chi2\n", "0 ulwdc1_005 5 117.0 36836.2\n", "1 ulwdc1_017 17 1361.0 397992.0\n", "2 ulwdc1_021 21 1014.0 253501.0\n", "3 ulwdc1_022 22 330.0 27108.2\n", "4 ulwdc1_029 29 3108.0 1991220.0" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#@title Putting everything in a tidy data frame\n", "master_df = read_table(\n", " MASTER_SOURCE,\n", " sep=r'\\s+',\n", " comment='#',\n", " header=None,\n", " names=colnames_96,\n", " engine='python'\n", ")\n", "\n", "df_sl = (\n", " master_df[master_df['sim type'] == 'dcnormffp']\n", " .copy()\n", " .reset_index(drop=True)\n", ")\n", "\n", "df_sl['lc_number'] = df_sl['lc_number'].astype(int)\n", "df_sl['event_id'] = df_sl['lc_number'].apply(lambda n: f\"ulwdc1_{n:03d}\")\n", "\n", "df_sl[['event_id', 'lc_number', 'N', 'Delta chi2']].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "The helper functions above handle local files which works on Colab, your laptop, or the Nexus." ] }, { "cell_type": "markdown", "metadata": { "id": "ubuyz4fM-X78" }, "source": [ "Each row now includes the `event_id` (e.g., `ulwdc1_123`) and the numeric `lc_number`, so we can fetch the corresponding light curves from the local `data-challenge-1/lc/` directory. We'll construct those references next." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "colab": { "base_uri": "https://localhost:8080/", "height": 635 }, "id": "33wo759fRxfb", "outputId": "1ac6d3e1-c0ea-474e-c6dd-66eb197ebef8" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
event_idlc_file_path_W149lc_s3_uri
0ulwdc1_005data-challenge-1/lc/ulwdc1_005_W149.txts3://rmdc26-data-public/2018-test/ulwdc1_005.dat
1ulwdc1_017data-challenge-1/lc/ulwdc1_017_W149.txts3://rmdc26-data-public/2018-test/ulwdc1_017.dat
2ulwdc1_021data-challenge-1/lc/ulwdc1_021_W149.txts3://rmdc26-data-public/2018-test/ulwdc1_021.dat
3ulwdc1_022data-challenge-1/lc/ulwdc1_022_W149.txts3://rmdc26-data-public/2018-test/ulwdc1_022.dat
4ulwdc1_029data-challenge-1/lc/ulwdc1_029_W149.txts3://rmdc26-data-public/2018-test/ulwdc1_029.dat
\n", "
" ], "text/plain": [ " event_id lc_file_path_W149 \\\n", "0 ulwdc1_005 data-challenge-1/lc/ulwdc1_005_W149.txt \n", "1 ulwdc1_017 data-challenge-1/lc/ulwdc1_017_W149.txt \n", "2 ulwdc1_021 data-challenge-1/lc/ulwdc1_021_W149.txt \n", "3 ulwdc1_022 data-challenge-1/lc/ulwdc1_022_W149.txt \n", "4 ulwdc1_029 data-challenge-1/lc/ulwdc1_029_W149.txt \n", "\n", " lc_s3_uri \n", "0 s3://rmdc26-data-public/2018-test/ulwdc1_005.dat \n", "1 s3://rmdc26-data-public/2018-test/ulwdc1_017.dat \n", "2 s3://rmdc26-data-public/2018-test/ulwdc1_021.dat \n", "3 s3://rmdc26-data-public/2018-test/ulwdc1_022.dat \n", "4 s3://rmdc26-data-public/2018-test/ulwdc1_029.dat " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#@title Figuring out which files we want\n", "def make_local_lightcurve(event_id: str, band: str) -> str:\n", " return str(LOCAL_LC_DIR / f\"{event_id}_{band}.txt\")\n", "\n", "if LOCAL_LC_DIR.exists():\n", " df_sl['lc_file_path_W149'] = df_sl['event_id'].apply(lambda eid: make_local_lightcurve(eid, 'W149'))\n", " df_sl['lc_file_path_Z087'] = df_sl['event_id'].apply(lambda eid: make_local_lightcurve(eid, 'Z087'))\n", "else:\n", " print(\"Warning: Local light-curve directory not found. Please ensure the repository was cloned successfully.\")\n", "\n", "preview_cols = ['event_id']\n", "if 'lc_file_path_W149' in df_sl.columns:\n", " preview_cols.append('lc_file_path_W149')\n", "\n", "df_sl[preview_cols].head()" ] }, { "cell_type": "markdown", "metadata": { "id": "zlIxCof6VY9o" }, "source": [ "There are a few pieces of information that do **not** live inside the light curve files (sky position, extinction estimates, etc.). They are provided in `event_info.txt`, which is available locally in `data-challenge-1/event_info.txt`.\n", "\n", "Columns: `\"Event_name\"` `\"Event_number\"` `\"RA_(deg)\"` `\"Dec_(deg)\"` `\"Distance\"` `\"A_W149\"` `\"sigma_A_W149\"` `\"A_Z087\"` `\"sigma_A_Z087\"`\n", "\n", "`Distance` plus the extinction columns give you approximate clump-star distances and reddening along the line of sight; the sigma values represent the dispersion in each band." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "cellView": "form", "colab": { "base_uri": "https://localhost:8080/", "height": 479 }, "id": "ke7lpciNVtEN", "outputId": "b0840ec4-6759-4af6-ab04-ef47df25585d" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Event_nameEvent_numberRA_(deg)Dec_(deg)DistanceA_W149sigma_A_W149A_Z087sigma_A_Z087
0ulwdc1_0011269.165-29.02078.180.730.011.410.01
1ulwdc1_0022269.959-30.19188.090.490.010.950.01
2ulwdc1_0033269.100-29.09838.180.730.011.410.01
3ulwdc1_0044268.036-28.37448.251.350.072.600.14
4ulwdc1_0055269.319-29.08898.180.730.011.410.01
\n", "
" ], "text/plain": [ " Event_name Event_number RA_(deg) Dec_(deg) Distance A_W149 \\\n", "0 ulwdc1_001 1 269.165 -29.0207 8.18 0.73 \n", "1 ulwdc1_002 2 269.959 -30.1918 8.09 0.49 \n", "2 ulwdc1_003 3 269.100 -29.0983 8.18 0.73 \n", "3 ulwdc1_004 4 268.036 -28.3744 8.25 1.35 \n", "4 ulwdc1_005 5 269.319 -29.0889 8.18 0.73 \n", "\n", " sigma_A_W149 A_Z087 sigma_A_Z087 \n", "0 0.01 1.41 0.01 \n", "1 0.01 0.95 0.01 \n", "2 0.01 1.41 0.01 \n", "3 0.07 2.60 0.14 \n", "4 0.01 1.41 0.01 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#@title Event information data frame\n", "\n", "header = [\n", " \"Event_name\",\n", " \"Event_number\",\n", " \"RA_(deg)\",\n", " \"Dec_(deg)\",\n", " \"Distance\",\n", " \"A_W149\",\n", " \"sigma_A_W149\",\n", " \"A_Z087\",\n", " \"sigma_A_Z087\"\n", "]\n", "\n", "event_info = read_table(\n", " EVENT_INFO_SOURCE,\n", " names=header,\n", " sep=r'\\s+',\n", " engine='python'\n", ")\n", "event_info.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "cellView": "form", "colab": { "base_uri": "https://localhost:8080/", "height": 635 }, "id": "H_wJ09k8W1Hr", "outputId": "13c370ee-d47e-409a-919e-821000e11be3" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All required microlensing columns present.\n", "\n", "Microlensing parameters for first 5 events:\n", " event_id t0 u0 tE piE rhos\n", "ulwdc1_005 1582.212644 -0.071271 1.85542 0.321621 0.002806\n", "ulwdc1_017 1596.697828 0.237068 9.57852 0.083551 0.000801\n", "ulwdc1_021 521.559471 0.190811 8.92638 0.235505 0.000506\n", "ulwdc1_022 1445.681285 0.356571 13.14070 0.108506 0.000500\n", "ulwdc1_029 304.529470 0.370042 19.76860 0.473756 0.000375\n" ] } ], "source": [ "#@title Combining the two data frames\n", "\n", "# Convert 'lc_number' to numeric type before merging\n", "merged_df = pd.merge(event_info, df_sl.astype({'lc_number': 'int64'}), left_on='Event_number', right_on='lc_number', how='inner')\n", "\n", "# Verify that the key microlensing columns are present\n", "required_cols = ['t0', 'u0', 'tE', 'piE', 'rhos']\n", "missing_cols = [c for c in required_cols if c not in merged_df.columns]\n", "\n", "if missing_cols:\n", " print(f\"WARNING: Missing columns: {missing_cols}\")\n", " print(\"Available columns:\", list(merged_df.columns))\n", "else:\n", " print(\"All required microlensing columns present.\")\n", " # Display a preview of key microlensing parameters\n", " print(\"\\nMicrolensing parameters for first 5 events:\")\n", " display_cols = ['event_id', 't0', 'u0', 'tE', 'piE', 'rhos']\n", " print(merged_df[display_cols].head().to_string(index=False))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 639 }, "id": "wkRfC7W26Akn", "outputId": "bc5c7dc8-96d6-4214-8634-0db94463585b" }, "outputs": [], "source": [ "try: # this will only work on Colab.\n", " sheet = sheets.InteractiveSheet(df=merged_df)\n", "except:\n", " # don't limit the number of columns displayed by pandas\n", " pd.set_option('display.max_columns', None)\n", " merged_df.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "EPfdr0mJ7fOb" }, "source": [ "Great—data successfully wrangled. Time to pivot from OGLE to the simulated Roman season.\n", "\n", "### Adjusting the Model for an L2 Orbit\n", "\n", "Space-based light curves require ephemerides so `MulensModel` can evaluate the observer position relative to the lens. For the Roman DC1 data we simply pass `ephemerides_file='data-challenge-1/wfirst_ephemeris_W149.txt'` when instantiating each `MulensData` object.\n", "\n", "> Instructions specific to this data set for `MulensModel` are given [here](https://github.com/rpoleski/MulensModel/blob/master/documents/data_challenge.md).\n", "\n", "Most of the data for these events is in the W149 band, so we focus on that passband and avoid juggling multiple $(F_s, F_b)$ pairs. If you later need the source color, hold the microlensing parameters fixed and refit only the fluxes (linear regression) in each band—fast and reliable.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JLvFYejBLQFY", "outputId": "adc8d511-817f-47cb-f6ab-4c0d06c28d97" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Event: ulwdc1_005\n", "Sky coordinates: RA=269.3190°, Dec=-29.0889°\n", "\n", "True parameters from master file:\n", " t0 (sim time): 1582.21\n", " u0: -0.0713\n", " tE: 1.86 days\n", " piE: 0.321621\n", " rhos: 0.002806\n", "\n", "SkyCoord: 17h57m16.56s -29d05m20.04s\n", "\n", "Initial guess parameters (true × 1.1):\n", " t_0: 2459816.21 (HJD)\n", " u_0: -0.0784\n", " t_E: 2.04 days\n", " rho: 0.003086\n", " pi_E_N: 0.227420\n", " pi_E_E: 0.227420\n", "\n", "MulensModel Event created successfully for ulwdc1_005\n" ] } ], "source": [ "#@title Example L2 data object\n", "\n", "# Select the first event from our merged dataframe\n", "event_idx = 0 # Change this to select a different event\n", "\n", "# Get the light curve file path\n", "data_file = merged_df['lc_file_path_W149'].iloc[event_idx]\n", "event_id = merged_df['event_id'].iloc[event_idx]\n", "\n", "# Extract microlensing parameters from the master file\n", "# These columns were defined during the data loading step\n", "t_0_sim = float(merged_df['t0'].iloc[event_idx]) # time of closest approach (simulation time)\n", "u_0 = float(merged_df['u0'].iloc[event_idx]) # impact parameter\n", "t_E = float(merged_df['tE'].iloc[event_idx]) # Einstein crossing time (days)\n", "pi_E = float(merged_df['piE'].iloc[event_idx]) # parallax amplitude\n", "rho_s = float(merged_df['rhos'].iloc[event_idx]) # source size (rho_star)\n", "\n", "# Split the parallax equally between N and E components (simplistic assumption)\n", "# In reality, you'd fit for both components independently\n", "pi_E_E = np.sqrt(pi_E**2 / 2.0)\n", "pi_E_N = pi_E_E\n", "\n", "# Sky coordinates\n", "ra = merged_df['RA_(deg)'].iloc[event_idx]\n", "dec = merged_df['Dec_(deg)'].iloc[event_idx]\n", "\n", "print(f\"Event: {event_id}\")\n", "print(f\"Sky coordinates: RA={ra:.4f}°, Dec={dec:.4f}°\")\n", "print(f\"\\nTrue parameters from master file:\")\n", "print(f\" t0 (sim time): {t_0_sim:.2f}\")\n", "print(f\" u0: {u_0:.4f}\")\n", "print(f\" tE: {t_E:.2f} days\")\n", "print(f\" piE: {pi_E:.6f}\")\n", "print(f\" rhos: {rho_s:.6f}\")\n", "\n", "# Convert RA/Dec to SkyCoord format for MulensModel\n", "coord = SkyCoord(ra=ra * u.deg, dec=dec * u.deg, frame='icrs')\n", "hms_dms_string = coord.to_string('hmsdms')\n", "print(f\"\\nSkyCoord: {hms_dms_string}\")\n", "\n", "# Here is the main difference for space-based data - we provide the ephemeris for Roman:\n", "EPHEM_FILE = 'data-challenge-1/wfirst_ephemeris_W149.txt'\n", "data_Roman_W149 = MulensModel.MulensData(file_name=data_file,\n", " phot_fmt='mag',\n", " ephemerides_file=EPHEM_FILE,\n", " plot_properties={'color': '#a859e4',\n", " 'label': 'Roman W149'\n", " },\n", " bandpass='H'\n", " )\n", "\n", "# Convert t_0 from simulation time to HJD\n", "# Reference: https://github.com/microlensing-data-challenge/evaluation_code/blob/master/parse_table1.py\n", "# The simulation zero time corresponds to HJD 2458234.0\n", "t_0 = t_0_sim + 2458234.0\n", "\n", "# Build the parameter dictionary for MulensModel\n", "# We multiply by 1.1 to create \"guess\" parameters slightly offset from truth\n", "params = dict()\n", "parameters_to_fit = [\"t_0\", \"u_0\", \"t_E\", \"rho\", \"pi_E_N\", \"pi_E_E\"]\n", "params['t_0'] = t_0\n", "params['t_0_par'] = t_0 # parallax reference time (often same as t_0)\n", "params['u_0'] = u_0 * 1.1\n", "params['t_E'] = t_E * 1.1\n", "params['rho'] = rho_s * 1.1\n", "params['pi_E_N'] = pi_E_N\n", "params['pi_E_E'] = pi_E_E\n", "\n", "print(f\"\\nInitial guess parameters (true × 1.1):\")\n", "print(f\" t_0: {params['t_0']:.2f} (HJD)\")\n", "print(f\" u_0: {params['u_0']:.4f}\")\n", "print(f\" t_E: {params['t_E']:.2f} days\")\n", "print(f\" rho: {params['rho']:.6f}\")\n", "print(f\" pi_E_N: {params['pi_E_N']:.6f}\")\n", "print(f\" pi_E_E: {params['pi_E_E']:.6f}\")\n", "\n", "# Create the MulensModel with parallax\n", "# Note: Providing coords and ephemerides_file is essential for parallax calculations\n", "Roman_model = MulensModel.Model({**params},\n", " coords=coord,\n", " ephemerides_file=EPHEM_FILE\n", " )\n", "\n", "Roman_event = MulensModel.Event(datasets=data_Roman_W149, model=Roman_model)\n", "print(f\"\\nMulensModel Event created successfully for {event_id}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "OV7HUjMcyRVX", "outputId": "ca717797-df72-4e0b-dadf-18fc9b9edfba" }, "outputs": [], "source": [ "#@title Sneaky peak at the data\n", "\n", "print(' ', data_file, '\\n')\n", "! head -5 data-challenge-1/lc/ulwdc1_005_W149.txt\n", "\n", "print('\\n', EPHEM_FILE)\n", "! head -5 data-challenge-1/wfirst_ephemeris_W149.txt\n", "\n", "print('\\n Object lightcurve dates')\n", "print(data_Roman_W149.time)\n", "\n", "print('\\n t_0:', params['t_0'])\n", "print(' t_0_par:', params['t_0_par'])\n", "\n", "# cool, everything is in full HJD and looking sensible" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 541 }, "id": "dUG01D7s4zeh", "outputId": "a65137fb-fcb7-4117-9fd4-96ebfc635ecf" }, "outputs": [], "source": [ "#@title Plot to check it's working\n", "\n", "Roman_event = MulensModel.Event(datasets=data_Roman_W149, model=Roman_model)\n", "\n", "t_min = params['t_0'] - 3.0 * params['t_E']\n", "t_max = params['t_0'] + 3.0 * params['t_E']\n", "print(t_min, t_max)\n", "print(t_0)\n", "\n", "# set finite source method\n", "Roman_model.set_magnification_methods([t_min,\n", " finite_source_methods[0],\n", " t_max],\n", " source=None\n", " ) # rho <= 0.1\n", "\n", "plt.close(200)\n", "plt.figure(200)\n", "\n", "#Roman_model.plot_magnification(t_range=[t_min, t_max],\n", "# subtract_2450000=True,\n", "# color='red',\n", "# linestyle=':'\n", "# )\n", "#Roman_event.plot_data(subtract_2450000=True)\n", "Roman_event.plot()\n", "\n", "plt.xlim(t_min-2450000, t_max-2450000)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "W4hlbXoorffu" }, "source": [ "Yeah... not great. But good enough for a starting guess, if that's the route you're taking." ] }, { "cell_type": "markdown", "metadata": { "id": "tDj1BFoDHCOj" }, "source": [ "### Run the Fits\n", "\n", "\n", "This is it. You have every thing you need to fit a full season of single lenses. And this is the part where we push the baby bird out of the nest. No more hand-holding. No sample answers. Just do it.\n", "\n", "We believe in you.\n", "\n", "\n", "\n", "
\n", "

Exercise 10

\n", "

Perform FSPL on the provided simulated Roman single-lens events.

\n", "

Note. Don't forget to save your best fit parameters for later inspection and to add parallax to the model.


\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "wVjw47SzHBmf" }, "outputs": [], "source": [ "#@title Your code goes here" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "zIIH2x4MLZR0", "outputId": "d9ba42b4-3e57-4555-92b5-3fa66a6eca46" }, "outputs": [], "source": [ "#@title Exercise 10 fit parameter distributions\n", "\n", "fit7_params = fit_params[:N].copy()\n", "plot_histograms(fit7_params, 8)" ] }, { "cell_type": "markdown", "metadata": { "id": "RLwg-froHeQm" }, "source": [ "### Evaluate Your Results\n", "\n", "\n", "The next step is evalutaing how well your bulk fit went. Some simple histograms of fitted parameters compared with \"truths\" should do the trick.\n", "\n", "\n", "\n", "
\n", "

Exercise 11

\n", "

Make overlayed histograms of the true parameters distributions and your best-fit parameter distributions to evaluate the sucess of your algoryhthms.

\n", "
\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aA8-nRnMIS_U" }, "outputs": [], "source": [ "#@title Your code goes here" ] }, { "cell_type": "markdown", "metadata": { "id": "LUrW-CXmIVY5" }, "source": [ "## Exercises and Next Steps\n", "\n", "- Re-run Exercises 1–4 but swap in your own priors or alternative optimizers; record how sensitive the recovered $t_E$ and $u_0$ distributions are.\n", "- Extend Exercise 5 by writing a simple event-finding or anomaly-finding heuristic.\n", "- In Section 4, batch-fit 10–20 Roman events and log runtime/memory usage—this helps you budget future Nexus jobs.\n", "- Package one of your fits with `microlens-submit` to confirm you can validate and export a submission artifact.\n", "- Move on to the next notebook in this series: [`Fitting_Binary_Lenses.ipynb`](../Session%20C:%20Binary%20Lens/Fitting_Binary_Lenses.ipynb).\n", "\n", "When you are ready for the real challenge, the Roman Microlensing Data Challenge releases can be found in their [HuggingFace repositories](https://huggingface.co/RGES-PIT) or locally on the Nexus." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Additional Resources\n", "- [MulensModel documentation](https://rpoleski.github.io/MulensModel/) — API details, geometry references, and Roman-specific notes.\n", "- [OGLE Early Warning System](https://ogle.astrouw.edu.pl/ogle4/ews/ews.html) — daily alert tables with PSPL estimates.\n", "- [microlens-submit docs](https://microlens-submit.readthedocs.io/en/latest/) — workflow for packaging and validating challenge submissions.\n", "- [Roman Research Nexus](https://roman.ipac.caltech.edu/nexus) — information about kernels, data access, and team spaces.\n", "- [RGES-PIT Microlensing resources](https://rges-pit.org/outreach/) — minicourse videos and supplementary tutorials referenced in this session." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## About this Notebook\n", "**Author(s):** Amber Malpas, Katarzyna Kruszyńska, Somayeh Khakpash, Ali Crisp, Meet J. Vyas \n", "**Maintainers:** RGES-PIT Working Group 9 \n", "**Last Updated:** 20 Nov 2025 \n", "**Contact:** malpas.1@osu.edu\n", "\n", "## Citations\n", "\n", "If you use `MulensModel`, `emcee`, or this notebook for published research, please cite the\n", "authors. Follow these links for more information about citing:\n", "\n", "* [Citing `MulensModel`](https://github.com/rpoleski/MulensModel/blob/master/CITATION.cff)\n", "* [Citing `emcee`](https://github.com/dfm/emcee#attribution)\n", "* [Citing **Roman Microlensing Data Challenge 2026 Notebooks**](https://github.com/rges-pit/data-challenge-notebooks/blob/main/zenodo.txt)\n", "\n", "### Citing Roman Microlensing Data Challenge 2026 Notebooks\n", "\n", "If you use our notebooks in your project, please cite:\n", "\n", "```\n", "Malpas, A., Murlidhar, A., Vandorou, K., Kruszyńska, K., Crisp, A., & Vyas, M. (2026). Roman Microlensing Data Challenge 2026 Notebooks (v1.0.0). Zenodo. https://doi.org/10.5281/zenodo.18262183\n", "```\n", "\n", "**BibTeX:**\n", "```bibtex\n", "@software{malpas_2025_17806271,\n", " author = {Malpas, Amber and Murlidhar, Arjun and Vyas, Meet and Vandorou, Katie and Kruszyńska, Katarzyna and Crisp, Ali},\n", " title = {Roman Microlensing Data Challenge 2026 Notebooks},\n", " month = dec,\n", " year = 2026,\n", " publisher = {Zenodo},\n", " version = {v1.0.0},\n", " doi = {10.5281/zenodo.18262183},\n", " url = {https://doi.org/10.5281/zenodo.18262183}\n", "}\n", "```\n", "\n", "\n", "\n", "" ] } ], "metadata": { "colab": { "authorship_tag": "ABX9TyP+TSchZRCAfwCcW+DZDMDV", "mount_file_id": "1dK339mAVNtIqC8WDUS8FfVj6K_a3z_jO", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 0 }