--- layout: '@/layouts/Doc.astro' title: '📝 2025 Annual Report' date: 2025-09-22 description: 'Draft annual report covering scientific accomplishments, publications, presentations, and goals at ALCF.' draft: true --- Sam Foreman 2025-09-22 - [Goals for Next Year (2026)](#goals-for-next-year-2026) - [Goals from Last Year (2024)](#goals-from-last-year-2024) - [Contributions to ALCF](#contributions-to-alcf) - [Publications](#publications) - [Presentations](#presentations) - [Posts](#posts) - [Organizational Efforts](#organizational-efforts) - [Mentoring](#mentoring) - [Scientific / Technical Accomplishments](#scientific--technical-accomplishments) - [References](#references) ## Goals for Next Year (2026) - [ ] Build out generic training services for science teams - [ ] Continue to push on resilient / fault-tolerant training techniques ## Goals from Last Year (2024) - [x] Continue to contribute to division(/lab)-wide efforts - [x] Continue to work with application teams to efficiently scale on ALCF systems - [x] \[WIP\] Publish retrospective on initial pre-training of AuroraGPT ## Contributions to ALCF - AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions - [ACM Gordon Bell Prize Finalist](https://arxiv.org/abs/2509.13523) (co-author) - Contributed to model development, performance analysis, and scaling studies - MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design with DPO - [Finalist for the 2024 ACM Gordon Bell Prize](https://sc24.supercomputing.org/2024/10/presenting-the-finalists-for-the-2024-gordon-bell-prize/) (first-author) - AuroraGPT - Co-lead Models and Training team with Venkat Vishwanath - Ongoing writeup of pre-training efforts - Successfully pre-trained: - AuroraGPT-7B on 2T tokens - AuroraGPT-2B on 4T tokens (ongoing) - Catalyst for: - Arvind Ramanthan’s INCITE Project (`FoundEpidem`) - Zheng Zhang’s ALCC Project - Rao Kotamarthi’s ALCC Project - Member of Software Committee - [Intro to HPC Undergraduate Bootcamp](https://intro-hpc-bootcamp.alcf.anl.gov/): - Project lead for [Intro to {AI, HPC} for Science](https://saforem2.github.io/intro-hpc-bootcamp-2025/) ## Publications 1. [**AERIS**: _Argonne Earth Systems Model for Reliable and Skillful Predictions_](https://arxiv.org/abs/2509.13523) (Hatanpää et al. (2025))[^1] 2. [Aurora: Architecting Argonne’s First Exascale Supercomputer for Accelerated Scientific Discovery](https://arxiv.org/abs/2509.08207) (Allen et al. (2025)) 3. [HiPerRAG: High-Performance Retrieval Augmented Generation for Scientific Insights](https://arxiv.org/abs/2505.04846) (Gokdemir et al. (2025)) 4. [Automated Tuning for HMC Mass Ratios](https://www.osti.gov/biblio/2551828) (Torsiello et al. (2025)) 5. [MOFA: Discovering Materials for Carbon Capture with a GenAI and Simulation-Based Workflow](https://arxiv.org/abs/2501.10651) (Yan et al. (2025)) 6. [**MProt-DPO**: _Breaking the ExaFLOPS Barrier for Multimodal Protein Design with DPO_](https://doi.org/10.1109/SC41406.2024.00013) (Dharuman et al. (2024))[^2] ## Presentations - [Scientific AI at Scale: AI for Science](https://samforeman.me/talks/openskai25/ai4science/index.html) @ [Open SkAI 2025](https://www.openskai-conference.org) - [Scientific AI at Scale: Distributed Training](https://samforeman.me/talks/openskai25/training/index.html) @ [Open SkAI 2025](https://www.openskai-conference.org/) - [Large Scale Training on Diverse Accelerators](https://samforeman.me/talks/AuroraGPT-SIAM25/index.html) @ [Scalable Deep Learning, SIAM AN2025](https://meetings.siam.org/sess/dsp_programsess.cfm?SESSIONCODE=84772) - [LLMs on Aurora: 🌌 AuroraGPT](https://samforeman.me/talks/incite-hackathon-2025/AuroraGPT/index.html) @ [2025 ALCF INCITE GPU Hackathon](https://www.alcf.anl.gov/events/alcf-incite-gpu-hackathon) - [LLMs on Aurora: 🍋 ezpz](https://samforeman.me/talks/incite-hackathon-2025/ezpz/index.html) @ [2025 ALCF INCITE GPU Hackathon](https://www.alcf.anl.gov/events/alcf-incite-gpu-hackathon) - [AuroraGPT: Foundation Models for Science](https://samforeman.me/talks/aurora-gpt-fm-for-electric-grid/index.html) @ [Foundation Models for the Electric Grid](https://www.alcf.anl.gov/alcf-ai-science-training-series) - [Parallel Training Methods](https://samforeman.me/talks/ai-for-science-2024/index.html) @ [AI-for-Science on Supercomputers](https://www.alcf.anl.gov/alcf-ai-science-training-series) - [AuroraGPT](https://samforeman.me/talks/AuroraGPT/alcf-hpc-workshop-2024/index.html) @ [2024 ALCF Hands-On HPC Workshop](https://www.alcf.anl.gov/events/2024-alcf-hands-hpc-workshop) - [Machine Learning and Foundation Models at Scale](https://samforeman.me/talks/alcf-hpc-workshop-2024/index.html) @ [2024 ALCF Hands-On HPC Workshop](https://www.alcf.anl.gov/events/2024-alcf-hands-hpc-workshop) ## Posts - [📊 pbs-tui : TUI for PBS Job Scheduler Monitoring](https://samforeman.me/posts/2025/09/17/) - [🍹 BlendCorpus + TorchTitan @ ALCF](https://samforeman.me/posts/2025/09/12/) - [🏗️ Building PyTorch 2.8 from Source on Aurora](https://samforeman.me/posts/2025/06/14/) - [🚧 Frameworks Issue with numpy \> 2](https://samforeman.me/posts/2025/05/03/) - [🔥 Building PyTorch 2.6 from Source on Aurora](https://samforeman.me/posts/2025/04/28/) - [🪛 Torchtune on Aurora](https://samforeman.me/posts/torchtune-aurora/) - [🚑 Torchtune Patch on Aurora](https://samforeman.me/posts/torchtune-patch-aurora/) - [💾 Converting Checkpoints](https://samforeman.me/posts/auroragpt/checkpoints/) ## Organizational Efforts - Organizer for: - [SC25 Workshop: High Performance Python for Science at Scale (HPPSS)](https://hppss.github.io/SC25/) - [SC25 Tutorial: Accelerating and Scaling Python for HPC](https://sc25.conference-program.com/presentation/?id=tut121&sess=sess255) - [SC24 Workshop: High Performance Python for Science at Scale (HPPSS)](https://hppss.github.io/SC24/) - Served as reviewer for: - HiPC 2025 - SPIGM @ NeurIPS - ML4PS Workshop @ NeurIPS’24 - AI4Science Workshop @ NeurIPS’24 - GenBio Workshop @ NeurIPS’24 - AI4Science Workshop @ ICML’24 ## Mentoring - Khalid Hossain: Supported Khalid’s successful transition from postdoc to staff - Joseph Frimpong: Postdoc in [Center for Nanoscale Materials](https://cnm.anl.gov/group/Theory-and-Modeling) - Hung Nguyen: Graduate student @ UIUC ## Scientific / Technical Accomplishments ## References