- Here to talk about AuroraGPT, Argonne’s internal effort to build a general purpose scientific LLM, broadly trained on a general corpora of text + scientific \{papers, text, data\} - As part of this effort, we plan to… - Explore pathways, build with international partners, multi-\{lingual, modal\} - Rough timeline of the project and deliverables: - 202\{3,4\}: text-only models, plan to release a series of \{7B, 70B, 1T\} models - 202\{4,5\}: Basic multi-modal models - 202\{5,6\}: Advanced scientific multimodal models - AuroraGPT: Exascale Pre-Training of Large Language Models on Diverse Accelerators \> [argonne-lcf/Megatron-DeepSpeed](https://github.com/argonne-lcf/Megatron-DeepSpeed) \> Large Model Training: any scale, any accelerator - Thoughts: - yeah okay so I’ll probably try and include then like: - [x] \{tensor, pipeline, sequence\}-parallelism - [x] DeepSpeed integration (ZeRO offloading, activation checkpointing, …) - [x] Robust mechanisms for automatic experiment \{configuration, tracking, …\} - [x] Support for modern (and experimental!) optimizers - [x] Large batch training - Goals - Issues with existing models - AuroraGPT - Project Details - Teams, Ongoing Efforts - Scientific Evaluations - Scaling Results - MProt-DPO - ~~aeris~~ (??)

- AuroraGPT will be a publicly distributed, open source foundation model for open science - Is being trained on: - Scientific / engineering structured data - General text, media, news, etc. - Large amounts of low to medium quality data - Much less high quality data (that is publicly available for use) - This data is then cleaned, processed, de-duplicated and used for the initial pre-training phase of the model - The vast majority of the overall compute is spent during this initial pre-training phase - This is the group I help to lead and will be talking a bit about today - The initial pre-training phase is currently underway - Eventually, given a bit of time, effort and magic, the model will be ready for fine-tuning and additional training for a variety of downstream tasks - The pretrained model will then be handed off for additional fine-tuning on a variety of downstream tasks - Scientific discovery - Accelerate scientific tasks - Digital twins - Inverse design - Code optimization - Accelerated simulations - Autonomous experiments - Co-design - Becoming increasingly clear that LLMs have the potential to drastically accelerate computational science - We’ve seen this already for \{GenSLMs, Weather / Climate / Earth Systems Modeling, Particle Physics, etc.\}

Dharuman, Gautham, Kyle Hippe, Alexander Brace, et al. 2024. “MProt-DPO: Breaking the ExaFLOPS Barrier for Multimodal Protein Design Workflows with Direct Preference Optimization.” _Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis_ (Atlanta, GA, USA), SC ’24. [https://doi.org/10.1109/SC41406.2024.00013](https://doi.org/10.1109/SC41406.2024.00013).

Hosseini, Ryien, Filippo Simini, Venkatram Vishwanath, Rebecca Willett, and Henry Hoffmann. 2025. “Quality Measures for Dynamic Graph Generative Models.” _The Thirteenth International Conference on Learning Representations_. [https://openreview.net/forum?id=8bjspmAMBk](https://openreview.net/forum?id=8bjspmAMBk).

McCandlish, Sam, Jared Kaplan, Dario Amodei, and OpenAI Dota Team. 2018. _An Empirical Model of Large-Batch Training_. [https://arxiv.org/abs/1812.06162](https://arxiv.org/abs/1812.06162).

Song, Shuaiwen Leon, Bonnie Kruft, Minjia Zhang, et al. 2023. _DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery Through Sophisticated AI System Technologies_. [https://arxiv.org/abs/2310.04610](https://arxiv.org/abs/2310.04610).

Wei, Jason, Yi Tay, Rishi Bommasani, et al. 2022. _Emergent Abilities of Large Language Models_. [https://arxiv.org/abs/2206.07682](https://arxiv.org/abs/2206.07682).