--- layout: '@/layouts/Doc.astro' title: "AERIS: Argonne's Earth Systems Model" date: '2025-10-08' location: '2025 ALCF Hands On HPC Workshop' --- Sam Foreman 2025-10-08 - [🌎 AERIS](#earth_americas-aeris) - [High-Level Overview of AERIS](#high-level-overview-of-aeris) - [Contributions](#contributions) - [Model Overview](#model-overview) - [Windowed Self-Attention](#windowed-self-attention) - [Model Architecture: Details](#model-architecture-details) - [Issues with the Deterministic Approach](#issues-with-the-deterministic-approach) - [Transitioning to a Probabilistic Model](#transitioning-to-a-probabilistic-model) - [Sequence-Window-Pipeline Parallelism `SWiPe`](#sequence-window-pipeline-parallelism-swipe) - [Aurora](#aurora) - [AERIS: Scaling Results](#aeris-scaling-results) - [Hurricane Laura](#hurricane-laura) - [S2S: Subsseasonal-to-Seasonal Forecasts](#s2s-subsseasonal-to-seasonal-forecasts) - [Seasonal Forecast Stability](#seasonal-forecast-stability) - [Next Steps](#next-steps) - [References](#references) - [Extras](#extras) - [Overview of Diffusion Models](#overview-of-diffusion-models) - [Diffusion Model: Forward Process](#diffusion-model-forward-process) - [Acknowledgements](#acknowledgements) ## 🌎 AERIS
Reverse Diffusion Process Forward Diffusion Process (\pi\rightarrow \mathcal{N})
## Sequence-Window-Pipeline Parallelism `SWiPe`
- `SWiPe` is a **novel parallelism strategy** for Swin-based Transformers - Hybrid 3D Parallelism strategy, combining: - Sequence parallelism (`SP`) - Window parallelism (`WP`) - Pipeline parallelism (`PP`)
Figure 6
Figure 7: `SWiPe` Communication Patterns
## Aurora
Table 3: Aurora[^4] Specs | Property | Value | | -------: | :------ | | Racks | 166 | | Nodes | 10,624 | | XPUs[^5] | 127,488 | | CPUs | 21,248 | | NICs | 84,992 | | HBM | 8 PB | | DDR5c | 10 PB |
Figure 8: Aurora: [Fact Sheet](https://www.alcf.anl.gov/sites/default/files/2024-07/Aurora_FactSheet_2024.pdf).
## AERIS: Scaling Results
Figure 9: AERIS: Scaling Results
- **10 EFLOPs** (sustained) @ **120,960 GPUs** - See (Hatanpää et al. (2025)) for additional details - [arXiv:2509.13523](https://arxiv.org/abs/2509.13523)
## Hurricane Laura
Figure 10: Hurricane Laura tracks (top) and intensity (bottom). Initialized 7(a), 5(b) and 3(c) days prior to 2020-08-28T00z.
## S2S: Subsseasonal-to-Seasonal Forecasts
> [!IMPORTANT] 🌡️ S2S Forecasts > > We demonstrate for the first time, the ability of a generative, high > resolution (native ERA5) diffusion model to produce skillful forecasts > on the S2S timescales with realistic evolutions of the Earth system > (atmosphere + ocean).
- To assess trends that extend beyond that of our medium-range weather forecasts (beyond 14-days) and evaluate the stability of our model, we made 3,000 forecasts (60 initial conditions each with 50 ensembles) out to 90 days. - AERIS was found to be stable during these 90-day forecasts - Realistic atmospheric states - Correct power spectra even at the smallest scales
## Seasonal Forecast Stability
Figure 11: S2S Stability: (a) Spring barrier El Niño with realistic ensemble spread in the ocean; (b) qualitatively sharp fields of SST and Q700 predicted 90 days in the future from the closest ensemble member to the ERA5 in (a); and (c) stable Hovmöller diagrams of U850 anomalies (climatology removed; m/s), averaged between 10°S and 10°N, for a 90-day rollout.
## Next Steps - [Swift](https://github.com/stockeh/swift): Swift, a single-step consistency model that, for the first time, enables autoregressive finetuning of a probability flow model with a continuous ranked probability score (CRPS) objective ## References 1. [What are Diffusion Models? \| Lil’Log](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/) 2. [Step by Step visual introduction to Diffusion Models. - Blog by Kemal Erdem](https://erdem.pl/2023/11/step-by-step-visual-introduction-to-diffusion-models) 3. [Understanding Diffusion Models: A Unified Perspective](https://calvinyluo.com/2022/08/26/diffusion-tutorial.html)
Hatanpää, Väinö, Eugene Ku, Jason Stock, et al. 2025. _AERIS: Argonne Earth Systems Model for Reliable and Skillful Predictions_. [https://arxiv.org/abs/2509.13523](https://arxiv.org/abs/2509.13523).
Price, Ilan, Alvaro Sanchez-Gonzalez, Ferran Alet, et al. 2024. _GenCast: Diffusion-Based Ensemble Forecasting for Medium-Range Weather_. [https://arxiv.org/abs/2312.15796](https://arxiv.org/abs/2312.15796).
## Extras ### Overview of Diffusion Models **Goal**: We would like to (efficiently) draw samples $x_{i}$ from a (potentially unknown) _target_ distribution $q(\cdot)$. - Given $x_{0} \sim q(x)$, we can construct a _forward diffusion process_ by gradually adding noise to $x_{0}$ over $T$ steps: $x_{0} \rightarrow \left\{x_{1}, \ldots, x_{T}\right\}$. - Step sizes $\beta_{t} \in (0, 1)$ controlled by a _variance schedule_ $\{\beta\}_{t=1}^{T}$, with: $$ \begin{aligned} q(x_{t}|x_{t-1}) = \mathcal{N}(x_{t}; \sqrt{1-\beta_{t}} x_{t-1}, \beta_{t} I) \\ q(x_{1:T}|x_{0}) = \prod_{t=1}^{T} q(x_{t}|x_{t-1}) \end{aligned} $$ ### Diffusion Model: Forward Process - Introduce: - $\alpha_{t} \equiv 1 - \beta_{t}$ - $\bar{\alpha}_{t} \equiv \prod_{s=1}^{T} \alpha_{s}$ We can write the forward process as: $$ q(x*{1}|x*{0}) = \mathcal{N}(x*{1}; \sqrt{\bar{\alpha}*{1}} x*{0}, (1-\bar{\alpha}*{1}) I)$$ - We see that the _mean_ $\mu_{t} = \sqrt{\alpha_{t}} x_{t-1} = \sqrt{\bar{\alpha}_{t}} x_{0}$ ## Acknowledgements > This research used resources of the Argonne Leadership Computing > Facility, which is a DOE Office of Science User Facility supported > under Contract DE-AC02-06CH11357. [^1]: Relative to PDE-based models, e.g.: [GFS](https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/global-forcast-system-gfs) [^2]: Demonstrated on up to 120,960 GPUs on Aurora and 8,064 GPUs on LUMI. [^3]: ~ 14,000 days of data [^4]: 🏆 [Aurora Supercomputer Ranks Fastest for AI](https://www.intel.com/content/www/us/en/newsroom/news/intel-powered-aurora-supercomputer-breaks-exascale-barrier.html) [^5]: Each node has 6 Intel Data Center GPU Max 1550 (code-named “Ponte Vecchio”) tiles, with 2 XPUs per tile.