---
layout: '@/layouts/Doc.astro'
title: "AERIS: Argonne's Earth Systems Model"
date: '2025-10-08'
location: '2025 ALCF Hands On HPC Workshop'
---
Sam Foreman
2025-10-08
- [🌎 AERIS](#earth_americas-aeris)
- [High-Level Overview of AERIS](#high-level-overview-of-aeris)
- [Contributions](#contributions)
- [Model Overview](#model-overview)
- [Windowed Self-Attention](#windowed-self-attention)
- [Model Architecture: Details](#model-architecture-details)
- [Issues with the Deterministic
Approach](#issues-with-the-deterministic-approach)
- [Transitioning to a Probabilistic
Model](#transitioning-to-a-probabilistic-model)
- [Sequence-Window-Pipeline Parallelism
`SWiPe`](#sequence-window-pipeline-parallelism-swipe)
- [Aurora](#aurora)
- [AERIS: Scaling Results](#aeris-scaling-results)
- [Hurricane Laura](#hurricane-laura)
- [S2S: Subsseasonal-to-Seasonal
Forecasts](#s2s-subsseasonal-to-seasonal-forecasts)
- [Seasonal Forecast Stability](#seasonal-forecast-stability)
- [Next Steps](#next-steps)
- [References](#references)
- [Extras](#extras)
- [Overview of Diffusion Models](#overview-of-diffusion-models)
- [Diffusion Model: Forward Process](#diffusion-model-forward-process)
- [Acknowledgements](#acknowledgements)
## 🌎 AERIS
- `SWiPe` is a **novel parallelism strategy** for Swin-based
Transformers
- Hybrid 3D Parallelism strategy, combining:
- Sequence parallelism (`SP`)
- Window parallelism (`WP`)
- Pipeline parallelism (`PP`)

Figure 6
Table 3: Aurora[^4] Specs
| Property | Value |
| -------: | :------ |
| Racks | 166 |
| Nodes | 10,624 |
| XPUs[^5] | 127,488 |
| CPUs | 21,248 |
| NICs | 84,992 |
| HBM | 8 PB |
| DDR5c | 10 PB |

Figure 8: Aurora: [Fact
Sheet](https://www.alcf.anl.gov/sites/default/files/2024-07/Aurora_FactSheet_2024.pdf).
## AERIS: Scaling Results
> [!IMPORTANT] 🌡️ S2S Forecasts
>
> We demonstrate for the first time, the ability of a generative, high
> resolution (native ERA5) diffusion model to produce skillful forecasts
> on the S2S timescales with realistic evolutions of the Earth system
> (atmosphere + ocean).
- To assess trends that extend beyond that of our medium-range weather
forecasts (beyond 14-days) and evaluate the stability of our model, we
made 3,000 forecasts (60 initial conditions each with 50 ensembles)
out to 90 days.
- AERIS was found to be stable during these 90-day forecasts
- Realistic atmospheric states
- Correct power spectra even at the smallest scales
## Seasonal Forecast Stability

Figure 11: S2S Stability: (a) Spring barrier El Niño with realistic
ensemble spread in the ocean; (b) qualitatively sharp fields of SST and
Q700 predicted 90 days in the future from the
closest ensemble member to the ERA5 in (a);
and (c) stable Hovmöller diagrams of U850 anomalies (climatology removed; m/s),
averaged between 10°S and 10°N, for a 90-day rollout.
## Next Steps
- [Swift](https://github.com/stockeh/swift): Swift, a single-step
consistency model that, for the first time, enables autoregressive
finetuning of a probability flow model with a continuous ranked
probability score (CRPS) objective
## References
1. [What are Diffusion Models? \|
Lil’Log](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/)
2. [Step by Step visual introduction to Diffusion Models. - Blog by
Kemal
Erdem](https://erdem.pl/2023/11/step-by-step-visual-introduction-to-diffusion-models)
3. [Understanding Diffusion Models: A Unified
Perspective](https://calvinyluo.com/2022/08/26/diffusion-tutorial.html)
Hatanpää, Väinö, Eugene Ku, Jason Stock, et al. 2025. _AERIS: Argonne
Earth Systems Model for Reliable and Skillful Predictions_.
[https://arxiv.org/abs/2509.13523](https://arxiv.org/abs/2509.13523).
Price, Ilan, Alvaro Sanchez-Gonzalez, Ferran Alet, et al. 2024.
_GenCast: Diffusion-Based Ensemble Forecasting for Medium-Range
Weather_. [https://arxiv.org/abs/2312.15796](https://arxiv.org/abs/2312.15796).
## Extras
### Overview of Diffusion Models
**Goal**: We would like to (efficiently) draw samples $x_{i}$ from a
(potentially unknown) _target_ distribution $q(\cdot)$.
- Given $x_{0} \sim q(x)$, we can construct a _forward diffusion
process_ by gradually adding noise to $x_{0}$ over $T$ steps:
$x_{0} \rightarrow \left\{x_{1}, \ldots, x_{T}\right\}$.
- Step sizes $\beta_{t} \in (0, 1)$ controlled by a _variance
schedule_ $\{\beta\}_{t=1}^{T}$, with:
$$
\begin{aligned}
q(x_{t}|x_{t-1}) = \mathcal{N}(x_{t}; \sqrt{1-\beta_{t}} x_{t-1}, \beta_{t} I) \\
q(x_{1:T}|x_{0}) = \prod_{t=1}^{T} q(x_{t}|x_{t-1})
\end{aligned}
$$
### Diffusion Model: Forward Process
- Introduce:
- $\alpha_{t} \equiv 1 - \beta_{t}$
- $\bar{\alpha}_{t} \equiv \prod_{s=1}^{T} \alpha_{s}$
We can write the forward process as:
$$ q(x*{1}|x*{0}) = \mathcal{N}(x*{1}; \sqrt{\bar{\alpha}*{1}} x*{0}, (1-\bar{\alpha}*{1}) I)$$
- We see that the _mean_
$\mu_{t} = \sqrt{\alpha_{t}} x_{t-1} = \sqrt{\bar{\alpha}_{t}} x_{0}$
## Acknowledgements
> This research used resources of the Argonne Leadership Computing
> Facility, which is a DOE Office of Science User Facility supported
> under Contract DE-AC02-06CH11357.
[^1]:
Relative to PDE-based models, e.g.:
[GFS](https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/global-forcast-system-gfs)
[^2]:
Demonstrated on up to 120,960 GPUs on Aurora and 8,064 GPUs on
LUMI.
[^3]: ~ 14,000 days of data
[^4]:
🏆 [Aurora Supercomputer Ranks Fastest for
AI](https://www.intel.com/content/www/us/en/newsroom/news/intel-powered-aurora-supercomputer-breaks-exascale-barrier.html)
[^5]:
Each node has 6 Intel Data Center GPU Max 1550 (code-named “Ponte
Vecchio”) tiles, with 2 XPUs per tile.