[](https://rsidea.whu.edu.cn/)
MaRS: A Multi-Modality Very-High-Resolution Remote Sensing Foundation Model with Cross-Granularity Meta-Modality Learning
✨AAAI 2026✨
1Wuhan University

Overall framework of MaRS and examples of downstream tasks.
---
## 📰 Latest News
- **Nov 2025** — MaRS paper accepted to **AAAI 2026**.
- **Nov 2025** — Pretraining code and model weights officially released.
---
## 📦 Overview
**MaRS** is a large-scale multi-modality foundation model designed for very-high-resolution remote sensing imagery.
It introduces **Cross-Granularity Meta-Modality Learning**, enabling robust representation learning across optical RGB and SAR modalities, at large spatial resolutions.
This repository provides:
- Pretrained weights (`mars_base`, `mars_large`)
- Pretraining pipeline (data processing, configuration, and scripts)
- Instructions for loading MaRS using **timm** (compatible with SwinV2 architecture)
---
## 🔧 Using MaRS in Your Project
All pretrained weights are available at:
MaRS follows the **SwinV2** architecture and can be loaded directly using `timm==1.0.15`.
### ▶ Optical RGB Example
```python
backbone_mars = timm.create_model(
'swinv2_base_window8_256',
pretrained=False,
features_only=True,
in_chans=3,
img_size=512,
checkpoint_path='mars_base_rgb_encoder_only.pth'
)
```
### ▶ SAR Example
```python
backbone_mars = timm.create_model(
'swinv2_base_window8_256',
pretrained=False,
features_only=True,
in_chans=1,
img_size=512,
checkpoint_path='mars_base_sar_encoder_only.pth'
)
```
The pretrained backbone has been validated on a wide range of high-resolution optical and multi-modal downstream tasks (details in the paper).
---
## 🏗️ Pretraining Pipeline
This section describes how to reproduce MaRS pretraining.
---
### 1. Environment Setup
A minimal software environment used in our experiments:
```text
python = 3.11.13
torch = 2.7.0
tifffile = 2025.3.30
timm = 1.0.15
```
---
### 2. Data Preparation
The full **MaRS-16M** pretraining corpus (~5 TB) is too large for public hosting.
A **public experimental subset** will be released :.
To request full dataset access for academic collaboration, please contact:
```
yangruoyu@whu.edu.cn
```
Note: The dataset is currently under organization and is not publicly available. For collaboration inquiries, please feel free to contact us via email.
#### 2.1 Download & Organize Raw Data
```bash
mkdir -p ./data
# Place Umbra / Capella raw tiles into ./data
```
#### 2.2 Patch Extraction
Extract **1024 × 1024** training patches:
```bash
python ./data/split_patch.py
```
After extraction:
```text
data/
├── Capella_patches/
│ ├── rgb/
│ └── sar/
└── Umbra_patches/
├── rgb/
└── sar/
```
- `rgb/`: optical patches
- `sar/`: SAR patches
---
### 3. Launching Pretraining
Example commands for 8×GPU single-node training using `torchrun`.
#### 3.1 MaRS-Base
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
torchrun \
--nproc-per-node=8 \
--nnodes=1 --node_rank=0 \
--master_addr=localhost --master_port=12345 \
main_pretrain.py \
--model mars_base \
--batch_size 16 \
--num_workers 8 \
--output_dir ./work_dirs/mars_base \
--log_dir ./work_dirs/mars_base \
--epochs 12 \
--warmup_epochs 1
```
#### 3.2 MaRS-Large
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
torchrun \
--nproc-per-node=8 \
--nnodes=1 --node_rank=0 \
--master_addr=localhost --master_port=12345 \
main_pretrain.py \
--model mars_large \
--batch_size 12 \
--num_workers 8 \
--output_dir ./work_dirs/mars_large \
--log_dir ./work_dirs/mars_large \
--epochs 12 \
--warmup_epochs 1
```
---
### 4. Converting MaRS Weights to Swin Format
To make MaRS weights directly loadable by SwinTransformer (and `timm`), convert them via:
```bash
python utils/convert_mars_checkpoints_to_swin.py
```
The released weights have already undergone this conversion.
---
## 📕 Downtasks Dataset
GUSO:Multi-modality Paired High-resolution Remote Sensing Dataset. [Under review]
EarthMiss: Missing Modality Land Cover Mapping. [Download](https://rsidea.whu.edu.cn/EarthMiss.html)
DFC25-T2: multimodal VHR dataset for all-weather disaster response. [Download](https://github.com/ChenHongruixuan/BRIGHT)
SARDet-100k: SAR Modality Object Detection Dataset. [Download](https://github.com/zcablii/SARDet_100K)
UBC-V2: Multi-modality High-resolution Remote Sensing Building Detection Dataset. [Download](https://github.com/AICyberTeam/UBC-dataset/tree/UBCv2)
UBC: Multi-modality High-resolution Remote Sensing Building Height Estimation Dataset. [Download](https://github.com/AICyberTeam/UBC-dataset)
WHU-CD: High-resolution Remote Sensing Change Detection Dataset. [Download](https://gpcv.whu.edu.cn/data/building_dataset.html)
DeepGlobe: High-resolution Remote Sensing Road Extraction Dataset. [Download](https://www.eotdl.com/datasets/DeepGlobeRoadExtraction)
---
## 📖 Citation
If you find **MaRS** useful in your research, please cite:
```bibtex
@inproceedings{yang2026mars,
title={MaRS: A Multi-Modality Very-High-Resolution Remote Sensing Foundation Model with Cross-Granularity Meta-Modality Learning},
author={Ruoyu Yang and Yinhe Liu and Heng Yan and Yiheng Zhou and Yihan Fu and Han Luo and Yanfei Zhong},
booktitle={AAAI Conference on Artificial Intelligence},
year={2026}
}
```
---
## © Copyright & Usage
This method is copyrighted by the **Intelligent Remote Sensing Data Extraction, Analysis and Application Research Group (RSIDEA)**
affiliated with the **State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University**.
**MaRS is released strictly for academic research purposes.**
---