--- name: wildworld-dataset description: WildWorld large-scale action-conditioned world modeling dataset with 108M+ frames from a photorealistic ARPG game, featuring per-frame annotations, 450+ actions, and explicit state information for generative world modeling research. triggers: - use WildWorld dataset - load WildWorld ARPG data - work with WildWorld annotations - WildWorld world modeling - action conditioned video dataset - WildBench benchmark evaluation - WildWorld frame annotations - generative ARPG dataset --- # WildWorld Dataset Skill > Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection. ## What WildWorld Is **WildWorld** is a large-scale action-conditioned world modeling dataset automatically collected from a photorealistic AAA action role-playing game (ARPG). It is designed for training and evaluating **dynamic world models** — generative models that predict future game states given past observations and player actions. ### Key Statistics | Property | Value | |---|---| | Total frames | 108M+ | | Actions | 450+ semantically meaningful | | Monster species | 29 | | Player characters | 4 | | Weapon types | 4 | | Distinct stages | 5 | | Max clip length | 30+ minutes continuous | ### Per-Frame Annotations Every frame includes: - **Character skeletons** — joint positions for player and monsters - **Actions & states** — HP, animation state, stamina, etc. - **Camera poses** — position, rotation, field of view - **Depth maps** — monocular depth for each frame - **Hierarchical captions** — action-level and sample-level natural language descriptions --- ## Project Status > ⚠️ As of March 2026, the dataset and WildBench benchmark have **not yet been released**. Monitor the repository for updates. ```bash # Watch the repository for dataset release # https://github.com/ShandaAI/WildWorld ``` --- ## Repository Setup ```bash # Clone the repository git clone https://github.com/ShandaAI/WildWorld.git cd WildWorld # Install dependencies (when benchmark code is released) pip install -r requirements.txt ``` --- ## Expected Dataset Structure Based on the paper and framework description, the dataset is expected to follow this structure: ``` WildWorld/ ├── data/ │ ├── sequences/ │ │ ├── stage_01/ │ │ │ ├── clip_000001/ │ │ │ │ ├── frames/ # RGB frames (e.g., PNG) │ │ │ │ ├── depth/ # Depth maps │ │ │ │ ├── skeleton/ # Per-frame skeleton JSON │ │ │ │ ├── states/ # HP, animation, stamina JSON │ │ │ │ ├── camera/ # Camera pose JSON │ │ │ │ └── actions/ # Action label files │ │ │ └── clip_000002/ │ │ └── stage_02/ │ └── captions/ │ ├── action_level/ # Per-action descriptions │ └── sample_level/ # Clip-level descriptions ├── benchmark/ │ └── wildbench/ # WildBench evaluation code ├── assets/ │ └── framework-arxiv.png ├── LICENSE └── README.md ``` --- ## Working with the Dataset (Anticipated API) ### Loading Frame Annotations ```python import json import os from pathlib import Path from PIL import Image import numpy as np class WildWorldClip: """Helper class to load a WildWorld clip and its annotations.""" def __init__(self, clip_dir: str): self.clip_dir = Path(clip_dir) self.frames_dir = self.clip_dir / "frames" self.depth_dir = self.clip_dir / "depth" self.skeleton_dir = self.clip_dir / "skeleton" self.states_dir = self.clip_dir / "states" self.camera_dir = self.clip_dir / "camera" self.actions_dir = self.clip_dir / "actions" def get_frame(self, frame_id: int) -> Image.Image: frame_path = self.frames_dir / f"{frame_id:06d}.png" return Image.open(frame_path) def get_depth(self, frame_id: int) -> np.ndarray: depth_path = self.depth_dir / f"{frame_id:06d}.npy" return np.load(depth_path) def get_skeleton(self, frame_id: int) -> dict: skeleton_path = self.skeleton_dir / f"{frame_id:06d}.json" with open(skeleton_path) as f: return json.load(f) def get_state(self, frame_id: int) -> dict: """Returns HP, animation state, stamina, etc.""" state_path = self.states_dir / f"{frame_id:06d}.json" with open(state_path) as f: return json.load(f) def get_camera(self, frame_id: int) -> dict: """Returns camera position, rotation, and FOV.""" camera_path = self.camera_dir / f"{frame_id:06d}.json" with open(camera_path) as f: return json.load(f) def get_action(self, frame_id: int) -> dict: action_path = self.actions_dir / f"{frame_id:06d}.json" with open(action_path) as f: return json.load(f) def iter_frames(self, start: int = 0, end: int = None): """Iterate over all frames in the clip.""" frame_files = sorted(self.frames_dir.glob("*.png")) for frame_path in frame_files[start:end]: frame_id = int(frame_path.stem) yield { "frame_id": frame_id, "frame": self.get_frame(frame_id), "depth": self.get_depth(frame_id), "skeleton": self.get_skeleton(frame_id), "state": self.get_state(frame_id), "camera": self.get_camera(frame_id), "action": self.get_action(frame_id), } # Usage clip = WildWorldClip("data/sequences/stage_01/clip_000001") for sample in clip.iter_frames(start=0, end=100): frame_id = sample["frame_id"] state = sample["state"] action = sample["action"] print(f"Frame {frame_id}: HP={state.get('hp')}, Action={action.get('name')}") ``` ### PyTorch Dataset ```python import torch from torch.utils.data import Dataset, DataLoader from pathlib import Path import json import numpy as np from PIL import Image import torchvision.transforms as T class WildWorldDataset(Dataset): """ PyTorch Dataset for WildWorld action-conditioned world modeling. Returns sequences of (frames, actions, states) for next-frame prediction. """ def __init__( self, root_dir: str, sequence_length: int = 16, image_size: tuple = (256, 256), stage: str = None, split: str = "train", ): self.root_dir = Path(root_dir) self.sequence_length = sequence_length self.image_size = image_size self.transform = T.Compose([ T.Resize(image_size), T.ToTensor(), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) # Discover all clips self.clips = self._discover_clips(stage, split) self.samples = self._build_sample_index() def _discover_clips(self, stage, split): clips = [] stage_dirs = ( [self.root_dir / "data" / "sequences" / stage] if stage else sorted((self.root_dir / "data" / "sequences").iterdir()) ) for stage_dir in stage_dirs: if stage_dir.is_dir(): for clip_dir in sorted(stage_dir.iterdir()): if clip_dir.is_dir(): clips.append(clip_dir) # Simple train/val split split_idx = int(len(clips) * 0.9) return clips[:split_idx] if split == "train" else clips[split_idx:] def _build_sample_index(self): """Build index of (clip_dir, start_frame) pairs.""" samples = [] for clip_dir in self.clips: frames = sorted((clip_dir / "frames").glob("*.png")) n_frames = len(frames) for start in range(0, n_frames - self.sequence_length, self.sequence_length // 2): samples.append((clip_dir, start)) return samples def __len__(self): return len(self.samples) def __getitem__(self, idx): clip_dir, start = self.samples[idx] frames_dir = clip_dir / "frames" frame_files = sorted(frames_dir.glob("*.png"))[start:start + self.sequence_length] frames, actions, states = [], [], [] for frame_path in frame_files: frame_id = int(frame_path.stem) # Load RGB frame img = Image.open(frame_path).convert("RGB") frames.append(self.transform(img)) # Load action action_path = clip_dir / "actions" / f"{frame_id:06d}.json" with open(action_path) as f: action_data = json.load(f) actions.append(action_data.get("action_id", 0)) # Load state state_path = clip_dir / "states" / f"{frame_id:06d}.json" with open(state_path) as f: state_data = json.load(f) states.append([ state_data.get("hp", 1.0), state_data.get("stamina", 1.0), state_data.get("animation_id", 0), ]) return { "frames": torch.stack(frames), # (T, C, H, W) "actions": torch.tensor(actions, dtype=torch.long), # (T,) "states": torch.tensor(states, dtype=torch.float32), # (T, S) } # Usage dataset = WildWorldDataset( root_dir="/path/to/WildWorld", sequence_length=16, image_size=(256, 256), split="train", ) loader = DataLoader(dataset, batch_size=4, shuffle=True, num_workers=4) for batch in loader: frames = batch["frames"] # (B, T, C, H, W) actions = batch["actions"] # (B, T) states = batch["states"] # (B, T, S) print(f"Frames: {frames.shape}, Actions: {actions.shape}") break ``` ### Filtering by Action Type ```python # Action categories in WildWorld ACTION_CATEGORIES = { "movement": ["walk", "run", "sprint", "dodge", "jump"], "attack": ["light_attack", "heavy_attack", "combo_finisher"], "skill": ["skill_cast_1", "skill_cast_2", "skill_cast_3", "skill_cast_4"], "defense": ["block", "parry", "guard"], "idle": ["idle", "idle_combat"], } def filter_clips_by_action(dataset_root: str, action_category: str) -> list: """Find all frame indices that contain a specific action category.""" root = Path(dataset_root) results = [] target_actions = ACTION_CATEGORIES.get(action_category, []) for clip_dir in root.glob("data/sequences/**"): if not clip_dir.is_dir(): continue for action_file in sorted((clip_dir / "actions").glob("*.json")): with open(action_file) as f: data = json.load(f) if data.get("action_name") in target_actions: results.append({ "clip": str(clip_dir), "frame_id": int(action_file.stem), "action": data.get("action_name"), }) return results # Find all skill cast frames skill_frames = filter_clips_by_action("/path/to/WildWorld", "skill") print(f"Found {len(skill_frames)} skill cast frames") ``` --- ## WildBench Evaluation ```python # WildBench evaluates world models on next-frame prediction quality. # Expected metrics: FVD, PSNR, SSIM, action accuracy class WildBenchEvaluator: """Evaluator for world model predictions on WildBench.""" def __init__(self, benchmark_dir: str): self.benchmark_dir = Path(benchmark_dir) self.metrics = {} def evaluate(self, model, dataloader): from torchmetrics.image import StructuralSimilarityIndexMeasure, PeakSignalNoiseRatio ssim = StructuralSimilarityIndexMeasure() psnr = PeakSignalNoiseRatio() all_psnr, all_ssim = [], [] for batch in dataloader: frames = batch["frames"] # (B, T, C, H, W) actions = batch["actions"] # (B, T) states = batch["states"] # (B, T, S) # Use first T-1 frames to predict the T-th frame context_frames = frames[:, :-1] context_actions = actions[:, :-1] target_frame = frames[:, -1] with torch.no_grad(): predicted_frame = model(context_frames, context_actions, states[:, :-1]) all_psnr.append(psnr(predicted_frame, target_frame).item()) all_ssim.append(ssim(predicted_frame, target_frame).item()) return { "PSNR": np.mean(all_psnr), "SSIM": np.mean(all_ssim), } ``` --- ## Citation ```bibtex @misc{li2026wildworldlargescaledatasetdynamic, title={WildWorld: A Large-Scale Dataset for Dynamic World Modeling with Actions and Explicit State toward Generative ARPG}, author={Zhen Li and Zian Meng and Shuwei Shi and Wenshuo Peng and Yuwei Wu and Bo Zheng and Chuanhao Li and Kaipeng Zhang}, year={2026}, eprint={2603.23497}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2603.23497}, } ``` --- ## Resources - **Project Page**: https://shandaai.github.io/wildworld-project/ - **arXiv Paper**: https://arxiv.org/abs/2603.23497 - **YouTube Demo**: https://www.youtube.com/watch?v=9vcSg553r2g - **GitHub**: https://github.com/ShandaAI/WildWorld --- ## Troubleshooting | Issue | Solution | |---|---| | Dataset not yet available | Monitor the repo; dataset release is pending as of March 2026 | | Frame loading OOM | Reduce `sequence_length` or `image_size` in the Dataset | | Missing annotation files | Check that all subdirs (frames, depth, skeleton, states, camera, actions) are fully downloaded | | Slow DataLoader | Increase `num_workers`, use SSD storage, or preprocess to HDF5 | | Benchmark code not found | The `benchmark/wildbench` directory will be released separately — watch the repo |