# Viewport Transform

Single-image viewport transformation with hole filling, based on [GEN3C](https://research.nvidia.com/labs/toronto-ai/GEN3C/).

This tool shifts the camera viewpoint of egocentric images to create augmented training data. It is used to generate viewpoint-diverse versions of human demonstration data for humanoid robot learning.

## Pipeline

1. **MoGe** - Monocular depth prediction
2. **Cache3D** - 3D point cloud warping to new viewpoint
3. **Stable Diffusion Inpainting** - Fill disoccluded regions (holes)

## Dependencies

```bash
pip install torch numpy opencv-python h5py pillow tqdm einops warp-lang diffusers transformers moge-model psutil
```

## Usage

### Single H5 file
```bash
python viewport_transform_batch_h5.py \
    --h5_file /path/to/input.h5 \
    --image_key "observation_image_left" \
    --trajectory "down" \
    --movement_distance 0.07 \
    --output_dir ./output
```

### Directory of H5 files (multi-GPU)
```bash
python viewport_transform_batch_h5.py \
    --h5_dir /path/to/h5_directory \
    --batch_size 32 \
    --trajectory "down" \
    --movement_distance 0.07 \
    --num_gpus 4 \
    --output_dir /path/to/output
```

### Parallel batch processing

For processing multiple batches in parallel across multiple GPUs, you can run separate processes with different GPU assignments:

```bash
# GPU 0,1,2,3 process batch_000
CUDA_VISIBLE_DEVICES=0,1,2,3 python viewport_transform_batch_h5.py \
    --h5_dir /path/to/data/batch_000 \
    --batch_size 32 \
    --trajectory "down" \
    --movement_distance 0.07 \
    --num_gpus 4 \
    --output_dir /path/to/output/batch_000 &

# GPU 4,5,6,7 process batch_001
CUDA_VISIBLE_DEVICES=4,5,6,7 python viewport_transform_batch_h5.py \
    --h5_dir /path/to/data/batch_001 \
    --batch_size 32 \
    --trajectory "down" \
    --movement_distance 0.07 \
    --num_gpus 4 \
    --output_dir /path/to/output/batch_001 &

# Wait for all tasks to complete
wait
```

### Key Arguments

| Argument | Description | Default |
|---|---|---|
| `--h5_file` / `--h5_dir` | Input H5 file or directory | - |
| `--image_key` | Key for image data in HDF5 | `observation_image_left` |
| `--trajectory` | Camera direction: `left`, `right`, `up`, `down`, `forward`, `backward` | `down` |
| `--movement_distance` | Camera movement distance | `0.1` |
| `--movement_distance_noise` | Random perturbation per sample | `0.02` |
| `--batch_size` | Frames per batch | `1` |
| `--num_gpus` | Number of GPUs | `1` |
| `--sd_model` | SD Inpainting model | `stabilityai/stable-diffusion-2-inpainting` |
| `--save_h5` | Save as H5 (replacing original images) | `false` |

## Acknowledgement

The 3D warping code (Cache3D, camera utilities, forward warping) is adapted from [NVIDIA Cosmos](https://github.com/NVIDIA/Cosmos) under the Apache 2.0 License.