# RefineAnything
**Multimodal Region-Specific Refinement for Perfect Local Details**
RefineAnything targets **region-specific image refinement**: given an input image and a user-specified region (e.g., scribble mask or bounding box), it restores fine-grained details—text, logos, thin structures—while keeping **all non-edited pixels unchanged**. It supports both **reference-based** and **reference-free** refinement.

---
## News
- **2026-04-21** — **Environment pinning update.** For best results (and to avoid color shifts), please use **exactly** the versions pinned in `requirement.txt`: `diffusers==0.36.0`, `transformers==4.55.0`, `safetensors==0.5.3`, `peft==0.17.0`. See [Environment Notice](#environment-notice) below for a visual comparison.
- **2026-04-21** — **Hugging Face Space environment fixed.** The online demo now runs on the correct dependency versions, so refinement results are noticeably better: .
- **2026-04-14** — Community ComfyUI integration by [@smthemex](https://github.com/smthemex): [ComfyUI_RefineAnything](https://github.com/smthemex/ComfyUI_RefineAnything). Thanks for the great work!
- **2026-04-14** — Local Gradio demo (`app.py`) is available for interactive testing.
- **2026-04-12** — Hugging Face Space demo is live: .
- **2026-04-09** — Checkpoint released on Hugging Face: .
- **2026-04-09** — Release inference scripts.
- **2026-04-08** — Documentation skeleton added; **code release coming this month** (inference scripts, environment, and checkpoints will be linked here).
- **TBD** — Checkpoints and training/evaluation resources will be announced once finalized.
---
## Highlights
- **Region-accurate refinement** — Explicit region cues (scribbles or boxes) steer edits to the target area.
- **Reference-based and reference-free** — Optional reference image for guided local detail recovery.
- **Strict background preservation** — Edits stay inside the target region; training emphasizes seamless boundaries.
---
## Comparisons


---
## Installation
```bash
pip install -r requirement.txt
```
> **Important — pin these versions exactly.** RefineAnything is sensitive to small numerical differences in the underlying libraries. Please install **exactly** the versions below; using newer or older releases can cause visible artifacts such as color shifts in the refined region.
>
> ```
> diffusers==0.36.0
> transformers==4.55.0
> safetensors==0.5.3
> peft==0.17.0
> ```
---
## Environment Notice
We have observed that mismatched versions of `diffusers` / `transformers` / `safetensors` / `peft` can introduce **color shifts** in the refined region, even when everything else is identical. The example below uses the prompt *"remove the hand"*:
Input (masked region = hand)
Correct environment
Wrong environment (color shift)
If your output shows a mild color/tone mismatch inside the mask while the rest of the image looks fine, the first thing to check is your package versions.
---
## Quick Start
Only **three** things are required to run RefineAnything:
| Argument | Description |
|----------|-------------|
| `--input` | Source image |
| `--mask` | Binary mask (white = region to refine) |
| `--prompt` | What to refine |
| `--ref` | *(optional)* Reference image for guided refinement |
---
### Demo 1 — Reference-based Logo Refinement
Refine a blurry logo on a pillow using a reference image.
```bash
python scripts/fast_inference.py \
--input src/input1.png \
--mask src/mask1.png \
--prompt "Refine the LOGO." \
--ref src/ref1.png \
--output output/demo1.png
```
Input
Reference
Prompt
"Refine the LOGO."
Output
---
### Demo 2 — Reference-free Text Refinement
Refine blurry Chinese text on a building sign — no reference image needed.
```bash
python scripts/fast_inference.py \
--input src/input2.png \
--mask src/mask2.png \
--prompt "refine the text '鼎好商城'" \
--output output/demo2.png
```
Input
Prompt
"refine the text '鼎好商城'"
Output
---
## Local Gradio Demo
We also provide a Gradio-based web UI for interactive testing. You can brush regions, upload reference images, and adjust all inference parameters in the browser.
```bash
python app.py
```
Then open `http://localhost:7860` in your browser. The app will automatically download the base model (`Qwen/Qwen-Image-Edit-2511`) and the RefineAnything LoRA from Hugging Face on first launch.
You can specify a custom base model path via the `MODEL_DIR` environment variable:
```bash
MODEL_DIR=/path/to/local/Qwen-Image-Edit-2511 python app.py
```
**Features of the Gradio demo:**
- **Brush-to-select**: paint directly on the source image to define the refinement region.
- **Optional reference image**: upload a second image and optionally brush to crop a specific reference area.
- **Focus crop**: automatically crops and zooms into the edit region for higher detail fidelity, then composites back seamlessly.
- **Lightning LoRA**: one-click toggle for faster inference with fewer steps.
- **Before / After slider**: instantly compare input and output.
---
## Citation
If you use this repository, please cite:
```bibtex
@article{zhou2026refineanything,
title={RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details},
author={Zhou, Dewei and Li, You and Yang, Zongxin and Yang, Yi},
journal={arXiv preprint arXiv:2604.06870},
year={2026}
}
```
---
## Acknowledgements and License
RefineAnything builds on ideas and components from the broader diffusion and multimodal ecosystem (including **Qwen2.5-VL**, **Qwen-Image**, and latent diffusion with **VAE** + **MMDiT**). Base model weights and API terms are subject to their respective licenses—**verify compliance before redistributing checkpoints or derived weights**.
Repository **code license**: *TBD* (e.g., Apache-2.0 or MIT)—set `LICENSE` when you open-source the implementation.