# RefineAnything **Multimodal Region-Specific Refinement for Perfect Local Details**

RefineAnything targets **region-specific image refinement**: given an input image and a user-specified region (e.g., scribble mask or bounding box), it restores fine-grained details—text, logos, thin structures—while keeping **all non-edited pixels unchanged**. It supports both **reference-based** and **reference-free** refinement. ![Teaser](docs/static/teaser.png) --- ## News - **2026-04-21** — **Environment pinning update.** For best results (and to avoid color shifts), please use **exactly** the versions pinned in `requirement.txt`: `diffusers==0.36.0`, `transformers==4.55.0`, `safetensors==0.5.3`, `peft==0.17.0`. See [Environment Notice](#environment-notice) below for a visual comparison. - **2026-04-21** — **Hugging Face Space environment fixed.** The online demo now runs on the correct dependency versions, so refinement results are noticeably better: . - **2026-04-14** — Community ComfyUI integration by [@smthemex](https://github.com/smthemex): [ComfyUI_RefineAnything](https://github.com/smthemex/ComfyUI_RefineAnything). Thanks for the great work! - **2026-04-14** — Local Gradio demo (`app.py`) is available for interactive testing. - **2026-04-12** — Hugging Face Space demo is live: . - **2026-04-09** — Checkpoint released on Hugging Face: . - **2026-04-09** — Release inference scripts. - **2026-04-08** — Documentation skeleton added; **code release coming this month** (inference scripts, environment, and checkpoints will be linked here). - **TBD** — Checkpoints and training/evaluation resources will be announced once finalized. --- ## Highlights - **Region-accurate refinement** — Explicit region cues (scribbles or boxes) steer edits to the target area. - **Reference-based and reference-free** — Optional reference image for guided local detail recovery. - **Strict background preservation** — Edits stay inside the target region; training emphasizes seamless boundaries. --- ## Comparisons ![Reference-free qualitative comparisons](docs/static/qualitative_free.png) ![Reference-based qualitative comparisons](docs/static/qualitative_reference.png) --- ## Installation ```bash pip install -r requirement.txt ``` > **Important — pin these versions exactly.** RefineAnything is sensitive to small numerical differences in the underlying libraries. Please install **exactly** the versions below; using newer or older releases can cause visible artifacts such as color shifts in the refined region. > > ``` > diffusers==0.36.0 > transformers==4.55.0 > safetensors==0.5.3 > peft==0.17.0 > ``` --- ## Environment Notice We have observed that mismatched versions of `diffusers` / `transformers` / `safetensors` / `peft` can introduce **color shifts** in the refined region, even when everything else is identical. The example below uses the prompt *"remove the hand"*:

Input (masked region = hand)	Correct environment	Wrong environment (color shift)

If your output shows a mild color/tone mismatch inside the mask while the rest of the image looks fine, the first thing to check is your package versions. --- ## Quick Start Only **three** things are required to run RefineAnything: | Argument | Description | |----------|-------------| | `--input` | Source image | | `--mask` | Binary mask (white = region to refine) | | `--prompt` | What to refine | | `--ref` | *(optional)* Reference image for guided refinement | --- ### Demo 1 — Reference-based Logo Refinement Refine a blurry logo on a pillow using a reference image. ```bash python scripts/fast_inference.py \ --input src/input1.png \ --mask src/mask1.png \ --prompt "Refine the LOGO." \ --ref src/ref1.png \ --output output/demo1.png ```

Input	Reference	Prompt
		"Refine the LOGO."
Output

--- ### Demo 2 — Reference-free Text Refinement Refine blurry Chinese text on a building sign — no reference image needed. ```bash python scripts/fast_inference.py \ --input src/input2.png \ --mask src/mask2.png \ --prompt "refine the text '鼎好商城'" \ --output output/demo2.png ```

Input	Prompt
	"refine the text '鼎好商城'"
Output

--- ## Local Gradio Demo We also provide a Gradio-based web UI for interactive testing. You can brush regions, upload reference images, and adjust all inference parameters in the browser. ```bash python app.py ``` Then open `http://localhost:7860` in your browser. The app will automatically download the base model (`Qwen/Qwen-Image-Edit-2511`) and the RefineAnything LoRA from Hugging Face on first launch. You can specify a custom base model path via the `MODEL_DIR` environment variable: ```bash MODEL_DIR=/path/to/local/Qwen-Image-Edit-2511 python app.py ``` **Features of the Gradio demo:** - **Brush-to-select**: paint directly on the source image to define the refinement region. - **Optional reference image**: upload a second image and optionally brush to crop a specific reference area. - **Focus crop**: automatically crops and zooms into the edit region for higher detail fidelity, then composites back seamlessly. - **Lightning LoRA**: one-click toggle for faster inference with fewer steps. - **Before / After slider**: instantly compare input and output. --- ## Citation If you use this repository, please cite: ```bibtex @article{zhou2026refineanything, title={RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details}, author={Zhou, Dewei and Li, You and Yang, Zongxin and Yang, Yi}, journal={arXiv preprint arXiv:2604.06870}, year={2026} } ``` --- ## Acknowledgements and License RefineAnything builds on ideas and components from the broader diffusion and multimodal ecosystem (including **Qwen2.5-VL**, **Qwen-Image**, and latent diffusion with **VAE** + **MMDiT**). Base model weights and API terms are subject to their respective licenses—**verify compliance before redistributing checkpoints or derived weights**. Repository **code license**: *TBD* (e.g., Apache-2.0 or MIT)—set `LICENSE` when you open-source the implementation.