# Docker Image Releases for Intel Arc B60 Pro GPU: ## LLM-Scaler-vLLM ### Latest Beta Release 0.14.0-b8 * `intel/llm-scaler-vllm:0.14.0-b8` [3/2/2026]: - Upgrade: vLLM upgrade to 0.14.0, Pytorch upgrade to 2.10. oneAPI uplifted to 2025.3.2(hotfix) with LTS support on UR adaptor v2. Oneccl upgrade to 2021.15.7.8. - Int4 onednn optimizations are included and up to 25% throughput improvement is achieved VS last release. - Bug fixes. - Added support for Qwen3-VL-Reranker-2B/8B - Added support for Qwen3-VL-Embedding-2B/8B - Added support for GLM-4.7-Flash - Added support for Ministral models - Added support for DeepSeek-OCR-2 - Added support for Qwen3-Coder-Next - Fix InternVL issue ### Previous Releases * `intel/llm-scaler-vllm:1.3` [1/30/2026]: - Same image as intel/llm-scaler-vllm:0.11.1-b7 * `intel/llm-scaler-vllm:0.11.1-b7` [1/16/2026]: - Upgrade: vLLM upgrade to 0.11.1, Pytorch upgrade to 2.9. oneAPI upgrade to 2025.2.2(hotfix), oneccl upgrade to 2021.15.7.6. - 8 New models supported: Qwen3-Next-80B-A3B-Instruct, Qwen3-Next-80B-A3B-Thinking, InternVL3.5-30B-A3B, DeepSeek-OCR,PaddleOCR-VL, Seed-OSS-36B-Instruct, Qwen3-30B-A3B-Instruct-2507 and openai/whisper-large-v3. - Key bug fixes for timeout/accuracy issues found in long time stress run. - Key bug fixes communication accuracy issue on long run scenarios. Sub-communicator hang issue on oneCCL side. - vLLM 0.11.1 with new features: cpu kv cache offload, speculative decoding support with 2 more methods(medusa, suffix), experimental feature:FP8 kv cache, Experts parallelism is supported with scenarios TP+EP and DP+EP. - Bug fixes. - Supported sym_int4 for Qwen3-30B-A3B on TP 4/8. - Supported sym_int4 for Qwen3-235B-A22B on TP 16. - Added support for the PaddleOCR model. - Added support for GLM-4.6v-Flash. - Fixed crash errors with 2DP + 4TP configuration. - Fixed abnormal output observed during JMeter stress testing. - Fixed UR_ERROR_DEVICE_LOST errors triggered by frequent preemption under high load. - Fixed output errors for InternVL-38B. - Refine logic for profile_run to provide more GPU blocks by default * `intel/llm-scaler-vllm:1.2` [12/11/2025]: - Same image as intel/llm-scaler-vllm:0.10.2-b6 * `intel/llm-scaler-vllm:0.10.2-b6` [11/25/2025]: - MoE-Int4 support for Qwen3-30B-A3B - Bpe-Qwen tokenizer support - Enable Qwen3-VL Dense/MoE models - Enable Qwen3-Omni models - MinerU 2.5 Support - Enable whisper transcription models - Fix minicpmv4.5 OOM issue and output error - Enable ERNIE-4.5-vl models - Enable Glyph based GLM-4.1V-9B-Base * `intel/llm-scaler-vllm:1.1-preview` [09/29/2025]: - Same image as intel/llm-scaler-vllm:0.10.0-b2 * `intel/llm-scaler-vllm:0.10.2-b5` [11/04/2025]: - Support gpt-oss models * `intel/llm-scaler-vllm:0.10.0-b3` [09/23/2025]: - Support Seed-oss model - Support miner-U - Support MiniCPM-V-4_5 - Fix internvl_3_5 and deepseek-v2-lite error * `intel/llm-scaler-vllm:0.10.0-b2` [09/02/2025]: - Bug fix for sym_int4 online quantization on Multi-modal models * `intel/llm-scaler-vllm:0.10.0-b1` [08/29/2025]: - Upgrade vLLM to 0.10.0 version - Support async scheduling with option --async-scheduling - Change to V1 engine for embedding/reranker models - Support pipeline parallelism with mp/ray backend - Support internvl3-8b - Support MiniCPM-v-4 - Support InternVL3_5-8B * `intel/llm-scaler-vllm:0.9.0-b3` [08/21/2025]: - Support Whisper - Support GLM-4.5-Air - Support dots.ocr - Support GLM-4.1V-9B-Thinking for image input - Optimize vLLM memory usage by updating profile_run logic - Enable/Optimize pipeline parallelism with Ray backend * `intel/llm-scaler-vllm:1.0` [08/10/2025]: - Same image as intel/llm-scaler-vllm:0.2.0-b2 * `intel/llm-scaler-vllm:0.2.0-b2` [07/25/2025]: - Support by-layer online quantization to reduce the required GPU memory - Support embedding, rerank models - Enhance the support to multi-modal models - Maximum length auto-detecting - Support data parallelism - Support pipeline parallelism (experimental) - Support torch.compile (experimental) - Support speculative decoding (experimental) - Performance improvements - Bug fixes ## LLM-Scaler-Omni ### Latest Beta Release * `intel/llm-scaler-omni:0.1.0-b5` [1/16/2026]: - Core Upgrades - Upgraded to Python 3.12 and PyTorch 2.9 for improved performance and compatibility - ComfyUI - Fixed a stochastic rounding issue on XPU, resolving the LoRA black-screen output problem. LoRA workflows are now supported (e.g, Z-Image-Turbo, Qwen-Image, Qwen-Image-Edit). - Added support for new models and workflows: Qwen-Image-Layered, Qwen-Image-Edit-2511, Qwen-Image-2512, HY-Motion, and more. - Added support for ComfyUI-GGUF, enabling GGUF models (e.g., FLUX.2-dev Q4_0) with reduced VRAM usage. - Fixed image format issue in the Hunyuan3D-2.1 workflow. - Refined documentation for improved clarity. - LTX2 support on XPU. - Updated Windows environment setup script. - SGLang Diffusion - Added support for CacheDiT. - Added Tensor Parallelism (TP) support for selected model with better performance (e.g, Z-Image-Turbo). - Added SGLD ComfyUI custom node support, allowing SGLang Diffusion to serve as a backend for ComfyUI image generation workflows. - Standalone Examples - Added support for HY-WorldPlay. - Add audio models in standalone example ### Previous Releases * `intel/llm-scaler-omni:0.1.0-b4` [12/10/2025]: - More workflows support: - Z-Image-Turbo - Hunyuan-Video-1.5 T2V/I2V with multi-XPU support - Initial support for SGLang Diffusion. 10% perf improvement compared to ComfyUI in 1*B60 scenario. * `intel/llm-scaler-omni:0.1.0-b3` [11/19/2025]: - More workflows support: - Hunyuan 3D 2.1 - Controlnet on Stable Diffusion 3.5, FLUX.1 - Multi XPU support for Wan 2.2 I2V 14B rapid aio - AnimateDiff lightning - Add Windows installation * `intel/llm-scaler-omni:0.1.0-b2` [10/17/2025]: - Fix issues: - Fix ComfyUI interpolate issue - Fix Xinference XPU index selection issue - Support more workflows: - ComfyUI - Wan2.2-Animate-14B basic workflow - Qwen-Image-Edit 2509 workflow - VoxCPM workflow - Xinference: - Kokoro-82M-v1.1-zh * `intel/llm-scaler-omni:0.1.0-b1` [09/29/2025]: - Integrate ComfyUI on XPU and provide sample workflows for: - Wan2.2 TI2V 5B - Wan2.2 T2V 14B (multi-XPU supported) - FLUX.1 dev - FLUX.1 Kontext dev - Stable Diffusion 3.5 large - Qwen Image, Qwen Image Edit - Add support for xDit, Yunchang, and Raylight usages on XPU - Integrate Xinference with OpenAI-compatible APIs for: - Kokoro 82M - Whisper Large v3 - Stable Diffusion 3.5 Medium