## 🔥 1. We provide all the links of Sana pth and diffusers safetensor below ### [SANA](https://arxiv.org/abs/2410.10629) | Model | Reso | pth link | diffusers | Precision | Description | |----------------------|--------|-----------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|---------------|----------------| | Sana-0.6B | 512px | [Sana_600M_512px](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px) | [Efficient-Large-Model/Sana_600M_512px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_600M_512px_diffusers) | fp16/fp32 | Multi-Language | | Sana-0.6B | 1024px | [Sana_600M_1024px](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px) | [Efficient-Large-Model/Sana_600M_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px_diffusers) | fp16/fp32 | Multi-Language | | Sana-1.6B | 512px | [Sana_1600M_512px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px) | [Efficient-Large-Model/Sana_1600M_512px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_diffusers) | fp16/fp32 | - | | Sana-1.6B | 512px | [Sana_1600M_512px_MultiLing](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing) | [Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_512px_MultiLing_diffusers) | fp16/fp32 | Multi-Language | | Sana-1.6B | 1024px | [Sana_1600M_1024px](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px) | [Efficient-Large-Model/Sana_1600M_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_diffusers) | fp16/fp32 | - | | Sana-1.6B | 1024px | [Sana_1600M_1024px_MultiLing](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing) | [Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_MultiLing_diffusers) | fp16/fp32 | Multi-Language | | Sana-1.6B | 1024px | [Sana_1600M_1024px_BF16](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16) | [Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers) | **bf16**/fp32 | Multi-Language | | Sana-1.6B-int4 | 1024px | - | [mit-han-lab/svdq-int4-sana-1600m](https://huggingface.co/mit-han-lab/svdq-int4-sana-1600m) | **int4** | Multi-Language | | Sana-1.6B | 2Kpx | [Sana_1600M_2Kpx_BF16](https://huggingface.co/Efficient-Large-Model/Sana_1600M_2Kpx_BF16) | [Efficient-Large-Model/Sana_1600M_2Kpx_BF16_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_2Kpx_BF16_diffusers) | **bf16**/fp32 | Multi-Language | | Sana-1.6B | 4Kpx | [Sana_1600M_4Kpx_BF16](https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16) | [Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers) | **bf16**/fp32 | Multi-Language | | ControlNet | | | | | | | Sana-1.6B-ControlNet | 1Kpx | [Sana_1600M_1024px_BF16_ControlNet_HED](https://huggingface.co/Efficient-Large-Model/Sana_1600M_1024px_BF16_ControlNet_HED) | Coming soon | **bf16**/fp32 | Multi-Language | | Sana-0.6B-ControlNet | 1Kpx | [Sana_600M_1024px_ControlNet_HED](https://huggingface.co/Efficient-Large-Model/Sana_600M_1024px_ControlNet_HED) | - soon | fp16/fp32 | - | ______________________________________________________________________ ### [SANA-1.5](https://arxiv.org/abs/2501.18427) | Model | Reso | pth link | diffusers | Precision | Description | |--------------|--------|-----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------|-----------|----------------| | SANA1.5-4.8B | 1024px | [SANA1.5_4.8B_1024px](https://huggingface.co/Efficient-Large-Model/SANA1.5_4.8B_1024px) | [Efficient-Large-Model/SANA1.5_4.8B_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/SANA1.5_4.8B_1024px_diffusers) | bf16 | Multi-Language | | SANA1.5-1.6B | 1024px | [SANA1.5_1.6B_1024px](https://huggingface.co/Efficient-Large-Model/SANA1.5_1.6B_1024px) | [Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/SANA1.5_1.6B_1024px_diffusers) | bf16 | Multi-Language | ______________________________________________________________________ ### [SANA-Sprint](https://arxiv.org/pdf/2503.09641) | Model | Reso | pth link | diffusers | Precision | Description | |------------------|--------|-------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-----------|----------------| | Sana-Sprint-0.6B | 1024px | [Sana-Sprint_0.6B_1024px](https://huggingface.co/Efficient-Large-Model/Sana_Sprint_0.6B_1024px) | [Efficient-Large-Model/Sana_Sprint_0.6B_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_Sprint_0.6B_1024px_diffusers) | bf16 | Multi-Language | | Sana-Sprint-1.6B | 1024px | [Sana-Sprint_1.6B_1024px](https://huggingface.co/Efficient-Large-Model/Sana_Sprint_1.6B_1024px) | [Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers](https://huggingface.co/Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers) | bf16 | Multi-Language | ______________________________________________________________________ ### [SANA-Video](https://arxiv.org/pdf/2509.24695) | Model | Reso | pth link | diffusers | Precision | Description | |------------------|--------|-------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-----------|----------------| | Sana-Video-2B | 480p | [Sana-Video_2B_480p](https://huggingface.co/Efficient-Large-Model/Sana-Video_2B_480p) | [Efficient-Large-Model/Sana-Video_2B_480p_diffusers](https://huggingface.co/Efficient-Large-Model/Sana-Video_2B_480p_diffusers) | bf16 | 5s Pre-train model | | Sana-Video-2B | 720p | [Sana-Video_2B_720p](https://huggingface.co/Efficient-Large-Model/Sana-Video_2B_720p) | [Efficient-Large-Model/SANA-Video_2B_720p_diffusers](https://huggingface.co/Efficient-Large-Model/SANA-Video_2B_720p_diffusers) | bf16 | 5s 720p model (LTX2 VAE) | | LongSANA-Video-2B | 480p | [SANA-Video_2B_480p_LongLive](https://huggingface.co/Efficient-Large-Model/SANA-Video_2B_480p_LongLive) | [Efficient-Large-Model/SANA-Video_2B_480p_LongLive_diffusers](https://huggingface.co/Efficient-Large-Model/SANA-Video_2B_480p_LongLive_diffusers) | bf16 | 27FPS Minute-length model | | LongSANA-Video-2B-ODE-Init | 480p | [LongSANA_2B_480p_ode](https://huggingface.co/Efficient-Large-Model/LongSANA_2B_480p_ode) | --- | bf16 | LongSANA first step model initialized from ODE trajectories | | LongSANA-Video-2B-Self-Forcing | 480p | [LongSANA_2B_480p_self_forcing](https://huggingface.co/Efficient-Large-Model/LongSANA_2B_480p_self_forcing) | --- | bf16 | LongSANA second step model trained by Self-Forcing | ______________________________________________________________________ ## ❗ 2. Make sure to use correct precision(fp16/bf16/fp32) for training and inference. ### We provide two samples to use fp16 and bf16 weights, respectively. ❗️Make sure to set `variant` and `torch_dtype` in diffusers pipelines to the desired precision. #### 1). For fp16 models ```python # run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers import torch from diffusers import SanaPipeline pipe = SanaPipeline.from_pretrained( "Efficient-Large-Model/Sana_1600M_1024px_diffusers", variant="fp16", torch_dtype=torch.float16, ) pipe.to("cuda") pipe.vae.to(torch.bfloat16) pipe.text_encoder.to(torch.bfloat16) prompt = 'a cyberpunk cat with a neon sign that says "Sana"' image = pipe( prompt=prompt, height=1024, width=1024, guidance_scale=5.0, num_inference_steps=20, generator=torch.Generator(device="cuda").manual_seed(42), )[0] image[0].save("sana.png") ``` #### 2). For bf16 models ```python # run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers import torch from diffusers import SanaPAGPipeline pipe = SanaPAGPipeline.from_pretrained( "Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", variant="bf16", torch_dtype=torch.bfloat16, pag_applied_layers="transformer_blocks.8", ) pipe.to("cuda") pipe.text_encoder.to(torch.bfloat16) pipe.vae.to(torch.bfloat16) prompt = 'a cyberpunk cat with a neon sign that says "Sana"' image = pipe( prompt=prompt, guidance_scale=5.0, pag_scale=2.0, num_inference_steps=20, generator=torch.Generator(device="cuda").manual_seed(42), )[0] image[0].save('sana.png') ``` ## ❗ 3. 2K & 4K models 4K models need VAE tiling to avoid OOM issue.(16 GPU is recommended) ```python # run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers import torch from diffusers import SanaPipeline # 2K model: Efficient-Large-Model/Sana_1600M_2Kpx_BF16_diffusers # 4K model:Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers pipe = SanaPipeline.from_pretrained( "Efficient-Large-Model/Sana_1600M_4Kpx_BF16_diffusers", variant="bf16", torch_dtype=torch.bfloat16, ) pipe.to("cuda") pipe.vae.to(torch.bfloat16) pipe.text_encoder.to(torch.bfloat16) # for 4096x4096 image generation OOM issue, feel free adjust the tile size if pipe.transformer.config.sample_size == 128: pipe.vae.enable_tiling( tile_sample_min_height=1024, tile_sample_min_width=1024, tile_sample_stride_height=896, tile_sample_stride_width=896, ) prompt = 'a cyberpunk cat with a neon sign that says "Sana"' image = pipe( prompt=prompt, height=4096, width=4096, guidance_scale=5.0, num_inference_steps=20, generator=torch.Generator(device="cuda").manual_seed(42), )[0] image[0].save("sana_4K.png") ``` ## ❗ 4. int4 inference This int4 model is quantized with [SVDQuant-Nunchaku](https://github.com/mit-han-lab/nunchaku). You need first follow the [guidance of installation](https://github.com/mit-han-lab/nunchaku?tab=readme-ov-file#installation) of nunchaku engine, then you can use the following code snippet to perform inference with int4 Sana model. Here we show the code snippet for SanaPipeline. For SanaPAGPipeline, please refer to the [SanaPAGPipeline](https://github.com/mit-han-lab/nunchaku/blob/main/examples/sana_1600m_pag.py) section. ```python import torch from diffusers import SanaPipeline from nunchaku.models.transformer_sana import NunchakuSanaTransformer2DModel transformer = NunchakuSanaTransformer2DModel.from_pretrained("mit-han-lab/svdq-int4-sana-1600m") pipe = SanaPipeline.from_pretrained( "Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers", transformer=transformer, variant="bf16", torch_dtype=torch.bfloat16, ).to("cuda") pipe.text_encoder.to(torch.bfloat16) pipe.vae.to(torch.bfloat16) image = pipe( prompt="A cute 🐼 eating 🎋, ink drawing style", height=1024, width=1024, guidance_scale=4.5, num_inference_steps=20, generator=torch.Generator().manual_seed(42), ).images[0] image.save("sana_1600m.png") ``` ## 🔧 5. Convert `.pth` to diffusers `.safetensor` ```bash python tools/convert_scripts/convert_sana_to_diffusers.py \ --orig_ckpt_path Efficient-Large-Model/Sana_1600M_1024px_BF16/checkpoints/Sana_1600M_1024px_BF16.pth \ --model_type SanaMS_1600M_P1_D20 \ --dtype bf16 \ --dump_path output/Sana_1600M_1024px_BF16_diffusers \ --save_full_pipeline ```