# LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
This repository is the official PyTorch implementation of [LaVie](https://arxiv.org/abs/2309.15103).
**LaVie** is a Text-to-Video (T2V) generation framework, and main part of video generation system [Vchitect](http://vchitect.intern-ai.org.cn/). You can also check our fine-tuned Image-to-Video (I2V) model [SEINE](https://github.com/Vchitect/SEINE).
[![arXiv](https://img.shields.io/badge/arXiv-2309.15103-b31b1b.svg)](https://arxiv.org/abs/2309.15103)
[![Project Page](https://img.shields.io/badge/Project-Website-green)](https://vchitect.github.io/LaVie-project/)
[![Replicate](https://replicate.com/cjwbw/lavie/badge)](https://replicate.com/cjwbw/lavie)
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces/Vchitect/LaVie)
[![Open in OpenXLab](https://cdn-static.openxlab.org.cn/app-center/openxlab_app.svg)](https://openxlab.org.cn/apps/detail/houshaowei/LaVie)
[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FVchitect%2FLaVie%2F&count_bg=%23368ED7&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=Visitors&edge_flat=false)](https://hits.seeyoufarm.com)
## Installation
```
conda env create -f environment.yml
conda activate lavie
```
## Download Pre-Trained models
Download pre-trained [LaVie models](https://huggingface.co/YaohuiW/LaVie/tree/main), [Stable Diffusion 1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4/tree/main), [stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/tree/main) to `./pretrained_models`. You should be able to see the following:
```
├── pretrained_models
│ ├── lavie_base.pt
│ ├── lavie_interpolation.pt
│ ├── lavie_vsr.pt
│ ├── stable-diffusion-v1-4
│ │ ├── ...
└── └── stable-diffusion-x4-upscaler
├── ...
```
Gallery:
|
|
|
two teddy bears playing poker under water, highly detailed, oil painting style |
a teddy bear skateboarding under water, highly detailed |
a cat reading a book on the table, Van Gogh style |
|
|
|
a cute raccoon playing guitar in the park at sunrise, oil painting style |
a teddy bear walking in the park at sunrise, oil painting style |
a teddy bear reading a book near a small river, oil painting style |
|
|
|
Elon Musk in a space suit standing besides a rocket, high quality |
a teddy bear in a suit having dinner in a well-decorated house |
Iron Man flying in the sky, 4k, high quality |
Feel free to try different prompts, and share with us which one you like the most!
## Inference
The inference contains **Base T2V**, **Video Interpolation** and **Video Super-Resolution** three steps. We provide several options to generate videos:
| |Step1|Step2|Step3|Resolution|Length|
|-------|-----|-----|-----|----------|------|
|option1| ✔ | | | 320x512 | 16 |
|option2| ✔ | ✔ | | 320x512 | 61 |
|option3| ✔ | | ✔ | 1280x2048| 16 |
|option4| ✔ | ✔ | ✔ | 1280x2048| 61 |
Feel free to try different options :)
### Step1. Base T2V
Run following command to generate videos from base T2V model.
```
cd base
python pipelines/sample.py --config configs/sample.yaml
```
In **configs/sample.yaml**, arguments for inference:
- **ckpt_path:** Path to the downloaded LaVie base model, default is `../pretrained_models/lavie_base.pt`
- **pretrained_models:** Path to the downloaded SD1.4, default is `../pretrained_models`
- **output_folder:** Path to save generated results, default is `../res/base`
- **seed:** Seed to be used, `None` for random generation
- **sample_method:** Scheduler to use, default is `ddpm`, options are `ddpm`, `ddim` and `eulerdiscrete`
- **guidance_scale:** CFG scale to use, default is `7.5`
- **num_sampling_steps:** Denoising steps, default is `50`
- **text_prompt:** Prompt for generation
Following results were generated with the arguments:
seed: `400`, sample_method: `ddpm`, guidance_scale: `7.0`, num_sampling_steps: `50`
(you might obtain different results on different device)
|
|
|
a Corgi walking in the park at sunrise, oil painting style |
a panda taking a selfie, 2k, high quality |
a polar bear playing drum kit in NYC Times Square, 4k, high resolution |
|
|
|
a shark swimming in clear Carribean ocean, 2k, high quality |
a teddy bear walking on the street, 2k, high quality |
jungle, river, at sunset, ultra quality |
### Step2 (optional). Video Interpolation
Run following command to conduct video interpolation.
```
cd interpolation
python sample.py --config configs/sample.yaml
```
The default input video path is `./res/base`, results will be saved under `./res/interpolation`. In `configs/sample.yaml`, you could modify default `input_folder` with `YOUR_INPUT_FOLDER` in `configs/sample.yaml`. Input videos should be named as `prompt1.mp4`, `prompt2.mp4`, ... and put under `YOUR_INPUT_FOLDER`. Launching the code will process all the input videos in `input_folder`.