Even_Adder@lemmy.dbzer0.com

Even_Adder@lemmy.dbzer0.com

Overview

Pusa introduces a paradigm shift in video diffusion modeling through frame-level noise control (thus it has thousands of timesteps, rather than one thousand of timesteps), departing from conventional approaches. This shift was first presented in our FVDM paper. Leveraging this architecture, Pusa seamlessly supports diverse video generation tasks (Text/Image/Video-to-Video) while maintaining exceptional motion fidelity and prompt adherence with our refined base model adaptations. Pusa-V0.5 represents an early preview based on Mochi1-Preview. We are open-sourcing this work to foster community collaboration, enhance methodologies, and expand capabilities.

Model: https://huggingface.co/RaphaelLiu/Pusa-V0.5

Code: https://github.com/Yaofang-Liu/Pusa-VidGen

Training Toolkit: https://github.com/Yaofang-Liu/Mochi-Full-Finetuner

Dataset: https://huggingface.co/datasets/RaphaelLiu/PusaV0.5_Training

Pusa: Thousands Timesteps Video Diffusion Model

Pusa: Thousands Timesteps Video Diffusion Model

Overview