[Looking for community contribution] support Wan 2.2 S2V: an audio-driven cinematic video generation model · huggingface/diffusers#12257

(4 comments) (1 reaction) (0 assignees)Python (4,562 forks)batch import

Good second issuecontributions-welcomehelp wanted

Repository metrics

Stars: (22,190 stars)
PR merge metrics: (Avg merge 14d 10h) (101 merged PRs in 30d)

Description

We're super excited about the Wan 2.2 S2V (Speech-to-Video) model and want to get it integrated into Diffusers! This would be an amazing addition, and we're looking for experienced community contributors to help make this happen.

Project Page: https://humanaigc.github.io/wan-s2v-webpage/
Source Code: https://github.com/Wan-Video/Wan2.2#run-speech-to-video-generation
Model Weights: https://huggingface.co/Wan-AI/Wan2.2-S2V-14B

This is a priority for us, so we will try review fast and actively collabrate with you throughout the process :)

Contributor guide

Research direction: Study the Wan2.2 S2V model architecture and codebase, then implement a pipeline in Diffusers following existing patterns for video generation models.
Tech stack: pythonpytorch
Domain: machine learningai
Issue type: Feature
Difficulty: 4
Estimated time: 3-5 days
Activity status: Active
Clarity: Mostly clear
Prerequisites: PythonPyTorch
Newbie friendliness: 30

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.