Phenaki is an AI model to generate videos that can be multiple minutes long straight from text. You can also generate video from a still image and a prompt. The proposed video encoder-decoder outperforms all per-frame baselines currently used in the literature in terms of spatio-temporal quality and number of tokens per video. To generate video tokens from text, they are using bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video.
Discover similar tools to enhance your workflow
Turn Your Content Team Into Unstoppable Creators. Automatically transcribe, edit, repurpose, and ...
Why pay a spokesperson when you can use an AI video editor to create one? MOVIO is a top-shelf sy...
Pyttipanna is an interface for Pytti 5. It allows you to structure, narrate and experiment with p...