Welcome to StunoVideo, our text-to-video model.
StunoVideo can generate videos up to ten minutes long while maintaining visual quality and adhering to the user’s prompt.
The StunoVideo model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.
The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions.
StunoVideo is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background
as in the examples below.
StunoVideo can also create multiple shots within a single generated video that accurately persist characters and visual style.
Be careful though as our current model has room for improvement and may produce results somehow fuzzy.
It may struggle to simulate the physics of a complex scene, and may not comprehend specific instances of cause and effect.
StunoVideo often confuse spatial details included in a prompt, such as discerning left from right, top from bottom, or struggle with precise
descriptions of events that unfold over time, like specific camera trajectories.