Today, OpenAI announced Sora, a text-to-video model. In OpenAI’s own words:
Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.
The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.
This model’s capabilities are hard to believe. There are several examples from the explainer page that would have fooled me if I encountered them without context.
This is just its first release, meaning this is the worst it will ever be.