The main dishes to remember:
- Researchers from Google Deepmind, the University of Michigan & Brown University have developed “Motion Invotion”, a new method to control the generation of videos using specific movement trajectories.
- The technique uses “motion prompts”, a flexible representation of the movement that can be sparse or dense, to guide a pre-formulated video diffusion model.
- A key innovation is “the motion prompt extension”, which translates high -level user requests, such as mouse streaks, in detailed motion instructions for the model.
- This unique and unified model can perform a wide range of tasks, including precise control of objects and camera, the movement transfer from one video to another, and the interactive image editing, without the need to be recycled for each specific capacity.
While generative AI continues to evolve, gaining precise control over video creation is a critical obstacle for its generalized adoption on markets such as advertising, cinema and interactive entertainment. Although the text prompts have been the main method of control, they often fail in the specification of nuanced and dynamic movements that make the video convincing. A new article, presented and highlighted to CVPR 2025From Google Deepmind, the University of Michigan and Brown University introduces a revolutionary solution called “motion invitation”, which offers an unprecedented level of control by allowing users to direct the action in a video using movement trajectories.
This new approach exceeds the limits of the text, which fights to describe the complex movements with precision. For example, a prompt like “a bear quickly turns its head” is open to countless interpretations. How fast is “quickly”? What is the exact path of the movement of the head? The incitement of the movement addresses this by allowing creators to define the movement itself, by opening the door to more expressive and intentional video content.
Presentation of movement prompts
At the heart of this research is the concept of a “movement invitation”. The researchers have identified that spatio -time spatially sparse or dense movement trajectories – essentially following the movement of points over time – are an ideal way to represent any type of movement. This flexible format can capture anything, from the subtle floating of hair to the complex movements of the camera.
To activate this, the team has formed a Controlnet adapter in addition to a powerful pre-formed video diffusion model called light. The controlnet was formed on a massive internal data set of 2.2 million videos, each with detailed tracks extracted by an algorithm called bootstap. This diversified training allows the model to understand and generate a wide range of movement without specialized engineering for each task.
From simple clicks to complex scenes: extension of the motion prompt
While specifying each point of movement for a complex scene would not be practical for a user, the researchers have developed a process which he calls “the expansion of the motion prompt”. This intelligent system translates simple high-level user inputs into the detailed and semi-dense movement invites the needs of the model.
This allows a variety of intuitive applications:
“Interaction” with an image: A user can simply click and slide his mouse on an object in a calm image to make it move. For example, a user could drag the head of a parrot to run it, or “play” with a person's hair, and the model generates a realistic video of this action. Interestingly, this process revealed emerging behaviors, where the model would generate a physically plausible movement, like the sand dispersing in a realistic way when it was “pushed” by the cursor.
Control of objects and camera: By interpreting mouse movements as instructions to manipulate a geometric primitive (like an invisible sphere), users can obtain a fine grain control, such as the rotation precisely from the head of a cat. Likewise, the system can generate sophisticated camera movements, as in orbit around a scene, considering the depth of the scene from the first frame and projecting a desired camera path. The model can even combine these guests to control an object and the camera simultaneously.
Movement transfer: This technique makes it possible to apply the movement of a source video to a completely different subject in a static image. For example, researchers have demonstrated the transfer of the movements of a person's head to a macaque, effectively “marry” the animal.
Test it
The team has carried out in -depth quantitative assessments and human studies to validate their approach, comparing it to recent models such as the image driver and the Draganain. In almost all measurements, including image quality (PSNR, SSIM) and Movement Precision (EPE), their model has surpassed the basic lines.
A human study has also confirmed these results. When they were asked to choose between the videos generated by motion invited and other methods, the participants constantly preferred the results of the new model, citing better membership in movement controls, a more realistic movement and a higher overall visual quality.
Future limitations and orientations
Researchers are transparent on the current limits of the system. Sometimes the model can produce unnatural results, such as stretching an object in an abnormal way if parts of it are wrongly “locked” on the background. However, they suggest that these very failures can be used as a precious tool to probe the underlying video model and identify the weaknesses of its “understanding” of the physical world.
This research represents an important step towards the creation of really interactive and controllable generative video models. By focusing on the fundamental element of the movement, the team has released a versatile and powerful tool that could one day become a standard for professionals and creatives who seek to exploit the full potential of AI in video production.
Discover the Paper And Project page. All the merit of this research goes to researchers in this project. Also, don't hesitate to follow us Twitter And don't forget to join our Subseubdredit 100k + ml and subscribe to Our newsletter.
