In recent years, machine learning-based models have made significant advancements in autonomously generating various types of content. These advancements have revolutionized the filmmaking industry and provided new opportunities for training robotics algorithms. While existing models can produce realistic or artistic images based on text descriptions, creating AI that can generate videos of moving human figures based on human instructions has presented challenges. However, a recent paper by researchers at Beijing Institute of Technology, BIGAI, and Peking University introduces a promising new framework that aims to address this challenge.
The new framework developed by the researchers builds upon a generative model called HUMANIZE, which was introduced a few years ago. The goal of this new framework is to enhance the model’s ability to generalize across different tasks, such as generating realistic motions in response to human prompts. The framework consists of two key stages: an Affordance Diffusion Model (ADM) for affordance map prediction and an Affordance-to-Motion Diffusion Model (AMDM) for generating human motion from the description and pre-produced affordance. This two-stage approach effectively links 3D scene grounding with conditional motion generation, improving the model’s overall performance.
One of the main advantages of this new framework is its ability to clearly delineate the region associated with user descriptions or prompts. This enhances its 3D grounding capabilities, allowing it to create realistic motions with limited training data. The framework also leverages affordance maps derived from the distance field between human skeleton joints and scene surfaces, providing a deeper understanding of the relationship between scenes and motions. By incorporating scene affordance representations, the model demonstrates enhanced language-guided human motion generation in 3D scenes.
Potential Applications
The study conducted by the researchers highlights the potential of conditional motion generation models that integrate scene affordances and representations. The team envisions that their model and approach will inspire innovation within the generative AI research community. The new framework developed could have significant real-world applications, such as producing realistic animated films using AI or generating synthetic training data for robotics applications. The researchers plan to further refine the model and focus on addressing data scarcity through improved data collection and annotation strategies for human-scene interaction data.
The new framework developed by researchers at BIGAI and Peking University represents a significant advancement in AI technology, particularly in the fields of filmmaking and robotics. By incorporating scene affordance representations and utilizing a two-stage approach, the framework demonstrates promising results in generating language-guided human motion in 3D scenes. This breakthrough has the potential to revolutionize the way content is created in both the entertainment industry and robotics applications. The researchers’ future work will continue to push the boundaries of generative AI research and address the challenges of data scarcity in human-scene interaction data.
Leave a Reply