Generative AI has long since left the realm of only being able to handle still images, with terrifyingly realistic animations and videos emerging from simple text prompts by the day. And now, researchers from Meta and the University of Oxford have unveiled a powerful new tool capable of transforming text prompts into detailed 3D renders.
VFusion3D sidesteps the issue of a “limited availability” of 3D data to create truly impressive models, and web users are already speculating that it has the potential to transform the process of character design.
Titled ‘VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models’, a paper by researchers Junlin Han, Filippos Kokkinos and Philip Torr describes how the new model is capable of “building scalable 3D generative models utilising pre-trained video diffusion models”.
According to VentureBeat, the team “fine-tuned an existing video AI model to produce multi-view video sequences”, which teaches it to view objects from multiple angles. And the results speak for themselves – the paper includes several examples of still images transformed into 3D objects, with the AI filling in the gaps with remarkable precision.
“The primary obstacle in developing foundation 3D generative models is the limited availability of 3D data. Unlike images, texts, or videos, 3D data are not readily accessible and are difficult to acquire. This results in a significant disparity in scale compared to the vast quantities of other types of data,” reads the introduction to the paper. “To address this issue, we propose using a video diffusion model, trained with extensive volumes of text, images, and videos, as a knowledge source for 3D data. By unlocking its multi-view generative capabilities through fine-tuning, we generate a large-scale synthetic multi-view dataset to train a feed-forward 3D generative model. The proposed model, VFusion3D, trained on nearly 3M synthetic multi-view data, can generate a 3D asset from a single image in seconds and achieves superior performance when compared to current SOTA feed-forward 3D generative models, with users preferring our results over 90% of the time.”
And you can even try out VFusion3D yourself. A publicly available demo on Hugging Face lets you input your own images, or select from a bunch of preexisting ones including Pikachu and Baby Yoda.
From the terrifyingly realistic viral Flux images to those AI gymnastic videos, AI generated content is either getting more realistic or more terrifying – and often both. But hey, don’t worry, recent reports suggest AI generators could end up being their own demise.