IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

Summary: IM-3D generates high-quality 3D assets from single images without SDS by fine-tuning video diffusion models.

Interactive Results

Explore reconstruction results (Gaussian Splats) below.

Method

Our model starts from an input image (e.g., generated from a T2I model). It feeds the latter into an image-to-video diffusion model to generate a turn-table like video. The latter is plugged into 3D Gaussian Splatting to directly reconstruct the 3D object using image-based losses for robustness. Optionally, renders of the objects are generated and fed back to the video diffusion model, repeating the process for refinement.

Human Evaluation

We perform human evaluation of IM-3D versus the state-of-the-art in Image-to-3D and Text-to-3D. Human raters prefer IM-3D to all competitors with regard to both generation quality and faithfulness, often by a large margin.