Tencent InstantMesh, an AI Model Capable of 3D Rendering Static Images Unveiled

Apr18,2024



Tencent has released a new artificial intelligence (AI) model, named InstantMesh, that can render 3D objects using a still photo. The new AI model is an upgrade of the company's legacy Instant3D framework and now uses a combination of multiview diffusion models and sparse-view reconstruction models based on large reconstruction model (LRM) architecture. Tencent has also made the InstantMesh model open source and offers a preview app for enthusiasts to test its capabilities or create and export 3D renders.

The company published a pre-print version of its research paper on arXiv. In particular, arXiv does not do peer review, so it is difficult to tell whether models have been evaluated or not. However, the company has already made the AI ​​model on Hugging Face available in open source so that developers can test its efficiency. For enthusiasts, an app view is also available where they can add a photo and watch it transform into a 3D render. We, at Gadgets 360, tested the platform and found that renders were created in less than 10 seconds, as claimed by the company. However, the quality of the renders felt quite low quality. An X (formerly known as X) user posted a video of how they used the AI ​​model, and you can see the results below.

Talking about the technology behind the AI ​​models, the company uses two different architectures – a multiview diffusion model and an LRM architecture. The former helps to process the image as input and generate various dimensions that are not visible in the image, and the LRM produces an orbital scene object that can be experienced in a 3D environment.

According to Tencent, InstantMesh solves the Janus problem in the world of 3D rendering. The Janus problem is a phenomenon in the 3D rendering space, where the model has to “imagine” different sides of a reference object and create them, creating multiple canonical views of the object rather than a single coherent 3D object. The company solves the problem by using a new view generator fine-tuned from Stable Diffusion.

The research paper also shared benchmark scores compared to various existing models, including Stability AI's Stable Video 3D, which was recently launched. Based on the scores, InstantMesh outperformed SV3D on Google Scanned Objects (GSO) and OmniObject3D (Omni3D) class scenes. SV3D performed better in some parameters in the Omni3D benchmark, which was in line with the resolution of the output, but Tencent said this was intentional. “We argue that perceptual quality is more important than reliability, because “true novel ideas” must be unknown and there must be multiple possibilities given a single image as reference,” the company explained.


Affiliate links may be automatically generated – see our ethics statement for details.



Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *