REGISTER
backtop
SkypeSkype
XRender News CenterXRender News Center

The TSAIL team led by Professor Zhu Jun from the Department of Computer Science, Tsinghua University recently published a paper "ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation". This paper shared a new technology called ProlificDreamer, which can generate ultra-high-quality 3D content from only text.

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation

The ProlificDreamer algorithm brings significant advances in the field of Text-to-3D. With ProlificDreamer, entering the text "a pineapple" will produce a very realistic and high-definition 3D pineapple like the example below.

a pineapple

Given slightly more difficult text, such as "Michelangelo style statue of dog reading news on a cellphone"? Not a problem.

Michelangelo style statue of dog reading news on a cellphone

In the fields of digital creation and virtual reality, Text-to-3D technology has important value and wide application potential. This technology can generate concrete 3D models from simple text descriptions, providing powerful tools for designers, game developers and digital artists.

However, in order to generate accurate 3D models from text, traditional methods require large datasets of labeled 3D models. These datasets need to contain many different types and styles of 3D models, and each model needs to be associated with a corresponding textual description. Creating such datasets requires a lot of time and human resources, and no large-scale datasets are currently available.

DreamFusion, proposed by Google, uses a pre-trained 2D text-to-image diffusion model to complete open-domain text-to-3D synthesis for the first time without 3D data. However, the results generated by the Score Distillation Sampling (SDS) algorithm proposed by DreamFusion face serious problems such as oversaturation, over smoothing, and lack of details. High-quality 3D content generation is still one of the very difficult frontier problems.

The ProlificDreamer paper proposes the Variational Score Distillation (VSD) algorithm, which reformulates the text-to-3D problem from the perspective of Bayesian modeling and variational inference. Specifically, VSD models the 3D parameters as a probability distribution and optimizes the distance between the distribution of its rendered 2D images and the distribution of a pretrained 2D diffusion model. It can be proved that the 3D parameters in the VSD algorithm approximate the process of sampling from the 3D distribution, which solves the problems of oversaturation, oversmoothing, and lack of diversity in the SDS algorithm proposed by DreamFusion. In addition, SDS often requires large supervision weights (CFG=100), while VSD is the first algorithm that can use normal CFG (=7.5).

A distribution exists over 3D objects given a valid prompt.jpg

Unlike previous methods, ProlificDreamer does not simply optimize a single 3D object, but optimizes the probability distribution corresponding to the 3D object. In general, given a valid text input, there exists a probability distribution covering all possible 3D objects described by the text.

Underlying distribution of rendered images

Specifically, the algorithm flow chart of VSD is shown below. The iterative update of 3D objects requires the use of two models: one is a pre-trained 2D diffusion model (such as Stable-Diffusion), and the other is LoRA (low-rank adaptation) based on this pre-trained model. This LoRA estimates the score function of the 2D image distribution induced by the current 3D object and is further used to update the 3D object. The algorithm is actually simulating the Wasserstein gradient flow, and can guarantee that the distribution obtained by convergence meets the minimum KL divergence with the pre-trained 2D diffusion model.

Overview of VSD

First randomly sample the 3D parameters \(\theta\) of the network from the current distribution, and the camera pose \(c\)

Then use the differentiable rendering method to render the corresponding 2D image \(x_0 = g(\theta, c)\)

Then update the parameter \(\theta\) of the 3D expression as \(\theta - \eta_1 E_{t, \epsilon, c} [\omega(t) (\epsilon_{\mathrm{pretrain}}(x_t, t, y) - \epsilon_\phi (x_t, t, c, y) \frac{\partial g(\theta, c)}{\partial \theta}\)

Finally update the potential 3D distribution \(\phi\) contained in LoRA as \(\phi - \eta_2 \nabla_\phi \mathbb E_{t, \epsilon} ||\epsilon_\phi(x_t, t, c, y ) - \epsilon||_2^2\) , which is closer to the original \(q_t^{\mu_\tau}\) distribution, and so on until convergence.

Comparison of updated formulas for SDS/SJC and VSD

ProlificDreamer can generate “meticulously detailed and photo-realistic 3D textured meshes”, “high rendering resolution (i.e., 512 × 512) and high-fidelity NeRF with rich structures and complex effects”, “diverse and semantically correct 3D scenes given the same text”.

1ProlificDreamer can generate meticulously detailed and photo-realistic 3D textured meshes..jpg

2 ProlificDreamer can generate high rendering resolution (i.e., 512 × 512) and high-fidelity NeRF with.jpg

3 ProlificDreamer can generate diverse and semantically correct 3D scenes given the same text

The paper listed several examples about the results of ProlificDreamer compared with baselines, which you may view more details from the link below.

Paper: https://arxiv.org/abs/2305.16213

Project: https://ml.cs.tsinghua.edu.cn/prolificdreamer/


XRender | Fast · Affordable · Reliable


Key Words

  • BagaPie
  • Unreal Engine
  • AE
  • Adobe
  • CG Magic
  • XRender
  • Render Tips
  • Blender
  • Wandering Earth
  • 3d animation
  • VFX
  • SIGGRAPH
  • 2023
  • XRender Client
  • Layered render
  • Master Ji Gong
  • CPU
  • XRender dedicated features
  • Cinema 4D
  • Imaxinaria
  • Vertex
  • 3D creation
  • Aspera
  • render online
  • Animated Film
  • CG
  • XRender update
  • Visual effects
  • Render service
  • A Writer's Odyssey
  • NVIDIA
  • GPU rendering
  • XCloud Disk
  • render farm
  • plugins for C4D
  • Animation
  • USD
  • Douluo Continent
  • After Effects
  • Capital Summit
  • add-on
  • Chaos
  • Timeout warning
  • Maya
  • Monster Run
  • Software update
  • Corona
  • Autodesk
  • 3ds Max
  • Intranet Submission
  • 3D Creation Plug-in
  • Render case
  • Maxon
  • Task Clone
  • TV series
  • plugin
  • V-Ray
  • Renderer
  • Zhen Dao Ge
  • Anima
  • sequence review
  • keyshot
  • What to watch
  • AI
  • Geometry node
  • free
  • The Infinitors