Soon after the submission of the paper “Adding Conditional Control to Text-to-Image Diffusion Models” by Lvmin Zhang and Maneesh Agrawala from Stanford University, the trained model based on Stable Diffusion is now available for download on GitHub!
The paper that submitted to Cornell University on 10th Feb presents the public a “neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions”. According to the paper, the neural network architecture ControlNet is able to let large image diffusion models (like Stable Diffusion) to learn task-specific input conditions and generate fine-controlled results based on the prompt and images input by users.
To put it bluntly, it is to control the pre-trained large model through additional input content, a kind of end-to-end training, and the current stable diffusion is achieved in this way.
By introducing a method to enable image diffusion models like Stable Diffusion to use additional input conditions, ControlNet presents an efficient way to tell an AI model which parts of an input image to keep, bring unprecedented levels of control to images based on the previous text-to-image models.
In the code shared in GitHub, the creators explained ma brunch of conditions that ControlNet can have fine control with based on Stable Diffusion, including:
Anime Line Drawing