This paper presents ControlNet, an end-to-end neural network architecture that controls large image diffusion models (like Stable Diffusion) to learn task-specific input conditions. The ControlNet clones the weights of a large diffusion model into a "trainable copy" and a "locked copy": the locked copy preserves the network capability learned from billions of images, while the trainable copy is trained on task-specific datasets to learn the conditional control.
ControlNet은 trainalble copy는 백만장의 이미지를 학습하는 네트워크 성능을 보존하고, trainable copy는 conditional control 학습하기 위한 특정 분야 datasets을 학습한다.
The trainable and locked neural network blocks are connected with an unique type of convolution layer called "zero convolution", where the convolution weights progressively grow from zeros to optimized parameters in a learned manner. Since the production-ready weights are preserved, the training is robust at datasets of different scale. Since the zero convolution does not add new noise to deep features, the training is as fast as fine tuning a diffusion model, compared to training new layers from scratch.
학습가능하고 locked neural network block은 zero convolution이라는 레이어로 나타내는데 0부터 최적화된 parameters까지 점차적으로 올린다. 그리고 재밌는 점은 deep feature에 새로운 노이즈를 더하지 않는다.
HyperNetwork originates from a neural language processing method [14] to train a small recurrent neural network to influence the weights of a larger one.
HyperNetwork는 NLP방법이다. 작은 reccurrent 모델을 학습시켜 더 큰 모델에 영향을 준다. 이 방법이 ControlNet과 비슷하다.
ControlNet is a neural network architecture that can enhance pretrained image diffusion models with task-specific conditions.
ControlNet manipulates the input conditions of neural network blocks so as to further control the overall behavior of an entire neural network.
ContorlNet은 task-specific condition에 특화된 몰이다. block마다 conditionm을 받아서 전체 모델의 전반적인 행동을 더룰 수 있다.
a neural network block F(·; Θ) with a set of parameters Θ transforms x into another feature map y with
and this procedure is visualized in Fig. 2-(a).
We lock all parameters in Θ and then clone it into a trainable copy Θc. The copied Θc is trained with an external condition vector c. In this paper, we call the original and new parameters “locked copy” and “trainable copy”.
기존 가중치를 바로 학습시키는 것보다는 복사하여 데이터셋이 작거나 이미 잘 학습된 큰 모델들일 때 오버피팅을 피하기 위해 복사한다. 이를 locked copy, trainable copy라는 새로운 파라미터 Θ가 있는데 Θ는 lock되어있다. 그래서 이 파라미터를 복사하여 condition c와 함께 학습시키는 파라미터는 Θc로 표현한다.
The neural network blocks are connected by an unique type of convolution layer called “zero convolution”, We denote the zero convolution operation as Z(·; ·) and use two instances of parameters {Θz1, Θz2} to compose the ControlNet structure with