정리 | Notion

Untitled

However, the further prevalence of instruction-following models is largely impeded by the closed-source restriction and high development costs.

문제 제기 : ChatGPT 같은 모델이 높은 비용과 폐쇄된 source restriction에 의해 많은 방해를 받음.

In this paper, we introduce LLaMA-Adapter, an efficient fine-tuning method that adapts LLaMA into a wellperformed instruction-following model.

저자는 LLaMA-Adapter를 제안함. 모델이 좋은 성과를 내기 위해 LLaMA에 적용하는 효율적인 fine-tuning 방법임.

1. LLaMA-Adapter

Learnable Adaption Prompts

Untitled

Given 52K instruction-to-output data [48] and a pretrained LLaMA [42] with an N-layer transformer, we adopt a set of learnable adaption prompts for instructionfollowing fine-tuning.

주어진 52k 데이터와 pretrained N-layer trasformer LLaMA에 저자는 instruction following fine-tuning에 학습가능한 adaption prompts의 set을 적용했음.

Note that we insert the prompts into the topmost L layers of the transformer (L ≤ N).

L transformer layer에 길이 K prompt와 C는 입력차원과 똑같음. 참고로 저자는 transformer의 topmost L개 layers에 prompt를 넣음. 이 방법이 고차원의 의미를 표현하도록해줌.

Taking the l-th inserted layer as an example, we denote the M-length word tokens as $T_l ∈ R^{M×C }$. Then, the adaption prompt is concatenated with Tl along the token dimension as prefix, formulated as

Untitled

l번째 layer에 입력하여 $T_l$에 M 길이 word token을 표시. 이후 adaption prompt가 T_l 혼자 토큰 차원 앞에 합쳐짐. 위와 같음. 그래서 (K+M)은 adaption prompt와 concat 된거임.

Zero-init Attention

Untitled

If the adaption prompts are randomly initialized, they might bring disturbance to the word tokens at the beginning of training, which harms the fine-tuning stability and effectiveness.

만약 무작위로 adaption prompts를 초기화 하면 fine-tuning 안정성과 효율성을 해칠수 있음.