<aside> 💡 Typical “hard” prompts are made from interpretable words and tokens, and must be hand-crafted by humans. There are also “soft” prompts, which consist of continuous feature vectors. 사람이 만든 해석가능한 단어와 토큰들로 이루어진 hard prompts와 연속적인 특징 벡터로 이루어진 soft promts를 이용해 해석한다.
</aside>
Hard prompt discovery is a specialized alchemy, with many good prompts being discovered by trial and error, or sheer intuition. Then there are soft prompts, which consist of continuous-valued language embeddings that do not correspond to any human-readable tokens. Soft prompt discovery is a mathematical science; gradient-based optimizers and large curated datasets are used to generate highly performant prompts for specialized tasks.
In doing so, we unlock the ability to learn hard prompts via optimization. Learned hard prompts combine the ease and automation of soft prompts with the portability, flexibility, and simplicity of hard prompts. Our primary contributions are summarized as follows:
Yes, that is correct. "Hard" prompts are typically made up of discrete and interpretable words or tokens, such as natural language queries or specific input-output pairs, that can be easily understood and processed by the AI system. In contrast, "soft" prompts are often represented by continuous feature vectors, which can capture more abstract and complex information about the input data. Soft prompts are often used in tasks such as image or speech recognition, where the input data is represented as a high-dimensional feature vector that captures various aspects of the input signal. Soft prompts can be learned automatically from large datasets, without the need for explicit human annotation or supervision. -ChatGPT 예, 맞습니다. "하드" 프롬프트는 일반적으로 AI 시스템에서 쉽게 이해하고 처리할 수 있는 자연어 쿼리 또는 특정 입력-출력 쌍과 같은 개별적이고 해석 가능한 단어 또는 토큰으로 구성됩니다. 반대로 "소프트" 프롬프트는 입력 데이터에 대한 보다 추상적이고 복잡한 정보를 캡처할 수 있는 연속 기능 벡터로 표현되는 경우가 많습니다. 소프트 프롬프트는 입력 데이터가 입력 신호의 다양한 측면을 캡처하는 고차원 특징 벡터로 표현되는 이미지 또는 음성 인식과 같은 작업에 자주 사용됩니다. 소프트 프롬프트는 명시적인 사람의 주석이나 감독 없이 대규모 데이터 세트에서 자동으로 학습할 수 있습니다.
Learning Hard Prompts.
The process requires the following inputs: a frozen model, θ, a sequence of learnable embeddings, P = [ei , ...eM], ei ∈ R d , where M is the number of “tokens” worth of vectors to optimize, and d is the dimension of the embeddings. Additionally, we employ an objective function L.
The discreteness of the token space is realized using a projection function, ProjE, that takes the individual embedding vectors ei in the prompt and projects them to their nearest neighbor
Formally, to learn a hard prompt, we minimize the following risk by measuring the performance of P on the task data: R(P0 ) = ED(L(θ(B(P, X)), Y)).
Hard Prompts를 학습하기 위해 broadcast와 Y를 embeddimg하여 모델에 넣는다. 모델의 결과를 Loss로 내놓아 최소화시키는 방향으로 이끈다.
Out Method
We propose a simple but efficient gradientbased discrete optimization algorithm that combines the advantages of the baseline discrete optimization methods and soft prompt optimization. The steps of our scheme, which we call PEZ, are concretely defined in Algorithm 1.
The method maintains continuous iterates, which in our applications corresponds to a soft prompt. During each forward pass, we first project the current embeddings P onto the nearest neighbor P0 before calculating the gradient. Then, using the gradient of the discrete vectors, P0 , we update the continuous/soft iterate, P.
PEZ라는 soft와 장점을 결합한 optimization 기법인데 soft prompt에 계속 상응하게 적용한다. forward pass 동안, P를 P’에 임베딩하여, 이 P’로 기울기를 얻고 learning rate에 맞게 연속적이고 soft iterate P에 -를 하여 업데이트한다.
With these models, like CLIP (Radford et al., 2021), we can use PEZ to discover captions which describe one or more target images.
we can discover prompts using these pre-trained text encoders that are directly relevant for downstream diffusion models