정리 | Notion

Untitled

Despite the progress, existing self-supervised pre-trained [30] generators are far from perfect. A primary challenge lies in aligning models with human preference, as the pre-training distribution is noisy and differs from the actual user-prompt distributions.

문제 제기: human preference에 aligning models를 주요 문제임. pretraining 분포는 noisy하고 user-prompt distributions과 차이가 있음.

In natural language processing, researchers have employed reinforcement learning from human feedback (RLHF) [49, 33, 36] to guide large language models [6, 7, 58, 44, 57] towards human preferences and values.

저자는 reinforcement learning에 human feedback을 이용하여 human prefences와 values 방향으로 guide함.

1. ImageReward

Untitled

Prompt Selection and Image Collection

Training human preference RM requires a diverse prompt distribution that could cover and represent users’ authentic usage. In ImageReward, we derive prompts and model outputs from DiffusionDB [52], an open-sourced dataset consisting of millions of prompts and generated images by Stable Diffusion [41] from the true use of human.

To ensure the diversity and representativeness of topic distribution in selected prompts, we enforce a graph-based selecting algorithm based on prompt similarity produced by language models [50, 40, 48].

human preference RM은 다양한 prompt 분부포가 필요함. open-sourced prompts dataset과 stable diffusion에 생성된 이미지들로 구성된 diffusionDB로부터 모델 결과와 prompts를 가져옴. 다양성과 representativeness를 위해 저자는 graph-based(such as kNN)과 같은 알고리즘을 이용함.

Human Annotation Design

Untitled

Prompt Annotation.

Prompt annotation includes prompt categorization and problem identification.

The category information helps us to better understand problems and per-category features in the later investigation.

Prompt annotation은 prompt categorization과 problem identification을 포함한 의미임. category information은 문제와 각 카테고리 특징을 더 나은 이해를 도움.

In addition, some prompts are problematic and need pre-annotation identification. For example, some are identified as ambiguous and unclear (e.g., "a brand new medium", "low quality", etc). Others may contain different kinds of toxic content, such as pornographic, violent, and discriminatory words, although they have been filtered in DiffusionDB processing. Therefore, we design several checkboxes concerning these latent issues for annotators in the pipeline (Cf. Appendix B).

추가적으로 특정 prompts는 가지고 있는 문제나 pre-annotation identification이 필요 할 수 ㅣ있음. unclear하거나 toxic content하면 DiffusionDB processing에 filtering됨. 그러므로, 저자는 pipline에 checkbox를 추가해 잠재적인 문제들을 표시할 수 있도록함.