By integrating knowledge and skills from different domains, Open-domain Model Synthesis (OMS) holds the potential to drive the development of artificial general intelligence (AGI), enabling AI to solve a diverse array of problems and tasks. While current research in this field has made some preliminary attempts, there are several notable challenges that need to be addressed: 1) Extensibility: Several existing works employ a fixed number of models, such as WebGPT [18] and ToolFormer [32], resulting in difficulties when attempting to expand their capabilities; 2) Nonlinear Task Planning: The majority of current research is limited to solving tasks with linear task planning solutions [37, 11], meaning that each sub-task must be completed before the next sub-task can start. However, linear planning of models may not suffice for solving complicated tasks, besides, many tasks involve multiple multi-modal inputs. 3) Quantitative Evaluation: Many existing works only provide qualitative results, such as HuggingGPT [33]. This makes it difficult to assess the planning capabilities of LLMs to determine whether the strategies employed are optimal.
설명 : Open-domain Model Synthesis(OMS)는 artifical general intelligence (AGI)의 발전을 이끌 잠재력을 지님. 하지만 아직 문제가 있음. 문제 제기 (1) Extensibility: 고정된 수의 모델들을 고용해 사용하는데 이 모델들의 능력에 벗어난 걸 시도하기 힘듦. (2) Nonlinear Task Planning: linear taks plainning 업무를 처리하는데 제한이 있음. 즉, 각 sub-task가 해결되어야만 다음 sub task가 시작될 수 있음. 그러나 complicated tasks를 해결하는데 아직 부족하고 게다가 많은 tasks를 여러 multi-modal inputs을 포함해여함. (3) Quantitative Evaluation: 많은 업무들은 허깅페이스와 같이 qualitative results를 제공함. 이게 전략적인 최적화를 가지고 결정하기 위해 LLMs의 능력을 계획하여 평가하기 어렵게 만듦.(아마 Hugging face와 같은 곳에 모델이 결정되어있다? 그런 거 같음)
In order to mitigate the above limitations, we develop a platform that encompasses a diverse array of domain-specific expert models and intricate multi-step tasks with single or multiple multi-modal inputs, supported by corresponding datasets.
위와 같은 제한을 완화하기 위해 해당하는 데이터셋을 지원하는 single of multiple multi-model input에 여러 특정 분야 전문적인 모델들의 multi-step tasks를 포함하는 플랫폼을 개발함.
We employ data augmentation techniques discussed above to augment the raw datasets. Specifically, for tasks with image inputs, we can choose one or more techniques from the image augmentation method set {Gaussian Blur, Gaussian Noise, Grayscale, Low Resolution} to generate a compositionally augmented image input, which necessitates a multi-step image restoration process for recovery. Similarly, for tasks with text inputs or outputs, we choose one or more from {Translation, Word Mask} to generate a compositionally augmented text input or output. Furthermore, Visual Question Answering (VQA) and Question Answering (QA) are tasks with multiple multi-model inputs, resulting in natural tasks that cannot be solved with linear task planning solutions. Lastly, we integrate both aspects to construct complex, multi-step tasks. In total, we generate a total number of 185 complex multi-step tasks, with 117 tasks featuring a linear task structure and the remaining 68 tasks exhibiting a non-linear task structure.
A selection of task samples, along with their corresponding input and output data samples, can be found in Table 4. For illustration, consider the third row of Table 4, which represents a machine translation domain task (i.e., translating from English to German).
input과 output 데이터가 상응해야함. table 4를 보면 알 수 있음. 그래서 하나 또는 그 이상 작업을 택할 수 있도록 만듦. 얘시로 Word Mask, Blur와 같은 거임. 이렇게 하여 Multi-step Tasks를 해결함.(자세히 안나옴)
Given that OpenAGI comprises a diverse range of domain tasks with multi-modal data, we classify them according to domain tasks as well as input and output types. We then assess their performance using the following three metrics:
각 여러 테스크에 맞는 score를 정함. 위와 같음.
One potential method to improve the capabilities of LLMs is to incorporate reinforcement learning (RL) techniques. By merging the strengths of RL, LLMs can gain additional insights from trial-and-error experiences. This leads to more robust and adaptive models, especially in situations where labeled data is scarce or when tasks involve physical interactions. In this work, we propose Reinforcement Learning from Task Feedback (RLTF), which utilizes task feedback to supply more information that guides the learning direction of LLMs, resulting in improved and more efficient strategies.
LLM의 능력을 향상 시키기 위해 RL(강화학습)과 합쳐짐. 저자는 Task Feedback으로부터 강화학습을 제안함. LLMs의 학습 방향을 가이드하여 더 많은 정보를 제공하기 위해 feedback을 활횽함. 향상된 결과와 더 효율적인 전략임.
In the setup of RLTF, the environment is the proposed OpenAGI platform and the agent is the LLM L parameterized with Φ. The solution s generated by the LLM can be seen as a set of instructions that solve the input task t and can be executed on the corresponding augmented dataset Dt. We can use the performance (provided in Sec. 3.4) on that dataset as the reward signal R and use reinforcement learning to fine-tune the LLM. More concretely, to find the optimal solution, we require the LLM to maximize its expected reward on the training set Ttrain, represented by J(Φ):
간단하게 LLM L은 Φ와 함께 매개변수화됨. LLM의해 생성된 solution s는 input task t를 해결하고 aug dataset Dt에 상응한 executed를 함. reward signal R로 데이터셋에 성과를 사용하고 fine-tune LLM에 강화학습을 사용함. 더 자세히 reward를 최대화 하도록함. 위와 같은 공식을 따름.