정리 | Notion

Untitled

While much progress has been made on vision and language encoders, computer vision includes a wide range of problems beyond this scope, and for many of these, abundant training data does not exist.

문제 제기 : language 그리고 vision encoder들에 많은 진전이 있었음. computer vision은 이 범위를 넘는 많은 문제들을 포함하고 풍부한 데이터가 없음.

<aside> 💡 These “foundation models” [8] can generalize to tasks and data distributions beyond those seen during training. foundtaion model은 학습동안 볼수 있는 것 너머 데이터 분포와 일반화 하는 거임(zero-shot)

</aside>

In this work, our goal is to build a foundation model for image segmentation. That is, we seek to develop a promptable model and pre-train it on a broad dataset using a task that enables powerful generalization.

저자의 목표는 image segmentation을 위해 foundation model을 세우는 거임. 저자는 강력한 생성이 가능한 task를 사용하여 broad dataset에 pretrain하고 prmptable model을 발전시킨다.

We start by defining a promptable segmentation task that is general enough to provide a powerful pretraining objective and to enable a wide range of downstream applications. This task requires a model that supports flexible prompting and can output segmentation masks in realtime when prompted to allow for interactive use. To train our model, we need a diverse, large-scale source of data.

downstreamp 넓은 범위가 가능하고 강력한 pretraining objective를 제공하는 것으로 충분한 promptable segmentation task를 정의함. 이 task는 interactive를 허용해주는 prompted에 segmentation masks를 출력이 가능하고 flexible prompting 지원하는 model이어야함. 저자의 모델을 학습하기 위해, 저자는 다양하고 대량의 데이터를 필요함.

Task

we propose the promptable segmentation task, where the goal is to return a valid segmentation mask given any segmentation prompt (see Fig. 1a).

어떤 segmentation prompt든 정당한 segmentation을 return하는 게 목표임.

Model

In particular, the model must support flexible prompts, needs to compute masks in amortized real-time to allow interactive use, and must be ambiguity-aware.

모델은 flexible prompts를 지원해야하고, interative 사용이 되는 real-time 상각에 masks compute(가벼워라 의미임), *ambiguity-aware(모호성 인식)*을 해야함.

Data engine

Our solution is to build a “data engine”, i.e., we co-develop our model with model-in-the-loop dataset annotation (see Fig. 1c). Our data engine has three stages: assisted-manual, semi-automatic, and fully automatic.

새로운 데이터 분포에 강력한 일반화. 원래는 인터넷에 있는 데이터를 가져왔지만 마스크가 자연스럽게 풍부하지 않아서 다른 방법을 고안함. 그게 data engine임. data engine은 3 단계를 거침.

1. Segment Anything Task

Untitled

Task