Location>code7788 >text

SyncOOD: Increasing the Robustness of OOD Target Detection with Automated Data | ECCV'24

Popularity:409 ℃/2024-11-06 02:46:49

This article is a core distillation of published papers, not a direct translation, and is intended for scholarly communication. Please contact the owner of the number for any infringement issues so that it can be removed.

discuss a paper or thesis (old): Can OOD Object Detectors Learn from Foundation Models?

  • Paper Address:/abs/2409.05162
  • Thesis Code:/CVMI-Lab/SyncOOD

innovation point


  • Investigating and Uncovering Text-to-Image Generation Models Trained on Large-Scale Open-Set Data Synthesized for Target Detection TasksOODThe potential of the object.
  • Introducing an automated data wrangling process for controlled, annotated scene-level synthesisOODImages forOODTarget Detection. The process utilizes large-scale language models (LLMs) performs new object discovery and uses a visual base model for data annotation and filtering.
  • It was found that in keepingID/OODconsistency of the image context as well as obtaining more accurateOODAnnotate the bounding box to the synthesized data in theOODValidity in target detection is critical.
  • Comprehensive experiments on multiple benchmarks demonstrate the effectiveness of the method, significantly outperforming existing state-of-the-art methods while using minimal synthetic data.

Content overview


Outside the distribution (OOD) Target detection is a challenging task because of the lack of open sets ofOODData. Inspired by recent advances in text-to-image generative modeling, such as theStable DiffusionThe paper investigates the synthesis of generative models trained on large-scale open-set dataOODThe potential of the sample to enhanceOODTarget Detection.

The paper presents theSyncOOD, which is a simple data curation method. The method utilizes the power of large-scale base models to automatically extract meaningful from text-to-image generative modelingOODdata, enabling the model to access the open-world knowledge contained in the off-the-shelf base model. SynthesizedOODThe samples were then used to enhance a lightweight, plug-and-playOODDetectors are trained so that they are efficiently optimized within the distribution (ID)/OODof decision-making boundaries.

Extensive experiments on multiple benchmarks have shown that theSyncOODSignificantly outperforms existing methods in terms of performance, establishing new state-of-the-art performance by virtue of minimal synthetic data usage.

SyncOOD


The anomaly synthesis pipeline consists of two parts:

  1. Compositing an effective set of photo-realistic scene levelsOODimagery\(\textbf{x}^{\text{edit}}\) is denoted by\(\mathcal{D}_{\text{edit}} = \left\{(\textbf{x}^{\text{edit}}, \textbf{b}^{\text{edit}})\right\}\) The image contains the Novelty object and its corresponding labeling box.\(\textbf{b}^{\text{edit}}\) This process is based on the process from\(\mathcal{D}_{\text{id}}\) Performs fully automated region-level editing.
  2. Selecting and using efficient synthetic data for trainingOODThe object detector provides pseudoOODSupervision, with training focusedIDSamples are used together.

Synthesizing new semantic objects

  • Imagining New Conceptual Objects from Intra-Distributed Objects

As shown in Fig. (a), based on the training set\(\mathcal{D}_{\text{id}}\) hit the nail on the headIDtab (of a window) (computing)\(\mathcal{Y}_{\text{id}}\) Utilizing large-scale language modelsLLM(e.g.GPT-4) extensive knowledge and reasoning capabilities to check visual similarity and contextual compatibility for everyIDThe object label envisions a set of novel objects, notated as\(\mathcal{Y}_{\text{novel}}\) while maintaining the relationship between the imagined object and theIDSemantic separability between objects. This can correlateIDobjects and facilitates the conceptualization of possible new objects by using hints containing contextual examples to replace existingIDObject.

  • Editing objects in a specified area

In order to generate new concepts containing\(y_j \in \textbf{y}^{\text{novel}}_i\) in the existing image, select Replace in the existing image with a new image labeled\(y_i^{\text{id}}\) existingIDobjects rather than finding new locations or generating images from scratch. By doing this, context compatibility is ensured and interference in the context of the scene is eliminated as the context is preserved.

As shown in Fig. (b), using the stabilizing diffusion repair (Stable-Diffusion-Inpainting(a) the use ofIDThe image is edited at the regional level to obtain an edited image containing new objects\(\textbf{x}^{\text{edit}}\) For:

\[\begin{equation} \textbf{x}^{\text{edit}}=\text{SDI}(\textbf{x}^{\text{id}},\textbf{b}^{\text{id}},\textbf{y}^{\text{novel}}). \label{eq:sdi} \end{equation} \]

  • Refinement of comment boxes for new objects

Due to the randomness in the diffusion model, the attributes of the edited objects, such as mass, volume, and positioning, may not match with the original object frames. To solve this problem, as shown in Fig. (c), designing a model based on theSAMs efficient and effective refiner to get the exact bounding box of a new object.

use the data from the\(\textbf{b}^{\text{id}}\) Expand out the filled area as a hint and use theSAMOutputs the highest confidence instance mask for new objects in the region\(\textbf{m}^{\text{SAM}}\)

\[\begin{equation} \textbf{m}^{\text{SAM}}=\text{SAM}(\textbf{x}^{\text{edit}};\text{padding}(\textbf{b}^{\text{id}}, e)), \label{eq:sam} \end{equation} \]

The mask that will be obtained\(\textbf{m}^{\text{SAM}}\) Convert to bounding box\(\textbf{b}^{\text{SAM}}\) and calculate\(\textbf{b}^{\text{SAM}}\) corresponding\(\textbf{b}^{\text{id}}\) The intersection ratio between (IoU) to filter out new objects that are more variable in scale:

\[\begin{equation} \left\{\textbf{b}^{\text{edit}}\right\}=\left\{\left.\textbf{b}^{\text{SAM}}\middle|\right.\text{IoU}(\textbf{b}^{\text{SAM}},\textbf{b}^{\text{id}})>\gamma\right\}, \label{eq:iou} \end{equation} \]

Discovering difficult OOD samples and model training

  • Mining Hard OOD Objects with High Visual Similarities for Training

most likely to be confused by the target detector as the originalIDobjects are considered as the most efficient for new objects. Therefore, based on the pairwise similarity in the latent space of the pretrained target detectors, finding the most confusing objects to beIDsynthesisOODSample.

For an off-the-shelf target detector\(\mathcal{F}_\text{det}\) , extract potential features for each pair\(\textbf{z}^{\text{edit}}\) cap (a poem)\(\textbf{z}^{\text{id}}\) , filtering based on similarity to provide pseudoOODSupervision:

\[\begin{equation} \textbf{z}^{\text{edit}},\textbf{z}^{\text{id}}=\mathcal{F}_\text{det}(\textbf{b}^{\text{edit}};\textbf{x}^{\text{edit}}),\mathcal{F}_\text{det}(\textbf{b}^{\text{id}};\textbf{x}^{\text{id}}). \label{eq:extract} \end{equation} \]

\[\begin{equation} \left\{\textbf{z}^{\text{ood}}\right\}=\left\{\left.\textbf{z}^{\text{edit}}\middle|\right.\epsilon_{\textit{low}}<\text{sim}(\textbf{z}^{\text{edit}},\textbf{z}^{\text{id}})<\epsilon_{\textit{up}}\right\}, \label{eq:sim} \end{equation} \]

  • Optimization through synthetic samplesID/OODDecision-making boundaries

Once acquiredIDand synthesisOODobject, using a lightweight multilayer perceptron (MLP\(\mathcal{F}_\text{ood}\) , as optimized by binary lossOODDetectors are involved in training:

\[\begin{equation} \mathcal{L}_\text{ood}=\mathbb{E} _{\textbf{z}\sim\textbf{z}^{\text{id}}}\left[-\log\frac{1}{1 + \exp^{-\mathcal{F}_\text{ood}(\textbf{z})}}\right]+\mathbb{E} _{\textbf{z}\sim\textbf{z}^{\text{ood}}}\left[-\log\frac{\exp^{-\mathcal{F}_\text{ood}(\textbf{z})}}{1+\exp^{-\mathcal{F}_\text{ood}(\textbf{z})}} \right]. \label{eq:optim} \end{equation} \]

Main experiments




If this article is helpful to you, please click a like or in the look at it ~~
For more content, please pay attention to WeChat public number [Xiaofei's Algorithm Engineering Notes].

work-life balance.