This article is a core distillation of published papers, not a direct translation, and is intended for scholarly communication. Please contact the owner of the number for any infringement issues so that it can be removed.
discuss a paper or thesis (old): Can OOD Object Detectors Learn from Foundation Models?
- Paper Address:/abs/2409.05162
- Thesis Code:/CVMI-Lab/SyncOOD
innovation point
- Investigating and Uncovering Text-to-Image Generation Models Trained on Large-Scale Open-Set Data Synthesized for Target Detection Tasks
OOD
The potential of the object. - Introducing an automated data wrangling process for controlled, annotated scene-level synthesis
OOD
Images forOOD
Target Detection. The process utilizes large-scale language models (LLMs
) performs new object discovery and uses a visual base model for data annotation and filtering. - It was found that in keeping
ID
/OOD
consistency of the image context as well as obtaining more accurateOOD
Annotate the bounding box to the synthesized data in theOOD
Validity in target detection is critical. - Comprehensive experiments on multiple benchmarks demonstrate the effectiveness of the method, significantly outperforming existing state-of-the-art methods while using minimal synthetic data.
Content overview
Outside the distribution (OOD
) Target detection is a challenging task because of the lack of open sets ofOOD
Data. Inspired by recent advances in text-to-image generative modeling, such as theStable Diffusion
The paper investigates the synthesis of generative models trained on large-scale open-set dataOOD
The potential of the sample to enhanceOOD
Target Detection.
The paper presents theSyncOOD
, which is a simple data curation method. The method utilizes the power of large-scale base models to automatically extract meaningful from text-to-image generative modelingOOD
data, enabling the model to access the open-world knowledge contained in the off-the-shelf base model. SynthesizedOOD
The samples were then used to enhance a lightweight, plug-and-playOOD
Detectors are trained so that they are efficiently optimized within the distribution (ID
)/OOD
of decision-making boundaries.
Extensive experiments on multiple benchmarks have shown that theSyncOOD
Significantly outperforms existing methods in terms of performance, establishing new state-of-the-art performance by virtue of minimal synthetic data usage.
SyncOOD
The anomaly synthesis pipeline consists of two parts:
- Compositing an effective set of photo-realistic scene levels
OOD
imagery\(\textbf{x}^{\text{edit}}\) is denoted by\(\mathcal{D}_{\text{edit}} = \left\{(\textbf{x}^{\text{edit}}, \textbf{b}^{\text{edit}})\right\}\) The image contains the Novelty object and its corresponding labeling box.\(\textbf{b}^{\text{edit}}\) This process is based on the process from\(\mathcal{D}_{\text{id}}\) Performs fully automated region-level editing. - Selecting and using efficient synthetic data for training
OOD
The object detector provides pseudoOOD
Supervision, with training focusedID
Samples are used together.
Synthesizing new semantic objects
-
Imagining New Conceptual Objects from Intra-Distributed Objects
As shown in Fig. (a), based on the training set\(\mathcal{D}_{\text{id}}\) hit the nail on the headID
tab (of a window) (computing)\(\mathcal{Y}_{\text{id}}\) Utilizing large-scale language modelsLLM
(e.g.GPT-4
) extensive knowledge and reasoning capabilities to check visual similarity and contextual compatibility for everyID
The object label envisions a set of novel objects, notated as\(\mathcal{Y}_{\text{novel}}\) while maintaining the relationship between the imagined object and theID
Semantic separability between objects. This can correlateID
objects and facilitates the conceptualization of possible new objects by using hints containing contextual examples to replace existingID
Object.
-
Editing objects in a specified area
In order to generate new concepts containing\(y_j \in \textbf{y}^{\text{novel}}_i\) in the existing image, select Replace in the existing image with a new image labeled\(y_i^{\text{id}}\) existingID
objects rather than finding new locations or generating images from scratch. By doing this, context compatibility is ensured and interference in the context of the scene is eliminated as the context is preserved.
As shown in Fig. (b), using the stabilizing diffusion repair (Stable-Diffusion-Inpainting
(a) the use ofID
The image is edited at the regional level to obtain an edited image containing new objects\(\textbf{x}^{\text{edit}}\) For:
-
Refinement of comment boxes for new objects
Due to the randomness in the diffusion model, the attributes of the edited objects, such as mass, volume, and positioning, may not match with the original object frames. To solve this problem, as shown in Fig. (c), designing a model based on theSAM
s efficient and effective refiner to get the exact bounding box of a new object.
use the data from the\(\textbf{b}^{\text{id}}\) Expand out the filled area as a hint and use theSAM
Outputs the highest confidence instance mask for new objects in the region\(\textbf{m}^{\text{SAM}}\) :
The mask that will be obtained\(\textbf{m}^{\text{SAM}}\) Convert to bounding box\(\textbf{b}^{\text{SAM}}\) and calculate\(\textbf{b}^{\text{SAM}}\) corresponding\(\textbf{b}^{\text{id}}\) The intersection ratio between (IoU
) to filter out new objects that are more variable in scale:
Discovering difficult OOD samples and model training
-
Mining Hard OOD Objects with High Visual Similarities for Training
most likely to be confused by the target detector as the originalID
objects are considered as the most efficient for new objects. Therefore, based on the pairwise similarity in the latent space of the pretrained target detectors, finding the most confusing objects to beID
synthesisOOD
Sample.
For an off-the-shelf target detector\(\mathcal{F}_\text{det}\) , extract potential features for each pair\(\textbf{z}^{\text{edit}}\) cap (a poem)\(\textbf{z}^{\text{id}}\) , filtering based on similarity to provide pseudoOOD
Supervision:
-
Optimization through synthetic samples
ID
/OOD
Decision-making boundaries
Once acquiredID
and synthesisOOD
object, using a lightweight multilayer perceptron (MLP
) \(\mathcal{F}_\text{ood}\) , as optimized by binary lossOOD
Detectors are involved in training:
Main experiments
If this article is helpful to you, please click a like or in the look at it ~~
For more content, please pay attention to WeChat public number [Xiaofei's Algorithm Engineering Notes].