The paper analyzes existing new categories of discovery and localization (
NCDL
) method and identifies the core problem: target detectors tend to bias towards known targets and ignore unknown ones. To address this problem, the paper proposes debiased region mining (DRM
) methods to combine class-independent in a complementary wayRPN
and class perceptionRPN
for target localization, semi-supervised comparative learning using unlabeled data to improve representation networks, and the use of simple and efficientmini-batch K-means
Clustering methods for new class discovery Source: Xiaofei's Algorithmic Engineering Notes Public
discuss a paper or thesis (old): Debiased Novel Category Discovering and Localization
- Paper Address:/abs/2402.18821
Introduction
Existing object detection methods are trained and evaluated on a closed dataset of fixed categories, whereas in real scenarios, object detectors need to face both known and potentially unknown objects. After training, the model then does not recognize any objects that were not seen during training, and either sees the unknown object as background or misclassifies it as a known category. In contrast, humans have the ability to perceive, discover and recognize unknown new objects. Thus, new category discovery (Novel Category Discovery
,NCD
) problem has attracted a lot of attention, detecting known objects while also discovering new categories unsupervised.
(great) majorityNCD
methods all perform a pre-training step on the labeled dataset first and then process the unlabeled data. While effective, most methods utilize only known objects and classes for pre-training and localization, which introduces two types of biases. The first is the biased feature representation introduced by detection heads trained using closed sets, and the second is training only on labeled closed setsRPN
The resulting positioning bias.
In order to solve the above problem, the paper proposes a debiasingNCD
methods to mitigate bias in feature representation and object localization:
- The introduction of a semi-supervised comparison learning method enables the model to learn similar features of similar instances, in distinguishing objects of unknown class from objects of known class.
- propose a double
RPN
strategy to simultaneously detect target objects in an image. ARPN
with class-awareness capabilities designed to obtain accurate localization information for known classes. AnotherRPN
is then category-independent and is designed to localize unlabeled target objects.
The contribution of the paper can be summarized as follows:
- Revisiting the problem of new category discovery in the open world and examining the problem of bias in existing methods.
- Using a dual object detector to get a good region proposal can efficiently find all target objects in an image and localize them better.
- Designing a semi-supervised instance-level contrast learning method to obtain better feature representations than before, so that the model relies on unlabeled image information to learn image features.
- The results of a large number of experiments show that the paper's method is superior to other baseline methods.
Framework Details
Overview
The overall structure is shown in Figure 2:
- Optimizing feature extractors by semi-supervised comparative learning to learn more general feature representations.
- Through the double
RPN
module to generate different boxes, and then use theROI pooling
to pool features for use as final proposal input. - Instances with similar characteristics are grouped together through clustering so that different unknown classes can be discovered.
Debiased Region Mining
In the actual assignment, the paper observes thatRPN
The two scenarios of the
- When encountering unannotated images, the model tends to categorize them as background without locating any objects.
- When the model recognizes an unknown object, it incorrectly classifies it as a known object with high confidence.
existFaster R-CNN
in which the target localizer is the classification header for the upstream task, extracting known classes of interest to the model. This leads to a bias towards recognizing known targets, severely affecting the generality of the model.
In Figure 3, three different kinds ofRPN
The positioning performance:
- The first is class perception
RPN
: Such proposals are of particular relevance to theVOC
known objects in the proposal show higher confidence, which improves the quality of the proposal. However, proposals with average confidence tend to be clustered and usually contain only a portion of the target objects. As a result, the ability to generalize the detected objects is limited. - The second is class agnostic
RPN
: By removing the category header and learning only on the webobjectness
to generate proposals. Despite the enhanced proposal generalization compared to the baseline, the localizationVOC
Category accuracy is still not optimal and many proposals still exhibit clustering. - The third is the merging method proposed in the paper: by selecting the reliable box from the two boxes and scaling the confidence level of each box through the
NMS
Harmonization of proposals. The method significantly improves the quality of the proposal and can be used without affecting the knownVOC
category accuracy in the case of extracting more target objects. In addition, it effectively solves the proposal clustering problem.
The paper argues that realistic scenarios ofNCDL
The problem should be more consistent with object detection scenarios in the open world, and the object extractor should not be limited by the classification header. Therefore, the paper inFaster-RCNN
The introduction of an additional class-independentRPN
that can generate more generalized object scores and retrieve more objects. TheRPN
Replacing class-related losses with class-independent losses, the proposal is estimated only by theobjectness
:
- exist
RPN
hit the nail on the headcenterness
regression rather than categorical loss. - exist
ROI
Use in the headerIoU
regression instead of the categorical loss.
differentRPN
The two sets of boxes obtained were analyzed for reliability and were found to have different distributions on the confidence intervals, indicating different strengths and weaknesses. Therefore, the paper proposes theDebiased Region Mining
(DRM
) approach to perception through classRPN
Class-independentRPN
Get two different sets of boxes. Class PerceptionRPN
The obtained boxes have high accuracy on known classes, but generalize poorly and perform poorly on unknown classes. On the other hand, frames obtained by class-independentRPN
The obtained boxes may not perform as well as the former on known classes, but have a stronger generalization to unknown classes. Combining these two sets of boxes results in a new ensemble of boxes that combines the advantages of both.
Suppose that the two sets of boxes and their confidence scores are expressed as\(\lambda_{1}\) and $\lambda_{2} $, respectively, obey two different distributions\(\Phi_{1}\) respond in singing\(\Phi_{2}\), the two distributions need to be mapped to a unified\({\Phi}\) to remove gaps between different box generation methods. In order to keep boxes with high confidence and filter out boxes with very low confidence, set the threshold value\(\alpha\_i,\beta\_i(i=1,2)\) to filter the confidence level. Merge the two sets of boxes after filtering and useNMS
Merge the redundant boxes to obtain a fused result.
Semi-supervised Contrastive Finetuning
After obtaining the frames, an instance-level semi-supervised contrast learning approach is used to extract more general and expressive features.
First, according toGT
programming languageVOC
The images in the dataset are cropped into image chunks that form the marker set\(B_{\mathcal{L}}\). Subsequently, in ___COCO
__ Generate proposals on the validation set and crop out image blocks to form the unlabeled set\(B_{\mathcal{U}}\). After that, each image block is given a randomized enhancement by\(\mathbf{x}\) Generate two different views\(\mathbf{x}^{\prime}\)The unsupervised comparison loss is calculated as:
included among these\(\mathbf{z},\mathbf{z}^{\prime}\) is the corresponding characterization.\(\tau\) is a temperature hyperparameter.
For labeled image blocks, the labels can be used to form a supervised contrast loss:
included among these\(\mathcal{N}(i)\) denote the same as\(\mathbf{x}\_{i}\) Indexes with the same label.
Finally, the total loss is constructed as follows:
This loss will be used to supervise the training of the feature extractor.
Clustering
After completing the comparative learning of unknown categories of objects, the model performs a cluster analysis of the obtained information and aggregates the unknown images with similar features into clusters.
Use something similar to theK-means
method for clustering, two modifications were made:
- adoption
over-clustering
strategy, by forcing the generation of another finer-grained unlabeled data partition and adding theK
(estimated number of clusters) to improve clustering purity and feature quality.over-clustering
It is beneficial to reduce the involvement of supervision by allowing the neural network to decide how to divide the data. This cut is effective in the presence of noisy data or when intermediate classes are randomly assigned to neighboring classes. - Use in new category discovery tasks
K-means
Very time-consuming to employMini-batch K-means
(in large-scale data)K-means
optimization algorithm) instead. A subset of data is randomly sampled during training to reduce the training computation time consuming while optimizing the objective function.
The main steps of the clustering algorithm are as follows:
- Extract a subset of the training data and use
K-means
construct (sth abstract)K
A clustering center. - Sample data is extracted from the training set and added to the model, assigning it to the nearest clustering center.
- Update the cluster center of each cluster.
- Repeat steps 2 and 3 until the clustering center is stable or the maximum number of iterations is reached.
Experiments
If this article is helpful to you, please point a praise or in the look chant ~ undefined more content please pay attention to WeChat public number [Xiaofei's algorithmic engineering notes].