discuss a paper or thesis (old): Online Learning via Memory: Retrieval-Augmented Detector Adaptation
- Paper Address:/abs/2409.10716
innovation point
Proposing an Innovative Online Learning Framework for Enhancing the Categorization Process through RetrievalRAC
, has the following advantages over traditional offline training/fine-tuning based approaches:
- Online and continuous learning capabilities.
- Minimal labeling requirements.
- There is no need for computation adapted to the visual domain.
Content overview
Target detectors have evolved from closed-set to open-world models, but applying these models to new domains often results in poor detection performance. To this end, the paper proposes a novel approach to adapt any off-the-shelf target detection model online to new domains without retraining the detector model.
Inspired by the way humans quickly learn new topics (e.g., memory), the paper allows the detector to look up similar object concepts from memory while testing. This is accomplished through a retrieval-enhanced categorization (RAC
) module is realized with a memory bank that can be flexibly updated with new domain knowledge.
Experiments were performed on a variety of off-the-shelf open-set detectors and closed-set detectors. Using only a small memory bank (e.g., one for each class of10
(one image) and without training.RAC
Significantly better than the baseline, it excels in adapting the detector to new areas.
Retrieve enhanced detector adaptation
The eLearning framework consists of the following main modules:
- An online updatable memory containing images of the target domain used to provide online adaptation of new concepts
- An object (foreground) proposal model from an off-the-shelf model, either an open-world detector, any detector trained on similar domain data with different ontologies, or a simple region proposal network (
RPN
)。 - A context retrieval module for associating an image context in a memory bank with an inference image.
- An instance retrieval module for associating a proposed object instance with an instance in a retrieved similar context.
For query images, the context levelRAC
Similar context images are first selected from the memory bank. Then based on the object proposals in the query image, for each proposal, the instance levelRAC
Instance matching is performed on selected similar context images. Finally, each proposal is assigned a category based on the votes from the retrieved instances.
Object (foreground) proposal model
A pre-trained detector was used as an object proposal network for the localization subtask and focused on solving the new concept classification subtask.
Proposal networks can take many forms, such as off-the-shelf open-set detectors, detectors trained on different datasets (e.g., detectors with different ontologies), or simple regional proposal networks (RPN
), as long as it can provide meaningful prospective proposals. Even binary without any semantic capabilitiesRPN
network, which can also be made capable of categorization.
memory bank
RAC
Only a minimal amount of data is needed to build a memory bank, e.g. per category10
sheets of images that can be easily labeled by end users in an online learning environment. In order to build an efficient memory bank, the paper proposes an unsupervised image selection method that utilizes image-level feature clustering to maximize the coverage as well as minimize the annotation effort.
-
Unsupervised seeded image clustering
Use powerful image features to extract the backbone (e.g.CLIP
) extract embeddings from unlabeled target domain images, which are subsequently clustered according to the number of images labeled by the user (e.g., using thek-means
), forming clusters of the target number. The central image in each cluster is labeled by the user and represents a diverse and representative scene. The method is able to achieve this by labeling each category with only10
Good detection performance is achieved in just one image.
Search Enhancement (RAC
) Module
By storing labeled seed objects and images in the memory bank, the retrieval enhancement module enables the object detector to gain new semantic classification capabilities by matching the proposals detected by the target with the seed objects.
A major challenge in object matching is the presence of different classes of objects with similar appearance in the target domain. To address these confusions, the paper constructs a multi-stage context matching process. The first stage, context retrieval, narrows the search by filtering out irrelevant scenes (e.g., filtering out maritime scenes of ships). The second stage, instance retrieval, is performed on contextually matched images. By considering both instance appearance and context, the method minimizes classification confusion and improves retrieval accuracy.
For retrieval enhancement models, a powerful feature extractor is necessary. However, it does not need to be trained on the target domain to achieve good semantic classification accuracy. Therefore, any powerful pre-trained feature extractor such asDINOV2
maybeCLIP
All can be used in an untrained manner or fine-tuned for optimal performance on the memory banks provided.
Specifically, image-level semantic matching is performed in the first stage, using off-the-shelfCLIP
model to extract image-level features and then calculate the similarity between the query image and the memory bank image. Instance-level matching is performed in the second stage, where the first k images are selected from the image-level matching results (k =20
, 50
, 100
), using off-the-shelf or fine-tunedCLIP
The model extracts bounding box-level features and then computes the similarity between instances for the first k images selected. Thus, the final instance classification result is a combination of bounding box-level matching and global contextual matching, which effectively reduces appearance-induced confusion.
Main experiments
If this article is helpful to you, please click a like or in the look at it ~~
For more content, please pay attention to WeChat public number [Xiaofei's Algorithm Engineering Notes].