Data-Free, Multi-Target Domain Adaptation Merging Scheme, Simple and Effective

discuss a paper or thesis (old): Training-Free Model Merging for Multi-target Domain Adaptation

Paper Address:/abs/2407.13771
Thesis Code:/ModelMerging

innovation point

A systematic exploration of pattern connectivity in domain-adapted scene resolution models reveals potential conditions under which model merging is effective.
A model merging technique, including parameter merging and buffer merging, is introduced for multi-target domain adaptation tasks that can be applied to any single-target domain adaptation model.
Performance comparable to training with multiple merged datasets can be achieved even when data availability is limited.

Content overview

The paper investigates multi-objective domain adaptation of scene understanding models (MTDA). While previous approaches have achieved promising results through inter-domain consistency loss, they typically assume unrealistic simultaneous access to images from all target domains, ignoring issues such as data transfer bandwidth limitations and data privacy. In light of these challenges, the paper poses the question: how can models independently adapted in different domains be merged without direct access to the training data?

The solution to this problem consists of two parts, namely merging model parameters and merging model buffers (i.e., normalized layer statistics). In terms of merging model parameters, an empirical analysis of pattern connectivity unexpectedly showed that linear merging is sufficient for separate models trained using the same pre-trained backbone weights. In terms of merging model buffers, a Gaussian prior was used to model real-world distributions and new statistics were estimated from the buffers of the separately trained models.

The paper's approach is simple yet effective, achieving performance comparable to the data combination training baseline while eliminating the need to access the training data.

methodologies

Previous methods have assumed the non-realistic assumption that all target domain images can be accessed simultaneously during the adaptation phase. On the contrary, the flow of the thesis method consists of two distinct phases:

Single-target domain adaptation phase, where models adapted to each target domain are trained separately. A state-of-the-art unsupervised domain adaptation approach is simply usedHRDA, utilizing various backbone architectures such asResNetvisualizationTransformer。
The model merging phase (the main focus), focuses on merging these adapted models together to create a robust model without accessing any training data. The method contains two key components of the model: the parameters (i.e., the weights and biases of the learnable layers) and the buffer (i.e., the running statistics of the normalized layers).

parameter merge

The paper finds through comparative experiments that when starting from the same pre-training weights, the domain adaptation models are able to effectively transition to diverse target domains while maintaining linear pattern connections in the parameter space. Thus, a simple midpoint merge between these training models can generate models that are robust in both domains.

buffer consolidation

buffer, i.e., the buffer used for batch normalization (BN) The running means and variances of the layers are closely related to the domain, as they encapsulate domain-specific features. While existing methods mainly deal with the merging of two models trained on different subsets within the same domain, the paper investigates the merging of two models trained in completely different target domains, so the problem of buffer merging becomes less straightforward.

BNlayer is introduced to mitigate the problem of internal covariate bias, where the mean and variance of the inputs change as they pass through the internal learnable layer. In this context, the basic consideration is that subsequent learnable layers are expected to merge theBNThe output of the layer follows a normal distribution. Since the output of theBNlayer preserves the inductive bias of inputs conforming to a Gaussian prior, and thus can be derived from the\(\mathbf{\Gamma}_A\) cap (a poem)\(\mathbf{\Gamma}_B\) to estimate the value obtained in the\(\boldsymbol{\mu}^{(i)}\) cap (a poem)\([\boldsymbol{\sigma}^{(i)}]^2\) .. Two sets of the mean and variance of the data points from that Gaussian prior are first obtained, along with the sizes of those sets, and these values are jointly used to estimate the parameters of that distribution.

When extending the merge method to\(m (m \geq 2)\) For a Gaussian distribution, the number of tracked batches can be calculated as follows\(n^{(i)}\) Weighted average of mean values\(\boldsymbol{\mu}^{(i)}\) and a weighted average of the variances.

\[\begin{equation} \label{m-buffer-merging-n-and-mean} \begin{split} n^{(i)} =& n^{(i)}_1 + n^{(i)}_2 + \cdots +n^{(i)}_M, \\ \boldsymbol{\mu}^{(i)} =& \frac{1}{n^{(i)}} (n^{(i)}_1 \boldsymbol{\mu}^{(i)}_1 + n^{(i)}_2 \boldsymbol{\mu}^{(i)}_2 + \cdots + n^{(i)}_M \boldsymbol{\mu}^{(i)}_M),\\ \boldsymbol{\sigma}^2 =& \frac{\sum_{j=1}^{M} n^{(i)} (\boldsymbol{\sigma}^i_j)^2 + \sum_{j=1}^{M} n_j^i (\boldsymbol{\mu}_j^i - \boldsymbol{\mu}^i)^2}{\sum_{j=1}^{M} n_j^i}. \end{split} \end{equation} \]

Main experiments

If this article is helpful to you, please click a like or in the look at it ～～
For more content, please pay attention to WeChat public number [Xiaofei's Algorithm Engineering Notes].

work-life balance.

Data-Free, Multi-Target Domain Adaptation Merging Scheme, Simple and Effective | ECCV'24

parameter merge

buffer consolidation