FALCON: Breaking Boundaries, Unsupervised Fine-Grained Category Inference for Coarse-Grained Labels, Open Source| ICML'24

In many practical applications, it is easier to obtain coarse-grained labels as opposed to fine-grained labels that reflect subtle differences between categories. However, existing methods cannot utilize coarse labels to infer fine-grained labels in an unsupervised manner. To fill this gap, the paper proposesFALCON, a method for discovering fine-grained categories from coarse-grained labeled data without fine-grained level of supervision.FALCONPotential relationships between unknown fine-grained and coarse-grained categories are also inferred. In addition.FALCONis a modular approach that can efficiently learn from multiple datasets with different strategies. We evaluated it on eight image classification tasks and one single-cell categorization taskFALCON。FALCONexisttieredImageNetExceeds the optimal baseline on the dataset22Percentage realized600Multiple fine-grained categories.

discuss a paper or thesis (old): Fine-grained Classes and How to Find Them

Paper Address:/abs/2406.11070
Code Address:/mlbio-epfl/falcon

Introduction

Machine learning excels in domains with large amounts of precisely labeled data. While coarse-grained labels are often abundant and easy to obtain, precise annotation of fine labels is challenging due to subtle differences between categories and a small number of distinguishing features. Thus, in many domains, obtaining such fine-grained labels requires domain expertise and tedious manual effort. For example.Bcells andTCells can be easily distinguished, but distinguishingCD4+T cells andCD8Very fine-grained cell subtypes such as +T cells require the identification of a very small number of specific labels. To automate the tedious task of obtaining fine-grained labels, machine learning methods that can distinguish subtle differences in fine-grained labels are needed.

Previous research has shown that coarse-grained labels can be used to learn fine-grained categories more efficiently. Weakly supervised classification methods use coarse-grained labels as a form of weak supervision to improve fine-grained classification performance. Recently, sample less learning methods have emerged. They train on a set of coarse-grained categories and then adapt fine-grained classification with only a few labeled samples per category. However, all these methods require a predefined set of fine-grained categories as well as access to a small set of their samples.

In this work, the paper proposes a method calledFALCON（Fine grAined Labels from COarse supervisioN) approach, which discovers fine-grained categories in a coarsely labeled dataset and does not require any supervision.FALCONThe key finding is that fine-grained predictions can be recovered from coarse-grained predictions by combining the relationships between coarse-grained and fine-grained categories. Based on this finding, theFALCONA specialized optimization process was developed that alternates between inferring unknown relationships between coarse- and fine-grained categories and training fine-grained classifiers. The relationship between coarse- and fine-grained categories is inferred by solving a discrete optimization problem, while the fine-grained classifier is trained using coarse-grained supervision and fine-grained pseudo-labeling. Furthermore.FALCONIncompatible coarse-grained categories with multiple datasets can be seamlessly adapted and utilized and relabeled at the same fine-grained level.

commander-in-chief (military)FALCONComparisons were made with other alternative benchmark methods on eight image categorization datasets as well as a single-cell dataset in the biological domain. The experimental results show that theFALCONFine-grained categories were efficiently discovered without supervision and consistently outperformed baseline methods on both image and single-cell data. For example, in the case of images containing608The fine-grained categories oftieredImageNeton the dataset.FALCONperformance improvement over the baseline method22%. Furthermore, when training with multiple datasets with different coarse-grained categories, theFALCONThe ability to effectively reuse different annotation strategies to improve their performance.

Fine-grained Class Discovery

Problem setup

found$\mathcal{X}$ is the sample space.$\mathcal{Y}_C$ mark the beginning of a new sentence that encompasses$K_C$ A collection of coarse-grained categories. Suppose that a coarse-grained labeled dataset is given$\mathcal{D}=\{(\mathbf{x}^i,y_c^i)\}_{i=1}^N$ which$\mathbf{x}^i\in\mathcal{X}$ ， $y_c^i\in\mathcal{Y}_C$ . In addition, each sample$\mathbf{x}\in\mathcal{D}$ are associated with a fine-grained category$y_\text{f}$ associated, and these fine-grained categories come from an unknown set of fine-grained categories$\mathcal{Y}_F$ . Assuming that each fine-grained category$y_\text{f}\in\mathcal{Y}_F$ Both with a single coarse-grained category$y_c\in\mathcal{Y}_C$ associated, i.e., with a unique coarse-grained parent category. The number of fine-grained categories$K_F = |\mathcal{Y}_F|$ more than$K_C$ and that this value can be previously known or estimated. Given a coarse-grained labeled data set$\mathcal{D}$ The goal is to discover a set of fine-grained categories$\mathcal{Y}_F$ . Thus, it is desirable to recover fine-grained labels only through the supervision of coarse-grained labeled datasets$\tau_F: \mathcal{X} \rightarrow \mathcal{Y}_F$ 。

Parameterizing the Fine-grained Class Discovery

existFALCONOne of the key findings in is that the combination of fine-grained predictions and category relations produces coarse-grained predictions. Thus, category relations can be used to link fine-grained predictions to coarse-grained labels.

Using a probabilistic classifier$f_\theta: \mathcal{X} \rightarrow \Delta^{K_F-1}$ For fine-grained labeling$\tau_F$ Modeling is performed to map the inputs to ($K_F-1$ dimensional$\Delta^{K_F-1}$ Probabilistic simplex (each point represents a probability distribution between a finite number of mutually exclusive events, each of which is usually referred to as a category) . Then, the fine-grained prediction of the classifier$\mathbf{p}_\text{f}$ getargmaxThe fine-grained categories of the samples can be obtained$\mathcal{Y}_F$ 。

\[\begin{equation} \tau_F(\mathbf{x}) = \text{argmax}_i \, \mathbf{p}_\text{f}^i, \quad \text{where} \quad \mathbf{p}_\text{f} = f_\theta(\mathbf{x}). \end{equation} \]

Here.$\theta \in \mathbb{R}^d$ is a parameter of the fine-grained classifier$\mathbf{p}_\text{f}$ be$\Delta^{K_F-1}$ A point on it.

Utilizing fine-grained forecasting$\mathbf{p}_\text{f}$ and category relationships$\mathbf{M}$ Getting coarse-grained predictions$\mathbf{p}_\text{c}$ 。

\[\begin{equation} \mathbf{p}_\text{c} = \mathbf{M}^T \mathbf{p}_\text{f}, \end{equation} \]

Among them.$\mathbf{p}_\text{c}$ is ($K_C-1$ )-dimensional probabilistic simplex$\Delta^{K_C-1}$ A point on the$\mathbf{M} \in \{0,1\}^{K_F \times K_C}$ is a binary matrix that describes the relationship between fine-grained and coarse-grained categories. Specifically, the elements$\mathbf{M}_{ij}$ be tantamount to1denote$i$ The fine-grained category is the same as the first$j$ A coarse-grained category is associated with the otherwise0. Since each fine-grained category is associated with only one coarse-grained category, the matrix$\mathbf{M}$ The sum of each row of the1. Therefore.$\mathbf{M}$ is an adjacency matrix of an undirected bipartite graph for modeling relationships between coarse and fine-grained categories.

FALCONSimultaneous learning of fine-grained classifiers and category relations through coarse-grained supervision using the cross-entropy objective function (CE) to utilize coarse-grained supervision and learn the parameter$\theta$ and relationships$\mathbf{M}$ ：

\[\begin{equation} \label{eq:joint_objective} \mathcal{L}_\text{coarse}(\theta, \mathbf{M}|\mathcal{D}) = \frac{1}{|\mathcal{D}|}\sum_{(\mathbf{x}, y_c) \in \mathcal{D}} \text{CE}(\mathbf{M}^Tf_\theta(\mathbf{x}), y_c). \end{equation} \]

By means of the discrete category relation$\mathbf{M}$ and continuous classifier parameters$\theta$ Performing joint optimization can lead to instability and high computational costs. In order to avoid these problems, the paper extends the objective function and optimizes the parameter$\theta$ and category relationships$\mathbf{M}$ Perform alternate optimization.

FALCONThe alternating optimization process in is shown in Fig.1shown, and proceed as follows:

In a given category relationship$\mathbf{M}$ in the case of the case of the parameter$\theta$ Parameterized fine-grained classifiers are trained.
Inferring category relationships based on fine-grained prediction and coarse-grained labeling of classifiers$\mathbf{M}$ 。
The process is repeated for a predefined number of rounds.

Training Fine-grained Classifier

In fixed category relationships$\mathbf{M}$ in the case of$\mathcal{L}_\text{coarse}(\theta, \mathbf{M}|\mathcal{D})$ change into$\mathcal{L}_\text{coarse}(\theta|\mathbf{M}, \mathcal{D})$ . However, training a fine-grained classifier only by coarse-grained labels cannot separate fine-grained categories within a coarse-grained category. To overcome this problem, theFALCONAdditional goals were introduced in to encourage local consistency and confidence in fine-grained predictions, leading to better separation of fine-grained categories from coarse-grained ones.

Consistent and confident fine-grained predictions

Given the nearest neighbor of an input, fine-grained prediction of consistent predictions is encouraged by intensively maximizing the dot product between input sample predictions and neighboring sample predictions. The corresponding loss$\mathcal{L}_\text{NN}$ is the logarithmic geometric mean of the dot product:

\[\begin{equation} \label{eq:nn_loss} \mathcal{L}_\text{NN}(\theta|\mathcal{D}) = \frac{-1}{N L}\sum_{(\mathbf{x},y_c) \in \mathcal{D}} \sum_{\hat{\mathbf{x}} \in \mathcal{N}(\mathbf{x}, y_c)} \ln (f_{\theta_\text{EMA}}(\hat{\mathbf{x}})^T f_\theta(\mathbf{x})), \end{equation} \]

Among them.$\mathcal{N}(\mathbf{x}, y_c)$ denotes a given sample$\mathbf{x}$ In the same coarse-grained category$y_c$ The set of nearest neighbor samples within the$\hat{\mathbf{x}}$ be$\mathcal{N}(\mathbf{x}, y_c)$ and an element in the$L = |\mathcal{N}(\mathbf{x}, y_c)|$ The Parameters$\theta_\text{EMA}$ is to compute the parameters in the iterative process$\theta$ The exponential moving average of the

\[\begin{equation} \theta_\text{EMA}^t = \gamma \theta_\text{EMA}^{t-1} + (1-\gamma) \theta^t, \end{equation} \]

Among them.$\gamma$ is the hyperparameter.$t$ represents the number of training iterations. Unlike previous studies, the paper retrieves nearest-neighbor samples from the same coarse-grained categories and uses theEMAParameters.

loss function$\mathcal{L}_\text{NN}$ Consistency of fine-grained predictions between neighboring samples is ensured. However, consistent predictions can also be ambiguous, which can prevent the formation of adequate fine-grained categories. Therefore, by minimizing fine-grained predictions and target distribution$q$ The cross-entropy between can encourage more confident assignment of samples to fine-grained categories:

\[\begin{equation} \label{eq:conf_loss} \mathcal{L}_\text{conf}(\theta|\mathbf{M}, \mathcal{D}) = \frac{1}{N}\sum_{(\mathbf{x}, y_c) \, \in \, \mathcal{D}} \text{CE}(q_{\theta_\text{EMA}}(\mathbf{x}, y_c), f_\theta(\mathbf{x})). \end{equation} \]

The fine-grained target distribution $ q $ uses the information from the coarse-grained label $ y_c $ to optimize the distribution for each fine-grained category. Use the category relation $ \mathbf {M} $ and the parameter $ \theta_\text{EMA} $ to define the target distribution $ q $ as follows:

\[\begin{equation} \label{eq:q_target} q_{\theta_\text{EMA}}(\mathbf{x},y_c) := \begin{cases} \frac{\exp(\mathbf{s}^{y_\text{f}} / T)}{Z}, & \text{if } \mathbf{M}_{y_\text{f},y_c} = 1\\ 0, & \text{otherwise}, \end{cases} \end{equation} \]

Among them.$T$ is a scalar temperature hyperparameter.$\mathbf{s}$ denotes logistic regression for fine-grained classifiers. Scalar$Z$ is a normalization constant defined as$Z=\sum_{i=1}^{K_F} \mathbf{M}_{i, y_c} \exp( \mathbf{s}^i / T )$ 。

Target distribution of introductions$q$ and nearest neighbor fine-grained prediction can be considered as a form of pseudo-labeling, as shown in Figure1(left) shown. Combining the loss function$\mathcal{L}_\text{NN}$ and the loss function$\mathcal{L}_\text{conf}$ Combined as a joint loss in fine-grained prediction$\mathcal{L}_\text{fine}$ 。

\[\begin{equation} \label{eq:cons} \mathcal{L}_\text{fine}(\theta|\mathbf{M},\mathcal{D}) = \mathcal{L}_\text{NN}(\theta|\mathcal{D}) + \mathcal{L}_\text{conf}(\theta|\mathbf{M},\mathcal{D}) \end{equation} \]

Regularization

In order to avoid degenerate solutions, further by introducing the maximum entropy loss function$\mathcal{L}_\text{reg}$ to stabilize the training, this loss function is often used in clustering related tasks.

\[\begin{equation} \label{eq:reg_ent} \mathcal{L}_\text{reg}(\theta|\mathcal{D}) = \ln K_F + \sum_{i=1}^{K_F} \overline{\mathbf{p}}_\text{f}^i \ln \overline{\mathbf{p}}_\text{f}^i, \,\, \overline{\mathbf{p}}_\text{f} = \frac{1}{N} \sum_{\mathbf{x} \in \mathcal{D}} f_\theta(\mathbf{x}). \end{equation} \]

loss function$\mathcal{L}_\text{reg}$ helps to avoid degenerate solutions that assign all samples to the same fine-grained category.

Total loss of the fine-grained classifier

Putting it all together.FALCONOptimize the following objectives to train a fine-grained classifier:

\[\begin{equation} \label{eq:final_cls} \underset{\theta \, \in \, \mathbb{R}^d}{\text{min}} \left\{ \mathcal{L}(\theta|\mathbf{M}, \mathcal{D}) = \lambda_1 \mathcal{L}_\text{coarse} + \lambda_2 \mathcal{L}_\text{fine} + \lambda_3 \mathcal{L}_\text{reg} \right\}, \end{equation} \]

Among them.$\lambda_1, \lambda_2$ cap (a poem)$\lambda_3$ are modulation hyperparameters. After using the prediction results of the fine-grained classifiers, theFALCONLearn the relationship between fine-grained and coarse-grained categories.

Inferring Class Relationships

Given a fine-grained classifier$f_\theta$ , optimization requires discrete optimization of all possible class relations to find the optimal$\mathbf{M}$ . The main difficulty is that the objective function is both on the$\mathbf{M}$ of nonlinear functions, and due to the huge dataset size$N$ （ $K_C < K_F \ll N$ ) and are difficult to evaluate. However, discrete optimization solvers require multiple evaluations of the objective function and are only applicable to specific classes of problems, such as linear objective functions. In order to overcome the above problems, theFALCONAn approximation to the objective function is used, which enables efficient inference of category relations.

Approximated coarse-grained supervision

First fix the parameters of the fine-grained classifier$\theta$ , and will be the loss of coarse-grained labels$\mathcal{L}_\text{coarse}$ Re-expressed in matrix form:

\[\begin{equation} \label{eq:cls_matrix_form} \mathcal{L}_\text{coarse}(\mathbf{M}|\theta, \mathcal{D}) = - \frac{1}{N} \text{tr}(\mathbf{Y}_{oh}^T \ln(\mathbf{P}\mathbf{M})), \end{equation} \]

Among them.$\mathbf{Y}_{oh} \in \{0, 1\}^{N\times K_C}$ is a program that represents coarse-grained labels asone-hotmatrix of vectors, and the$\mathbf{P} \in [0, 1]^{N\times K_F}$ is a matrix that aggregates fine-grained predictions into rows. The logarithmic operation is performed element-by-element, while the$\text{tr}(\cdot)$ is the trace operator (sum of matrix diagonals).

To overcome the challenges in the discussion, the use of Taylor expansion for loss$\mathcal{L}_\text{coarse}$ make approximations and reformulate them in a computationally efficient way:

\[\begin{equation} \label{eq:linear_coarse_cls} \mathcal{L}_\text{coarse}^\text{lin}(\mathbf{M}|\theta, D) = - \frac{1}{N} \text{tr}(\mathbf{Y}_{oh}^T\mathbf{P} \mathbf{M}). \end{equation} \]

cost matrix$\mathbf{C} = \mathbf{Y}_{oh}^T \mathbf{P} \in \mathbb{R}^{K_C \times K_F}_+$ effectively encodes the strength of connections between coarse and fine classes, and each cost matrix element$\mathbf{C}_{ij}$ congruent class (of integers modulo n)$j$ Assigned to a subcategory$i$ of the sample size is proportional. Thus, the optimal solution of the above formula retains only the strongest connections between coarse and fine classes. Note that the new objective can be evaluated more efficiently than the original objective because the matrix$\mathbf{Y}_{oh}^T\mathbf{P}$ It can be pre-calculated.

Regularization

computational objective$\mathcal{L}_\text{coarse}^\text{lin}$ of the optimal solution may lead to a severely unbalanced allocation of fine-grained classes among coarse-grained classes. Therefore, an additional regularization term is introduced to penalize the bias in the allocation of fine-grained classes among coarse-grained classes:

\[\begin{equation} \label{eq:M_bal} \mathcal{L}_\text{bal}(\mathbf{M}) = \frac{1}{K_C} \text{tr}(\mathbf{M}^T\boldsymbol{1}_{K_F}\boldsymbol{1}_{K_F}^T\mathbf{M}) - \frac{K_F^2}{K_C^2}, \end{equation} \]

Among them.$\boldsymbol{1}_{K_F}$ indicate$K_F$ dimensionality1of the column vectors. Therefore, the$\mathbf{M}^T\boldsymbol{1}_{K_F}$ an$K_C$ Dimensional vector whose values correspond to the number of fine-grained classes associated with each coarse-grained class. Constant$K_F^2/K_C^2$ Corrected the loss so that it would be zero in the case of a balanced allocation.

Total loss for inferring class relationships

FALCONRecover the relationship between fine-grained and coarse-grained classes by solving the following optimization problem$\mathbf{M}$ ：

\[\begin{equation} \label{eq:objective_M} \underset{\mathbf{M} \, \in \, \mathcal{M}}{\text{min}} \left\{ \mathcal{L}(\mathbf{M}|\theta, \mathcal{D}) = \mathcal{L}_\text{coarse}^\text{lin}(\mathbf{M}|\theta, D) + \lambda_M \mathcal{L}_\text{bal}(\mathbf{M}) \right\}, \end{equation} \]

Among them.$\lambda_M$ is a hyperparameter that controls the$\mathcal{L}_\text{bal}$ The impact of the Collections$\mathcal{M}$ Contains all possible category relationships:

\[\begin{align} \mathcal{M} = \{& \mathbf{M} \in \{0, 1\}^{K_F \times K_C} \, |\, \nonumber \\ & \mathbf{M}\boldsymbol{1}_{K_C} = \boldsymbol{1}_{K_F}, \mathbf{M}^T\boldsymbol{1}_{K_F} \geq \boldsymbol{1}_{K_C} \}. \end{align} \]

The optimization formulation is essentially an integer quadratic programming problem with linear constraints, which involves only the$K_F\cdot K_C$ optimization of individual binary variables. Thus, even though the resulting problem is essentiallyNP-hards, the solution can also be computed quickly using modern hardware. Experiments have shown that theFALCONcan be applied to real datasets containing hundreds of fine-grained categories.

Training on Multiple Datasets

Fine-grained categories can be grouped into coarse-grained categories in different ways. For example, they can be grouped into coarse-grained categories based on dietary habits (carnivores vs. omnivores), body size (small vs. large), or taxonomy (Canis lupustogether withCanis familiaris) to group animals. As a result, datasets often have different labels, despite aggregating instances of the same fine-grained categories.FALCONcan be seamlessly applied to training on multiple datasets with different coarse-grained labels.

Specifically, set$\mathcal{D}_l = \{(\mathbf{x}^i, y_c^i)\}_{i=1}^{N_l}$ is a dataset in which$\mathbf{x}^i \in \mathcal{X}$ ， $y_c^i \in \mathcal{Y}_C^l$ but (not)$\mathcal{Y}_C^l$ is a collection of dataset-specific coarse-grained categories. Assume that each dataset$\mathcal{D}_l$ samples can all be associated with a shared set of fine-grained categories$\mathcal{Y}_F$ The fine-grained category associations from the$D$ The samples from the individual datasets are combined into a combined dataset$\mathcal{D}_\text{all}$ ：

\[\begin{equation} \mathcal{D}_\text{all} = \cup_{l=1}^D \{ (\mathbf{x}, y, l) \, | \, (\mathbf{x}, y) \in \mathcal{D}_l \}. \end{equation} \]

$\mathcal{D}_\text{all}$ Each data point in is a ternary consisting of an input, a coarse-grained label, and an index to the sample's source dataset, which is modeled by the$D$ A dataset-specific mapping$\mathbf{M}_l$ to expand:

\[\begin{equation} \mathcal{L}_\text{coarse}(\theta, \mathbf{M}_1, \dots, \mathbf{M}_D | \mathcal{D}_\text{all}) =\\ \frac{1}{|\mathcal{D}_\text{all}|}\sum_{(\mathbf{x}, y_c, l) \in \mathcal{D}_\text{all}} \text{CE}(\mathbf{M}_l^Tf_\theta(\mathbf{x}), y_c) . \end{equation} \]

Therefore, integrating multiple datasets intoFALCONThe framework only needs to infer the$D$ Category-specific relationships for each dataset$M_l$ . Similar to the case of a single dataset, theFALCONBy solving the formula14infer dataset-specific category relationships. All$D$ The individual discrete optimization problems are independent of each other and can be solved in parallel.

Experimental Setup

Datasets & Metrics

Datasets

Evaluated on eight image classification datasetsFALCONIncludesLiving17、Nonliving26、Entity30、Entity13、tieredImageNet、CIFAR100、CIFAR-SIcap (a poem)CIFAR68Dataset. DatasetLiving17、Nonliving26、Entity30cap (a poem)Entity13From: (in email header)BREEDSBenchmarking.

insofar astieredImageNetdataset, combining the trained, validated, and tested classification systems into a single system containing the$608$ A number of fine-grained categories and$34$ A single dataset of coarse-grained categories.
insofar asCIFAR100dataset, using a dataset with a$20$ The number of coarse-grained categories and$100$ The raw tags for the individual fine-grained categories. The rawCIFAR100In the dataset, each coarse-grained category has the same number of fine-grained categories, and the same number of samples in each fine-grained category. Therefore, two additional unbalanced versions of theCIFAR100The dataset, namedCIFAR68cap (a poem)CIFAR-SIDataset.
existCIFAR68dataset is removed from the original dataset in the case of the$32$ individual fine-grained categories to unbalance the number of fine-grained categories within the coarse-grained categories.
existCIFAR-SIcase of the dataset, removing from each fine-grained category up to$70\%$ of the sample, in effect leading to an unbalanced sample distribution.

In addition, in order to demonstrateFALCONhas broad applicability, considering single cells from the biological fieldRNAsequencing dataset. The sequencing dataset is not available in any of the data sets from theCOVID-19collected from the patient's blood samplesPBMCThe dataset was evaluated onFALCON.. The task is to categorize cells into fine-grained cell subtypes given a coarse-grained cell type. The method is evaluated based on the true cell subtype corresponding to the fine-grained label.PBMCThe dataset is extremely unbalanced (Gini coefficient greater than0.5). Performance evaluation under transductivity settings was performed on single-cell data.

An overview of all considered datasets is given in Table1Shown. AbbreviationsL17in the name ofLiving17，N26in the name ofNonliving26，E30in the name ofEntity30，E13in the name ofEntity13，C100in the name ofCIFAR100，C68in the name ofCIFAR68，CSIin the name ofCIFARSample imbalance.tINin the name oftieredImageNet，PBin the name ofPBMC。

Metrics

Trained without fine-grained baseline labeling of theFALCONand baseline models. Therefore, fine-grained clustering accuracy is reported as an evaluation metric.

\[\begin{equation} \text{Acc} = \underset{p \in \mathcal{P}(\mathcal{Y}_\text{f})}{\max} \frac{1}{|D|} \sum_{i=1}^{|D|} \, 【 y_\text{f}^i = p(\hat{y}^i_\text{f})】. \end{equation} \]

Here.$\mathcal{P}(\mathcal{Y}_\text{f})$ is a permutation of all fine-grained category labels. In practice, this metric can be computed efficiently using the Hungarian algorithm. In addition, we report the results of adjusting the Rand index (ARI). As a result ofFALCONAlso learning about category relations, we use graph editing distances (GED) reports the difference between the learned labeling relationships and the ground truth graph. The graph edit distance calculates the number of nodes and edges that must be added or removed to make it match the target graph.

Baselines

Since there are no methods specifically designed for fine-grained category discovery under coarse-grained supervision, it will beFALCONComparisons are made with methods that can be applied to this setting, including clustering adapted for fine-grained category discovery and cross-grained few-sample methods.

SCANis a deep clustering method that is directly applied to fine-grained category discovery by clustering data. However, theSCANInformation about coarse-grained categories cannot be utilized during training. Therefore, an additional prediction by enforcing consistent predictions between neighbors in the same coarse-grained category for theSCANImprovements were made. This improvement allows theSCANable to utilize coarse-grained supervision. We refer to this benchmark method asSCAN-C。

The paper further compares cross-granularity less-sample learning methods as a baseline.ANCORis a cross-granularity few-sample learning method that learns the fine-grained representation space. Therefore, K-mean clustering is run on the extracted features to recover fine-grained predictions, using the same method to adapt theSNCA。SCGMis a sample less learning method that can be directly applied to fine-grained category discovery as it provides fine-grained predictions.

The paper also joinsGEORGE, which optimizes the coarse classification objective by distributional robustness.GEORGEOnly the fine-grained representation space is learned, so we run the K-means algorithm again to recover the fine-grained predictions.

Finally, empirical risk minimization can be performed by performing fine-grained labeling (ERM) to determine the upper limit of performance.

Implementation Details

For people fromCIFARSmall size images of the dataset usingResNet18as the backbone network and for the remaining five image datasets using theResNet50as the backbone network. Using a self-supervised pre-training methodMoCoV3For all methods (i.e.FALCONand all baseline methods) are initialized. During training, all parameters of the model are updated. The weak enhancement of the inputs is combined with the$\theta_\text{EMA}$ Pairing will strongly enhance the relationship between$\theta$ Pairing. Use the distance between self-supervised feature representations to retrieve nearest neighbors. The distance between theOptunahit the nail on the headTPEalgorithmic response toCIFAR100dataset for hyperparametric search. A hyperparametric search is performed using theGurobiSolving discrete optimization problems.

In the case of single-cell data, the use of data with a4A linear layer andReLURandom initialization of activationMLP. By calculating the pre2kThe distance of a highly variable gene to retrieve the nearest neighbor.

Experimental Evaluation

If this article is helpful to you, please click a like or in the look at it ～～
For more content, please pay attention to WeChat public number [Xiaofei's Algorithm Engineering Notes].

work-life balance.

FALCON: Breaking Boundaries, Unsupervised Fine-Grained Category Inference for Coarse-Grained Labels, Open Source| ICML'24

Problem setup

Parameterizing the Fine-grained Class Discovery

Training Fine-grained Classifier

Consistent and confident fine-grained predictions

Regularization

Total loss of the fine-grained classifier

Inferring Class Relationships

Approximated coarse-grained supervision

Regularization

Total loss for inferring class relationships

Training on Multiple Datasets

Datasets & Metrics

Datasets

Metrics

Baselines

Implementation Details