Context-Aware Meta-Learning
Authors: Christopher Fifty, Dennis Duan, Ronald G. Junkins, Ehsan Amid, Jure Leskovec, Christopher Re, Sebastian Thrun
What
This paper introduces Context-Aware Meta-Learning (CAML), a novel meta-learning algorithm for few-shot image classification that draws inspiration from in-context learning in Large Language Models (LLMs) to learn new visual concepts during inference without fine-tuning.
Why
This paper is important because it addresses the limitations of existing visual meta-learning algorithms that are either slow due to fine-tuning requirements or exhibit poor generalization to unseen tasks. The proposed CAML method offers a promising solution for real-time and generalizable few-shot image classification, potentially unlocking new applications in computer vision similar to the advancements in natural language processing enabled by in-context learning in LLMs.
How
The authors propose a novel meta-learning algorithm, CAML, that leverages a frozen pre-trained feature extractor, an Equal Length and Maximally Equiangular Set (ELMES) class encoder, and a non-causal sequence model. The method encodes images and labels, forming a sequence that is processed by the non-causal sequence model to predict the query image’s label. CAML is pre-trained on diverse few-shot image classification tasks, avoiding the need for meta-training or fine-tuning during inference. The authors theoretically demonstrate that using an ELMES class encoder maximizes the model’s ability to identify classes within the support set. They evaluate CAML on 11 few-shot image classification benchmarks, comparing its performance against existing meta-learning methods in a universal setting.
Result
CAML achieves state-of-the-art performance in universal meta-learning, outperforming other baselines on 14 out of 22 evaluation settings. Remarkably, it performs comparably to P>M>F, the current best meta-learning algorithm, on 8 out of 11 benchmarks, even though P>M>F is meta-trained on the specific benchmark datasets. This suggests that visual in-context learning during inference can be as effective as meta-training on in-domain data. The paper also provides analysis showing CAML’s capability to dynamically update representations based on the query and support set context, enabling it to perform well on diverse tasks.
LF
The paper acknowledges limitations in handling highly out-of-distribution images and varying image resolutions. Future work could focus on improving robustness in these areas. Additionally, the current implementation requires knowing the maximum number of classes during pre-training. Exploring methods to overcome this limitation and enable more flexible class handling during inference would be beneficial.
Abstract
Large Language Models like ChatGPT demonstrate a remarkable capacity to learn new concepts during inference without any fine-tuning. However, visual models trained to detect new objects during inference have been unable to replicate this ability, and instead either perform poorly or require meta-training and/or fine-tuning on similar objects. In this work, we propose a meta-learning algorithm that emulates Large Language Models by learning new visual concepts during inference without fine-tuning. Our approach leverages a frozen pre-trained feature extractor, and analogous to in-context learning, recasts visual meta-learning as sequence modeling over datapoints with known labels and a test datapoint with an unknown label. On 8 out of 11 meta-learning benchmarks, our approach — without meta-training or fine-tuning — exceeds or matches the state-of-the-art algorithm, P>M>F, which is meta-trained on these benchmarks. Our code is available at https://github.com/cfifty/CAML.