ReFT: Representation Finetuning for Language Models
Authors: Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts
What
This paper introduces ReFT, a novel parameter-efficient fine-tuning method that modifies model representations through learned interventions, outperforming weight-based methods like LoRA in efficiency and achieving state-of-the-art performance on various NLP tasks.
Why
This paper is important because it challenges the prevailing focus on weight-based PEFTs, proposing a more efficient and interpretable approach by leveraging the rich semantic information encoded in model representations. This approach opens up new possibilities for controlling and understanding large language models.
How
The authors develop ReFT, a method that learns low-rank interventions on model representations, inspired by causal abstraction and distributed interchange interventions. They evaluate ReFT on four diverse NLP benchmarks, including commonsense reasoning, arithmetic reasoning, instruction-following, and natural language understanding, comparing its performance and efficiency against existing PEFT methods like LoRA, Adapters, and Prefix-tuning.
Result
ReFT significantly outperforms previous PEFT methods on commonsense reasoning, instruction-following, and natural language understanding benchmarks, achieving state-of-the-art results while using 10-50 times fewer parameters than LoRA. It also demonstrates strong performance on arithmetic reasoning tasks, surpassing Prefix-tuning. Furthermore, the paper explores the memorization capabilities of ReFT, showing that a single low-rank intervention can store a surprisingly large amount of information, and provides evidence for the superposition of token identities in model representations.
LF
The authors acknowledge limitations in terms of model diversity, primarily exploring LLaMA-family models. Future work could investigate ReFT’s effectiveness on other model families like Mistral or GPT. Further exploration of ReFT’s design space, including automating the hyperparameter search and developing more effective interventions for specific tasks like arithmetic reasoning, is also suggested. Additionally, the authors highlight the need for more robust evaluation practices in PEFT research, advocating for benchmarks that prevent test-set hill-climbing and allow for fair comparisons.
Abstract
Parameter-efficient fine-tuning (PEFT) methods seek to adapt large models via updates to a small number of weights. However, much prior interpretability work has shown that representations encode rich semantic information, suggesting that editing representations might be a more powerful alternative. Here, we pursue this hypothesis by developing a family of methods. ReFT methods operate on a frozen base model and learn task-specific interventions on hidden representations. We define a strong instance of the ReFT family, Low-rank Linear Subspace ReFT (LoReFT). LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs. We showcase LoReFT on eight commonsense reasoning tasks, four arithmetic reasoning tasks, Alpaca-Eval v1.0, and GLUE. In all these evaluations, LoReFT delivers the best balance of efficiency and performance, and almost always outperforms state-of-the-art PEFTs. We release a generic ReFT training library publicly at https://github.com/stanfordnlp/pyreft.