Advancing Parameter Efficiency in Fine-tuning via Representation Editing

Authors: Muling Wu, Wenhao Liu, Xiaohua Wang, Tianlong Li, Changze Lv, Zixuan Ling, Jianhao Zhu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

What

This paper introduces Representation EDiting (RED), a novel parameter-efficient fine-tuning (PEFT) method that scales and biases representations at each layer of a pre-trained language model to adapt it to downstream tasks.

Why

The paper addresses the limitations of existing PEFT methods in terms of hyperparameter selection and parameter efficiency. It proposes RED as a more efficient and effective alternative to fine-tune large language models, reducing the number of trainable parameters significantly while achieving comparable or superior performance.

How

The authors evaluate RED on a variety of language models (RoBERTa, GPT-2, T5, Llama-2) and NLP tasks (GLUE benchmark, E2E NLG Challenge, UltraFeedback, Open LLM Leaderboard, AlpacaEval, MT-Bench). They compare RED against several baselines, including full fine-tuning, Adapter, LoRA, BitFit, and Prompt Tuning. Ablation studies were conducted to analyze the impact of different components of RED, such as the type and position of ‘edit vectors’.

Result

RED consistently achieves comparable or better performance than other PEFT methods while using significantly fewer trainable parameters. For instance, RED requires 25,700 times fewer parameters than full fine-tuning and 32 times fewer than LoRA on Llama-2 7B while achieving comparable or even better results across different benchmarks. Ablation studies show that both scaling and bias vectors contribute to RED’s performance, and editing representations after the FFN sub-layer is the most effective strategy.

LF

The authors acknowledge that RED’s application in other modalities like computer vision and speech recognition needs further investigation. They plan to explore RED in few-shot learning scenarios to enhance its data efficiency.

Abstract

Parameter Efficient Fine-Tuning (PEFT) has gained significant attention for its ability to achieve competitive results while updating only a small subset of trainable parameters. Despite the promising performance of current PEFT methods, they present challenges in hyperparameter selection, such as determining the rank of LoRA or Adapter, or specifying the length of soft prompts. In addressing these challenges, we propose a novel approach to fine-tuning neural models, termed Representation EDiting (RED), which scales and biases the representation produced at each layer. RED substantially reduces the number of trainable parameters by a factor of compared to full parameter fine-tuning, and by a factor of compared to LoRA. Remarkably, RED achieves comparable or superior results to full parameter fine-tuning and other PEFT methods. Extensive experiments were conducted across models of varying architectures and scales, including RoBERTa, GPT-2, T5, and Llama-2, and the results demonstrate the efficiency and efficacy of RED, positioning it as a promising PEFT approach for large neural models.