The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Authors: Pratyusha Sharma, Jordan T. Ash, Dipendra Misra
What
This paper introduces LAyer-SElective Rank reduction (LASER), a technique for improving the performance of Large Language Models (LLMs) by selectively removing higher-order components from weight matrices in specific layers.
Why
The paper is important because it challenges the conventional belief that larger models always perform better. It demonstrates a simple yet effective method to enhance LLM accuracy on various NLP and even reinforcement learning tasks without requiring additional training data or parameters.
How
The authors apply LASER by using Singular Value Decomposition (SVD) to identify and remove higher-order components from specific weight matrices of pre-trained LLMs. They experiment with different layers and reduction percentages, evaluating the impact on accuracy and other metrics across various datasets and LLM architectures.
Result
LASER significantly improves accuracy on several NLP tasks, especially those involving less frequent information in the training data. For instance, GPT-J’s accuracy on the CounterFact dataset increased from 13.3% to 24.1%. The technique also enhances robustness to paraphrases. Notably, LASER even benefits a Decision Transformer agent in a Sokoban environment, hinting at broader applicability beyond NLP.
LF
The authors acknowledge limitations and propose future work on: (1) understanding why higher-order components accumulate noisy answers during training, (2) investigating the effect of model architecture on LASER’s effectiveness, and (3) explaining the specific benefit of pruning later MLP layers. Further research is needed to explore alternative pruning methods and analyze the impact of LASER on language modeling and fluency in detail.
Abstract
Transformer-based Large Language Models (LLMs) have become a fixture in modern machine learning. Correspondingly, significant resources are allocated towards research that aims to further advance this technology, typically resulting in models of increasing size that are trained on increasing amounts of data. This work, however, demonstrates the surprising result that it is often possible to significantly improve the performance of LLMs by selectively removing higher-order components of their weight matrices. This simple intervention, which we call LAyer-SElective Rank reduction (LASER), can be done on a model after training has completed, and requires no additional parameters or data. We show extensive experiments demonstrating the generality of this finding across language models and datasets, and provide in-depth analyses offering insights into both when LASER is effective and the mechanism by which it operates.