Large Language Models: A Survey
Authors: Shervin Minaee, Tomas Mikolov, Narjes Nikzad, Meysam Chenaghlu, Richard Socher, Xavier Amatriain, Jianfeng Gao
What
This paper presents a survey of Large Language Models (LLMs), covering their evolution from early neural language models, prominent LLM families (GPT, LLaMA, PaLM), techniques for building and augmenting LLMs, popular datasets and benchmarks, and an overview of performance comparisons.
Why
This paper is important due to the rapid evolution and increasing influence of LLMs in various domains. It provides a comprehensive overview of LLM advancements, techniques, and challenges, serving as a valuable resource for researchers and practitioners seeking to understand and utilize LLMs effectively.
How
The paper conducts a literature review, summarizing key findings and advancements in the field of LLMs. It analyzes prominent LLM architectures, pre-training methods, fine-tuning and alignment techniques, and prompt engineering strategies. Additionally, it reviews popular datasets and benchmarks used for LLM evaluation, comparing the performance of notable models.
Result
The survey highlights the impressive performance and capabilities of LLMs across various NLP tasks, including commonsense reasoning, code generation, and question answering. It showcases the benefits of prompt engineering techniques like Chain of Thought (CoT), Retrieval Augmented Generation (RAG), and the use of external tools to augment LLM functionality. The paper also emphasizes the importance of addressing challenges like hallucination, ethical concerns, and the need for smaller and more efficient LLM models.
LF
The paper identifies several challenges and future research directions for LLMs, including the development of smaller and more efficient models, exploring new post-attention architectural paradigms, enhancing multi-modal capabilities, improving LLM usage and augmentation techniques, and addressing security and ethical concerns. It emphasizes the need for continued research in these areas to unlock the full potential of LLMs while mitigating their limitations.
Abstract
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs’ ability of general-purpose language understanding and generation is acquired by training billions of model’s parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.