Reinforcement Learning for Generative AI: A Survey
Authors: Yuanjiang Cao, Quan Z. Sheng, Julian McAuley, Lina Yao
What
This paper presents a comprehensive survey of how reinforcement learning (RL) is used in generative AI, analyzing its benefits, challenges, and applications across various domains.
Why
This survey is important because it provides a structured overview of a rapidly developing field that bridges reinforcement learning and generative AI, offering insights for both newcomers and experienced researchers to understand current progress and future directions.
How
The authors reviewed a wide range of papers published in top conferences and journals, categorizing them based on how RL is used in generative tasks. They focused on applications involving sequential data generation, such as text, code, and molecules.
Result
The survey highlights that RL is beneficial for handling non-differentiable objectives, introducing new training signals, improving sampling in energy-based models, and automating neural architecture search. The authors also identify challenges like peaked distributions, exploration-exploitation trade-offs, sparse rewards, long-term credit assignment, and generalization.
LF
The paper points out several future research avenues, including reward function design for multi-objective optimization, model enhancement and control with RL, more sophisticated human preference modeling, addressing sample efficiency and generalization issues, incorporating novel RL algorithms, and understanding the implications of LLMs and foundation models.
Abstract
Deep Generative AI has been a long-standing essential topic in the machine learning community, which can impact a number of application areas like text generation and computer vision. The major paradigm to train a generative model is maximum likelihood estimation, which pushes the learner to capture and approximate the target data distribution by decreasing the divergence between the model distribution and the target distribution. This formulation successfully establishes the objective of generative tasks, while it is incapable of satisfying all the requirements that a user might expect from a generative model. Reinforcement learning, serving as a competitive option to inject new training signals by creating new objectives that exploit novel signals, has demonstrated its power and flexibility to incorporate human inductive bias from multiple angles, such as adversarial learning, hand-designed rules and learned reward model to build a performant model. Thereby, reinforcement learning has become a trending research field and has stretched the limits of generative AI in both model design and application. It is reasonable to summarize and conclude advances in recent years with a comprehensive review. Although there are surveys in different application areas recently, this survey aims to shed light on a high-level review that spans a range of application areas. We provide a rigorous taxonomy in this area and make sufficient coverage on various models and applications. Notably, we also surveyed the fast-developing large language model area. We conclude this survey by showing the potential directions that might tackle the limit of current models and expand the frontiers for generative AI.