Compositional Generative Modeling: A Single Model is Not All You Need

Authors: Yilun Du, Leslie Kaelbling

What

This paper argues for a compositional approach to generative modeling, proposing the construction of large generative systems by composing smaller generative models instead of relying solely on large monolithic models.

Why

This paper is important because it addresses limitations of current large generative models, such as poor compositionality, data inefficiency, and difficulty in adaptation. The proposed compositional approach offers a more scalable, data-efficient, and generalizable alternative.

How

The authors present a theoretical framework for compositional generative modeling and illustrate its benefits in various domains including image synthesis, trajectory modeling, and planning. They demonstrate how composing simpler models can represent complex distributions more effectively, generalize to unseen data regions, and enable the construction of new generative models for unseen tasks. They also discuss methods for discovering compositional components from data.

Result

The paper shows that compositional models are more data-efficient, generalize better to unseen data, and can be composed to solve new tasks. For example, composing models trained on different subsets of data allows for generating hybrid scenes with elements from each subset. Additionally, the paper demonstrates how compositional models can be used for planning, constraint satisfaction, and style adaptation in video generation.

LF

The paper acknowledges limitations in implementing compositional sampling with common generative model parameterizations and suggests using Energy-Based Models (EBMs) as a solution. Future work includes developing efficient methods for sampling from joint distributions, discovering compositional structures, and dynamically adapting the structure of generative models under distribution shift.

Abstract

Large monolithic generative models trained on massive amounts of data have become an increasingly dominant approach in AI research. In this paper, we argue that we should instead construct large generative systems by composing smaller generative models together. We show how such a compositional generative approach enables us to learn distributions in a more data-efficient manner, enabling generalization to parts of the data distribution unseen at training time. We further show how this enables us to program and construct new generative models for tasks completely unseen at training. Finally, we show that in many cases, we can discover separate compositional components from data.