Model Lakes

Authors: Koyena Pal, David Bau, Renée J. Miller

What

This paper introduces the concept of “model lakes” as a way to manage and understand the growing number of deep learning models, drawing parallels to data lakes in the data management field.

Why

This paper is important because it addresses the difficulty in finding, understanding, and comparing deep learning models due to the reliance on often incomplete or unreliable manual documentation. It proposes model lakes, inspired by data lakes, as a potential solution to these challenges.

How

This paper presents a vision paper that draws analogies from data management literature, particularly data lakes, and proposes a roadmap for future research in model management. It does not perform any experiments.

Result

The paper doesn’t have experimental results, being a vision paper. However, it proposes a model lake framework, outlines key challenges like content-based model search, related model search, documentation verification, data citation, provenance, version control, and discusses potential approaches inspired by solutions in data management for data lakes.

LF

The authors identify limitations in current model management practices, including reliance on incomplete metadata and manual documentation. They propose future work on content-based model search, automated documentation verification, data citation for models, model provenance tracking, and model version management, emphasizing the need for standardized benchmarks and evaluation metrics.

Abstract

Given a set of deep learning models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners rely on manually-written documentation to understand and choose models. However, not all models have complete and reliable documentation. As the number of machine learning models increases, this issue of finding, differentiating, and understanding models is becoming more crucial. Inspired from research on data lakes, we introduce and define the concept of model lakes. We discuss fundamental research challenges in the management of large models. And we discuss what principled data management techniques can be brought to bear on the study of large model management.