Spiking-Diffusion: Vector Quantized Discrete Diffusion Model with Spiking Neural Networks
Authors: Mingxuan Liu, Jie Gan, Rui Wen, Tao Li, Yongli Chen, Hong Chen
What
This paper introduces Spiking-Diffusion, a novel generative model for image generation that utilizes spiking neural networks (SNNs) to achieve both energy efficiency and biological plausibility.
Why
This paper is significant because it is the first to successfully implement a diffusion model entirely using SNN layers, opening up new possibilities for energy-efficient and brain-inspired image generation. Previous SNN-based generative models faced limitations in quality and capacity, making this a notable advancement in the field.
How
The authors develop Spiking-Diffusion in two stages: 1) VQ-SVAE: They create a Vector Quantized Spiking Variational Autoencoder to learn discrete latent representations of images. This involves encoding image features using spike firing rate (SFR) and postsynaptic potential (PSP), and designing an adaptive spike generator (ASG) to convert embeddings back into spike trains for the decoder. 2) SDID: They employ a Spiking Diffusion Image Decoder trained on the discrete latent space. They utilize an absorbing state diffusion process, gradually masking the discrete image representation, and the SDID learns to reverse this process, effectively denoising the image.
Result
Spiking-Diffusion outperforms the current state-of-the-art SNN-based generative model (FSVAE) on various image datasets, including MNIST, FMNIST, KMNIST, Letters, and Cifar10. It demonstrates lower reconstruction error (MSE, SSIM) and better-generated image quality (FID, KID).
LF
The paper acknowledges the need to explore the training of larger-scale SNN generative models in future work. This suggests scaling up the model and exploring more complex datasets to further validate and improve Spiking-Diffusion’s capabilities.
Abstract
Spiking neural networks (SNNs) have tremendous potential for energy-efficient neuromorphic chips due to their binary and event-driven architecture. SNNs have been primarily used in classification tasks, but limited exploration on image generation tasks. To fill the gap, we propose a Spiking-Diffusion model, which is based on the vector quantized discrete diffusion model. First, we develop a vector quantized variational autoencoder with SNNs (VQ-SVAE) to learn a discrete latent space for images. In VQ-SVAE, image features are encoded using both the spike firing rate and postsynaptic potential, and an adaptive spike generator is designed to restore embedding features in the form of spike trains. Next, we perform absorbing state diffusion in the discrete latent space and construct a spiking diffusion image decoder (SDID) with SNNs to denoise the image. Our work is the first to build the diffusion model entirely from SNN layers. Experimental results on MNIST, FMNIST, KMNIST, Letters, and Cifar10 demonstrate that Spiking-Diffusion outperforms the existing SNN-based generation model. We achieve FIDs of 37.50, 91.98, 59.23, 67.41, and 120.5 on the above datasets respectively, with reductions of 58.60%, 18.75%, 64.51%, 29.75%, and 44.88% in FIDs compared with the state-of-art work. Our code will be available at \url{https://github.com/Arktis2022/Spiking-Diffusion}.