🪴 Quartz 4.0
Search
Search
Search
Dark mode
Light mode
Explorer
A General Theoretical Paradigm to Understand Learning from Human Preferences
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
A Picture is Worth More Than 77 Text Tokens Evaluating CLIP-Style Models on Dense Captions
A Review of Adversarial Attacks in Computer Vision
A Survey on Personalized Content Synthesis with Diffusion Models
A Survey on Vision Mamba Models, Applications and Challenges
A Training-Free Plug-and-Play Watermark Framework for Stable Diffusion
Accelerating the Global Aggregation of Local Explanations
ACT-Diffusion Efficient Adversarial Consistency Training for One-step Diffusion Models
Advancing Parameter Efficiency in Fine-tuning via Representation Editing
Adversarial Diffusion Distillation
AEROBLADE Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
ALIP Adaptive Language-Image Pre-training with Synthetic Caption
AltDiffusion A Multilingual Text-to-Image Diffusion Model
An Image is Worth Multiple Words Learning Object Level Concepts using Multi-Concept Prompt Learning
An Image is Worth Multiple Words Multi-attribute Inversion for Constrained Text-to-Image Synthesis
Analysis of Classifier-Free Guidance Weight Schedulers
Any-Size-Diffusion Toward Efficient Text-Driven Synthesis for Any-Size HD Images
APLA Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
Asymmetry in Low-Rank Adapters of Foundation Models
Attention Calibration for Disentangled Text-to-Image Personalization
Backdooring Textual Inversion for Concept Censorship
Benchmarking the Robustness of Image Watermarks
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
CAD Photorealistic 3D Generation via Adversarial Distillation
Can MLLMs Perform Text-to-Image In-Context Learning
Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
CAT Contrastive Adapter Training for Personalized Image Generation
CatLIP CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
CoDeF Content Deformation Fields for Temporally Consistent Video Processing
Compositional Generative Modeling A Single Model is Not All You Need
Compositional Text-to-Image Generation with Dense Blob Representations
Concept Sliders LoRA Adaptors for Precise Control in Diffusion Models
Concept Weaver Enabling Multi-Concept Fusion in Text-to-Image Models
Connecting NeRFs, Images, and Text
Consolidating Attention Features for Multi-view Image Editing
Context-Aware Meta-Learning
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
Controllable Image Generation With Composed Parallel Token Prediction
Could It Be Generated Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Cross-Image Attention for Zero-Shot Appearance Transfer
Customizing Text-to-Image Models with a Single Image Pair
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
DemoFusion Democratising High-Resolution Image Generation With No $$$
Demystifying CLIP Data
DiffHarmony Latent Diffusion Model Meets Image Harmonization
DiffiT Diffusion Vision Transformers for Image Generation
DiffMorpher Unleashing the Capability of Diffusion Models for Image Morphing
Diffusion Model Alignment Using Direct Preference Optimization
Diffusion Model as Representation Learner
Diffusion Model with Perceptual Loss
DiffusionLight Light Probes for Free by Painting a Chrome Ball
Direct Consistency Optimization for Compositional Text-to-Image Personalization
Direct Inversion Boosting Diffusion-based Editing with 3 Lines of Code
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
Distilling Diffusion Models into Conditional GANs
DragNUWA Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
DreamPropeller Supercharge Text-to-3D Generation with Parallel Sampling
DUAW Data-free Universal Adversarial Watermark against Stable Diffusion Customization
Dynamic Prompt Optimizing for Text-to-Image Generation
Dynamic Typography Bringing Text to Life via Video Diffusion Prior
Edit One for All Interactive Batch Image Editing
Editing Massive Concepts in Text-to-Image Diffusion Models
ELLA Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Elucidating the Exposure Bias in Diffusion Models
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
Espresso Robust Concept Filtering in Text-to-Image Models
Evolutionary Optimization of Model Merging Recipes
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs) A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Exponentially Faster Language Modelling
Eyes Wide Shut Exploring the Visual Shortcomings of Multimodal LLMs
FaceStudio Put Your Face Everywhere in Seconds
FIFO-Diffusion Generating Infinite Videos from Text without Training
FIND A Function Description Benchmark for Evaluating Interpretability Methods
Finding Visual Task Vectors
Fine-tuning CLIP Text Encoders with Two-step Paraphrasing
First Tragedy, then Parse History Repeats Itself in the New Era of Large Language Models
FouriScale A Frequency Perspective on Training-Free High-Resolution Image Synthesis
FreeU Free Lunch in Diffusion U-Net
Future Lens Anticipating Subsequent Tokens from a Single Hidden State
Generative Escher Meshes
Generative Image Dynamics
Generative Multimodal Models are In-Context Learners
GIVT Generative Infinite-Vocabulary Transformers
GLoD Composing Global Contexts and Local Details in Image Generation
Graph Neural Networks for Learning Equivariant Representations of Neural Networks
Griffin Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
High-fidelity Person-centric Subject-to-Image Synthesis
HiPA Enabling One-Step Text-to-Image Diffusion Models via High-Frequency-Promoting Adaptation
Idempotent Generative Network
Implicit Style-Content Separation using B-LoRA
Improving Adversarial Attacks on Latent Diffusion Model
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Inf-DiT Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
InstantID Zero-shot Identity-Preserving Generation in Seconds
Instruct Me More Random Prompting for Visual In-Context Learning
Interpreting CLIP's Image Representation via Text-Based Decomposition
Inversion-by-Inversion Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training
Iterated Learning Improves Compositionality in Large Vision-Language Models
KAN Kolmogorov-Arnold Networks
Kandinsky an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Large Language Models A Survey
Lazy Diffusion Transformer for Interactive Image Editing
LCM-Lookahead for Encoder-based Text-to-Image Personalization
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Lego Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
Linearity of Relation Decoding in Transformer Language Models
LLM2Vec Large Language Models Are Secretly Powerful Text Encoders
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Localizing and Editing Knowledge in Text-to-Image Generative Models
LocInv Localization-aware Inversion for Text-Guided Image Editing
Long-CLIP Unlocking the Long-Text Capability of CLIP
LoRA+ Efficient Low Rank Adaptation of Large Models
LP++ A Surprisingly Strong Linear Probe for Few-Shot CLIP
Lumiere A Space-Time Diffusion Model for Video Generation
MagicTime Time-lapse Video Generation Models as Metamorphic Simulators
Make a Cheap Scaling A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
MAS Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion
Mask-ControlNet Higher-Quality Image Generation with An Additional Mask Prompt
MasterWeaver Taming Editability and Identity for Personalized Text-to-Image Generation
MetaCloak Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
Mismatch Quest Visual and Textual Feedback for Image-Text Misalignment
Mitigate Replication and Copying in Diffusion Models with Generalized Caption and Dual Fusion Enhancement
Mixture-of-Depths Dynamically allocating compute in transformer-based language models
Model Inversion Attack via Dynamic Memory Learning
Model Lakes
MoEController Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers
MoMA Multimodal LLM Adapter for Fast Personalized Image Generation
MVDream Multi-view Diffusion for 3D Generation
MyVLM Personalizing VLMs for User-Specific Queries
NEFTune Noisy Embeddings Improve Instruction Finetuning
NeuroPrompts An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
No Token Left Behind Efficient Vision Transformer via Dynamic Token Idling
Object Recognition as Next Token Prediction
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding
On Mechanistic Knowledge Localization in Text-to-Image Generative Models
On Model Explanations with Transferable Neural Pathways
On the Language Encoder of Contrastive Cross-modal Models
On the Scalability of Diffusion-based Text-to-Image Generation
One-step Diffusion with Distribution Matching Distillation
ORPO Monolithic Preference Optimization without Reference Model
Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models A Critical Review and Assessment
PEA-Diffusion Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
Perspectives on the State and Future of Deep Learning - 2023
PhotoVerse Tuning-Free Image Customization with Text-to-Image Diffusion Models
PixArt-$α$ Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
PixArt-Σ Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Privacy Backdoors Enhancing Membership Inference through Poisoning Pre-trained Models
Probing the 3D Awareness of Visual Foundation Models
Prompt Switch Efficient CLIP Adaptation for Text-Video Retrieval
Quality Diversity through Human Feedback
Ranni Taming Text-to-Image Diffusion for Accurate Instruction Following
Recovering the Pre-Fine-Tuning Weights of Generative Models
ReFT Representation Finetuning for Language Models
Reinforcement Learning for Generative AI A Survey
ReNoise Real Image Inversion Through Iterative Noising
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
Return of Unconditional Generation A Self-supervised Representation Generation Method
Reward Guided Latent Consistency Distillation
RL for Consistency Models Faster Reward Guided Text-to-Image Generation
RLIPv2 Fast Scaling of Relational Language-Image Pre-training
Robust Concept Erasure Using Task Vectors
Scalable Extraction of Training Data from (Production) Language Models
ScaleCrafter Tuning-free Higher-Resolution Visual Generation with Diffusion Models
Score Distillation Sampling with Learned Manifold Corrective
SDXL-Lightning Progressive Adversarial Diffusion Distillation
SDXS Real-Time One-Step Latent Diffusion Models with Image Conditions
Self-correcting LLM-controlled Diffusion Models
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
Self-Rewarding Language Models
Sequential Modeling Enables Scalable Learning for Large Vision Models
Sherpa3D Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
SiT Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
Smooth Diffusion Crafting Smooth Latent Spaces in Diffusion Models
Sora A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
Speculative Streaming Fast LLM Inference without Auxiliary Models
Spiking-Diffusion Vector Quantized Discrete Diffusion Model with Spiking Neural Networks
Stable Video Diffusion Scaling Latent Video Diffusion Models to Large Datasets
State of the Art on Diffusion Models for Visual Computing
Stealing Part of a Production Language Model
Style Aligned Image Generation via Shared Attention
StyleDiffusion Controllable Disentangled Style Transfer via Diffusion Models
Stylus Automatic Adapter Selection for Diffusion Models
SwapAnything Enabling Arbitrary Object Swapping in Personalized Visual Editing
SwiftBrush One-Step Text-to-Image Diffusion Model with Variational Score Distillation
Testing Language Model Agents Safely in the Wild
TextCraftor Your Text Encoder Can be Image Quality Controller
The Chosen One Consistent Characters in Text-to-Image Diffusion Models
The Expressive Power of Low-Rank Adaptation
The Platonic Representation Hypothesis
The Truth is in There Improving Reasoning in Language Models with Layer-Selective Rank Reduction
TinyCLIP CLIP Distillation via Affinity Mimicking and Weight Inheritance
To Generate or Not Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
Toward effective protection against diffusion based mimicry through score distillation
Training Neural Networks from Scratch with Parallel Low-Rank Adapters
Transparent Image Layer Diffusion using Latent Transparency
TTD Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Tutorial on Diffusion Models for Imaging and Vision
U-DiTs Downsample Tokens in U-Shaped Diffusion Transformers
UFOGen You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Unified Concept Editing in Diffusion Models
UniFL Improve Stable Diffusion via Unified Feedback Learning
Using Captum to Explain Generative Language Models
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
V* Guided Visual Search as a Core Mechanism in Multimodal LLMs
Variational Schrödinger Diffusion Models
Video Diffusion Models A Survey
VideoBooth Diffusion-based Video Generation with Image Prompts
View Selection for 3D Captioning via Diffusion Ranking
Viewpoint Textual Inversion Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models
Vision Mamba A Comprehensive Survey and Taxonomy
Vision-Language Models as a Source of Rewards
Visual Fact Checker Enabling High-Fidelity Detailed Caption Generation
Watch Your Steps Local Image and Scene Editing by Text Instructions
West-of-N Synthetic Preference Generation for Improved Reward Modeling
What do we learn from inverting CLIP models
When Do We Not Need Larger Vision Models
WWW A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts
xLSTM Extended Long Short-Term Memory
You Only Sample Once Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs
Your Student is Better Than Expected Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
ZipLoRA Any Subject in Any Style by Effectively Merging LoRAs
Home
❯
tags
❯
Tag: diffusion_model
Tag: diffusion_model
198 items with this tag.
Jun 18, 2024
UniFL Improve Stable Diffusion via Unified Feedback Learning
diffusion_model
feedback_learning
acceleration
aesthetic
quality
inference
text-to-image
perceptual_loss
adversarial_training
Jun 18, 2024
A Training-Free Plug-and-Play Watermark Framework for Stable Diffusion
diffusion_model
stable-diffusion
watermarking
training-free
plug-and-play
aigc
image_generation
robustness
latent_space
Jun 18, 2024
MoMA Multimodal LLM Adapter for Fast Personalized Image Generation
diffusion_model
mllm
personalization
image_generation
open-vocabulary
tuning-free
image-to-image
self-attention
Jun 18, 2024
SwapAnything Enabling Arbitrary Object Swapping in Personalized Visual Editing
diffusion_model
image_editing
object_swapping
personalized_editing
appearance_adaptation
context_preservation
text-based_editing
Jun 18, 2024
Finding Visual Task Vectors
diffusion_model
visual_prompting
in-context_learning
analysis
task_vectors
zero-shot
mae
vqgan
attention
reinforce
Jun 18, 2024
LLM2Vec Large Language Models Are Secretly Powerful Text Encoders
diffusion_model
llm
analysis
text_embedding
contrastive_learning
Jun 18, 2024
DiffHarmony Latent Diffusion Model Meets Image Harmonization
diffusion_model
image_harmonization
stable_diffusion
image_generation
refinement
vae
image_distortion
high_resolution
Jun 18, 2024
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
diffusion_model
llm
analysis
video
vqa
interpretability
Jun 18, 2024
CAT Contrastive Adapter Training for Personalized Image Generation
diffusion_model
adapter
lora
dreambooth
personalization
image_generation
contrastive_learning
knowledge_preservation
Jun 18, 2024
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
diffusion_model
image_generation
classifier-free_guidance
sampling
fid
imagenet
stable-diffusion-xl
Jun 18, 2024
View Selection for 3D Captioning via Diffusion Ranking
diffusion_model
llm
3d
captioning
hallucination
view_selection
dataset
objaverse
gpt4-vision
visual_question_answering
Jun 18, 2024
Connecting NeRFs, Images, and Text
nerf
diffusion_model
gan
analysis
3d
multimodal
retrieval
zero-shot
representation_learning
Jun 18, 2024
Probing the 3D Awareness of Visual Foundation Models
3d
analysis
depth_estimation
surface_normal
correspondence
vision_transformer
diffusion_model
self_supervised_learning
vision_language_model
Jun 18, 2024
Dynamic Typography Bringing Text to Life via Video Diffusion Prior
diffusion_model
animation
text-to-video
kinetic-typography
svg
interpretability
Jun 18, 2024
Lazy Diffusion Transformer for Interactive Image Editing
diffusion_model
transformer
inpainting
image_editing
interactive
context_encoding
latent_space
efficiency
poisson_blending
Jun 18, 2024
Analysis of Classifier-Free Guidance Weight Schedulers
diffusion_model
cfg
analysis
image_generation
text-to-image
fid
inception_score
clip-score
diversity
Jun 18, 2024
GLoD Composing Global Contexts and Local Details in Image Generation
diffusion_model
text-to-image-generation
controllable-image-synthesis
global-context
local-detail
layer-composition
training-free
Jun 18, 2024
CatLIP CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
diffusion_model
analysis
image_classification
multi-label_classification
semantic_segmentation
object_detection
weakly_supervised_learning
pre-training
vision_transformer
data_efficiency
web-scale_data
Jun 18, 2024
A Survey on Vision Mamba Models, Applications and Challenges
diffusion_model
llm
analysis
literature_review
3d
motion
video
interpretability
vision_transformer
state_space_model
Jun 18, 2024
Stylus Automatic Adapter Selection for Diffusion Models
diffusion_model
adapter
llm
analysis
image_generation
retrieval
lora
Jun 18, 2024
Espresso Robust Concept Filtering in Text-to-Image Models
diffusion_model
tti
clip
analysis
adversarial_attack
interpretability
robustness
concept_filtering
safety
Jun 18, 2024
Visual Fact Checker Enabling High-Fidelity Detailed Caption Generation
diffusion_model
llm
captioning
2d
3d
hallucination
vqa
object_detection
analysis
Jun 18, 2024
KAN Kolmogorov-Arnold Networks
diffusion_model
analysis
interpretability
neural_scaling_law
pde
scientific_discovery
symbolic_regression
Jun 18, 2024
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
diffusion_model
reward
drtune
stable-diffusion
image_generation
optimization
deep_learning
text-to-image
Jun 18, 2024
On Mechanistic Knowledge Localization in Text-to-Image Generative Models
diffusion_model
analysis
interpretability
text-to-image
model_editing
Jun 18, 2024
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
diffusion_model
text-to-image
image_synthesis
subject-driven
classifier-free_guidance
dreambooth
textual_inversion
Jun 18, 2024
LocInv Localization-aware Inversion for Text-Guided Image Editing
diffusion_model
image_editing
text-guided
cross-attention
localization
segmentation
bounding_box
stable_diffusion
attribute_editing
word-swap
Jun 18, 2024
Customizing Text-to-Image Models with a Single Image Pair
diffusion_model
gan
customization
style_transfer
image_generation
lora
orthogonal
disentanglement
style_guidance
Jun 18, 2024
U-DiTs Downsample Tokens in U-Shaped Diffusion Transformers
diffusion_model
transformer
u-net
image_generation
latent_space
self-attention
downsampling
computational_efficiency
Jun 18, 2024
Video Diffusion Models A Survey
diffusion_model
video
generation
editing
survey
temporal_dynamics
latent_diffusion_model
unet
attention_mechanism
transformer
Jun 18, 2024
Inf-DiT Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
diffusion_model
transformer
super-resolution
image_generation
ultra-high-resolution
memory_efficient
uniba
inf-dit
clip
Jun 18, 2024
Variational Schrödinger Diffusion Models
diffusion_model
optimal_transport
variational_inference
stochastic_approximation
schrodinger_bridge
simulation-free
image_generation
time_series_forecasting
Jun 18, 2024
A Survey on Personalized Content Synthesis with Diffusion Models
diffusion_model
personalized_content_synthesis
image_generation
optimization
learning_based
attention_mechanism
mask-guided
data_augmentation
regularization
object_generation
face_synthesis
style_personalization
video
3d
Jun 18, 2024
MasterWeaver Taming Editability and Identity for Personalized Text-to-Image Generation
diffusion_model
personalized_text-to_image_generation
identity_preservation
editability
face_editing
cross_attention
Jun 18, 2024
Could It Be Generated Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models
diffusion_model
memorization
analysis
text-to-image
security
privacy
copyright
inversion
Jun 18, 2024
Distilling Diffusion Models into Conditional GANs
diffusion_model
gan
distillation
image_generation
text-to-image
perceptual_loss
latent_space
one-step_generation
inference_speed
Jun 18, 2024
Controllable Image Generation With Composed Parallel Token Prediction
diffusion_model
gan
vq-vae
vq-gan
analysis
image_generation
compositionality
discrete_models
parallel_token_prediction
controllable_generation
Jun 18, 2024
Compositional Text-to-Image Generation with Dense Blob Representations
diffusion_model
llm
compositional-image-generation
layout-guided-generation
blob-representation
masked-cross-attention
zero-shot-generation
image-editing
Jun 18, 2024
FIFO-Diffusion Generating Infinite Videos from Text without Training
diffusion_model
video
text-to-video
long_video_generation
analysis
Jun 18, 2024
Recovering the Pre-Fine-Tuning Weights of Generative Models
diffusion_model
llm
analysis
adversarial_attack
interpretability
lora
fine-tuning
model_security
weight_recovery
Jun 18, 2024
Make a Cheap Scaling A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
diffusion_model
image_generation
video_generation
high_resolution
adaptation
efficiency
self-cascade
Jun 18, 2024
Speculative Streaming Fast LLM Inference without Auxiliary Models
llm
diffusion_model
analysis
speculative_decoding
inference
resource_constrained
Jun 18, 2024
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
diffusion_model
llm
hallucination
alignment
vllm
image_captioning
reasoning
preference_tuning
dpo
Jun 18, 2024
Direct Consistency Optimization for Compositional Text-to-Image Personalization
diffusion_model
t2i
personalization
fine-tuning
compositionality
image_generation
dreambooth
lora
consistency
reward_guidance
Jun 18, 2024
LoRA+ Efficient Low Rank Adaptation of Large Models
diffusion_model
llm
analysis
finetuning
lora
optimization
Jun 18, 2024
SDXL-Lightning Progressive Adversarial Diffusion Distillation
diffusion_model
gan
distillation
text-to-image
adversarial_training
image_generation
sdxl
Jun 18, 2024
Consolidating Attention Features for Multi-view Image Editing
diffusion_model
nerf
3d
multi-view
image_editing
geometric_editing
consistency
self-attention
Jun 18, 2024
Training Neural Networks from Scratch with Parallel Low-Rank Adapters
diffusion_model
llm
analysis
3d
motion
video
interpretability
Jun 18, 2024
Transparent Image Layer Diffusion using Latent Transparency
diffusion_model
transparent_image_generation
layered_content_generation
latent_space
human-in-the-loop
image_synthesis
Jun 18, 2024
Sora A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
diffusion_model
llm
analysis
video
sora
Jun 18, 2024
PixArt-Σ Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
diffusion_model
dit
t2i
4k
text-to-image
high-resolution
efficient-training
token-compression
weak-to-strong-training
Jun 18, 2024
ELLA Equip Diffusion Models with LLM for Enhanced Semantic Alignment
diffusion_model
llm
text-to-image
semantic_alignment
dense_prompt
timestep-aware
benchmark
analysis
Jun 18, 2024
ORPO Monolithic Preference Optimization without Reference Model
diffusion_model
llm
analysis
preference_alignment
sft
instruction_following
rlhf
dpo
Jun 18, 2024
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
diffusion_model
text-to-image
language_model
vision_model
lora
adapter
image_generation
semantic_understanding
image_quality
Jun 18, 2024
Reward Guided Latent Consistency Distillation
diffusion_model
consistency_distillation
text-to-image
reward_model
image_generation
inference_acceleration
latent_space
Jun 18, 2024
Graph Neural Networks for Learning Equivariant Representations of Neural Networks
diffusion_model
gan
analysis
3d
interpretability
neural_network
graph_neural_network
transformer
representation_learning
permutation_symmetry
implicit_neural_representation
generalization
learning_to_optimize
Jun 18, 2024
You Only Sample Once Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs
diffusion_model
gan
image_synthesis
one-step_generation
text-to-image
lora
self-cooperative_learning
latent_perceptual_loss
latent_discriminator
Jun 18, 2024
FouriScale A Frequency Perspective on Training-Free High-Resolution Image Synthesis
diffusion_model
image_synthesis
high_resolution
training-free
frequency_domain
convolutional_neural_networks
generative_models
Jun 18, 2024
When Do We Not Need Larger Vision Models
diffusion_model
llm
analysis
3d
motion
video
interpretability
Jun 18, 2024
Evolutionary Optimization of Model Merging Recipes
diffusion_model
llm
analysis
evolutionary_algorithm
model_merging
japanese
multi-modal
vlm
Jun 18, 2024
Editing Massive Concepts in Text-to-Image Diffusion Models
diffusion_model
concept_editing
text-to-image
model_editing
large_scale
interpretability
Jun 18, 2024
Implicit Style-Content Separation using B-LoRA
diffusion_model
lora
image_stylization
style_transfer
text_guided_image_editing
analysis
sdxl
Jun 18, 2024
MyVLM Personalizing VLMs for User-Specific Queries
diffusion_model
llm
analysis
personalization
image_captioning
visual_question_answering
referring_expression_comprehension
Jun 18, 2024
ReNoise Real Image Inversion Through Iterative Noising
diffusion_model
image_editing
inversion
few-step_models
analysis
ddim
sdxl-turbo
lcm
Jun 18, 2024
Long-CLIP Unlocking the Long-Text Capability of CLIP
diffusion_model
clip
analysis
image_retrieval
text-to-image_generation
interpretability
Jun 18, 2024
SDXS Real-Time One-Step Latent Diffusion Models with Image Conditions
diffusion_model
knowledge_distillation
one-step_training
real-time_inference
text-to-image
image-to-image
controlnet
latency_optimization
Jun 18, 2024
Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
diffusion_model
guidance
self-attention
unconditional_generation
image_restoration
controlnet
sample_quality
pag
Jun 18, 2024
Improving Text-to-Image Consistency via Automatic Prompt Optimization
diffusion_model
llm
analysis
text-to-image
interpretability
prompt_engineering
consistency
Jun 18, 2024
Tutorial on Diffusion Models for Imaging and Vision
diffusion_model
vae
ddpm
smld
sde
analysis
tutorial
image_generation
denoising
Jun 18, 2024
Attention Calibration for Disentangled Text-to-Image Personalization
diffusion_model
image_generation
personalization
attention_mechanism
disentanglement
text-to-image
inpainting
lora
Jun 18, 2024
TextCraftor Your Text Encoder Can be Image Quality Controller
diffusion_model
text-to-image
image_generation
text_encoder
fine-tuning
reward_function
controllable_generation
Jun 18, 2024
Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
diffusion_model
text-to-image-generation
prompt-reformulation
analysis
log-analysis
Jun 18, 2024
TTD Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
diffusion_model
clip
analysis
segmentation
open-vocabulary
image-text-alignment
self-distillation
bias
Jun 18, 2024
Iterated Learning Improves Compositionality in Large Vision-Language Models
diffusion_model
llm
analysis
interpretability
Jun 18, 2024
Mixture-of-Depths Dynamically allocating compute in transformer-based language models
diffusion_model
llm
analysis
conditional_computation
transformer
efficiency
mixture-of-experts
routing
autoregressive_sampling
long-term_memory
Jun 18, 2024
LP++ A Surprisingly Strong Linear Probe for Few-Shot CLIP
diffusion_model
llm
analysis
few-shot-learning
clip
optimization
black-box
linear_probe
image_classification
Jun 18, 2024
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
diffusion_model
cross-attention
inference
efficiency
analysis
text-to-image
Jun 18, 2024
On the Scalability of Diffusion-based Text-to-Image Generation
diffusion_model
gan
analysis
text-to-image
unet
transformer
scaling_law
dataset
caption
efficiency
Jun 18, 2024
ReFT Representation Finetuning for Language Models
diffusion_model
llm
analysis
interpretability
Jun 18, 2024
LCM-Lookahead for Encoder-based Text-to-Image Personalization
diffusion_model
personalization
face_generation
lcm
analysis
attention_mechanism
image_generation
Jun 18, 2024
Robust Concept Erasure Using Task Vectors
diffusion_model
gan
adversarial_attack
interpretability
text-to-image
concept_erasure
safety
Jun 18, 2024
RL for Consistency Models Faster Reward Guided Text-to-Image Generation
diffusion_model
consistency_model
rl
text-to-image
inference
optimization
aesthetic
image_generation
Jun 18, 2024
Concept Weaver Enabling Multi-Concept Fusion in Text-to-Image Models
diffusion_model
text-to-image
image_generation
multi-concept
personalization
concept-fusion
lora
clip
Jun 18, 2024
Dynamic Prompt Optimizing for Text-to-Image Generation
diffusion_model
text-to-image
prompt_engineering
reinforcement_learning
aesthetic_quality
semantic_consistency
user_preference
Jun 18, 2024
MagicTime Time-lapse Video Generation Models as Metamorphic Simulators
diffusion_model
video
generation
time-lapse
metamorphic
physics
dataset
magictime
chronomagic
Jun 18, 2024
Mask-ControlNet Higher-Quality Image Generation with An Additional Mask Prompt
diffusion_model
image_generation
object_reconstruction
mask
controllability
foreground-background
fidelity
Jun 18, 2024
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
diffusion_model
text-to-image
cfg
semantic_segmentation
attention_map
image_quality
Jun 18, 2024
Exponentially Faster Language Modelling
llm
bert
diffusion_model
analysis
performance
optimization
conditional_computation
Jun 18, 2024
An Image is Worth Multiple Words Multi-attribute Inversion for Constrained Text-to-Image Synthesis
diffusion_model
inversion
text-to-image
attribute-guided
disentanglement
analysis
image_synthesis
reference_image
Jun 18, 2024
Concept Sliders LoRA Adaptors for Precise Control in Diffusion Models
diffusion_model
lora
analysis
image_editing
gan
stylegan
interpretability
3d
concept_sliders
Jun 18, 2024
NeuroPrompts An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
diffusion_model
prompt_engineering
text-to-image
image_generation
aesthetic_quality
constrained_decoding
reinforcement_learning
ppo
neurologic
stable_diffusion
pickscore
Jun 18, 2024
Toward effective protection against diffusion based mimicry through score distillation
diffusion_model
ldm
adversarial_attack
image_protection
mimicry
sds
semantic_loss
Jun 18, 2024
Diffusion Model Alignment Using Direct Preference Optimization
diffusion_model
dpo
alignment
human_preference
image_generation
ai_feedback
stable_diffusion
sdxl
Jun 18, 2024
MetaCloak Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
diffusion_model
gan
adversarial_attack
interpretability
data_protection
privacy
dreambooth
poisoning_attack
Jun 18, 2024
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
diffusion_model
rlhf
dpo
image_generation
human_feedback
image_quality
safety
prompt-image_alignment
Jun 18, 2024
ZipLoRA Any Subject in Any Style by Effectively Merging LoRAs
diffusion_model
lora
stylization
personalization
image_generation
text-to-image
sdxl
dreambooth
styledrop
Jun 18, 2024
Lego Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
diffusion_model
textual_inversion
concept_learning
image_generation
disentanglement
contrastive_learning
Jun 18, 2024
ACT-Diffusion Efficient Adversarial Consistency Training for One-step Diffusion Models
diffusion_model
gan
image_generation
consistency_training
adversarial_training
fast_sampling
resource_efficiency
Jun 18, 2024
Stable Video Diffusion Scaling Latent Video Diffusion Models to Large Datasets
diffusion_model
video
text-to-video
image-to-video
3d
motion
multi-view
data_curation
Jun 18, 2024
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
diffusion_model
text-to-image
reinforcement_learning
lora
text-image_alignment
image_quality
gpt-4v
face_generation
hand_generation
Jun 18, 2024
Self-correcting LLM-controlled Diffusion Models
diffusion_model
llm
image_generation
image_editing
object_detection
self-correction
closed-loop
Jun 18, 2024
DemoFusion Democratising High-Resolution Image Generation With No $$$
diffusion_model
image_generation
high_resolution
sdxl
progressive_upscaling
skip_residual
dilated_sampling
Jun 18, 2024
Ranni Taming Text-to-Image Diffusion for Accurate Instruction Following
diffusion_model
llm
text-to-image
controllable_generation
semantic_panel
interactive_editing
chat-based_generation
Jun 18, 2024
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
diffusion_model
motion
video
text-to-video
motion_transfer
zero-shot
Jun 18, 2024
Adversarial Diffusion Distillation
diffusion_model
gan
distillation
image_generation
real-time
adversarial_training
score_distillation
Jun 18, 2024
DreamPropeller Supercharge Text-to-3D Generation with Parallel Sampling
diffusion_model
3d
acceleration
score_distillation
nerf
gaussian_splatting
sds
vsd
Jun 18, 2024
PEA-Diffusion Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
diffusion_model
language_transfer
knowledge_distillation
multilingual
text-to-image
culture-specific
adapter
parameter-efficient
Jun 18, 2024
HiPA Enabling One-Step Text-to-Image Diffusion Models via High-Frequency-Promoting Adaptation
diffusion_model
text-to-image
one-step-generation
high-frequency
parameter-efficient
low-rank-adaptation
image-editing
inpainting
super-resolution
Jun 18, 2024
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
diffusion_model
image_editing
text-guided_synthesis
contrastive_learning
structure_preservation
latent_diffusion_model
nerf
zero-shot
unsupervised_learning
Jun 18, 2024
One-step Diffusion with Distribution Matching Distillation
diffusion_model
distillation
image_generation
text-to-image
one-step
kl_divergence
score_matching
Jun 18, 2024
VideoBooth Diffusion-based Video Generation with Image Prompts
diffusion_model
video
generation
image_prompt
customized_content_creation
attention_mechanism
Jun 18, 2024
Sequential Modeling Enables Scalable Learning for Large Vision Models
diffusion_model
llm
analysis
3d
motion
video
interpretability
Jun 18, 2024
GIVT Generative Infinite-Vocabulary Transformers
diffusion_model
gan
vae
transformer
image_generation
representation_learning
panoptic_segmentation
depth_estimation
gmm
classifier-free_guidance
Jun 18, 2024
Style Aligned Image Generation via Shared Attention
diffusion_model
style_transfer
image_generation
attention_mechanism
text-to-image
zero-shot
consistency
adain
controlnet
multidiffusion
Jun 18, 2024
DiffiT Diffusion Vision Transformers for Image Generation
diffusion_model
vit
image_generation
tmsa
self-attention
latent_space
image_space
fid
parameter_efficiency
Jun 18, 2024
FaceStudio Put Your Face Everywhere in Seconds
diffusion_model
image_synthesis
identity_preserving
hybrid_guidance
text-to-image
multi-identity
tuning-free
face_recognition
novel_view_synthesis
Jun 18, 2024
Return of Unconditional Generation A Self-supervised Representation Generation Method
diffusion_model
gan
unconditional-generation
self-supervised-representation
image-generation
representation-learning
Jun 18, 2024
Smooth Diffusion Crafting Smooth Latent Spaces in Diffusion Models
diffusion_model
text-to-image
latent_space
smoothness
image_interpolation
image_inversion
image_editing
lora
Jun 18, 2024
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
diffusion_model
llm
analysis
3d
video
interpretability
Jun 18, 2024
SwiftBrush One-Step Text-to-Image Diffusion Model with Variational Score Distillation
diffusion_model
distillation
text-to-image
one-step-generation
image-free
gan
nerf
sds
vsd
lora
Jun 18, 2024
Sherpa3D Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
diffusion_model
3d
text-to-3d
multi-view-consistency
generative_model
score_distillation_sampling
Jun 18, 2024
CAD Photorealistic 3D Generation via Adversarial Distillation
diffusion_model
gan
3d
single-view-reconstruction
photorealistic
adversarial_distillation
Jun 18, 2024
DiffMorpher Unleashing the Capability of Diffusion Models for Image Morphing
diffusion_model
image_morphing
lora
attention_mechanism
smooth_interpolation
stable-diffusion
ddim
adain
Jun 18, 2024
A Picture is Worth More Than 77 Text Tokens Evaluating CLIP-Style Models on Dense Captions
diffusion_model
llm
analysis
3d
adversarial_attack
interpretability
Jun 18, 2024
DiffusionLight Light Probes for Free by Painting a Chrome Ball
diffusion_model
light_estimation
hdr
inpainting
lora
environment_map
generalization
in-the-wild
Jun 18, 2024
Vision-Language Models as a Source of Rewards
diffusion_model
llm
rl
vision-language-model
reward-function
clip
playhouse
androidenv
prompt-engineering
Jun 18, 2024
Your Student is Better Than Expected Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
diffusion_model
gan
analysis
image_generation
knowledge_distillation
text-to-image
Jun 18, 2024
Generative Multimodal Models are In-Context Learners
diffusion_model
llm
analysis
3d
motion
video
interpretability
Jun 18, 2024
V* Guided Visual Search as a Core Mechanism in Multimodal LLMs
diffusion_model
llm
analysis
3d
video
interpretability
Jun 18, 2024
Diffusion Model with Perceptual Loss
diffusion_model
perceptual_loss
image_generation
unconditional_generation
classifier-free_guidance
Jun 18, 2024
Score Distillation Sampling with Learned Manifold Corrective
diffusion_model
analysis
image_synthesis
image_editing
3d
text-to-3d
optimization
loss_function
denoising
Jun 18, 2024
Eyes Wide Shut Exploring the Visual Shortcomings of Multimodal LLMs
diffusion_model
llm
analysis
benchmark
visual-grounding
clip
self-supervised-learning
multimodal
vision-and-language
representation-learning
Jun 18, 2024
InstantID Zero-shot Identity-Preserving Generation in Seconds
diffusion_model
identity_preserving
image_generation
face_embedding
controlnet
plug-and-play
single-shot
high-fidelity
image_synthesis
Jun 18, 2024
Benchmarking the Robustness of Image Watermarks
diffusion_model
watermark
analysis
adversarial_attack
benchmark
image_quality
robustness
Jun 18, 2024
SiT Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
diffusion_model
gan
interpolant
analysis
image_generation
transformer
sde
ode
Jun 18, 2024
Edit One for All Interactive Batch Image Editing
diffusion_model
gan
image_editing
stylegan
batch_processing
Jun 18, 2024
West-of-N Synthetic Preference Generation for Improved Reward Modeling
diffusion_model
llm
rlhf
preference_modeling
synthetic_data
self_training
best-of-n
reward_modeling
Jun 18, 2024
Lumiere A Space-Time Diffusion Model for Video Generation
diffusion_model
video
motion
text-to-video
video_generation
image-to-video
video_inpainting
stylized_generation
Jun 18, 2024
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding
diffusion_model
one-shot
fine-tuning
text-to-image
prototypical_embedding
object-driven
fidelity
generalization
Jun 18, 2024
AEROBLADE Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
diffusion_model
ldm
analysis
adversarial_attack
interpretability
detection
disinformation
autoencoder
Jun 18, 2024
Compositional Generative Modeling A Single Model is Not All You Need
generative_modeling
modularity
compositionality
ebm
diffusion_model
analysis
video
image
planning
Jun 18, 2024
Can MLLMs Perform Text-to-Image In-Context Learning
diffusion_model
llm
mllm
analysis
benchmark
dataset
image_generation
in-context-learning
multimodality
prompt_engineering
Jun 18, 2024
Inversion-by-Inversion Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training
diffusion_model
sde
sketch-to-photo
exemplar-based
image_synthesis
shape_control
appearance_control
energy_function
Jun 18, 2024
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
diffusion_model
analysis
multi-modal-learning
modality-competition
gradient-modulation
shapley-value
fusion-strategies
Jun 18, 2024
StyleDiffusion Controllable Disentangled Style Transfer via Diffusion Models
diffusion_model
style_transfer
disentanglement
clip
analysis
image_manipulation
photorealistic
multi-modal
Jun 18, 2024
CoDeF Content Deformation Fields for Temporally Consistent Video Processing
diffusion_model
video
motion
video_editing
representation_learning
temporal_consistency
Jun 18, 2024
DragNUWA Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
diffusion_model
video
motion
controllable_generation
trajectory
open-domain
Jun 18, 2024
ALIP Adaptive Language-Image Pre-training with Synthetic Caption
diffusion_model
llm
analysis
image-text-retrieval
contrastive_learning
pre-training
noise_alleviation
Jun 18, 2024
Watch Your Steps Local Image and Scene Editing by Text Instructions
diffusion_model
image_editing
3d
nerf
relevance_map
text-guided
scene_editing
localization
Jun 18, 2024
RLIPv2 Fast Scaling of Relational Language-Image Pre-training
diffusion_model
llm
analysis
3d
motion
video
interpretability
Jun 18, 2024
DUAW Data-free Universal Adversarial Watermark against Stable Diffusion Customization
diffusion_model
adversarial_watermark
copyright_protection
stable_diffusion
data-free
vae
llm
image_generation
Jun 18, 2024
AltDiffusion A Multilingual Text-to-Image Diffusion Model
diffusion_model
multilingual
text-to-image
culture-specific
knowledge_distillation
Jun 18, 2024
Spiking-Diffusion Vector Quantized Discrete Diffusion Model with Spiking Neural Networks
diffusion_model
gan
snn
image_generation
vq-vae
neuromorphic
energy_efficient
biological_plausibility
Jun 18, 2024
Backdooring Textual Inversion for Concept Censorship
diffusion_model
textual_inversion
backdoor_attack
concept_censorship
aigc
misinformation
ethics
Jun 18, 2024
Diffusion Model as Representation Learner
diffusion_model
representation_learning
knowledge_distillation
semantic_segmentation
image_classification
landmark_detection
reinforcement_learning
analysis
Jun 18, 2024
APLA Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency
diffusion_model
video
generation
t2v
consistency
transformer
adversarial_training
Jun 18, 2024
Reinforcement Learning for Generative AI A Survey
reinforcement_learning
generative_ai
survey
text_generation
code_generation
molecule_design
natural_language_processing
computer_vision
neural_architecture_search
diffusion_model
Jun 18, 2024
Unified Concept Editing in Diffusion Models
diffusion_model
gan
analysis
adversarial_attack
interpretability
debias
erasure
moderation
Jun 18, 2024
Elucidating the Exposure Bias in Diffusion Models
diffusion_model
exposure_bias
sampling
fid
analysis
training-free
image_generation
adm
ddim
ddpm
edm
ldm
dit
pfgm++
Jun 18, 2024
MVDream Multi-view Diffusion for 3D Generation
diffusion_model
3d
text-to-3d
multi-view
consistency
dreambooth
score-distillation-sampling
nerf
generative_model
Jun 18, 2024
Any-Size-Diffusion Toward Efficient Text-Driven Synthesis for Any-Size HD Images
diffusion_model
text-to-image
image_synthesis
super-resolution
compositionality
tiled_diffusion
Jun 18, 2024
FIND A Function Description Benchmark for Evaluating Interpretability Methods
diffusion_model
llm
analysis
interpretability
Jun 18, 2024
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis
diffusion_model
gan
text-to-image
image_synthesis
sparse_moe
attention
latent_space
open-vocabulary
Jun 18, 2024
MoEController Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers
diffusion_model
image_manipulation
llm
moe
controlnet
chatgpt
global_editing
local_editing
Jun 18, 2024
PhotoVerse Tuning-Free Image Customization with Text-to-Image Diffusion Models
diffusion_model
text-to-image
personalization
identity_preservation
fast_generation
single_image
dual-branch_conditioning
adapter
facial_identity_loss
image_editing
stylization
Jun 18, 2024
Mitigate Replication and Copying in Diffusion Models with Generalized Caption and Dual Fusion Enhancement
diffusion_model
privacy
data_replication
llm
caption_generation
generality
fusion
stable-diffusion
Jun 18, 2024
Generative Image Dynamics
diffusion_model
motion
video
analysis
3d
Jun 18, 2024
Viewpoint Textual Inversion Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models
diffusion_model
novel_view_synthesis
textual_inversion
3d
single-view
viewpoint_control
stable-diffusion
Jun 18, 2024
On Model Explanations with Transferable Neural Pathways
diffusion_model
analysis
interpretability
neural_pathway
Jun 18, 2024
FreeU Free Lunch in Diffusion U-Net
diffusion_model
u-net
image_generation
video_generation
sample_quality
denoising
freeu
Jun 18, 2024
TinyCLIP CLIP Distillation via Affinity Mimicking and Weight Inheritance
diffusion_model
llm
analysis
3d
video
interpretability
Jun 18, 2024
Generative Escher Meshes
diffusion_model
2d
tiling
mesh
generative
text-guided
ote
sds
Jun 18, 2024
Demystifying CLIP Data
diffusion_model
clip
analysis
data_curation
image_text
zero_shot
Jun 18, 2024
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
diffusion_model
reward_learning
fine-tuning
human_preference
aesthetic
lora
gradient_checkpointing
image_generation
adversarial_example
interpretability
Jun 18, 2024
PixArt-$α$ Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
diffusion_model
t2i
transformer
image_generation
efficient_training
llava
sam
controlnet
dreambooth
Jun 18, 2024
Direct Inversion Boosting Diffusion-based Editing with 3 Lines of Code
diffusion_model
image_editing
inversion
benchmark
content_preservation
edit_fidelity
Jun 18, 2024
Kandinsky an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
diffusion_model
text-to-image
image_generation
image_prior
latent_diffusion
movq
clip
fid
open-source
web_application
telegram_bot
Jun 18, 2024
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
diffusion_model
alignment
image_generation
reward_function
backpropagation
lora
gradient_checkpointing
text-to-image
human_evaluation
generalization
Jun 18, 2024
Improving Adversarial Attacks on Latent Diffusion Model
diffusion_model
adversarial_attack
interpretability
ldm
few-shot-generation
image_generation
Jun 18, 2024
No Token Left Behind Efficient Vision Transformer via Dynamic Token Idling
diffusion_model
llm
analysis
3d
motion
video
interpretability
Jun 18, 2024
NEFTune Noisy Embeddings Improve Instruction Finetuning
diffusion_model
llm
analysis
instruction_finetuning
overfitting
regularization
embedding
conversational_ai
Jun 18, 2024
Interpreting CLIP's Image Representation via Text-Based Decomposition
diffusion_model
llm
analysis
interpretability
attention
clip
vit
zero-shot-learning
segmentation
spurious-correlations
Jun 18, 2024
State of the Art on Diffusion Models for Visual Computing
diffusion_model
gan
analysis
literature_review
2d
3d
motion
video
4d
text-to-image
text-to-video
Jun 18, 2024
ScaleCrafter Tuning-free Higher-Resolution Visual Generation with Diffusion Models
diffusion_model
high_resolution
image_synthesis
re-dilation
convolution
perception_field
text-to-image
text-to-video
stable-diffusion
Jun 18, 2024
Context-Aware Meta-Learning
diffusion_model
llm
analysis
few-shot-learning
image-classification
meta-learning
in-context-learning
universal-meta-learning
Jun 18, 2024
To Generate or Not Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
diffusion_model
adversarial_attack
interpretability
safety
unlearning
machine_unlearning
robustness
text-to-image
image_generation
Jun 18, 2024
Quality Diversity through Human Feedback
diffusion_model
analysis
3d
motion
interpretability
quality_diversity
human_feedback
contrastive_learning
latent_space
image_generation
Jun 18, 2024
An Image is Worth Multiple Words Learning Object Level Concepts using Multi-Concept Prompt Learning
diffusion_model
textural_inversion
prompt_learning
multi-concept
object-level
attention_mechanism
contrastive_learning
image_generation
image_editing
disentanglement
Jun 18, 2024
On the Language Encoder of Contrastive Cross-modal Models
diffusion_model
analysis
llm
audio
video
Jun 18, 2024
Localizing and Editing Knowledge in Text-to-Image Generative Models
diffusion_model
analysis
interpretability
text-to-image
stable-diffusion
causal-mediation-analysis
model-editing
Jun 18, 2024
MAS Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion
diffusion_model
3d
motion
video
analysis
motion_generation
Jun 18, 2024
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
diffusion_model
llm
analysis
language_modeling
text_generation
Jun 18, 2024
Idempotent Generative Network
diffusion_model
gan
generative_model
idempotence
image_generation
latent_space
projection
out-of-distribution
Jun 18, 2024
Cross-Image Attention for Zero-Shot Appearance Transfer
diffusion_model
appearance_transfer
semantic_correspondence
zero-shot
image_manipulation
self-attention
denoising_diffusion_model
Jun 18, 2024
Instruct Me More Random Prompting for Visual In-Context Learning
diffusion_model
in-context-learning
visual-prompting
foreground-segmentation
object-detection
parameter-efficient-transfer-learning
domain-shift
mae-vqgan
Jun 18, 2024
UFOGen You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
diffusion_model
gan
text-to-image
one-step-generation
image-to-image
controllable-generation
Jun 18, 2024
The Chosen One Consistent Characters in Text-to-Image Diffusion Models
diffusion_model
consistent_character
personalization
text-to-image
clustering
analysis
user_study
sdxl
dinov2
Jun 18, 2024
High-fidelity Person-centric Subject-to-Image Synthesis
diffusion_model
image_generation
subject-driven
person-centric
saliency
collaborative_generation