🪴 Quartz 4.0
Search
Search
Search
Dark mode
Light mode
Explorer
A General Theoretical Paradigm to Understand Learning from Human Preferences
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
A Picture is Worth More Than 77 Text Tokens Evaluating CLIP-Style Models on Dense Captions
A Review of Adversarial Attacks in Computer Vision
A Survey on Personalized Content Synthesis with Diffusion Models
A Survey on Vision Mamba Models, Applications and Challenges
A Training-Free Plug-and-Play Watermark Framework for Stable Diffusion
Accelerating the Global Aggregation of Local Explanations
ACT-Diffusion Efficient Adversarial Consistency Training for One-step Diffusion Models
Advancing Parameter Efficiency in Fine-tuning via Representation Editing
Adversarial Diffusion Distillation
AEROBLADE Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
Aligning Text-to-Image Diffusion Models with Reward Backpropagation
ALIP Adaptive Language-Image Pre-training with Synthetic Caption
AltDiffusion A Multilingual Text-to-Image Diffusion Model
An Image is Worth Multiple Words Learning Object Level Concepts using Multi-Concept Prompt Learning
An Image is Worth Multiple Words Multi-attribute Inversion for Constrained Text-to-Image Synthesis
Analysis of Classifier-Free Guidance Weight Schedulers
Any-Size-Diffusion Toward Efficient Text-Driven Synthesis for Any-Size HD Images
APLA Additional Perturbation for Latent Noise with Adversarial Training Enables Consistency
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
Asymmetry in Low-Rank Adapters of Foundation Models
Attention Calibration for Disentangled Text-to-Image Personalization
Backdooring Textual Inversion for Concept Censorship
Benchmarking the Robustness of Image Watermarks
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
CAD Photorealistic 3D Generation via Adversarial Distillation
Can MLLMs Perform Text-to-Image In-Context Learning
Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
CAT Contrastive Adapter Training for Personalized Image Generation
CatLIP CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
CoDeF Content Deformation Fields for Temporally Consistent Video Processing
Compositional Generative Modeling A Single Model is Not All You Need
Compositional Text-to-Image Generation with Dense Blob Representations
Concept Sliders LoRA Adaptors for Precise Control in Diffusion Models
Concept Weaver Enabling Multi-Concept Fusion in Text-to-Image Models
Connecting NeRFs, Images, and Text
Consolidating Attention Features for Multi-view Image Editing
Context-Aware Meta-Learning
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
Controllable Image Generation With Composed Parallel Token Prediction
Could It Be Generated Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Cross-Image Attention for Zero-Shot Appearance Transfer
Customizing Text-to-Image Models with a Single Image Pair
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
DemoFusion Democratising High-Resolution Image Generation With No $$$
Demystifying CLIP Data
DiffHarmony Latent Diffusion Model Meets Image Harmonization
DiffiT Diffusion Vision Transformers for Image Generation
DiffMorpher Unleashing the Capability of Diffusion Models for Image Morphing
Diffusion Model Alignment Using Direct Preference Optimization
Diffusion Model as Representation Learner
Diffusion Model with Perceptual Loss
DiffusionLight Light Probes for Free by Painting a Chrome Ball
Direct Consistency Optimization for Compositional Text-to-Image Personalization
Direct Inversion Boosting Diffusion-based Editing with 3 Lines of Code
Directly Fine-Tuning Diffusion Models on Differentiable Rewards
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
Distilling Diffusion Models into Conditional GANs
DragNUWA Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory
DreamPropeller Supercharge Text-to-3D Generation with Parallel Sampling
DUAW Data-free Universal Adversarial Watermark against Stable Diffusion Customization
Dynamic Prompt Optimizing for Text-to-Image Generation
Dynamic Typography Bringing Text to Life via Video Diffusion Prior
Edit One for All Interactive Batch Image Editing
Editing Massive Concepts in Text-to-Image Diffusion Models
ELLA Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Elucidating the Exposure Bias in Diffusion Models
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
Espresso Robust Concept Filtering in Text-to-Image Models
Evolutionary Optimization of Model Merging Recipes
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs) A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Exponentially Faster Language Modelling
Eyes Wide Shut Exploring the Visual Shortcomings of Multimodal LLMs
FaceStudio Put Your Face Everywhere in Seconds
FIFO-Diffusion Generating Infinite Videos from Text without Training
FIND A Function Description Benchmark for Evaluating Interpretability Methods
Finding Visual Task Vectors
Fine-tuning CLIP Text Encoders with Two-step Paraphrasing
First Tragedy, then Parse History Repeats Itself in the New Era of Large Language Models
FouriScale A Frequency Perspective on Training-Free High-Resolution Image Synthesis
FreeU Free Lunch in Diffusion U-Net
Future Lens Anticipating Subsequent Tokens from a Single Hidden State
Generative Escher Meshes
Generative Image Dynamics
Generative Multimodal Models are In-Context Learners
GIVT Generative Infinite-Vocabulary Transformers
GLoD Composing Global Contexts and Local Details in Image Generation
Graph Neural Networks for Learning Equivariant Representations of Neural Networks
Griffin Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
High-fidelity Person-centric Subject-to-Image Synthesis
HiPA Enabling One-Step Text-to-Image Diffusion Models via High-Frequency-Promoting Adaptation
Idempotent Generative Network
Implicit Style-Content Separation using B-LoRA
Improving Adversarial Attacks on Latent Diffusion Model
Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Inf-DiT Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
InstantID Zero-shot Identity-Preserving Generation in Seconds
Instruct Me More Random Prompting for Visual In-Context Learning
Interpreting CLIP's Image Representation via Text-Based Decomposition
Inversion-by-Inversion Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training
Iterated Learning Improves Compositionality in Large Vision-Language Models
KAN Kolmogorov-Arnold Networks
Kandinsky an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Large Language Models A Survey
Lazy Diffusion Transformer for Interactive Image Editing
LCM-Lookahead for Encoder-based Text-to-Image Personalization
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Lego Learning to Disentangle and Invert Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
Linearity of Relation Decoding in Transformer Language Models
LLM2Vec Large Language Models Are Secretly Powerful Text Encoders
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Localizing and Editing Knowledge in Text-to-Image Generative Models
LocInv Localization-aware Inversion for Text-Guided Image Editing
Long-CLIP Unlocking the Long-Text Capability of CLIP
LoRA+ Efficient Low Rank Adaptation of Large Models
LP++ A Surprisingly Strong Linear Probe for Few-Shot CLIP
Lumiere A Space-Time Diffusion Model for Video Generation
MagicTime Time-lapse Video Generation Models as Metamorphic Simulators
Make a Cheap Scaling A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
MAS Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion
Mask-ControlNet Higher-Quality Image Generation with An Additional Mask Prompt
MasterWeaver Taming Editability and Identity for Personalized Text-to-Image Generation
MetaCloak Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
Mismatch Quest Visual and Textual Feedback for Image-Text Misalignment
Mitigate Replication and Copying in Diffusion Models with Generalized Caption and Dual Fusion Enhancement
Mixture-of-Depths Dynamically allocating compute in transformer-based language models
Model Inversion Attack via Dynamic Memory Learning
Model Lakes
MoEController Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers
MoMA Multimodal LLM Adapter for Fast Personalized Image Generation
MVDream Multi-view Diffusion for 3D Generation
MyVLM Personalizing VLMs for User-Specific Queries
NEFTune Noisy Embeddings Improve Instruction Finetuning
NeuroPrompts An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
No Token Left Behind Efficient Vision Transformer via Dynamic Token Idling
Object Recognition as Next Token Prediction
Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding
On Mechanistic Knowledge Localization in Text-to-Image Generative Models
On Model Explanations with Transferable Neural Pathways
On the Language Encoder of Contrastive Cross-modal Models
On the Scalability of Diffusion-based Text-to-Image Generation
One-step Diffusion with Distribution Matching Distillation
ORPO Monolithic Preference Optimization without Reference Model
Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models A Critical Review and Assessment
PEA-Diffusion Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
Perspectives on the State and Future of Deep Learning - 2023
PhotoVerse Tuning-Free Image Customization with Text-to-Image Diffusion Models
PixArt-$α$ Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
PixArt-Σ Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Privacy Backdoors Enhancing Membership Inference through Poisoning Pre-trained Models
Probing the 3D Awareness of Visual Foundation Models
Prompt Switch Efficient CLIP Adaptation for Text-Video Retrieval
Quality Diversity through Human Feedback
Ranni Taming Text-to-Image Diffusion for Accurate Instruction Following
Recovering the Pre-Fine-Tuning Weights of Generative Models
ReFT Representation Finetuning for Language Models
Reinforcement Learning for Generative AI A Survey
ReNoise Real Image Inversion Through Iterative Noising
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
Return of Unconditional Generation A Self-supervised Representation Generation Method
Reward Guided Latent Consistency Distillation
RL for Consistency Models Faster Reward Guided Text-to-Image Generation
RLIPv2 Fast Scaling of Relational Language-Image Pre-training
Robust Concept Erasure Using Task Vectors
Scalable Extraction of Training Data from (Production) Language Models
ScaleCrafter Tuning-free Higher-Resolution Visual Generation with Diffusion Models
Score Distillation Sampling with Learned Manifold Corrective
SDXL-Lightning Progressive Adversarial Diffusion Distillation
SDXS Real-Time One-Step Latent Diffusion Models with Image Conditions
Self-correcting LLM-controlled Diffusion Models
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
Self-Rewarding Language Models
Sequential Modeling Enables Scalable Learning for Large Vision Models
Sherpa3D Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
SiT Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
Smooth Diffusion Crafting Smooth Latent Spaces in Diffusion Models
Sora A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
Speculative Streaming Fast LLM Inference without Auxiliary Models
Spiking-Diffusion Vector Quantized Discrete Diffusion Model with Spiking Neural Networks
Stable Video Diffusion Scaling Latent Video Diffusion Models to Large Datasets
State of the Art on Diffusion Models for Visual Computing
Stealing Part of a Production Language Model
Style Aligned Image Generation via Shared Attention
StyleDiffusion Controllable Disentangled Style Transfer via Diffusion Models
Stylus Automatic Adapter Selection for Diffusion Models
SwapAnything Enabling Arbitrary Object Swapping in Personalized Visual Editing
SwiftBrush One-Step Text-to-Image Diffusion Model with Variational Score Distillation
Testing Language Model Agents Safely in the Wild
TextCraftor Your Text Encoder Can be Image Quality Controller
The Chosen One Consistent Characters in Text-to-Image Diffusion Models
The Expressive Power of Low-Rank Adaptation
The Platonic Representation Hypothesis
The Truth is in There Improving Reasoning in Language Models with Layer-Selective Rank Reduction
TinyCLIP CLIP Distillation via Affinity Mimicking and Weight Inheritance
To Generate or Not Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
Toward effective protection against diffusion based mimicry through score distillation
Training Neural Networks from Scratch with Parallel Low-Rank Adapters
Transparent Image Layer Diffusion using Latent Transparency
TTD Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Tutorial on Diffusion Models for Imaging and Vision
U-DiTs Downsample Tokens in U-Shaped Diffusion Transformers
UFOGen You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Unified Concept Editing in Diffusion Models
UniFL Improve Stable Diffusion via Unified Feedback Learning
Using Captum to Explain Generative Language Models
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
V* Guided Visual Search as a Core Mechanism in Multimodal LLMs
Variational Schrödinger Diffusion Models
Video Diffusion Models A Survey
VideoBooth Diffusion-based Video Generation with Image Prompts
View Selection for 3D Captioning via Diffusion Ranking
Viewpoint Textual Inversion Unleashing Novel View Synthesis with Pretrained 2D Diffusion Models
Vision Mamba A Comprehensive Survey and Taxonomy
Vision-Language Models as a Source of Rewards
Visual Fact Checker Enabling High-Fidelity Detailed Caption Generation
Watch Your Steps Local Image and Scene Editing by Text Instructions
West-of-N Synthetic Preference Generation for Improved Reward Modeling
What do we learn from inverting CLIP models
When Do We Not Need Larger Vision Models
WWW A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts
xLSTM Extended Long Short-Term Memory
You Only Sample Once Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs
Your Student is Better Than Expected Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
ZipLoRA Any Subject in Any Style by Effectively Merging LoRAs
Home
❯
tags
❯
Tag: analysis
Tag: analysis
106 items with this tag.
Jun 18, 2024
Finding Visual Task Vectors
diffusion_model
visual_prompting
in-context_learning
analysis
task_vectors
zero-shot
mae
vqgan
attention
reinforce
Jun 18, 2024
LLM2Vec Large Language Models Are Secretly Powerful Text Encoders
diffusion_model
llm
analysis
text_embedding
contrastive_learning
Jun 18, 2024
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
diffusion_model
llm
analysis
video
vqa
interpretability
Jun 18, 2024
Connecting NeRFs, Images, and Text
nerf
diffusion_model
gan
analysis
3d
multimodal
retrieval
zero-shot
representation_learning
Jun 18, 2024
Probing the 3D Awareness of Visual Foundation Models
3d
analysis
depth_estimation
surface_normal
correspondence
vision_transformer
diffusion_model
self_supervised_learning
vision_language_model
Jun 18, 2024
Analysis of Classifier-Free Guidance Weight Schedulers
diffusion_model
cfg
analysis
image_generation
text-to-image
fid
inception_score
clip-score
diversity
Jun 18, 2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
llm
analysis
fine-tuning
preference_learning
reinforcement_learning
contrastive_learning
on-policy
negative_gradient
mode-seeking
Jun 18, 2024
CatLIP CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
diffusion_model
analysis
image_classification
multi-label_classification
semantic_segmentation
object_detection
weakly_supervised_learning
pre-training
vision_transformer
data_efficiency
web-scale_data
Jun 18, 2024
A Survey on Vision Mamba Models, Applications and Challenges
diffusion_model
llm
analysis
literature_review
3d
motion
video
interpretability
vision_transformer
state_space_model
Jun 18, 2024
Stylus Automatic Adapter Selection for Diffusion Models
diffusion_model
adapter
llm
analysis
image_generation
retrieval
lora
Jun 18, 2024
Espresso Robust Concept Filtering in Text-to-Image Models
diffusion_model
tti
clip
analysis
adversarial_attack
interpretability
robustness
concept_filtering
safety
Jun 18, 2024
Visual Fact Checker Enabling High-Fidelity Detailed Caption Generation
diffusion_model
llm
captioning
2d
3d
hallucination
vqa
object_detection
analysis
Jun 18, 2024
KAN Kolmogorov-Arnold Networks
diffusion_model
analysis
interpretability
neural_scaling_law
pde
scientific_discovery
symbolic_regression
Jun 18, 2024
On Mechanistic Knowledge Localization in Text-to-Image Generative Models
diffusion_model
analysis
interpretability
text-to-image
model_editing
Jun 18, 2024
xLSTM Extended Long Short-Term Memory
lstm
language_model
llm
rnn
transformer
state_space_model
gating
memory
analysis
scaling_law
sequence_length_extrapolation
Jun 18, 2024
Could It Be Generated Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models
diffusion_model
memorization
analysis
text-to-image
security
privacy
copyright
inversion
Jun 18, 2024
Controllable Image Generation With Composed Parallel Token Prediction
diffusion_model
gan
vq-vae
vq-gan
analysis
image_generation
compositionality
discrete_models
parallel_token_prediction
controllable_generation
Jun 18, 2024
The Platonic Representation Hypothesis
representation
convergence
multimodality
vision
language
scaling
analysis
platonic_representation
Jun 18, 2024
FIFO-Diffusion Generating Infinite Videos from Text without Training
diffusion_model
video
text-to-video
long_video_generation
analysis
Jun 18, 2024
Recovering the Pre-Fine-Tuning Weights of Generative Models
diffusion_model
llm
analysis
adversarial_attack
interpretability
lora
fine-tuning
model_security
weight_recovery
Jun 18, 2024
Speculative Streaming Fast LLM Inference without Auxiliary Models
llm
diffusion_model
analysis
speculative_decoding
inference
resource_constrained
Jun 18, 2024
LoRA+ Efficient Low Rank Adaptation of Large Models
diffusion_model
llm
analysis
finetuning
lora
optimization
Jun 18, 2024
Training Neural Networks from Scratch with Parallel Low-Rank Adapters
diffusion_model
llm
analysis
3d
motion
video
interpretability
Jun 18, 2024
Asymmetry in Low-Rank Adapters of Foundation Models
llm
lora
peft
fine-tuning
analysis
generalization
parameter_efficiency
text_generation
text_classification
image_classification
Jun 18, 2024
Sora A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
diffusion_model
llm
analysis
video
sora
Jun 18, 2024
WWW A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts
interpretability
explanation
neural-network
concept-discovery
shapley-value
neuron-activation-map
heatmap
uncertainty
analysis
Jun 18, 2024
Model Lakes
model_lake
model_management
model_search
model_provenance
model_versioning
analysis
literature_review
Jun 18, 2024
What do we learn from inverting CLIP models
clip
analysis
nsfw
gender-bias
model-inversion
interpretability
Jun 18, 2024
ELLA Equip Diffusion Models with LLM for Enhanced Semantic Alignment
diffusion_model
llm
text-to-image
semantic_alignment
dense_prompt
timestep-aware
benchmark
analysis
Jun 18, 2024
ORPO Monolithic Preference Optimization without Reference Model
diffusion_model
llm
analysis
preference_alignment
sft
instruction_following
rlhf
dpo
Jun 18, 2024
Graph Neural Networks for Learning Equivariant Representations of Neural Networks
diffusion_model
gan
analysis
3d
interpretability
neural_network
graph_neural_network
transformer
representation_learning
permutation_symmetry
implicit_neural_representation
generalization
learning_to_optimize
Jun 18, 2024
When Do We Not Need Larger Vision Models
diffusion_model
llm
analysis
3d
motion
video
interpretability
Jun 18, 2024
Evolutionary Optimization of Model Merging Recipes
diffusion_model
llm
analysis
evolutionary_algorithm
model_merging
japanese
multi-modal
vlm
Jun 18, 2024
Implicit Style-Content Separation using B-LoRA
diffusion_model
lora
image_stylization
style_transfer
text_guided_image_editing
analysis
sdxl
Jun 18, 2024
MyVLM Personalizing VLMs for User-Specific Queries
diffusion_model
llm
analysis
personalization
image_captioning
visual_question_answering
referring_expression_comprehension
Jun 18, 2024
ReNoise Real Image Inversion Through Iterative Noising
diffusion_model
image_editing
inversion
few-step_models
analysis
ddim
sdxl-turbo
lcm
Jun 18, 2024
Long-CLIP Unlocking the Long-Text Capability of CLIP
diffusion_model
clip
analysis
image_retrieval
text-to-image_generation
interpretability
Jun 18, 2024
Improving Text-to-Image Consistency via Automatic Prompt Optimization
diffusion_model
llm
analysis
text-to-image
interpretability
prompt_engineering
consistency
Jun 18, 2024
Tutorial on Diffusion Models for Imaging and Vision
diffusion_model
vae
ddpm
smld
sde
analysis
tutorial
image_generation
denoising
Jun 18, 2024
Capability-aware Prompt Reformulation Learning for Text-to-Image Generation
diffusion_model
text-to-image-generation
prompt-reformulation
analysis
log-analysis
Jun 18, 2024
TTD Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
diffusion_model
clip
analysis
segmentation
open-vocabulary
image-text-alignment
self-distillation
bias
Jun 18, 2024
Privacy Backdoors Enhancing Membership Inference through Poisoning Pre-trained Models
privacy
backdoor_attack
membership_inference
poisoning
pre-trained_model
fine-tuning
clip
llm
analysis
Jun 18, 2024
Iterated Learning Improves Compositionality in Large Vision-Language Models
diffusion_model
llm
analysis
interpretability
Jun 18, 2024
Mixture-of-Depths Dynamically allocating compute in transformer-based language models
diffusion_model
llm
analysis
conditional_computation
transformer
efficiency
mixture-of-experts
routing
autoregressive_sampling
long-term_memory
Jun 18, 2024
LP++ A Surprisingly Strong Linear Probe for Few-Shot CLIP
diffusion_model
llm
analysis
few-shot-learning
clip
optimization
black-box
linear_probe
image_classification
Jun 18, 2024
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
diffusion_model
cross-attention
inference
efficiency
analysis
text-to-image
Jun 18, 2024
On the Scalability of Diffusion-based Text-to-Image Generation
diffusion_model
gan
analysis
text-to-image
unet
transformer
scaling_law
dataset
caption
efficiency
Jun 18, 2024
ReFT Representation Finetuning for Language Models
diffusion_model
llm
analysis
interpretability
Jun 18, 2024
LCM-Lookahead for Encoder-based Text-to-Image Personalization
diffusion_model
personalization
face_generation
lcm
analysis
attention_mechanism
image_generation
Jun 18, 2024
Exponentially Faster Language Modelling
llm
bert
diffusion_model
analysis
performance
optimization
conditional_computation
Jun 18, 2024
An Image is Worth Multiple Words Multi-attribute Inversion for Constrained Text-to-Image Synthesis
diffusion_model
inversion
text-to-image
attribute-guided
disentanglement
analysis
image_synthesis
reference_image
Jun 18, 2024
Concept Sliders LoRA Adaptors for Precise Control in Diffusion Models
diffusion_model
lora
analysis
image_editing
gan
stylegan
interpretability
3d
concept_sliders
Jun 18, 2024
Scalable Extraction of Training Data from (Production) Language Models
llm
analysis
memorization
privacy
data_extraction
alignment
chatgpt
divergence_attack
suffix_array
good-turing_estimator
pii
Jun 18, 2024
Sequential Modeling Enables Scalable Learning for Large Vision Models
diffusion_model
llm
analysis
3d
motion
video
interpretability
Jun 18, 2024
Mismatch Quest Visual and Textual Feedback for Image-Text Misalignment
image-text-alignment
llm
visual-grounding
misalignment-explanation
dataset
analysis
evaluation
Jun 18, 2024
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
diffusion_model
llm
analysis
3d
video
interpretability
Jun 18, 2024
Using Captum to Explain Generative Language Models
llm
analysis
interpretability
explainability
attribution
perturbation-based-methods
gradient-based-methods
open-source
captum
Jun 18, 2024
Accelerating the Global Aggregation of Local Explanations
analysis
interpretability
local-explanation
global-explanation
anchor-algorithm
text-classification
runtime-optimization
anytime-algorithm
Jun 18, 2024
A Picture is Worth More Than 77 Text Tokens Evaluating CLIP-Style Models on Dense Captions
diffusion_model
llm
analysis
3d
adversarial_attack
interpretability
Jun 18, 2024
Perspectives on the State and Future of Deep Learning - 2023
analysis
llm
interpretability
benchmarking
deep_learning
transformers
future_of_ai
Jun 18, 2024
Your Student is Better Than Expected Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
diffusion_model
gan
analysis
image_generation
knowledge_distillation
text-to-image
Jun 18, 2024
Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models A Critical Review and Assessment
peft
llm
fine-tuning
parameter_efficiency
memory_efficiency
adapter
lora
prompt-tuning
prefix-tuning
analysis
literature_review
nlu
mt
nlg
Jun 18, 2024
Generative Multimodal Models are In-Context Learners
diffusion_model
llm
analysis
3d
motion
video
interpretability
Jun 18, 2024
The Truth is in There Improving Reasoning in Language Models with Layer-Selective Rank Reduction
llm
analysis
svd
pruning
rank_reduction
question_answering
factuality
decision_transformer
reinforcement_learning
Jun 18, 2024
V* Guided Visual Search as a Core Mechanism in Multimodal LLMs
diffusion_model
llm
analysis
3d
video
interpretability
Jun 18, 2024
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
llm
fine-tuning
self-play
sft
dpo
analysis
benchmark
Jun 18, 2024
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
imitation-learning
offline-learning
sample-complexity
dataset-composition
function-class-complexity
covering-number
analysis
Jun 18, 2024
Score Distillation Sampling with Learned Manifold Corrective
diffusion_model
analysis
image_synthesis
image_editing
3d
text-to-3d
optimization
loss_function
denoising
Jun 18, 2024
Eyes Wide Shut Exploring the Visual Shortcomings of Multimodal LLMs
diffusion_model
llm
analysis
benchmark
visual-grounding
clip
self-supervised-learning
multimodal
vision-and-language
representation-learning
Jun 18, 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs) A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
mllm
llm
multimodal_reasoning
instruction_tuning
in-context_learning
analysis
literature_review
embodied_ai
tool_usage
Jun 18, 2024
Benchmarking the Robustness of Image Watermarks
diffusion_model
watermark
analysis
adversarial_attack
benchmark
image_quality
robustness
Jun 18, 2024
SiT Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
diffusion_model
gan
interpolant
analysis
image_generation
transformer
sde
ode
Jun 18, 2024
AEROBLADE Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
diffusion_model
ldm
analysis
adversarial_attack
interpretability
detection
disinformation
autoencoder
Jun 18, 2024
Compositional Generative Modeling A Single Model is Not All You Need
generative_modeling
modularity
compositionality
ebm
diffusion_model
analysis
video
image
planning
Jun 18, 2024
Can MLLMs Perform Text-to-Image In-Context Learning
diffusion_model
llm
mllm
analysis
benchmark
dataset
image_generation
in-context-learning
multimodality
prompt_engineering
Jun 18, 2024
Large Language Models A Survey
llm
survey
gpt
llama
palm
transformer
pre-training
fine-tuning
alignment
prompt_engineering
rag
hallucination
ethical_ai
multi-modal
analysis
literature_review
code_generation
reasoning
Jun 18, 2024
A Review of Adversarial Attacks in Computer Vision
adversarial_attack
computer_vision
image_classification
object_detection
semantic_segmentation
white-box_attack
black-box_attack
transfer_attack
universal_adversarial_perturbation
analysis
literature_review
Jun 18, 2024
Boosting Multi-modal Model Performance with Adaptive Gradient Modulation
diffusion_model
analysis
multi-modal-learning
modality-competition
gradient-modulation
shapley-value
fusion-strategies
Jun 18, 2024
StyleDiffusion Controllable Disentangled Style Transfer via Diffusion Models
diffusion_model
style_transfer
disentanglement
clip
analysis
image_manipulation
photorealistic
multi-modal
Jun 18, 2024
ALIP Adaptive Language-Image Pre-training with Synthetic Caption
diffusion_model
llm
analysis
image-text-retrieval
contrastive_learning
pre-training
noise_alleviation
Jun 18, 2024
Linearity of Relation Decoding in Transformer Language Models
llm
analysis
interpretability
knowledge_representation
relation
linear_transformation
Jun 18, 2024
RLIPv2 Fast Scaling of Relational Language-Image Pre-training
diffusion_model
llm
analysis
3d
motion
video
interpretability
Jun 18, 2024
Diffusion Model as Representation Learner
diffusion_model
representation_learning
knowledge_distillation
semantic_segmentation
image_classification
landmark_detection
reinforcement_learning
analysis
Jun 18, 2024
Unified Concept Editing in Diffusion Models
diffusion_model
gan
analysis
adversarial_attack
interpretability
debias
erasure
moderation
Jun 18, 2024
Elucidating the Exposure Bias in Diffusion Models
diffusion_model
exposure_bias
sampling
fid
analysis
training-free
image_generation
adm
ddim
ddpm
edm
ldm
dit
pfgm++
Jun 18, 2024
FIND A Function Description Benchmark for Evaluating Interpretability Methods
diffusion_model
llm
analysis
interpretability
Jun 18, 2024
Generative Image Dynamics
diffusion_model
motion
video
analysis
3d
Jun 18, 2024
On Model Explanations with Transferable Neural Pathways
diffusion_model
analysis
interpretability
neural_pathway
Jun 18, 2024
TinyCLIP CLIP Distillation via Affinity Mimicking and Weight Inheritance
diffusion_model
llm
analysis
3d
video
interpretability
Jun 18, 2024
Demystifying CLIP Data
diffusion_model
clip
analysis
data_curation
image_text
zero_shot
Jun 18, 2024
No Token Left Behind Efficient Vision Transformer via Dynamic Token Idling
diffusion_model
llm
analysis
3d
motion
video
interpretability
Jun 18, 2024
NEFTune Noisy Embeddings Improve Instruction Finetuning
diffusion_model
llm
analysis
instruction_finetuning
overfitting
regularization
embedding
conversational_ai
Jun 18, 2024
Interpreting CLIP's Image Representation via Text-Based Decomposition
diffusion_model
llm
analysis
interpretability
attention
clip
vit
zero-shot-learning
segmentation
spurious-correlations
Jun 18, 2024
State of the Art on Diffusion Models for Visual Computing
diffusion_model
gan
analysis
literature_review
2d
3d
motion
video
4d
text-to-image
text-to-video
Jun 18, 2024
Context-Aware Meta-Learning
diffusion_model
llm
analysis
few-shot-learning
image-classification
meta-learning
in-context-learning
universal-meta-learning
Jun 18, 2024
A General Theoretical Paradigm to Understand Learning from Human Preferences
rlhf
dpo
llm
analysis
preference-learning
overfitting
regularization
bandit
optimization
Jun 18, 2024
Quality Diversity through Human Feedback
diffusion_model
analysis
3d
motion
interpretability
quality_diversity
human_feedback
contrastive_learning
latent_space
image_generation
Jun 18, 2024
On the Language Encoder of Contrastive Cross-modal Models
diffusion_model
analysis
llm
audio
video
Jun 18, 2024
Localizing and Editing Knowledge in Text-to-Image Generative Models
diffusion_model
analysis
interpretability
text-to-image
stable-diffusion
causal-mediation-analysis
model-editing
Jun 18, 2024
MAS Multi-view Ancestral Sampling for 3D motion generation using 2D diffusion
diffusion_model
3d
motion
video
analysis
motion_generation
Jun 18, 2024
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
diffusion_model
llm
analysis
language_modeling
text_generation
Jun 18, 2024
The Expressive Power of Low-Rank Adaptation
lora
fine-tuning
fnn
tfn
analysis
expressive_power
approximation_error
Jun 18, 2024
Future Lens Anticipating Subsequent Tokens from a Single Hidden State
llm
analysis
interpretability
transformer
hidden_state
causal_intervention
Jun 18, 2024
First Tragedy, then Parse History Repeats Itself in the New Era of Large Language Models
llm
analysis
literature_review
machine_translation
evaluation
data_scarcity
hardware
future_work
Jun 18, 2024
The Chosen One Consistent Characters in Text-to-Image Diffusion Models
diffusion_model
consistent_character
personalization
text-to-image
clustering
analysis
user_study
sdxl
dinov2
Jun 18, 2024
Testing Language Model Agents Safely in the Wild
llm
analysis
safety
autonomous_agent
testing