Multimodal Learning 相关度: 8/10

PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference

Qirui Wang, Qi Guo, Yiding Sun, Junkai Yang, Dongxu Zhang, Shanmin Pang, Qing Guo
arXiv: 2603.22943v1 发布: 2026-03-24 更新: 2026-03-24

AI 摘要

PersonalQ提出了一种统一框架,通过选择、量化和提供个性化扩散模型来提高推理效率。

主要贡献

  • 提出Intent-aligned选择方法,提高意图对齐
  • 提出Trigger-Aware Quantization (TAQ)方法,实现高效量化
  • 实现了个性化扩散模型的可扩展服务,保证了生成质量

方法论

通过Intent-aware混合检索和LLM reranking进行checkpoint选择,并使用TAQ对cross-attention进行触发感知量化。

原文摘要

Personalized text-to-image generation lets users fine-tune diffusion models into repositories of concept-specific checkpoints, but serving these repositories efficiently is difficult for two reasons: natural-language requests are often ambiguous and can be misrouted to visually similar checkpoints, and standard post-training quantization can distort the fragile representations that encode personalized concepts. We present PersonalQ, a unified framework that connects checkpoint selection and quantization through a shared signal -- the checkpoint's trigger token. Check-in performs intent-aligned selection by combining intent-aware hybrid retrieval with LLM-based reranking over checkpoint context and asks a brief clarification question only when multiple intents remain plausible; it then rewrites the prompt by inserting the selected checkpoint's canonical trigger. Complementing this, Trigger-Aware Quantization (TAQ) applies trigger-aware mixed precision in cross-attention, preserving trigger-conditioned key/value rows (and their attention weights) while aggressively quantizing the remaining pathways for memory-efficient inference. Experiments show that PersonalQ improves intent alignment over retrieval and reranking baselines, while TAQ consistently offers a stronger compression-quality trade-off than prior diffusion PTQ methods, enabling scalable serving of personalized checkpoints without sacrificing fidelity.

标签

Diffusion Models Personalization Quantization Efficient Inference

arXiv 分类

cs.AI