PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference
AI 摘要
PersonalQ提出了一种统一框架,通过选择、量化和提供个性化扩散模型来提高推理效率。
主要贡献
- 提出Intent-aligned选择方法,提高意图对齐
- 提出Trigger-Aware Quantization (TAQ)方法,实现高效量化
- 实现了个性化扩散模型的可扩展服务,保证了生成质量
方法论
通过Intent-aware混合检索和LLM reranking进行checkpoint选择,并使用TAQ对cross-attention进行触发感知量化。
原文摘要
Personalized text-to-image generation lets users fine-tune diffusion models into repositories of concept-specific checkpoints, but serving these repositories efficiently is difficult for two reasons: natural-language requests are often ambiguous and can be misrouted to visually similar checkpoints, and standard post-training quantization can distort the fragile representations that encode personalized concepts. We present PersonalQ, a unified framework that connects checkpoint selection and quantization through a shared signal -- the checkpoint's trigger token. Check-in performs intent-aligned selection by combining intent-aware hybrid retrieval with LLM-based reranking over checkpoint context and asks a brief clarification question only when multiple intents remain plausible; it then rewrites the prompt by inserting the selected checkpoint's canonical trigger. Complementing this, Trigger-Aware Quantization (TAQ) applies trigger-aware mixed precision in cross-attention, preserving trigger-conditioned key/value rows (and their attention weights) while aggressively quantizing the remaining pathways for memory-efficient inference. Experiments show that PersonalQ improves intent alignment over retrieval and reranking baselines, while TAQ consistently offers a stronger compression-quality trade-off than prior diffusion PTQ methods, enabling scalable serving of personalized checkpoints without sacrificing fidelity.