LLM Memory & RAG 相关度: 6/10

Depth-Structured Music Recurrence: Budgeted Recurrent Attention for Full-Piece Symbolic Music Modeling

Yungang Yi
arXiv: 2602.19816v1 发布: 2026-02-23 更新: 2026-02-23

AI 摘要

DSMR模型通过分层记忆调度,实现资源受限下的长序列音乐建模。

主要贡献

  • 提出了Depth-Structured Music Recurrence (DSMR)模型
  • 设计了分层记忆调度策略,优化资源分配
  • 验证了DSMR在长序列音乐建模上的有效性

方法论

构建基于Transformer的循环神经网络,通过分层管理KV状态来扩展上下文,并进行实验分析。

原文摘要

Long-context modeling is essential for symbolic music generation, since motif repetition and developmental variation can span thousands of musical events. However, practical composition and performance workflows frequently rely on resource-limited devices (e.g., electronic instruments and portable computers), making heavy memory and attention computation difficult to deploy. We introduce Depth-Structured Music Recurrence (DSMR), a recurrent long-context Transformer for full-piece symbolic music modeling that extends context beyond fixed-length excerpts via segment-level recurrence with detached cross-segment states, featuring a layer-wise memory-horizon schedule that budgets recurrent KV states across depth. DSMR is trained in a single left-to-right pass over each complete composition, akin to how a musician experiences it from beginning to end, while carrying recurrent cross-segment states forward. Within this recurrent framework, we systematically study how depth-wise horizon allocations affect optimization, best-checkpoint perplexity, and efficiency. By allocating different history-window lengths across layers while keeping the total recurrent-state budget fixed, DSMR creates depth-dependent temporal receptive fields within a recurrent attention stack without reducing compute depth. Our main instantiation is a two-scale DSMR schedule that allocates long history windows to lower layers and a uniform short window to the remaining layers. Experiments on the piano performance dataset MAESTRO demonstrate that two-scale DSMR provides a practical quality--efficiency recipe for full-length long-context symbolic music modeling with recurrent attention under limited computational resources.

标签

音乐生成 长序列建模 Transformer 循环神经网络 资源优化

arXiv 分类

cs.SD cs.AI cs.LG