ULTRA:Urdu Language Transformer-based Recommendation Architecture
AI 摘要
ULTRA是针对低资源乌尔都语的自适应语义推荐框架,通过双嵌入和查询长度感知路由提升推荐效果。
主要贡献
- 提出了ULTRA:一种基于Transformer的乌尔都语推荐架构
- 引入双嵌入架构和查询长度感知路由机制
- 在大规模乌尔都语新闻语料库上验证了其有效性,精度提升超过90%
方法论
采用Transformer embeddings和优化的池化策略,根据查询长度动态选择headline或全文级别的语义pipeline进行推荐。
原文摘要
Urdu, as a low-resource language, lacks effective semantic content recommendation systems, particularly in the domain of personalized news retrieval. Existing approaches largely rely on lexical matching or language-agnostic techniques, which struggle to capture semantic intent and perform poorly under varying query lengths and information needs. This limitation results in reduced relevance and adaptability in Urdu content recommendation. We propose ULTRA (Urdu Language Transformer-based Recommendation Architecture),an adaptive semantic recommendation framework designed to address these challenges. ULTRA introduces a dual-embedding architecture with a query-length aware routing mechanism that dynamically distinguishes between short, intent-focused queries and longer, context-rich queries. Based on a threshold-driven decision process, user queries are routed to specialized semantic pipelines optimized for either title/headline-level or full-content/document level representations, ensuring appropriate semantic granularity during retrieval. The proposed system leverages transformer-based embeddings and optimized pooling strategies to move beyond surface-level keyword matching and enable context-aware similarity search. Extensive experiments conducted on a large-scale Urdu news corpus demonstrate that the proposed architecture consistently improves recommendation relevance across diverse query types. Results show gains in precision above 90% compared to single-pipeline baselines, highlighting the effectiveness of query-adaptive semantic alignment for low-resource languages. The findings establish ULTRA as a robust and generalizable content recommendation architecture, offering practical design insights for semantic retrieval systems in low-resource language settings.