Multimodal Learning 相关度: 9/10

TrajTok: Learning Trajectory Tokens enables better Video Understanding

Chenhao Zheng, Jieyu Zhang, Jianing Zhang, Weikai Huang, Ashutosh Kumar, Quan Kong, Oncel Tuzel, Chun-Liang Li, Ranjay Krishna

arXiv: 2602.22779v1 发布: 2026-02-26 更新: 2026-02-26

下载 PDF arXiv 页面

AI 摘要

提出TrajTok视频tokenizer，通过联合训练动态分割视频轨迹，提升视频理解性能和效率。

主要贡献

提出端到端视频tokenizer模块TrajTok，与下游任务联合训练。
TrajTok通过隐式聚类提取时空轨迹，无需外部分割和跟踪流水线。
TrajTok在视频分类、检索和长视频推理上表现出色。

方法论

TrajTok使用统一分割器在时空中进行隐式聚类，直接生成目标轨迹，并与下游视频模型联合训练。

原文摘要

Tokenization in video models, typically through patchification, generates an excessive and redundant number of tokens. This severely limits video efficiency and scalability. While recent trajectory-based tokenizers offer a promising solution by decoupling video duration from token count, they rely on complex external segmentation and tracking pipelines that are slow and task-agnostic. We propose TrajTok, an end-to-end video tokenizer module that is fully integrated and co-trained with video models for a downstream objective, dynamically adapting its token granularity to semantic complexity, independent of video duration. TrajTok contains a unified segmenter that performs implicit clustering over pixels in both space and time to directly produce object trajectories in a single forward pass. By prioritizing downstream adaptability over pixel-perfect segmentation fidelity, TrajTok is lightweight and efficient, yet empirically improves video understanding performance. With TrajTok, we implement a video CLIP model trained from scratch (TrajViT2). It achieves the best accuracy at scale across both classification and retrieval benchmarks, while maintaining efficiency comparable to the best token-merging methods. TrajTok also proves to be a versatile component beyond its role as a tokenizer. We show that it can be seamlessly integrated as either a probing head for pretrained visual features (TrajAdapter) or an alignment connector in vision-language models (TrajVLM) with especially strong performance in long-video reasoning.

arXiv 分类

cs.CV

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类