Multimodal Learning 相关度: 7/10

Self-Supervised Learning as Discrete Communication

Kawtar Zaher, Ilyass Moummad, Olivier Buisson, Alexis Joly
arXiv: 2602.09764v1 发布: 2026-02-10 更新: 2026-02-10

AI 摘要

提出一种基于离散通信的自监督学习方法,通过二元编码学习结构化视觉表示。

主要贡献

  • 将自监督学习建模为师生网络间的离散通信过程
  • 提出一种编码率正则化项,鼓励有效利用约束信道,促进结构化表示
  • 实验证明该方法在图像分类、检索和密集视觉预测任务上的有效性

方法论

通过教师-学生网络,学生预测教师产生的二元消息,利用二元交叉熵损失和编码率正则化约束。

原文摘要

Most self-supervised learning (SSL) methods learn continuous visual representations by aligning different views of the same input, offering limited control over how information is structured across representation dimensions. In this work, we frame visual self-supervised learning as a discrete communication process between a teacher and a student network, where semantic information is transmitted through a fixed-capacity binary channel. Rather than aligning continuous features, the student predicts multi-label binary messages produced by the teacher. Discrete agreement is enforced through an element-wise binary cross-entropy objective, while a coding-rate regularization term encourages effective utilization of the constrained channel, promoting structured representations. We further show that periodically reinitializing the projection head strengthens this effect by encouraging embeddings that remain predictive across multiple discrete encodings. Extensive experiments demonstrate consistent improvements over continuous agreement baselines on image classification, retrieval, and dense visual prediction tasks, as well as under domain shift through self-supervised adaptation. Beyond backbone representations, we analyze the learned binary codes and show that they form a compact and informative discrete language, capturing semantic factors reusable across classes.

标签

自监督学习 离散表示 表示学习 二元编码

arXiv 分类

cs.CV cs.IR cs.LG