arXiv 论文速递

Snapshot: 20260206_0343

Reinforced Attention Learning

Authors: Bangzheng Li, Jianmo Ni, Chen Qu, Ian Miao, Liu Yang, Xingyu Fu, Muhao Chen, Derek Zhiyuan Cheng

First: 2026-02-04T18:59:52+00:00 · Latest: 2026-02-04T18:59:52+00:00

Abstract

Post-training with Reinforcement Learning (RL) has substantially improved reasoning in Large Language Models (LLMs) via test-time scaling. However, extending this paradigm to Multimodal LLMs (MLLMs) through verbose rationales yields limited gains for perception and can even degrade performance. We propose Reinforced Attention Learning (RAL), a policy-gradient framework that directly optimizes internal attention distributions rather than output token sequences. By shifting optimization from what to generate to where to attend, RAL promotes effective information allocation and improved grounding in complex multimodal inputs. Experiments across diverse image and video benchmarks show consistent gains over GRPO and other baselines. We further introduce On-Policy Attention Distillation, demonstrating that transferring latent attention behaviors yields stronger cross-modal alignment than standard knowledge distillation. Our results position attention policies as a principled and general alternative for multimodal post-training.

中文标题/摘要

标题：强化注意学习

通过测试时缩放，后训练的强化学习（RL）在大型语言模型（LLMs）的推理方面取得了显著改进。然而，将这一范式扩展到多模态LLMs（MLLMs）并通过冗长的推理理由，仅在感知方面获得有限的收益，甚至可能降低性能。我们提出了一种强化注意学习（RAL），这是一种策略梯度框架，直接优化内部注意分布，而不是输出的标记序列。通过将优化从生成什么转移到注意哪里，RAL促进了有效信息分配并提高了对复杂多模态输入的语义关联。跨多种图像和视频基准的实验显示，RAL在GRPO和其他基线方法上均表现出一致的改进。我们进一步引入了策略梯度注意蒸馏，表明转移潜在的注意行为比标准的知识蒸馏能获得更强的跨模态对齐。我们的结果将注意策略定位为多模态后训练的一种原理上和通用的替代方案。

Protein Autoregressive Modeling via Multiscale Structure Generation

Authors: Yanru Qu, Cheng-Yen Hsieh, Zaixiang Zheng, Ge Liu, Quanquan Gu

First: 2026-02-04T18:59:49+00:00 · Latest: 2026-02-04T18:59:49+00:00

Comments: ByteDance Seed Tech Report; Page: https://par-protein.github.io/

Abs · PDF · Code1 · Code2 · Project1

Abstract

We present protein autoregressive modeling (PAR), the first multi-scale autoregressive framework for protein backbone generation via coarse-to-fine next-scale prediction. Using the hierarchical nature of proteins, PAR generates structures that mimic sculpting a statue, forming a coarse topology and refining structural details over scales. To achieve this, PAR consists of three key components: (i) multi-scale downsampling operations that represent protein structures across multiple scales during training; (ii) an autoregressive transformer that encodes multi-scale information and produces conditional embeddings to guide structure generation; (iii) a flow-based backbone decoder that generates backbone atoms conditioned on these embeddings. Moreover, autoregressive models suffer from exposure bias, caused by the training and the generation procedure mismatch, and substantially degrades structure generation quality. We effectively alleviate this issue by adopting noisy context learning and scheduled sampling, enabling robust backbone generation. Notably, PAR exhibits strong zero-shot generalization, supporting flexible human-prompted conditional generation and motif scaffolding without requiring fine-tuning. On the unconditional generation benchmark, PAR effectively learns protein distributions and produces backbones of high design quality, and exhibits favorable scaling behavior. Together, these properties establish PAR as a promising framework for protein structure generation.

中文标题/摘要

标题：基于多尺度结构生成的蛋白质自回归建模

我们提出了蛋白质自回归建模（PAR），这是第一个通过粗到细逐级预测的多尺度自回归框架，用于蛋白质主链生成。利用蛋白质的分层特性，PAR 生成结构，模仿雕塑雕像的过程，先形成粗略的拓扑结构，再逐步细化结构细节。为此，PAR 包含三个关键组件：（i）多尺度下采样操作，在训练过程中表示不同尺度的蛋白质结构；（ii）自回归变压器，编码多尺度信息并生成条件嵌入以指导结构生成；（iii）基于流的主链解码器，根据这些嵌入生成主链原子。此外，自回归模型由于训练和生成过程的不匹配而遭受曝光偏差，这显著降低了结构生成的质量。我们通过采用噪声上下文学习和计划采样有效缓解了这一问题，使主链生成更加稳健。值得注意的是，PAR 具有强大的零样本泛化能力，支持灵活的人类提示条件生成和模式支架，无需微调。在无条件生成基准测试中，PAR 有效地学习了蛋白质分布并生成了高质量的主链，表现出良好的扩展行为。这些特性使 PAR 成为蛋白质结构生成的一个有前途的框架。

Summary / 总结

The research introduces protein autoregressive modeling (PAR), a multi-scale autoregressive framework for protein backbone generation. PAR uses coarse-to-fine next-scale prediction and consists of multi-scale downsampling, an autoregressive transformer, and a flow-based backbone decoder. It addresses exposure bias through noisy context learning and scheduled sampling, enhancing structure generation quality. PAR demonstrates strong zero-shot generalization and favorable scaling behavior, producing high-quality protein backbones without fine-tuning, making it a promising framework for protein structure generation.

研究引入了蛋白质自回归建模（PAR），这是一种多尺度自回归框架，用于蛋白质主链生成。PAR采用自上而下的逐尺度预测和蛋白质结构的层次表示。它包括多尺度下采样、自回归变压器和基于流的主链解码器。为解决暴露偏差问题，PAR采用了噪声上下文学习和计划采样。PAR展示了强大的零样本泛化能力和高质量的主链生成，支持灵活的条件生成和基序构建，无需微调。在基准测试中，PAR有效地学习了蛋白质分布并生成了高质量的主链，显示出有利的扩展行为。

Contrastive Continual Learning for Model Adaptability in Internet of Things

Authors: Ajesh Koyatan Chathoth

First: 2026-02-04T18:59:14+00:00 · Latest: 2026-02-04T18:59:14+00:00