arXiv 论文速递

Snapshot: 20260224_0400

Going Down Memory Lane: Scaling Tokens for Video Stream Understanding with Dynamic KV-Cache Memory

Authors: Vatsal Agarwal, Saksham Suri, Matthew Gwilliam, Pulkit Kumar, Abhinav Shrivastava

First: 2026-02-20T18:59:50+00:00 · Latest: 2026-02-20T18:59:50+00:00

Comments: Project page: see https://vatsalag99.github.io/memstream/

Abstract

Streaming video understanding requires models to robustly encode, store, and retrieve information from a continuous video stream to support accurate video question answering (VQA). Existing state-of-the-art approaches rely on key-value caching to accumulate frame-level information over time, but use a limited number of tokens per frame, leading to the loss of fine-grained visual details. In this work, we propose scaling the token budget to enable more granular spatiotemporal understanding and reasoning. First, we find that current methods are ill-equipped to handle dense streams: their feature encoding causes query-frame similarity scores to increase over time, biasing retrieval toward later frames. To address this, we introduce an adaptive selection strategy that reduces token redundancy while preserving local spatiotemporal information. We further propose a training-free retrieval mixture-of-experts that leverages external models to better identify relevant frames. Our method, MemStream, achieves +8.0% on CG-Bench, +8.5% on LVBench, and +2.4% on VideoMME (Long) over ReKV with Qwen2.5-VL-7B.

中文标题/摘要

标题：重返记忆之路：通过动态KV缓存记忆扩展视频流理解的令牌预算

视频流理解需要模型从连续视频流中稳健地编码、存储和检索信息，以支持准确的视频问答（VQA）。现有最先进的方法依赖于键值缓存来随着时间累积帧级信息，但每帧使用的令牌数量有限，导致丢失了细粒度的视觉细节。在本文中，我们提出扩展令牌预算以实现更精细的空间-时间理解和推理。首先，我们发现当前方法无法处理密集流：它们的特征编码导致查询帧相似度分数随时间增加，偏向于后期帧的检索。为了解决这个问题，我们引入了一种自适应选择策略，减少令牌冗余同时保留局部空间-时间信息。我们还提出了一种无需训练的检索专家混合模型，利用外部模型更好地识别相关帧。我们的方法MemStream在CG-Bench上提高了8.0%，在LVBench上提高了8.5%，在VideoMME（长）上相对于ReKV与Qwen2.5-VL-7B提高了2.4%。

SARAH: Spatially Aware Real-time Agentic Humans

Authors: Evonne Ng, Siwei Zhang, Zhang Chen, Michael Zollhoefer, Alexander Richard

First: 2026-02-20T18:59:35+00:00 · Latest: 2026-02-20T18:59:35+00:00

Comments: Project page: https://evonneng.github.io/sarah/

Abs · PDF · Code1 · Code2 · Project1

Abstract

As embodied agents become central to VR, telepresence, and digital human applications, their motion must go beyond speech-aligned gestures: agents should turn toward users, respond to their movement, and maintain natural gaze. Current methods lack this spatial awareness. We close this gap with the first real-time, fully causal method for spatially-aware conversational motion, deployable on a streaming VR headset. Given a user's position and dyadic audio, our approach produces full-body motion that aligns gestures with speech while orienting the agent according to the user. Our architecture combines a causal transformer-based VAE with interleaved latent tokens for streaming inference and a flow matching model conditioned on user trajectory and audio. To support varying gaze preferences, we introduce a gaze scoring mechanism with classifier-free guidance to decouple learning from control: the model captures natural spatial alignment from data, while users can adjust eye contact intensity at inference time. On the Embody 3D dataset, our method achieves state-of-the-art motion quality at over 300 FPS -- 3x faster than non-causal baselines -- while capturing the subtle spatial dynamics of natural conversation. We validate our approach on a live VR system, bringing spatially-aware conversational agents to real-time deployment. Please see https://evonneng.github.io/sarah/ for details.

中文标题/摘要

标题：SARAH: 空间感知实时自主人类

随着具身代理在VR、远程存在感和数字人类应用中的核心地位日益凸显，它们的运动必须超越与言语同步的手势：代理应面向用户，响应其动作，并保持自然的目光。当前方法缺乏这种空间感知能力。我们通过提出第一个实时、完全因果的空间感知对话运动方法来填补这一空白，该方法适用于流式传输的VR头显。给定用户的位置和二元音频，我们的方法生成全身运动，使手势与言语同步，同时根据用户调整代理的方向。我们的架构结合了因果变换器基VAE和交错的潜在令牌以实现流式推理，以及基于用户轨迹和音频的流动匹配模型。为了支持不同的注视偏好，我们引入了一种注视评分机制和无分类引导，以解耦学习与控制：模型从数据中捕捉自然的空间对齐，而用户可以在推理时调整眼神接触的强度。在Embody 3D数据集上，我们的方法在超过300 FPS的速度下实现了最先进的运动质量——比非因果基线快3倍——同时捕捉自然对话的微妙空间动态。我们通过实时VR系统验证了我们的方法，将空间感知对话代理带到实时部署。请参见https://evonneng.github.io/sarah/ 获取更多详情。

Summary / 总结

The research aims to enhance embodied agents in VR and telepresence applications by incorporating spatial awareness into their motion, enabling them to turn towards users and respond to their movements. The method uses a causal transformer-based VAE with interleaved latent tokens for streaming inference and a flow matching model conditioned on user trajectory and audio. On the Embody 3D dataset, the approach achieves state-of-the-art motion quality at over 300 FPS, capturing natural spatial dynamics and supporting real-time deployment in VR systems.

研究旨在通过使虚拟现实和远程呈现中的实体代理具备空间意识，例如转向用户和响应其动作，来提升实体代理。方法使用因果变换器基线VAE结合流匹配模型来生成与言语同步的手势和基于用户位置的定向动作。该方法在超过300 FPS的速度下实现了最先进的运动质量，捕捉自然的空间动态，并支持用户调整眼神接触强度。这使得在VR系统中实现了实时部署的空间感知对话代理。

VIRAASAT: Traversing Novel Paths for Indian Cultural Reasoning

Authors: Harshul Raj Surana, Arijit Maji, Aryan Vats, Akash Ghosh, Sriparna Saha, Amit Sheth

First: 2026-02-20T18:53:07+00:00 · Latest: 2026-02-20T18:53:07+00:00

Abs · PDF · Code1 · Code2

Abstract

Large Language Models (LLMs) have made significant progress in reasoning tasks across various domains such as mathematics and coding. However, their performance deteriorates in tasks requiring rich socio-cultural knowledge and diverse local contexts, particularly those involving Indian Culture. Existing Cultural benchmarks are (i) Manually crafted, (ii) contain single-hop questions testing factual recall, and (iii) prohibitively costly to scale, leaving this deficiency largely unmeasured. To address this, we introduce VIRAASAT, a novel, semi-automated multi-hop approach for generating cultural specific multi-hop Question-Answering dataset for Indian culture. VIRAASAT leverages a Knowledge Graph comprising more than 700 expert-curated cultural artifacts, covering 13 key attributes of Indian culture (history, festivals, etc). VIRAASAT spans all 28 states and 8 Union Territories, yielding more than 3,200 multi-hop questions that necessitate chained cultural reasoning. We evaluate current State-of-the-Art (SOTA) LLMs on VIRAASAT and identify key limitations in reasoning wherein fine-tuning on Chain-of-Thought(CoT) traces fails to ground and synthesize low-probability facts. To bridge this gap, we propose a novel framework named Symbolic Chain-of-Manipulation (SCoM). Adapting the Chain-of-Manipulation paradigm, we train the model to simulate atomic Knowledge Graph manipulations internally. SCoM teaches the model to reliably traverse the topological structure of the graph. Experiments on Supervised Fine-Tuning (SFT) demonstrate that SCoM outperforms standard CoT baselines by up to 20%. We release the VIRAASAT dataset along with our findings, laying a strong foundation towards building Culturally Aware Reasoning Models.

中文标题/摘要

标题：VIRAASAT：探索印度文化推理的新路径

大型语言模型（LLMs）在数学和编程等各个领域的推理任务中取得了显著进展。然而，在需要丰富社会文化知识和多样地方背景的任务中，其性能会下降，尤其是在涉及印度文化的情况下。现有的文化基准是（i）手工制作的，（ii）包含单一跳的问题测试事实回忆，（iii）难以扩展，导致这一缺陷未被充分测量。为了解决这一问题，我们引入了VIRAASAT，这是一种新颖的半自动化多跳方法，用于生成针对印度文化的特定多跳问答数据集。VIRAASAT利用了一个包含700多个专家整理的文化艺术品的知识图谱，涵盖了13个关键的文化属性（历史、节日等）。VIRAASAT覆盖了所有28个邦和8个联邦领土，产生了超过3200个需要链式文化推理的多跳问题。我们对当前最先进的（SOTA）LLMs在VIRAASAT上的评估发现，推理中的关键限制在于即使在链式思维（CoT）跟踪微调上也难以扎根和综合低概率事实。为解决这一差距，我们提出了一种新的框架，名为符号链式操作（SCoM）。通过适应链式操作范式，我们训练模型在内部模拟原子知识图谱操作。SCoM教会模型可靠地遍历图的拓扑结构。在监督微调（SFT）实验中，SCoM的表现优于标准CoT基线，最高可提高20%。我们发布了VIRAASAT数据集以及我们的发现，为构建文化意识推理模型奠定了坚实的基础。

Summary / 总结

The paper introduces VIRAASAT, a novel dataset for Indian cultural reasoning that addresses the limitations of existing cultural benchmarks. VIRAASAT uses a semi-automated multi-hop approach and a knowledge graph with over 700 expert-curated cultural artifacts to generate more than 3,200 multi-hop questions. Evaluations show that current SOTA LLMs struggle with reasoning tasks requiring chained cultural knowledge. To improve, the authors propose SCoM, a framework that trains models to simulate Knowledge Graph manipulations, leading to up to 20% better performance in supervised fine-tuning experiments compared to standard Chain-of-Thought methods.

论文介绍了VIRAASAT，这是一种新型的多跳数据集，用于印度文化推理，解决了现有基准在处理社会文化知识方面的局限性。VIRAASAT 使用包含超过 700 个专家整理的文化艺术品的知识图谱，生成了超过 3,200 个多跳问题。评估结果显示，当前的 SOTA 大型语言模型在推理任务中表现不佳，而提出的符号链操作（SCoM）框架通过内部图操作模拟提高了性能，最高可达 20%。数据集和研究成果已发布，以促进文化意识推理模型的发展。

RVR: Retrieve-Verify-Retrieve for Comprehensive Question Answering

Authors: Deniz Qian, Hung-Ting Chen, Eunsol Choi

First: 2026-02-20T18:48:05+00:00 · Latest: 2026-02-20T18:48:05+00:00

Comments: 18 pages, 12 figures, 12 tables

Abs · PDF · Code1 · Code2

Abstract

Comprehensively retrieving diverse documents is crucial to address queries that admit a wide range of valid answers. We introduce retrieve-verify-retrieve (RVR), a multi-round retrieval framework designed to maximize answer coverage. Initially, a retriever takes the original query and returns a candidate document set, followed by a verifier that identifies a high-quality subset. For subsequent rounds, the query is augmented with previously verified documents to uncover answers that are not yet covered in previous rounds. RVR is effective even with off-the-shelf retrievers, and fine-tuning retrievers for our inference procedure brings further gains. Our method outperforms baselines, including agentic search approaches, achieving at least 10% relative and 3% absolute gain in complete recall percentage on a multi-answer retrieval dataset (QAMPARI). We also see consistent gains on two out-of-domain datasets (QUEST and WebQuestionsSP) across different base retrievers. Our work presents a promising iterative approach for comprehensive answer recall leveraging a verifier and adapting retrievers to a new inference scenario.

中文标题/摘要

标题：RVR：检索-验证-检索以实现全面问题回答

全面检索多样化的文档对于处理可以有广泛有效答案的问题至关重要。我们提出了检索-验证-检索（RVR），这是一种多轮检索框架，旨在最大化答案覆盖范围。首先，检索器接收原始查询并返回候选文档集，随后验证器识别高质量的子集。在后续轮次中，查询将被之前验证过的文档增强，以发现尚未在前几轮中覆盖的答案。即使使用现成的检索器，RVR 也有效，对检索器进行微调以适应我们的推理过程还能带来进一步的收益。我们的方法优于基线方法，包括代理搜索方法，在多答案检索数据集（QAMPARI）上实现了至少10%的相对和3%的绝对增益。我们在两个跨域数据集（QUEST和WebQuestionsSP）上也看到了一致的收益，不同基础检索器下表现不同。我们的工作展示了利用验证器和适应检索器以实现全面答案召回的有前景的迭代方法。

Summary / 总结

The research introduces RVR, a multi-round retrieval framework to enhance answer coverage for diverse queries. It consists of a retriever and a verifier, with the query being updated in each round to uncover new answers. RVR outperforms baselines, achieving at least 10% relative and 3% absolute gain in complete recall on QAMPARI, and shows consistent gains on QUEST and WebQuestionsSP datasets across different base retrievers.

研究提出了RVR，这是一种多轮检索框架，旨在提高对多样化查询的答案覆盖率。该框架包括检索器和验证器，每次检索后更新查询以发现新的答案。RVR在QAMPARI数据集上优于基线方法，实现了至少10%的相对和3%的绝对完整召回率提升，并在QUEST和WebQuestionsSP数据集上也表现出一致的提升，适用于不同的基础检索器。

CapNav: Benchmarking Vision Language Models on Capability-conditioned Indoor Navigation

Authors: Xia Su, Ruiqi Chen, Benlin Liu, Jingwei Ma, Zonglin Di, Ranjay Krishna, Jon Froehlich

First: 2026-02-20T18:46:27+00:00 · Latest: 2026-02-20T18:46:27+00:00

Abs · PDF · Code1 · Code2 · Code3

Abstract

Vision-Language Models (VLMs) have shown remarkable progress in Vision-Language Navigation (VLN), offering new possibilities for navigation decision-making that could benefit both robotic platforms and human users. However, real-world navigation is inherently conditioned by the agent's mobility constraints. For example, a sweeping robot cannot traverse stairs, while a quadruped can. We introduce Capability-Conditioned Navigation (CapNav), a benchmark designed to evaluate how well VLMs can navigate complex indoor spaces given an agent's specific physical and operational capabilities. CapNav defines five representative human and robot agents, each described with physical dimensions, mobility capabilities, and environmental interaction abilities. CapNav provides 45 real-world indoor scenes, 473 navigation tasks, and 2365 QA pairs to test if VLMs can traverse indoor environments based on agent capabilities. We evaluate 13 modern VLMs and find that current VLM's navigation performance drops sharply as mobility constraints tighten, and that even state-of-the-art models struggle with obstacle types that require reasoning on spatial dimensions. We conclude by discussing the implications for capability-aware navigation and the opportunities for advancing embodied spatial reasoning in future VLMs. The benchmark is available at https://github.com/makeabilitylab/CapNav

中文标题/摘要

标题：CapNav：基于能力条件的室内导航视觉语言模型基准测试

视觉语言模型（VLMs）在视觉语言导航（VLN）方面取得了显著进展，为导航决策提供了新的可能性，这不仅对机器人平台，也对人类用户有益。然而，现实世界的导航本质上受到代理移动约束的条件限制。例如，清洁机器人无法穿越楼梯，而四足机器人可以。我们引入了基于能力条件的导航（CapNav），这是一种基准测试，旨在评估VLMs在给定代理特定物理和操作能力的情况下，如何导航复杂的室内空间。CapNav 定义了五种代表性的人类和机器人代理，每种代理都描述了其物理尺寸、移动能力和环境交互能力。CapNav 提供了 45 个真实世界的室内场景、473 项导航任务和 2365 对问答对，以测试 VLMs 是否可以根据代理能力穿越室内环境。我们评估了 13 种现代 VLMs，发现当前 VLM 的导航性能随着移动约束的收紧而急剧下降，即使是最先进的模型也难以应对需要在空间维度上进行推理的障碍类型。最后，我们讨论了能力感知导航的含义以及未来 VLMs 中增强实体空间推理的机会。基准测试可在 https://github.com/makeabilitylab/CapNav 获取

Summary / 总结

The study introduces CapNav, a benchmark to evaluate Vision-Language Models (VLMs) on indoor navigation tasks considering the physical capabilities of agents. It defines five human and robot agents with specific mobility and interaction abilities and provides 45 real-world indoor scenes and 473 navigation tasks. The evaluation of 13 modern VLMs shows that their performance significantly decreases with tighter mobility constraints, and even state-of-the-art models struggle with spatial reasoning for certain obstacles. This highlights the need for capability-aware navigation in VLMs.

研究旨在评估视觉语言模型（VLMs）在基于代理物理和操作约束条件下的室内导航能力。研究引入了CapNav基准，包含五种人类和机器人代理、45个真实世界的室内场景和473项导航任务，以评估VLMs的表现。主要发现表明，当前的VLMs在应对移动限制和空间推理方面存在困难，特别是在需要复杂空间理解的障碍物方面，这表明需要在VLMs中增强实体空间推理能力。

SPQ: An Ensemble Technique for Large Language Model Compression

Authors: Jiamin Yao, Eren Gultepe

First: 2026-02-20T18:44:16+00:00 · Latest: 2026-02-20T18:44:16+00:00

Comments: Accepted to LREC 2026 Main Conference

Abs · PDF · Code1 · Code2 · Code3

Abstract

This study presents an ensemble technique, SPQ (SVD-Pruning-Quantization), for large language model (LLM) compression that combines variance-retained singular value decomposition (SVD), activation-based pruning, and post-training linear quantization. Each component targets a different source of inefficiency: i) pruning removes redundant neurons in MLP layers, ii) SVD reduces attention projections into compact low-rank factors, iii) and 8-bit quantization uniformly compresses all linear layers. At matched compression ratios, SPQ outperforms individual methods (SVD-only, pruning-only, or quantization-only) in perplexity, demonstrating the benefit of combining complementary techniques. Applied to LLaMA-2-7B, SPQ achieves up to 75% memory reduction while maintaining or improving perplexity (e.g., WikiText-2 5.47 to 4.91) and preserving accuracy on downstream benchmarks such as C4, TruthfulQA, and GSM8K. Compared to strong baselines like GPTQ and SparseGPT, SPQ offers competitive perplexity and accuracy while using less memory (6.86 GB vs. 7.16 GB for GPTQ). Moreover, SPQ improves inference throughput over GPTQ, achieving up to a 1.9x speedup, which further enhances its practicality for real-world deployment. The effectiveness of SPQ's robust compression through layer-aware and complementary compression techniques may provide practical deployment of LLMs in memory-constrained environments. Code is available at: https://github.com/JiaminYao/SPQ_LLM_Compression/

中文标题/摘要

标题：SPQ：一种大型语言模型压缩的集成技术

本研究提出了一种集成技术SPQ（SVD-Pruning-Quantization），用于大型语言模型（LLM）压缩，结合了保留方差的奇异值分解（SVD）、基于激活的剪枝和后训练线性量化。每个组件针对不同的效率低下来源：i) 剪枝移除MLP层中的冗余神经元，ii) SVD将注意力投影减少为紧凑的低秩因子，iii) 8位量化均匀压缩所有线性层。在匹配的压缩比下，SPQ在困惑度上优于单独的方法（仅SVD、仅剪枝或仅量化），证明了结合互补技术的好处。应用于LLaMA-2-7B时，SPQ可实现高达75%的内存减少，同时保持或提高困惑度（例如，从WikiText-2的5.47降至4.91），并在C4、TruthfulQA和GSM8K等下游基准测试中保持或提高了准确性。与强大的基线如GPTQ和SparseGPT相比，SPQ在使用更少内存（6.86 GB vs. GPTQ的7.16 GB）的同时提供竞争力的困惑度和准确性。此外，SPQ在推理吞吐量上优于GPTQ，最高可实现1.9倍的加速，这进一步增强了其在实际部署中的实用性。SPQ通过层感知和互补压缩技术实现的稳健压缩效果可能为内存受限环境中大型语言模型的实际部署提供实用途径。代码可在：https://github.com/JiaminYao/SPQ_LLM_Compression/ 获取。

Summary / 总结

This study introduces SPQ (SVD-Pruning-Quantization), an ensemble technique for compressing large language models (LLMs) that combines variance-retained singular value decomposition (SVD), activation-based pruning, and post-training linear quantization. The technique outperforms individual methods in perplexity at matched compression ratios, achieving up to 75% memory reduction while maintaining or improving perplexity and accuracy on downstream benchmarks. SPQ also offers competitive perplexity and accuracy with less memory usage compared to strong baselines like GPTQ and SparseGPT, and it improves inference throughput by up to 1.9x, enhancing practical deployment in memory-constrained environments.

该研究提出了一种组合技术SPQ（SVD-Pruning-Quantization），用于压缩大型语言模型（LLM），该技术结合了保留方差的奇异值分解、基于激活的剪枝和后训练线性量化。SPQ在困惑度上优于单一方法，并且在保持或提高困惑度和下游基准上的准确性的前提下实现了高达75%的内存减少。与GPTQ和SparseGPT等强基线相比，SPQ在性能上具有竞争力，使用更少的内存，并且提高了推理吞吐量，使其在内存受限的环境中具有实际部署的可行性。

Wink: Recovering from Misbehaviors in Coding Agents

Authors: Rahul Nanda, Chandra Maddila, Smriti Jha, Euna Mehnaz Khan, Matteo Paltenghi, Satish Chandra

First: 2026-02-19T03:15:00+00:00 · Latest: 2026-02-20T18:13:08+00:00

Abs · PDF · Code1 · Code2

Abstract

Autonomous coding agents, powered by large language models (LLMs), are increasingly being adopted in the software industry to automate complex engineering tasks. However, these agents are prone to a wide range of misbehaviors, such as deviating from the user's instructions, getting stuck in repetitive loops, or failing to use tools correctly. These failures disrupt the development workflow and often require resource-intensive manual intervention. In this paper, we present a system for automatically recovering from agentic misbehaviors at scale. We first introduce a taxonomy of misbehaviors grounded in an analysis of production traffic, identifying three primary categories: Specification Drift, Reasoning Problems, and Tool Call Failures, which we find occur in about 30% of all agent trajectories. To address these issues, we developed a lightweight, asynchronous self-intervention system named Wink. Wink observes agent trajectories and provides targeted course-correction guidance to nudge the agent back to a productive path. We evaluated our system on over 10,000 real world agent trajectories and found that it successfully resolves 90% of the misbehaviors that require a single intervention. Furthermore, a live A/B test in our production environment demonstrated that our system leads to a statistically significant reduction in Tool Call Failures, Tokens per Session and Engineer Interventions per Session. We present our experience designing and deploying this system, offering insights into the challenges of building resilient agentic systems at scale.

中文标题/摘要

标题：眨眼：从编码代理的不当行为中恢复

由大规模语言模型（LLMs）驱动的自主编码代理在软件行业中越来越被采用，以自动化复杂的工程任务。然而，这些代理容易出现各种不当行为，如偏离用户的指令、陷入重复循环或无法正确使用工具。这些失败会扰乱开发流程，通常需要耗费大量资源的手动干预。在本文中，我们提出了一种大规模自动从代理不当行为中恢复的系统。我们首先根据生产流量分析引入了一种不当行为分类法，确定了三大主要类别：规范漂移、推理问题和工具调用失败，我们发现这三大类占所有代理轨迹的约30%。为解决这些问题，我们开发了一种轻量级的异步自我干预系统，名为眨眼。眨眼观察代理轨迹，并提供有针对性的纠正指导，引导代理回到生产性路径。我们在超过10,000个实际代理轨迹上评估了我们的系统，发现它成功解决了90%需要单次干预的不当行为。此外，在我们生产环境中的实时A/B测试表明，我们的系统在统计上显著减少了工具调用失败、每会话令牌数和工程师干预次数。我们介绍了设计和部署此系统的经验，提供了构建大规模稳健代理系统的挑战见解。

Summary / 总结

This paper addresses the challenges of misbehaviors in autonomous coding agents powered by large language models, which can disrupt development workflows. It introduces a system called Wink that automatically recovers from misbehaviors by observing agent trajectories and providing targeted guidance. The system successfully resolves 90% of single-intervention required misbehaviors and reduces Tool Call Failures, Tokens per Session, and Engineer Interventions per Session in live testing.

本文探讨了大型语言模型驱动的自主编码代理出现的各种问题，这些问题会扰乱开发流程。该文提出了一种名为Wink的系统，通过观察代理轨迹并提供有针对性的指导来自动恢复。该系统在单次干预所需的问题中成功解决了90%，并在现场测试中减少了工具调用失败、每会话令牌数和工程师干预次数。

xLSTM Scaling Laws: Competitive Performance with Linear Time-Complexity

Authors: Maximilian Beck, Kajetan Schweighofer, Sebastian Böck, Sebastian Lehner, Sepp Hochreiter

Venue: ICLR 2026

First: 2025-10-02T17:14:34+00:00 · Latest: 2026-02-20T18:12:16+00:00

Comments: Accepted at ICLR 2026. Code and data available at https://github.com/NX-AI/xlstm_scaling_laws

Abs · PDF · Code1 · Code2 · Code3

Abstract

Scaling laws play a central role in the success of Large Language Models (LLMs), enabling the prediction of model performance relative to compute budgets prior to training. While Transformers have been the dominant architecture, recent alternatives such as xLSTM offer linear complexity with respect to context length while remaining competitive in the billion-parameter regime. We conduct a comparative investigation on the scaling behavior of Transformers and xLSTM along the following lines, providing insights to guide future model design and deployment. First, we study the scaling behavior for xLSTM in compute-optimal and over-training regimes using both IsoFLOP and parametric fit approaches on a wide range of model sizes (80M-7B) and number of training tokens (2B-2T). Second, we examine the dependence of optimal model sizes on context length, a pivotal aspect that was largely ignored in previous work. Finally, we analyze inference-time scaling characteristics. Our findings reveal that in typical LLM training and inference scenarios, xLSTM scales favorably compared to Transformers. Notably, xLSTM models consistently Pareto-dominate Transformer models, delivering lower cross-entropy loss for the same compute budget.

中文标题/摘要

标题：xLSTM 规模法则：与线性时间复杂度下的竞争力表现

规模法则在大型语言模型（LLMs）的成功中起着核心作用，使人们能够在训练前根据计算预算预测模型性能。尽管变压器一直是主导架构，但最近的替代方案xLSTM在上下文长度方面具有线性复杂度，同时在十亿参数范围内保持竞争力。我们对变压器和xLSTM的缩放行为进行了比较研究，提供了指导未来模型设计和部署的见解。首先，我们使用等FLOP和参数拟合方法，研究了xLSTM在计算最优和过训练区域的缩放行为，涵盖了从80M到7B的不同模型大小和从2B到2T的不同训练令牌数量。其次，我们检查了最优模型大小对上下文长度的依赖性，这是之前工作中被忽视的关键方面。最后，我们分析了推理时间的缩放特性。我们的发现表明，在典型的LLM训练和推理场景中，xLSTM相对于变压器具有更优的缩放表现。值得注意的是，xLSTM模型在相同的计算预算下始终帕累托占优，提供更低的交叉熵损失。

Summary / 总结

The study investigates the scaling behavior of xLSTM and Transformers, focusing on compute-optimal and over-training regimes, and varying model sizes and training tokens. It finds that xLSTM models outperform Transformer models in terms of cross-entropy loss for the same compute budget, particularly in typical LLM training and inference scenarios. The research also highlights the importance of context length in determining optimal model sizes.

研究考察了xLSTM和Transformer在不同模型大小和训练令牌数量下的扩展行为，重点关注计算最优和过训练区间。研究发现，xLSTM模型在相同的计算预算下，交叉熵损失低于Transformer模型，展示了在典型的大语言模型训练和推理场景中的有利扩展特性。

Exploiting Completeness Perception with Diffusion Transformer for Unified 3D MRI Synthesis

Authors: Junkai Liu, Nay Aung, Theodoros N. Arvanitis, Joao A. C. Lima, Steffen E. Petersen, Daniel C. Alexander, Le Zhang

First: 2026-02-20T18:05:39+00:00 · Latest: 2026-02-20T18:05:39+00:00

Abs · PDF · Code1 · Code2 · Code3

Abstract

Missing data problems, such as missing modalities in multi-modal brain MRI and missing slices in cardiac MRI, pose significant challenges in clinical practice. Existing methods rely on external guidance to supply detailed missing state for instructing generative models to synthesize missing MRIs. However, manual indicators are not always available or reliable in real-world scenarios due to the unpredictable nature of clinical environments. Moreover, these explicit masks are not informative enough to provide guidance for improving semantic consistency. In this work, we argue that generative models should infer and recognize missing states in a self-perceptive manner, enabling them to better capture subtle anatomical and pathological variations. Towards this goal, we propose CoPeDiT, a general-purpose latent diffusion model equipped with completeness perception for unified synthesis of 3D MRIs. Specifically, we incorporate dedicated pretext tasks into our tokenizer, CoPeVAE, empowering it to learn completeness-aware discriminative prompts, and design MDiT3D, a specialized diffusion transformer architecture for 3D MRI synthesis, that effectively uses the learned prompts as guidance to enhance semantic consistency in 3D space. Comprehensive evaluations on three large-scale MRI datasets demonstrate that CoPeDiT significantly outperforms state-of-the-art methods, achieving superior robustness, generalizability, and flexibility. The code is available at https://github.com/JK-Liu7/CoPeDiT .

中文标题/摘要

标题：利用完整性感知的扩散变换器进行统一的3D MRI合成

多模态脑MRI中的缺失模态和心脏MRI中的缺失切片等问题，给临床实践带来了重大挑战。现有方法依赖外部指导来提供详细的缺失状态，以指导生成模型合成缺失的MRI。然而，在实际临床环境中，由于环境的不可预测性，手动指示可能不可用或不可靠。此外，这些显式的掩码不足以提供提高语义一致性的指导。在本文中，我们认为生成模型应该以自我感知的方式推断和识别缺失状态，从而更好地捕捉细微的解剖和病理变化。为了实现这一目标，我们提出了CoPeDiT，这是一种通用的潜在扩散模型，配备了完整性感知能力，用于统一合成3D MRI。具体而言，我们将在我们的分词器CoPeVAE中引入专门的预训练任务，使其能够学习完整性感知的判别提示，并设计MDiT3D，这是一种专门的3D MRI合成扩散变换器架构，能够有效利用学习到的提示作为指导，增强3D空间中的语义一致性。在三个大规模MRI数据集上的全面评估表明，CoPeDiT显著优于现有最先进的方法，实现了更高的鲁棒性、通用性和灵活性。代码可在https://github.com/JK-Liu7/CoPeDiT 获取。

Summary / 总结

This paper addresses the challenge of missing data in MRI scans, particularly in multi-modal brain MRI and cardiac MRI, by proposing CoPeDiT, a latent diffusion model that incorporates completeness perception. CoPeDiT uses a tokenizer with dedicated pretext tasks to learn completeness-aware prompts and a specialized diffusion transformer architecture, MDiT3D, for 3D MRI synthesis. Experimental results on three large-scale MRI datasets show that CoPeDiT outperforms existing methods in terms of robustness, generalizability, and flexibility.

该论文解决了医学影像中缺失数据的问题，特别是在多模态脑MRI和心脏MRI中。它提出了CoPeDiT，一种结合了完整性感知的潜扩散模型，可以在无需外部指导的情况下合成3D MRI。该模型使用预训练任务和专门的扩散变压器架构来提高语义一致性。实验结果表明，CoPeDiT在鲁棒性、通用性和灵活性方面优于现有方法。

ConformalNL2LTL: Translating Natural Language Instructions into Temporal Logic Formulas with Conformal Correctness Guarantees

Authors: David Smith Sundarsingh, Jun Wang, Jyotirmoy V. Deshmukh, Yiannis Kantaros

First: 2025-04-22T20:32:34+00:00 · Latest: 2026-02-20T17:50:01+00:00

Abs · PDF · Code1 · Code2

Abstract

Linear Temporal Logic (LTL) is a widely used task specification language for autonomous systems. To mitigate the significant manual effort and expertise required to define LTL-encoded tasks, several methods have been proposed for translating Natural Language (NL) instructions into LTL formulas, which, however, lack correctness guarantees. To address this, we propose a new NL-to-LTL translation method, called ConformalNL2LTL that achieves user-defined translation success rates on unseen NL commands. Our method constructs LTL formulas iteratively by solving a sequence of open-vocabulary question-answering (QA) problems using large language models (LLMs). These QA tasks are handled collaboratively by a primary and an auxiliary model. The primary model answers each QA instance while quantifying uncertainty via conformal prediction; when it is insufficiently certain according to user-defined confidence thresholds, it requests assistance from the auxiliary model and, if necessary, from the user. We demonstrate theoretically and empirically that ConformalNL2LTL achieves the desired translation accuracy while minimizing user intervention.

中文标题/摘要

标题：ConformalNL2LTL：使用符合正确性保证的自然语言指令翻译成时间逻辑公式的方法

线性时间逻辑（LTL）是一种广泛用于自主系统任务规范的语言。为了减轻定义LTL编码任务所需的大量手动努力和专业知识，已经提出了几种将自然语言（NL）指令翻译成LTL公式的办法，但这些方法缺乏正确性保证。为了解决这个问题，我们提出了一种新的NL到LTL翻译方法，称为ConformalNL2LTL，该方法可以在未见过的NL命令上实现用户定义的翻译成功率。我们的方法通过使用大型语言模型（LLMs）迭代构建LTL公式，解决一系列开放词汇的问答（QA）问题。这些QA任务由一个主模型和一个辅助模型协作处理。主模型回答每个QA实例并量化不确定性通过符合性预测；当其不确定性根据用户定义的信心阈值不足时，它会请求辅助模型的帮助，必要时还会请求用户帮助。我们从理论上和实验上证明，ConformalNL2LTL可以实现所需的翻译准确性，同时将用户干预降至最低。

Summary / 总结

The research aims to address the challenge of translating natural language instructions into Linear Temporal Logic (LTL) formulas with correctness guarantees, which are essential for autonomous systems. The proposed ConformalNL2LTL method uses large language models to iteratively construct LTL formulas by solving open-vocabulary QA problems. It collaborates between a primary and an auxiliary model, with the primary model quantifying uncertainty through conformal prediction and seeking user assistance when necessary. Experiments show that ConformalNL2LTL achieves the desired translation accuracy with minimal user intervention.

论文提出了ConformalNL2LTL方法，用于将自然语言指令转换为线性时序逻辑（LTL）公式，并提供用户定义的正确性保证。该方法使用大型语言模型逐步解决开放词汇的问答任务，主要模型提供答案并量化不确定性，必要时由辅助模型或用户介入。该方法确保了所需的转换准确性并减少了用户干预。

Adaptive GR(1) Specification Repair for Liveness-Preserving Shielding in Reinforcement Learning

Authors: Tiberiu-Andrei Georgescu, Alexander W. Goodall, Dalal Alrajeh, Francesco Belardinelli, Sebastian Uchitel

First: 2025-11-04T14:27:28+00:00 · Latest: 2026-02-20T17:44:58+00:00

Abs · PDF · Code1 · Code2

Abstract

Shielding is widely used to enforce safety in reinforcement learning (RL), ensuring that an agent's actions remain compliant with formal specifications. Classical shielding approaches, however, are often static, in the sense that they assume fixed logical specifications and hand-crafted abstractions. While these static shields provide safety under nominal assumptions, they fail to adapt when environment assumptions are violated. In this paper, we develop an adaptive shielding framework based on based on Generalized Reactivity of rank 1 (GR(1)) specifications, a tractable and expressive fragment of Linear Temporal Logic (LTL) that captures both safety and liveness properties. Our method detects environment assumption violations at runtime and employs Inductive Logic Programming (ILP) to automatically repair GR(1) specifications online, in a systematic and interpretable way. This ensures that the shield evolves gracefully, ensuring liveness is achievable and minimally weakening goals only when necessary. We consider two case studies: Minepump and Atari Seaquest; showing that (i) static symbolic controllers are often severely suboptimal when optimizing for auxiliary rewards, and (ii) RL agents equipped with our adaptive shield maintain near-optimal reward and perfect logical compliance compared with static shields.

中文标题/摘要

标题：自适应GR(1)规范修复以保持活锁保护的强化学习中防护

防护在强化学习（RL）中广泛用于确保代理的行为符合正式规范。经典的防护方法通常是静态的，即它们假设固定的逻辑规范和手工构建的抽象。虽然这些静态防护在名义假设下提供了安全性，但在环境假设被违反时却无法适应。在本文中，我们基于广义反应性等级1（GR(1)）规范开发了一个自适应防护框架，这是一种线性时序逻辑（LTL）的可处理且表达力强的片段，能够捕捉安全性和活锁属性。我们的方法在运行时检测环境假设的违反，并使用归纳逻辑编程（ILP）在线自动修复GR(1)规范，以系统且可解释的方式。这确保了防护能够优雅地演变，仅在必要时最小削弱目标，以确保活锁的实现。我们考虑了两个案例研究：Minepump和Atari Seaquest；展示了（i）在优化辅助奖励时，静态符号控制器通常严重次优，以及（ii）配备我们自适应防护的RL代理与静态防护相比，保持了接近最优的奖励和完美的逻辑合规性。

Summary / 总结

This paper addresses the limitations of static shielding in reinforcement learning by developing an adaptive framework based on GR(1) specifications. The method detects runtime violations and uses Inductive Logic Programming to repair specifications, ensuring liveness and minimal weakening of goals. Experiments on Minepump and Atari Seaquest demonstrate that adaptive shields maintain near-optimal rewards and perfect logical compliance, outperforming static shields in auxiliary reward optimization scenarios.

本文针对强化学习中静态屏蔽的局限性，开发了一个基于GR(1)规范的自适应框架。该方法在运行时检测违规行为，并使用归纳逻辑编程自动修复规范，确保保持活性并仅在必要时最小削弱目标。实验结果表明，自适应屏蔽在辅助奖励优化场景中能保持接近最优的奖励和完美的逻辑合规性，优于静态屏蔽。

Zero-shot Interactive Perception

Authors: Venkatesh Sripada, Frank Guerin, Amir Ghalamzan

First: 2026-02-20T17:30:25+00:00 · Latest: 2026-02-20T17:30:25+00:00

Comments: Original manuscript submitted on April 24, 2025. Timestamped and publicly available on OpenReview: https://openreview.net/forum?id=7MhpFcr5Nx

Abs · PDF · Code1 · Code2

Abstract

Interactive perception (IP) enables robots to extract hidden information in their workspace and execute manipulation plans by physically interacting with objects and altering the state of the environment -- crucial for resolving occlusions and ambiguity in complex, partially observable scenarios. We present Zero-Shot IP (ZS-IP), a novel framework that couples multi-strategy manipulation (pushing and grasping) with a memory-driven Vision Language Model (VLM) to guide robotic interactions and resolve semantic queries. ZS-IP integrates three key components: (1) an Enhanced Observation (EO) module that augments the VLM's visual perception with both conventional keypoints and our proposed pushlines -- a novel 2D visual augmentation tailored to pushing actions, (2) a memory-guided action module that reinforces semantic reasoning through context lookup, and (3) a robotic controller that executes pushing, pulling, or grasping based on VLM output. Unlike grid-based augmentations optimized for pick-and-place, pushlines capture affordances for contact-rich actions, substantially improving pushing performance. We evaluate ZS-IP on a 7-DOF Franka Panda arm across diverse scenes with varying occlusions and task complexities. Our experiments demonstrate that ZS-IP outperforms passive and viewpoint-based perception techniques such as Mark-Based Visual Prompting (MOKA), particularly in pushing tasks, while preserving the integrity of non-target elements.

中文标题/摘要

标题：零样本交互感知

交互感知（IP）使机器人能够在其工作空间中提取隐藏信息并通过物理交互物体和改变环境状态来执行操作计划——这对于解决复杂、部分可观测场景中的遮挡和模糊至关重要。我们提出了零样本IP（ZS-IP），这是一种新颖的框架，将多策略操作（推和抓取）与基于记忆的视觉语言模型（VLM）结合，以指导机器人交互并解决语义查询。ZS-IP 结合了三个关键组件：（1）增强观察（EO）模块，该模块通过结合传统的关键点和我们提出的推线——一种针对推操作定制的新型2D视觉增强，来增强VLM的视觉感知，（2）基于记忆的动作模块，该模块通过上下文查找强化语义推理，以及（3）一个基于VLM输出执行推、拉或抓取的机器人控制器。与针对拾取和放置优化的基于网格的增强不同，推线捕捉接触丰富的操作的利用机会，显著提高了推操作的性能。我们在具有不同遮挡和任务复杂度的7-DOF Franka Panda手臂上评估了ZS-IP。我们的实验表明，ZS-IP 在推操作任务中优于被动和视角基于的感知技术，如基于标记的视觉提示（MOKA），同时保持非目标元素的完整性。

Summary / 总结

Zero-Shot Interactive Perception (ZS-IP) is a framework that combines multi-strategy manipulation with a memory-driven Vision Language Model to resolve occlusions and ambiguities in complex scenarios. It includes an Enhanced Observation module that enhances visual perception with pushlines, a memory-guided action module for semantic reasoning, and a robotic controller for executing actions. ZS-IP outperforms passive and viewpoint-based perception techniques, especially in pushing tasks, while maintaining the integrity of non-target elements.

研究旨在开发零样本交互感知（ZS-IP），使机器人能够与物体互动并解决复杂场景中的遮挡问题。ZS-IP 结合了基于记忆的视觉语言模型和多策略操作（推和抓），以及一个增强观察模块，该模块使用推线以提高推操作的性能。实验表明，ZS-IP 在推操作中优于被动和视点基的感知技术如 MOKA，同时保持非目标元素的完整性。

"How Do I ...?": Procedural Questions Predominate Student-LLM Chatbot Conversations

Authors: Alexandra Neagu, Marcus Messer, Peter Johnson, Rhodri Nelson

First: 2026-02-20T17:27:41+00:00 · Latest: 2026-02-20T17:27:41+00:00

Comments: 14 pages, 2 figures

Abs · PDF · Code1 · Code2

Abstract

Providing scaffolding through educational chatbots built on Large Language Models (LLM) has potential risks and benefits that remain an open area of research. When students navigate impasses, they ask for help by formulating impasse-driven questions. Within interactions with LLM chatbots, such questions shape the user prompts and drive the pedagogical effectiveness of the chatbot's response. This paper focuses on such student questions from two datasets of distinct learning contexts: formative self-study, and summative assessed coursework. We analysed 6,113 messages from both learning contexts, using 11 different LLMs and three human raters to classify student questions using four existing schemas. On the feasibility of using LLMs as raters, results showed moderate-to-good inter-rater reliability, with higher consistency than human raters. The data showed that 'procedural' questions predominated in both learning contexts, but more so when students prepare for summative assessment. These results provide a basis on which to use LLMs for classification of student questions. However, we identify clear limitations in both the ability to classify with schemas and the value of doing so: schemas are limited and thus struggle to accommodate the semantic richness of composite prompts, offering only partial understanding the wider risks and benefits of chatbot integration. In the future, we recommend an analysis approach that captures the nuanced, multi-turn nature of conversation, for example, by applying methods from conversation analysis in discursive psychology.

中文标题/摘要

标题："如何...?": 学生对LLM聊天机器人的问题以程序性问题为主导

基于大型语言模型（LLM）构建的教育聊天机器人提供了支架，但其潜在风险和益处仍是一个开放的研究领域。当学生遇到瓶颈时，他们会通过提出瓶颈驱动的问题寻求帮助。在与LLM聊天机器人的互动中，这些问题塑造了用户的提示并推动了聊天机器人回应的教育有效性。本文专注于来自两种不同学习背景的数据集中的学生问题：形成性自我学习和总结性评估作业。我们分析了来自两个学习背景的6,113条消息，使用11种不同的LLM和三名人类评估者对学生的提问进行了分类，使用了四种现有的分类方案。关于使用LLM作为评估者的技术可行性，结果显示评估者间的一致性较好，高于人类评估者。数据显示，在两种学习背景下，程序性问题占主导地位，但在学生准备总结性评估时更为明显。这些结果为使用LLM对学生的提问进行分类提供了基础。然而，我们指出了使用分类方案进行分类的局限性及其价值：分类方案有限，难以容纳复合提示的语义丰富性，仅提供对聊天机器人整合的更广泛风险和益处的有限理解。未来，我们建议采用一种捕捉对话多轮次细微差别的分析方法，例如，通过话语心理学中的对话分析方法。

Summary / 总结

This paper examines student questions in interactions with LLM chatbots in formative self-study and summative assessed coursework, focusing on the predominance of procedural questions. Using 11 LLMs and three human raters, the study found moderate-to-good inter-rater reliability and that procedural questions were more common when preparing for summative assessment. However, existing schemas struggle to capture the semantic richness of composite prompts, limiting their utility. The research suggests a need for more nuanced analysis methods, such as those from conversation analysis in discursive psychology, to better understand the risks and benefits of chatbot integration in education.

该研究分析了学生与大型语言模型（LLM）聊天机器人在形成性自学和总结性评估作业中的互动，重点关注程序性问题的主导地位。研究使用11个LLM和三个人工评分者，发现人工评分者之间的一致性较好，并且在准备总结性评估时程序性问题更为常见。然而，现有的分类方案难以捕捉复合提示的语义丰富性，限制了其应用价值。研究建议未来应采用更细致的分析方法，例如话语心理学中的对话分析方法，以更好地理解聊天机器人在教育中的风险和益处。

Quantum Maximum Likelihood Prediction via Hilbert Space Embeddings

Authors: Sreejith Sreekumar, Nir Weinberger

First: 2026-02-20T17:16:38+00:00 · Latest: 2026-02-20T17:16:38+00:00

Comments: 32+4 pages, 1 figure

Abs · PDF · Code1 · Code2

Abstract

Recent works have proposed various explanations for the ability of modern large language models (LLMs) to perform in-context prediction. We propose an alternative conceptual viewpoint from an information-geometric and statistical perspective. Motivated by Bach[2023], we model training as learning an embedding of probability distributions into the space of quantum density operators, and in-context learning as maximum-likelihood prediction over a specified class of quantum models. We provide an interpretation of this predictor in terms of quantum reverse information projection and quantum Pythagorean theorem when the class of quantum models is sufficiently expressive. We further derive non-asymptotic performance guarantees in terms of convergence rates and concentration inequalities, both in trace norm and quantum relative entropy. Our approach provides a unified framework to handle both classical and quantum LLMs.

中文标题/摘要

标题：量子最大似然预测通过希尔伯特空间嵌入

近期的研究提出了各种解释现代大型语言模型（LLMs）进行上下文内预测的能力。我们从信息几何和统计的角度提出了一种替代的概念视角。受Bach[2023]的启发，我们将训练视为将概率分布嵌入到量子密度算子的空间中学习，而上下文内学习则是在指定的量子模型类中进行最大似然预测。我们通过量子反向信息投影和量子毕达哥拉斯定理对这种预测器进行了解释，当量子模型类足够表达能力强时。我们进一步在迹范数和量子相对熵中推导了非渐近性能保证，包括收敛速率和集中不等式。我们的方法提供了一个统一的框架来处理经典和量子LLMs。

Summary / 总结

This paper proposes a new perspective on the ability of large language models to perform in-context prediction by modeling training as learning an embedding of probability distributions into the space of quantum density operators. The method involves maximum-likelihood prediction over a specified class of quantum models, which is interpreted using quantum reverse information projection and the quantum Pythagorean theorem. Key findings include non-asymptotic performance guarantees in terms of convergence rates and concentration inequalities in both trace norm and quantum relative entropy, providing a unified framework for classical and quantum LLMs.

本文提出了一种新的观点，将大型语言模型的能力视为通过量子密度算子学习概率分布的嵌入，并将上下文学习视为在量子模型类中进行最大似然预测。作者利用量子信息理论的概念对预测器进行了解释，并推导出了非渐近性能保证，包括迹范数和量子相对熵中的收敛速率和集中不等式。

Visual Planning: Let's Think Only with Images

Authors: Yi Xu, Chengzu Li, Han Zhou, Xingchen Wan, Caiqi Zhang, Anna Korhonen, Ivan Vulić

Venue: ICLR 2026 Oral

First: 2025-05-16T16:17:22+00:00 · Latest: 2026-02-20T17:09:35+00:00

Comments: ICLR 2026 (Oral)

Abs · PDF · Code1 · Code2

Abstract

Recent advancements in Large Language Models (LLMs) and their multimodal extensions (MLLMs) have substantially enhanced machine reasoning across diverse tasks. However, these models predominantly rely on pure text as the medium for both expressing and structuring reasoning, even when visual information is present. In this work, we argue that language may not always be the most natural or effective modality for reasoning, particularly in tasks involving spatial and geometrical information. Motivated by this, we propose a new paradigm, Visual Planning, which enables planning through purely visual representations for these "vision-first" tasks, as a supplementary channel to language-based reasoning. In this paradigm, planning is executed via sequences of images that encode step-by-step inference in the visual domain, akin to how humans sketch or visualize future actions. We introduce a novel reinforcement learning framework, Visual Planning via Reinforcement Learning (VPRL), empowered by GRPO for post-training large vision models, leading to substantial improvements in planning in a selection of representative visual navigation tasks, FrozenLake, Maze, and MiniBehavior. Our visual planning paradigm outperforms all other planning variants that conduct reasoning in the text-only space. Our results establish Visual Planning as a viable and promising supplement to language-based reasoning, opening new avenues for tasks that benefit from intuitive, image-based inference.

中文标题/摘要

标题：视觉规划：仅用图像思考

近期大型语言模型（LLMs）及其多模态扩展（MLLMs）在多种任务中的机器推理能力有了显著提升。然而，这些模型主要依赖纯文本作为表达和结构化推理的媒介，即使存在视觉信息也是如此。在本文中，我们提出，对于涉及空间和几何信息的任务，语言可能不是最自然或有效的推理模态。受此启发，我们提出了一种新的范式——视觉规划，该范式通过纯视觉表示进行规划，作为语言推理的补充渠道，特别适用于“视觉优先”任务。在该范式中，规划通过一系列图像序列执行，这些图像序列编码了视觉域中的逐步推理，类似于人类如何草图或可视化未来动作。我们引入了一种新的强化学习框架——基于GRPO的强化学习视觉规划（VPRL），在多个代表性视觉导航任务（FrozenLake、迷宫和MiniBehavior）中，该框架显著提高了规划性能。我们的视觉规划范式优于所有仅在文本空间中进行推理的规划变体。我们的结果表明，视觉规划是语言推理的一种可行且有前景的补充，为那些受益于直观、基于图像的推理的任务开辟了新的途径。

Summary / 总结

This work proposes Visual Planning, a new paradigm that enables planning through purely visual representations for tasks involving spatial and geometrical information. The authors introduce a reinforcement learning framework, Visual Planning via Reinforcement Learning (VPRL), which leads to improved performance in visual navigation tasks such as FrozenLake, Maze, and MiniBehavior. Visual Planning outperforms text-only reasoning methods and establishes itself as a promising supplement to language-based reasoning.

本文提出了视觉规划这一新范式，通过纯视觉表示进行推理，特别适用于涉及空间和几何信息的任务。作者提出了一种强化学习框架，视觉规划通过强化学习（VPRL），在FrozenLake、迷宫和MiniBehavior等视觉导航任务中提高了规划性能。结果表明，视觉规划优于仅基于文本的推理方法，表明它作为语言推理的补充，在需要直观图像推理的任务中具有潜力。

Validating Political Position Predictions of Arguments

Authors: Jordan Robinson, Angus R. Williams, Katie Atkinson, Anthony G. Cohn

First: 2026-02-20T17:03:44+00:00 · Latest: 2026-02-20T17:03:44+00:00

Comments: 13 pages, 6 figures, 6 tables. Under review

Abs · PDF · Code1 · Code2

Abstract

Real-world knowledge representation often requires capturing subjective, continuous attributes -- such as political positions -- that conflict with pairwise validation, the widely accepted gold standard for human evaluation. We address this challenge through a dual-scale validation framework applied to political stance prediction in argumentative discourse, combining pointwise and pairwise human annotation. Using 22 language models, we construct a large-scale knowledge base of political position predictions for 23,228 arguments drawn from 30 debates that appeared on the UK politicial television programme \textit{Question Time}. Pointwise evaluation shows moderate human-model agreement (Krippendorff's $α=0.578$), reflecting intrinsic subjectivity, while pairwise validation reveals substantially stronger alignment between human- and model-derived rankings ($α=0.86$ for the best model). This work contributes: (i) a practical validation methodology for subjective continuous knowledge that balances scalability with reliability; (ii) a validated structured argumentation knowledge base enabling graph-based reasoning and retrieval-augmented generation in political domains; and (iii) evidence that ordinal structure can be extracted from pointwise language models predictions from inherently subjective real-world discourse, advancing knowledge representation capabilities for domains where traditional symbolic or categorical approaches are insufficient.

中文标题/摘要

标题：验证论据中的政治立场预测

现实世界的知识表示经常需要捕捉主观的连续属性——如政治立场——这与广泛接受的人类评估黄金标准的成对验证相冲突。我们通过应用于论据话语中的政治立场预测的双重验证框架来应对这一挑战，结合点对点和成对的人类注释。使用22种语言模型，我们构建了一个包含23,228个论据的政治立场预测大规模知识库，这些论据来自30场在英国政治电视节目《Question Time》上出现的辩论。点对点评估显示中等程度的人类-模型一致性（Krippendorff的α=0.578），反映了内在的主观性，而成对验证揭示了人类和模型排名之间显著更强的一致性（最佳模型的α=0.86）。这项工作贡献了：(i) 一种平衡可扩展性和可靠性的人类主观连续知识验证方法；(ii) 一个经过验证的结构化论据知识库，支持图推理和检索增强生成在政治领域的应用；以及(iii) 证据表明，可以从本质上主观的现实世界话语中提取出序数结构，从而推进在传统符号或分类方法不足的领域中的知识表示能力。

Summary / 总结

This study addresses the challenge of validating continuous subjective attributes like political positions in knowledge representation by proposing a dual-scale validation framework combining pointwise and pairwise human annotation. Using 22 language models, the research constructs a large-scale knowledge base for 23,228 arguments from 30 UK political debates. Pointwise evaluation shows moderate human-model agreement, while pairwise validation reveals stronger alignment, indicating that ordinal structure can be extracted from language models for inherently subjective discourse, advancing knowledge representation in political domains.

该研究通过提出结合点wise和pairwise人工注释的双尺度验证框架，解决了在论辩话语中验证政治立场预测的挑战。研究人员使用22个语言模型构建了一个包含23,228个来自30场英国政治辩论的论点的大规模知识库。点wise评估显示了中等的人工模型一致性（Krippendorff's α=0.578），而pairwise验证则显示了更强的一致性（最佳模型的α=0.86），表明可以从主观的现实世界话语中提取出序数结构，从而推进知识表示能力。

Vichara: Appellate Judgment Prediction and Explanation for the Indian Judicial System

Authors: Pavithra PM Nair, Preethu Rose Anish

Venue: AAAI 2026

First: 2026-02-20T16:57:44+00:00 · Latest: 2026-02-20T16:57:44+00:00

Abs · PDF · Code1 · Code2

Abstract

In jurisdictions like India, where courts face an extensive backlog of cases, artificial intelligence offers transformative potential for legal judgment prediction. A critical subset of this backlog comprises appellate cases, which are formal decisions issued by higher courts reviewing the rulings of lower courts. To this end, we present Vichara, a novel framework tailored to the Indian judicial system that predicts and explains appellate judgments. Vichara processes English-language appellate case proceeding documents and decomposes them into decision points. Decision points are discrete legal determinations that encapsulate the legal issue, deciding authority, outcome, reasoning, and temporal context. The structured representation isolates the core determinations and their context, enabling accurate predictions and interpretable explanations. Vichara's explanations follow a structured format inspired by the IRAC (Issue-Rule-Application-Conclusion) framework and adapted for Indian legal reasoning. This enhances interpretability, allowing legal professionals to assess the soundness of predictions efficiently. We evaluate Vichara on two datasets, PredEx and the expert-annotated subset of the Indian Legal Documents Corpus (ILDC_expert), using four large language models: GPT-4o mini, Llama-3.1-8B, Mistral-7B, and Qwen2.5-7B. Vichara surpasses existing judgment prediction benchmarks on both datasets, with GPT-4o mini achieving the highest performance (F1: 81.5 on PredEx, 80.3 on ILDC_expert), followed by Llama-3.1-8B. Human evaluation of the generated explanations across Clarity, Linking, and Usefulness metrics highlights GPT-4o mini's superior interpretability.

中文标题/摘要

标题：Vichara：印度司法系统中的上诉判决预测与解释

在印度等司法系统中，由于案件积压严重，人工智能为法律判决预测提供了变革性的潜力。上诉案件是更高法院对下级法院判决进行正式审查的正式决定，构成了这一积压案件的一部分。为此，我们提出了Vichara，一种针对印度司法系统的新型框架，用于预测和解释上诉判决。Vichara 处理英文上诉案件程序文件，并将其分解为决策点。决策点是包含法律问题、决定依据、结果、推理和时间背景的离散法律判断。结构化的表示形式隔离了核心判断及其背景，从而实现准确的预测和可解释的解释。Vichara 的解释遵循由IRAC（问题-规则-应用-结论）框架启发并适应印度法律推理的结构化格式。这增强了可解释性，使法律专业人士能够高效地评估预测的合理性。我们使用四个大型语言模型——GPT-4o mini、Llama-3.1-8B、Mistral-7B 和 Qwen2.5-7B，在两个数据集PredEx 和印度法律文件语料库的专家注释子集（ILDC_expert）上评估了Vichara。Vichara 在两个数据集上的表现均优于现有判决预测基准，GPT-4o mini 在 PredEx 上的性能最高（F1: 81.5，在 ILDC_expert 上为 80.3），其次是 Llama-3.1-8B。生成的解释在清晰度、关联性和有用性方面的评估结果显示，GPT-4o mini 的可解释性更优。

Summary / 总结

Vichara is a framework designed to predict and explain appellate judgments in the Indian judicial system. It processes appellate case documents, decomposing them into decision points that include legal issues, outcomes, and reasoning. Vichara uses a structured format inspired by the IRAC framework to provide interpretable explanations. Evaluated on two datasets, Vichara outperforms existing benchmarks, with GPT-4o mini achieving the highest F1 scores of 81.5 and 80.3 on PredEx and ILDC_expert, respectively. Human evaluation also indicates superior interpretability of GPT-4o mini's explanations.

Vichara 是一个旨在预测和解释印度司法系统中上诉判决的框架。它处理上诉案件文件，将其分解为包括法律问题、结果和推理的决策点。Vichara 使用基于 IRAC 框架的结构化格式生成可解释的解释。在两个数据集上的评估表明，Vichara 超过了现有基准，GPT-4o mini 在 PredEx 和 ILDC_expert 上分别取得了最高的 F1 分数 81.5 和 80.3。人类评估还显示 GPT-4o mini 在清晰度、关联性和实用性方面的解释更优。

Communication-Corruption Coupling and Verification in Cooperative Multi-Objective Bandits

Authors: Ming Shi

First: 2026-01-17T06:13:52+00:00 · Latest: 2026-02-20T16:40:01+00:00

Abs · PDF · Code1 · Code2

Abstract

We study cooperative stochastic multi-armed bandits with vector-valued rewards under adversarial corruption and limited verification. In each of $T$ rounds, each of $N$ agents selects an arm, the environment generates a clean reward vector, and an adversary perturbs the observed feedback subject to a global corruption budget $Γ$. Performance is measured by team regret under a coordinate-wise nondecreasing, $L$-Lipschitz scalarization $φ$, covering linear, Chebyshev, and smooth monotone utilities. Our main contribution is a communication-corruption coupling: we show that a fixed environment-side budget $Γ$ can translate into an effective corruption level ranging from $Γ$ to $NΓ$, depending on whether agents share raw samples, sufficient statistics, or only arm recommendations. We formalize this via a protocol-induced multiplicity functional and prove regret bounds parameterized by the resulting effective corruption. As corollaries, raw-sample sharing can suffer an $N$-fold larger additive corruption penalty, whereas summary sharing and recommendation-only sharing preserve an unamplified $O(Γ)$ term and achieve centralized-rate team regret. We further establish information-theoretic limits, including an unavoidable additive $Ω(Γ)$ penalty and a high-corruption regime $Γ=Θ(NT)$ where sublinear regret is impossible without clean information. Finally, we characterize how a global budget $ν$ of verified observations restores learnability. That is, verification is necessary in the high-corruption regime, and sufficient once it crosses the identification threshold, with certified sharing enabling the team's regret to become independent of $Γ$.

中文标题/摘要

标题：通信-腐败耦合与合作多目标多臂老虎机中的验证

我们研究在存在敌对腐败和有限验证条件下的合作随机多臂老虎机问题，奖励为向量形式。在每一轮中的T次中，每个N个代理选择一个臂，环境生成一个干净的奖励向量，而一个敌手在全局腐败预算Γ的限制下篡改观察反馈。性能通过坐标单调非递减、L- Lipschitz标量化函数φ来衡量，该函数涵盖了线性、切比雪夫和光滑单调效用。我们的主要贡献是通信-腐败耦合：我们证明了一个固定环境预算Γ可以转化为从Γ到NΓ的有效腐败水平，这取决于代理是否共享原始样本、充分统计信息或仅推荐臂。我们通过协议诱导的多重函数形式化了这一点，并证明了由结果的有效腐败参数化的遗憾界。作为推论，原始样本共享可能会遭受N倍更大的附加腐败惩罚，而汇总共享和仅推荐共享则保持未放大的O(Γ)项，并实现中心化速率团队遗憾。我们还建立了信息论极限，包括不可避免的附加Ω(Γ)惩罚，以及在高腐败情况下Γ=Θ(NT)的高腐败区域，其中没有干净信息就不可能实现亚线性遗憾。最后，我们描述了全局验证观察预算ν如何恢复可学习性。也就是说，在高腐败区域验证是必要的，一旦超过识别阈值，验证就足够了，认证共享使团队的遗憾与Γ无关。

Summary / 总结

This paper investigates cooperative multi-armed bandits with vector-valued rewards under adversarial corruption and limited verification. It introduces a communication-corruption coupling, showing that the effective corruption level can range from the actual corruption budget Γ to NΓ based on the level of communication between agents. The study proves regret bounds parameterized by the effective corruption and establishes that raw-sample sharing can suffer an N-fold larger additive corruption penalty, while summary sharing and recommendation-only sharing preserve an unamplified O(Γ) term and achieve centralized-rate team regret. The paper also sets information-theoretic limits, indicating an unavoidable additive Ω(Γ) penalty and a high-corruption regime where sublinear regret is impossible without clean information.

论文研究了在敌对干扰和有限验证下的合作多臂老虎机问题，具有向量奖励。引入了通信-干扰耦合的概念，显示了基于信息共享方式的有效干扰水平变化。原始样本共享会遭受$N$倍更大的附加干扰惩罚，而汇总共享和仅推荐共享则保持未放大的$O(Γ)$项，并实现集中率团队遗憾。研究还建立了信息论极限，表明存在不可避免的附加$Ω(Γ)$惩罚，并且在高干扰环境下，没有干净信息就不可能实现亚线性遗憾。

Predicting Contextual Informativeness for Vocabulary Learning using Deep Learning

Authors: Tao Wu, Adam Kapelner

First: 2026-02-20T16:32:14+00:00 · Latest: 2026-02-20T16:32:14+00:00

Comments: 8 pages, 3 figures, 4 tables

Abs · PDF · Code1 · Code2

Abstract

We describe a modern deep learning system that automatically identifies informative contextual examples (\qu{contexts}) for first language vocabulary instruction for high school student. Our paper compares three modeling approaches: (i) an unsupervised similarity-based strategy using MPNet's uniformly contextualized embeddings, (ii) a supervised framework built on instruction-aware, fine-tuned Qwen3 embeddings with a nonlinear regression head and (iii) model (ii) plus handcrafted context features. We introduce a novel metric called the Retention Competency Curve to visualize trade-offs between the discarded proportion of good contexts and the \qu{good-to-bad} contexts ratio providing a compact, unified lens on model performance. Model (iii) delivers the most dramatic gains with performance of a good-to-bad ratio of 440 all while only throwing out 70\% of the good contexts. In summary, we demonstrate that a modern embedding model on neural network architecture, when guided by human supervision, results in a low-cost large supply of near-perfect contexts for teaching vocabulary for a variety of target words.

中文标题/摘要

标题：使用深度学习预测词汇学习的上下文信息量

我们描述了一个现代深度学习系统，该系统能够自动识别对高中生第一语言词汇教学具有信息量的上下文示例（"上下文"）。我们的论文比较了三种建模方法：（i）一种基于MPNet均匀上下文化嵌入的无监督相似性策略，（ii）一种基于指令感知、微调Qwen3嵌入的监督框架，带有非线性回归头，（iii）方法（ii）加上手工制作的上下文特征。我们引入了一种新的度量标准，称为保留能力曲线，用于可视化丢弃的良好上下文比例与"良好到不良"上下文比例之间的权衡，提供了一个紧凑的统一视角来评估模型性能。方法（iii）带来了最大的性能提升，良好到不良上下文比率为440，同时仅丢弃了70%的良好上下文。总之，我们证明了在神经网络架构上的现代嵌入模型，在人类监督的指导下，可以产生大量近乎完美的上下文，用于教授各种目标词汇。

Summary / 总结

The research aims to identify informative contextual examples for high school vocabulary instruction using deep learning. Three modeling approaches were compared: an unsupervised similarity-based strategy, a supervised framework, and a supervised framework with additional handcrafted context features. A novel metric, the Retention Competency Curve, was introduced to evaluate model performance. The third approach, incorporating handcrafted context features, achieved the best results with a good-to-bad ratio of 440 while discarding only 70% of good contexts. This demonstrates the effectiveness of human-guided deep learning models in providing high-quality contexts for vocabulary teaching.

研究旨在使用深度学习自动识别高中词汇教学中的信息性上下文示例。三种建模方法进行了比较：无监督的相似性策略、监督框架以及结合手工特征的监督框架。引入了一种新的评价指标——保留能力曲线，以评估模型性能。第三种方法结合了手工特征，实现了440的好坏比，同时仅丢弃70%的好上下文。这表明，在人类监督下使用现代嵌入模型和神经网络架构，可以提供高质量的上下文，用于多种目标词汇的教学。

Probe-then-Commit Multi-Objective Bandits: Theoretical Benefits of Limited Multi-Arm Feedback

Authors: Ming Shi

First: 2026-02-03T06:44:00+00:00 · Latest: 2026-02-20T16:27:12+00:00

Abs · PDF · Code1 · Code2

Abstract

We study an online resource-selection problem motivated by multi-radio access selection and mobile edge computing offloading. In each round, an agent chooses among $K$ candidate links/servers (arms) whose performance is a stochastic $d$-dimensional vector (e.g., throughput, latency, energy, reliability). The key interaction is \emph{probe-then-commit (PtC)}: the agent may probe up to $q>1$ candidates via control-plane measurements to observe their vector outcomes, but must execute exactly one candidate in the data plane. This limited multi-arm feedback regime strictly interpolates between classical bandits ($q=1$) and full-information experts ($q=K$), yet existing multi-objective learning theory largely focuses on these extremes. We develop \textsc{PtC-P-UCB}, an optimistic probe-then-commit algorithm whose technical core is frontier-aware probing under uncertainty in a Pareto mode, e.g., it selects the $q$ probes by approximately maximizing a hypervolume-inspired frontier-coverage potential and commits by marginal hypervolume gain to directly expand the attained Pareto region. We prove a dominated-hypervolume frontier error of $\tilde{O} (K_P d/\sqrt{qT})$, where $K_P$ is the Pareto-frontier size and $T$ is the horizon, and scalarized regret $\tilde{O} (L_φd\sqrt{(K/q)T})$, where $φ$ is the scalarizer. These quantify a transparent $1/\sqrt{q}$ acceleration from limited probing. We further extend to \emph{multi-modal probing}: each probe returns $M$ modalities (e.g., CSI, queue, compute telemetry), and uncertainty fusion yields variance-adaptive versions of the above bounds via an effective noise scale.

中文标题/摘要

标题：探查后再承诺的多目标多臂老虎机：有限多臂反馈的理论优势

我们研究了一个在线资源选择问题，受多无线电接入选择和移动边缘计算卸载的启发。在每一轮中，代理在$K$个候选链路/服务器（臂）中选择一个，其性能是一个随机的$d$维向量（例如，吞吐量、延迟、能耗、可靠性）。关键交互是“探查后再承诺（PtC）”：代理可以通过控制面测量探查多达$q>1$个候选臂以观察它们的向量结果，但在数据面必须执行一个候选臂。这种有限多臂反馈机制严格介于经典的多臂老虎机（$q=1$）和完全信息专家（$q=K$）之间，但现有的多目标学习理论主要集中在这些极端情况。我们开发了PtC-P-UCB算法，这是一种乐观的探查后再承诺算法，其技术核心是在帕累托模式下的不确定性下的前沿感知探查，例如，它通过近似最大化一种基于前沿覆盖潜力的超体积启发式来选择$q$个探查臂，并通过边际超体积增益来承诺，直接扩展已获得的帕累托区域。我们证明了主导超体积前沿误差为$\tilde{O} (K_P d/\sqrt{qT})$，其中$K_P$是帕累托前沿的大小，$T$是时间范围，以及标量化后悔$\tilde{O} (L_φd\sqrt{(K/q)T})$，其中$φ$是标量化器。这些量化了有限探查带来的透明的$1/\sqrt{q}$加速。我们进一步扩展到“多模态探查”：每个探查返回$M$种模态（例如，CSI、队列、计算遥测），不确定性融合通过有效噪声尺度产生上述界限的方差自适应版本。

FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels

Authors: Jiedong Jiang, Wanyi He, Yuefeng Wang, Guoxiong Gao, Yongle Hu, Jingting Wang, Nailing Guan, Peihao Wu, Chunbo Dai, Liang Xiao, Bin Dong

First: 2025-11-04T03:25:17+00:00 · Latest: 2026-02-20T16:21:35+00:00

Abs · PDF · Code1 · Code2

Abstract

Recent advances in large language models (LLMs) have demonstrated impressive capabilities in formal theorem proving, particularly on contest-based mathematical benchmarks like the IMO. However, these contests do not reflect the depth, breadth, and abstraction of modern mathematical research. To bridge this gap, we introduce FATE (Formal Algebra Theorem Evaluation), a new benchmark series in formal algebra designed to chart a course toward advanced mathematical reasoning. We present two new components, FATE-H and FATE-X, each with 100 problems in abstract and commutative algebra. The FATE series spans a difficulty spectrum from undergraduate exercises to problems exceeding PhD qualifying exams. Notably, FATE-X is the first formal benchmark to surpass both PhD-level exam difficulty and the coverage of the Mathlib library. Our evaluations of state-of-the-art LLM provers on this new benchmark reveal a stark performance gap compared to contest math: the best model achieves only 3% (pass@64) accuracy on FATE-H and 0% on FATE-X. Our two-stage evaluation reveals that models' natural-language reasoning is notably more accurate than their ability to formalize this reasoning. We systematically classify the common errors that arise during this formalization process. Furthermore, a comparative study shows that a specialized prover can exhibit less effective reflection than general-purpose models, reducing its accuracy at the natural-language stage. We believe FATE provides a robust and challenging benchmark that establishes essential checkpoints on the path toward research-level formal mathematical reasoning.

中文标题/摘要

标题：FATE：多难度层次形式代数基准系列

近年来，大型语言模型（LLMs）在形式定理证明方面展示了令人印象深刻的性能，特别是在IMO等竞赛数学基准测试中。然而，这些竞赛并未反映现代数学研究的深度、广度和抽象性。为弥合这一差距，我们引入了FATE（形式代数定理评估），这是一个新的形式代数基准系列，旨在引导高级数学推理的发展。我们提出了两个新组件，FATE-H和FATE-X，每个都包含100个抽象和交换代数问题。FATE系列涵盖了从本科练习到超出博士资格考试难度的问题范围。值得注意的是，FATE-X是第一个超越博士水平考试难度和Mathlib库覆盖范围的形式基准。我们对最新一代LLM证明器在这项新基准上的评估显示，与竞赛数学相比，性能差距明显：最佳模型在FATE-H上的准确率仅为3%（pass@64），在FATE-X上为0%。我们的两阶段评估表明，模型的自然语言推理比其形式化推理的能力更为准确。我们系统地分类了在这一形式化过程中出现的常见错误。此外，一项比较研究显示，专门的证明器在自然语言阶段的反思效果可能不如通用模型，从而降低了其准确性。我们相信FATE提供了一个稳健且具有挑战性的基准，为通往研究级形式数学推理的道路上设立了必要的检查点。

Summary / 总结

The research introduces FATE, a new benchmark series for formal algebra to evaluate the capabilities of large language models (LLMs) in advanced mathematical reasoning. FATE includes two components, FATE-H and FATE-X, with problems ranging from undergraduate to PhD-level difficulty. Evaluations show that state-of-the-art LLM provers perform poorly, achieving only 3% accuracy on FATE-H and 0% on FATE-X. The study highlights the gap between natural-language reasoning and formalization skills, and suggests that specialized provers may not outperform general-purpose models in formalization tasks.

FATE 是一个新的形式代数基准系列，旨在评估大型语言模型（LLMs）在高级数学推理方面的能力。它包括两个部分，FATE-H 和 FATE-X，每个部分包含 100 个问题，难度范围从本科到博士水平。评估结果显示，最先进的 LLM 证明器表现不佳，仅在 FATE-H 上达到 3% 的准确率，在 FATE-X 上为 0%。研究强调了自然语言推理与形式化能力之间的差距，并表明专门的证明器可能不如通用模型在处理形式化任务时有效。

GRPO is Secretly a Process Reward Model

Authors: Michael Sullivan, Alexander Koller

Venue: ICML 2026

First: 2025-09-25T13:40:36+00:00 · Latest: 2026-02-20T16:20:57+00:00

Comments: 15 pages, 7 figures; under review at ICML 2026

Abs · PDF · Code1 · Code2

Abstract

Process reward models (PRMs) allow for fine-grained credit assignment in reinforcement learning (RL), and seemingly contrast with outcome reward models (ORMs), which assign a single reward to an entire trajectory. However, we provide theoretical proof in this work that the Group Relative Policy Optimization (GRPO) RL algorithm equipped with an ORM is in fact equivalent to a PRM-aware RL objective equipped with a non-trivial, Monte-Carlo-based PRM (given mild assumptions). Leveraging the framework of GRPO-as-a-PRM, we identify a flaw in the GRPO objective that interacts with imbalanced process steps and rewards to hinder both exploration and exploitation (under different conditions). We propose a simple modification to the algorithm to mitigate this defect ($λ$-GRPO), and show that LLMs tuned with $λ$-GRPO outperform LLMs tuned with standard GRPO on downstream reasoning tasks\textemdash and reach peak performance more rapidly. These results show that we can leverage the hidden, built-in PRM structure within the vanilla GRPO algorithm to boost model performance without employing an explicit PRM, and with a negligible impact on training time and cost.

中文标题/摘要

标题：GRPO实际上是过程奖励模型

过程奖励模型（PRMs）在强化学习（RL）中允许精细的信用分配，并似乎与结果奖励模型（ORMs）相对立，后者为整个轨迹分配单一奖励。然而，我们在本文中提供了理论证明，表明在轻度假设下，配备ORM的Group Relative Policy Optimization（GRPO）RL算法实际上等同于一个配备非平凡、基于蒙特卡洛的PRM的PRM意识RL目标。利用GRPO-as-a-PRM的框架，我们识别出GRPO目标中的一个缺陷，该缺陷与过程步骤和奖励的不平衡相互作用，从而阻碍了探索和利用（在不同条件下）。我们提出了一种简单的算法修改（$λ$-GRPO）来缓解这一缺陷，并展示了使用$λ$-GRPO调优的LLMs在下游推理任务中优于使用标准GRPO调优的LLMs，并且达到最佳性能的速度更快。这些结果表明，我们可以利用vanilla GRPO算法中隐含的内置PRM结构来提升模型性能，而无需使用显式的PRM，并且对训练时间和成本的影响微乎其微。

Robo-Saber: Generating and Simulating Virtual Reality Players

Authors: Nam Hee Kim, Jingjing May Liu, Jaakko Lehtinen, Perttu Hämäläinen, James F. O'Brien, Xue Bin Peng

First: 2026-02-20T16:19:19+00:00 · Latest: 2026-02-20T16:19:19+00:00

Comments: 13 pages, 15 figures. Accepted to Eurographics 2026. Project page: https://robo-saber.github.io/

Abs · PDF · Code1 · Code2 · Project1

Abstract

We present the first motion generation system for playtesting virtual reality (VR) games. Our player model generates VR headset and handheld controller movements from in-game object arrangements, guided by style exemplars and aligned to maximize simulated gameplay score. We train on the large BOXRR-23 dataset and apply our framework on the popular VR game Beat Saber. The resulting model Robo-Saber produces skilled gameplay and captures diverse player behaviors, mirroring the skill levels and movement patterns specified by input style exemplars. Robo-Saber demonstrates promise in synthesizing rich gameplay data for predictive applications and enabling a physics-based whole-body VR playtesting agent.

中文标题/摘要

标题：Robo-Saber：生成和模拟虚拟现实玩家

我们提出了首个用于测试虚拟现实（VR）游戏的运动生成系统。我们的玩家模型根据样例风格和最大化模拟游戏得分来生成VR头显和手持控制器的动作，从游戏中的对象排列中生成。我们使用大型BOXRR-23数据集进行训练，并将我们的框架应用于流行的VR游戏Beat Saber。生成的模型Robo-Saber能够产生熟练的游戏表现，并捕捉多样化的玩家行为，反映输入样例风格指定的技能水平和动作模式。Robo-Saber展示了在预测应用中合成丰富游戏数据以及启用基于物理的全身VR测试代理的潜力。

Summary / 总结

The research aims to develop a motion generation system for playtesting VR games, specifically for the game Beat Saber. The system, Robo-Saber, generates VR player movements based on in-game object arrangements and style exemplars, optimizing gameplay scores. Key findings include the production of skilled gameplay and the ability to capture diverse player behaviors, aligning with specified input styles. This system shows potential for generating rich gameplay data and enabling physics-based VR playtesting agents.

研究旨在开发一种用于测试VR游戏的运动生成系统，特别是针对Beat Saber游戏。该系统Robo-Saber根据游戏中的物体排列和风格示例生成VR玩家动作，并优化游戏得分。主要发现包括生成高技能的游戏玩法和捕捉多样化的玩家行为，与输入的风格示例相匹配。该系统展示了生成丰富游戏数据和启用基于物理的全身VR测试代理的潜力。

View Invariant Learning for Vision-Language Navigation in Continuous Environments

Authors: Josh Qixuan Sun, Huaiyuan Weng, Xiaoying Xing, Chul Min Yeum, Mark Crowley

First: 2025-07-05T18:04:35+00:00 · Latest: 2026-02-20T16:14:13+00:00

Comments: This paper is accepted to RA-L 2026

Abs · PDF · Code1 · Code2 · Code3

Abstract

Vision-Language Navigation in Continuous Environments (VLNCE), where an agent follows instructions and moves freely to reach a destination, is a key research problem in embodied AI. However, most existing approaches are sensitive to viewpoint changes, i.e. variations in camera height and viewing angle. Here we introduce a more general scenario, V$^2$-VLNCE (VLNCE with Varied Viewpoints) and propose a view-invariant post-training framework, called VIL (View Invariant Learning), that makes existing navigation policies more robust to changes in camera viewpoint. VIL employs a contrastive learning framework to learn sparse and view-invariant features. We also introduce a teacher-student framework for the Waypoint Predictor Module, a standard part of VLNCE baselines, where a view-dependent teacher model distills knowledge into a view-invariant student model. We employ an end-to-end training paradigm to jointly optimize these components. Empirical results show that our method outperforms state-of-the-art approaches on V$^2$-VLNCE by 8-15\% measured on Success Rate for two standard benchmark datasets R2R-CE and RxR-CE. Evaluation of VIL in standard VLNCE settings shows that despite being trained for varied viewpoints, VIL often still improves performance. On the harder RxR-CE dataset, our method also achieved state-of-the-art performance across all metrics. This suggests that adding VIL does not diminish the standard viewpoint performance and can serve as a plug-and-play post-training method. We further evaluate VIL for simulated camera placements derived from real robot configurations (e.g. Stretch RE-1, LoCoBot), showing consistent improvements of performance. Finally, we present a proof-of-concept real-robot evaluation in two physical environments using a panoramic RGB sensor combined with LiDAR. The code is available at https://github.com/realjoshqsun/V2-VLNCE.

中文标题/摘要

标题：连续环境中的视景语言导航的视点不变学习

连续环境中的视景语言导航（VLNCE），其中智能体遵循指令自由移动以到达目的地，是嵌入式人工智能中的关键研究问题。然而，大多数现有方法对视点变化敏感，即相机高度和视角的变化。在这里，我们引入了一个更通用的场景，V$^2$-VLNCE（具有变化视点的VLNCE），并提出了一种视点不变后训练框架，称为VIL（视点不变学习），使现有的导航策略对相机视点的变化更具鲁棒性。VIL 使用对比学习框架来学习稀疏且视点不变的特征。我们还引入了教师-学生框架来训练路径预测模块，这是VLNCE基线中的标准部分，其中视点依赖的教师模型将知识提炼到视点不变的学生模型中。我们采用端到端的训练范式来联合优化这些组件。实验证明，我们的方法在V$^2$-VLNCE的两个标准基准数据集R2R-CE和RxR-CE上，以成功率为指标，比最先进的方法高出8-15%。在标准VLNCE设置中的评估表明，尽管VIL 是为变化视点训练的，但它通常仍然能提高性能。在更难的RxR-CE数据集上，我们的方法在所有指标上也达到了最先进的性能。这表明添加VIL 不会削弱标准视点性能，可以作为插件式后训练方法。我们进一步评估了VIL 在从真实机器人配置（例如Stretch RE-1，LoCoBot）推导出的模拟相机放置中的表现，显示出一致的性能改进。最后，我们使用全景RGB传感器结合LiDAR在两个物理环境中进行了真实机器人评估，展示了VIL 的概念验证。

Summary / 总结

The research addresses the issue of viewpoint sensitivity in Vision-Language Navigation in Continuous Environments (VLNCE) by introducing V$^2$-VLNCE and proposing a view-invariant post-training framework called VIL. VIL uses a contrastive learning framework to learn sparse and view-invariant features and includes a teacher-student framework for the Waypoint Predictor Module. Empirical results show that VIL outperforms state-of-the-art approaches by 8-15% on Success Rate for R2R-CE and RxR-CE datasets, and it also improves performance on the harder RxR-CE dataset, achieving state-of-the-art performance across all metrics. Additionally, VIL shows consistent improvements in simulated real-robot configurations and real-robot evaluations in physical environments.

研究通过引入V$^2$-VLNCE和提出一种名为VIL的视点不变后训练框架来解决视点敏感性问题。VIL使用对比学习框架学习稀疏且视点不变的特征，并包含一个教师-学生框架用于航点预测模块。实验结果表明，VIL在R2R-CE和RxR-CE数据集上的成功率方面优于现有最佳方法8-15%，并且在更难的RxR-CE数据集上也实现了所有指标的最佳性能。此外，VIL在模拟真实机器人配置和物理环境中的真实机器人评估中均表现出一致的性能提升。

VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean

Authors: Yutong Xin, Qiaochu Chen, Greg Durrett, Işil Dillig

First: 2026-02-20T16:05:06+00:00 · Latest: 2026-02-20T16:05:06+00:00

Abs · PDF · Code1 · Code2 · Code3

Abstract

Large language models have achieved striking results in interactive theorem proving, particularly in Lean. However, most benchmarks for LLM-based proof automation are drawn from mathematics in the Mathlib ecosystem, whereas proofs in software verification are developed inside definition-rich codebases with substantial project-specific libraries. We introduce VeriSoftBench, a benchmark of 500 Lean 4 proof obligations drawn from open-source formal-methods developments and packaged to preserve realistic repository context and cross-file dependencies. Our evaluation of frontier LLMs and specialized provers yields three observations. First, provers tuned for Mathlib-style mathematics transfer poorly to this repository-centric setting. Second, success is strongly correlated with transitive repository dependence: tasks whose proofs draw on large, multi-hop dependency closures are less likely to be solved. Third, providing curated context restricted to a proof's dependency closure improves performance relative to exposing the full repository, but nevertheless leaves substantial room for improvement. Our benchmark and evaluation suite are released at https://github.com/utopia-group/VeriSoftBench.

中文标题/摘要

标题：VeriSoftBench：面向Lean的开源形式验证基准

大型语言模型在交互定理证明方面取得了令人瞩目的成果，特别是在Lean中。然而，大多数基于LLM的证明自动化基准来自Mathlib生态系统中的数学，而软件验证中的证明则在定义丰富的代码库中开发，包含大量的项目特定库。我们介绍了VeriSoftBench，这是一个包含500个来自开源形式方法开发的Lean 4证明义务的基准，打包以保留真实的仓库上下文和跨文件依赖关系。我们对前沿的LLM和专门的证明器的评估得出三个观察结果。首先，针对Mathlib风格数学进行调整的证明器在这一基于仓库的设置中转移效果不佳。其次，成功与传递的仓库依赖性密切相关：那些证明依赖于大量多跳依赖闭包的任务更不可能被解决。第三，提供针对证明依赖闭包的精心策划的上下文比暴露整个仓库更能提高性能，但仍然存在很大的改进空间。我们的基准和评估套件在https://github.com/utopia-group/VeriSoftBench发布。

On the Semantic and Syntactic Information Encoded in Proto-Tokens for One-Step Text Reconstruction

Authors: Ivan Bondarenko, Egor Palkin, Fedor Tikunov

First: 2026-02-20T15:54:10+00:00 · Latest: 2026-02-20T15:54:10+00:00

Abs · PDF · Code1 · Code2

Abstract

Autoregressive large language models (LLMs) generate text token-by-token, requiring n forward passes to produce a sequence of length n. Recent work, Exploring the Latent Capacity of LLMs for One-Step Text Reconstruction (Mezentsev and Oseledets), shows that frozen LLMs can reconstruct hundreds of tokens from only two learned proto-tokens in a single forward pass, suggesting a path beyond the autoregressive paradigm. In this paper, we study what information these proto-tokens encode and how they behave under reconstruction and controlled constraints. We perform a series of experiments aimed at disentangling semantic and syntactic content in the two proto-tokens, analyzing stability properties of the e-token, and visualizing attention patterns to the e-token during reconstruction. Finally, we test two regularization schemes for "imposing" semantic structure on the e-token using teacher embeddings, including an anchor-based loss and a relational distillation objective. Our results indicate that the m-token tends to capture semantic information more strongly than the e-token under standard optimization; anchor-based constraints trade off sharply with reconstruction accuracy; and relational distillation can transfer batch-level semantic relations into the proto-token space without sacrificing reconstruction quality, supporting the feasibility of future non-autoregressive seq2seq systems that predict proto-tokens as an intermediate representation.

中文标题/摘要

标题：关于原型标记中用于一步文本重构的语义和句法信息编码

自回归大型语言模型（LLMs）逐个生成文本，需要n次前向传递来生成长度为n的序列。近期研究《探索LLMs在一步文本重构中的潜在能力》（Mezentsev和Oseledets）表明，冻结的LLMs仅需一次前向传递即可从两个学习到的原型标记中重构数百个标记，这暗示了一种超越自回归范式的路径。在本文中，我们研究这些原型标记中编码了什么信息，以及在重构和控制约束下它们的行为。我们进行了一系列实验，旨在解开两个原型标记中的语义和句法内容，分析e标记的稳定性属性，并可视化重构过程中对e标记的注意力模式。最后，我们测试了两种正则化方案，以“强加”语义结构于e标记上，包括基于锚点的损失和关系蒸馏目标。我们的结果表明，在标准优化下，m标记倾向于比e标记更强烈地捕捉语义信息；基于锚点的约束与重构准确性之间存在显著的权衡；关系蒸馏可以在不牺牲重构质量的情况下将批次级语义关系转移到原型标记空间，支持未来非自回归seq2seq系统预测原型标记作为中间表示的可行性。

Analyzing and Improving Chain-of-Thought Monitorability Through Information Theory

Authors: Usman Anwar, Tim Bakker, Dana Kianfar, Cristina Pinneri, Christos Louizos

First: 2026-02-20T15:50:30+00:00 · Latest: 2026-02-20T15:50:30+00:00

Comments: First two authors contributed equally

Abs · PDF · Code1 · Code2

Abstract

Chain-of-thought (CoT) monitors are LLM-based systems that analyze reasoning traces to detect when outputs may exhibit attributes of interest, such as test-hacking behavior during code generation. In this paper, we use information-theoretic analysis to show that non-zero mutual information between CoT and output is a necessary but not sufficient condition for CoT monitorability. We identify two sources of approximation error that may undermine the performance of CoT monitors in practice: information gap, which measures the extent to which the monitor can extract the information available in CoT, and elicitation error, which measures the extent to which the monitor approximates the optimal monitoring function. We further demonstrate that CoT monitorability can be systematically improved through targeted training objectives. To this end, we propose two complementary approaches: (a) an oracle-based method that directly rewards the monitored model for producing CoTs that maximize monitor accuracy, and (b) a more practical, label-free approach that maximizes conditional mutual information between outputs and CoTs. Across multiple different environments, we show both methods significantly improve monitor accuracy while preventing CoT degeneration even when training against a monitor, thereby mitigating reward hacking when the task reward is imperfectly specified.

中文标题/摘要

标题：通过信息论分析和改进链式思维监控可观察性

链式思维（CoT）监控是基于LLM的系统，通过分析推理轨迹来检测输出可能表现出特定属性的情况，如代码生成期间的作弊行为。本文使用信息论分析表明，CoT与输出之间存在非零互信息是CoT监控可观察性的必要但不充分条件。我们识别出两种可能在实践中削弱CoT监控性能的近似误差来源：信息差距，衡量监控提取CoT中可用信息的程度；以及激发误差，衡量监控对最优监控函数的近似程度。我们进一步证明，可以通过有针对性的训练目标系统地提高CoT监控可观察性。为此，我们提出了两种互补的方法：(a) 一种基于Oracle的方法，直接奖励监控模型生成最大化监控准确性的CoT；(b) 一种更实用的、无需标签的方法，最大化输出与CoT之间的条件互信息。在多个不同环境中，我们展示了这两种方法显著提高了监控准确率，即使在训练对抗监控的情况下也能防止CoT退化，从而减轻了任务奖励不精确指定时的奖励作弊问题。

Summary / 总结

This paper investigates the limitations of chain-of-thought (CoT) monitors in detecting outputs with specific attributes, such as test-hacking behavior, by using information-theoretic analysis. It identifies two sources of approximation error—information gap and elicitation error—and proposes two methods to improve CoT monitorability: an oracle-based method that rewards the model for producing CoTs that maximize monitor accuracy, and a label-free approach that maximizes conditional mutual information between outputs and CoTs. The methods significantly enhance monitor accuracy without causing CoT degeneration, even when the model is trained against the monitor.

本文使用信息理论研究了有效链式思维（CoT）监控的条件。它确定了CoT监控的必要但不充分条件，并引入了两种近似误差来源：信息差距和解释误差。作者提出了两种改进CoT监控的方法：一种是基于Oracle的方法，旨在最大化监控准确性；另一种是无标签方法，旨在最大化输出与CoT之间的条件互信息。实验结果显示，这些方法在各种环境中显著提高了监控准确性，同时防止了CoT退化，从而减轻了任务奖励不精确时的奖励劫持问题。

CAIMAN: Causal Action Influence Detection for Sample-efficient Loco-manipulation

Authors: Yuanchen Yuan, Jin Cheng, Núria Armengol Urpí, Stelian Coros

First: 2025-02-02T16:16:53+00:00 · Latest: 2026-02-20T15:50:11+00:00

Abs · PDF · Code1 · Code2

Abstract

Enabling legged robots to perform non-prehensile loco-manipulation is crucial for enhancing their versatility. Learning behaviors such as whole-body object pushing often requires sophisticated planning strategies or extensive task-specific reward shaping, especially in unstructured environments. In this work, we present CAIMAN, a practical reinforcement learning framework that encourages the agent to gain control over other entities in the environment. CAIMAN leverages causal action influence as an intrinsic motivation objective, allowing legged robots to efficiently acquire object pushing skills even under sparse task rewards. We employ a hierarchical control strategy, combining a low-level locomotion module with a high-level policy that generates task-relevant velocity commands and is trained to maximize the intrinsic reward. To estimate causal action influence, we learn the dynamics of the environment by integrating a kinematic prior with data collected during training. We empirically demonstrate CAIMAN's superior sample efficiency and adaptability to diverse scenarios in simulation, as well as its successful transfer to real-world systems without further fine-tuning. A video demo is available at https://www.youtube.com/watch?v=dNyvT04Cqaw.

中文标题/摘要

标题：CAIMAN：基于因果动作影响的样本高效腿足操作检测

使腿足机器人能够执行非抓握式移动操作对于提高其灵活性至关重要。学习全身物体推举等行为通常需要复杂的规划策略或大量的任务特定奖励塑造，特别是在非结构化环境中。在本工作中，我们提出了CAIMAN，这是一种实用的强化学习框架，鼓励代理获得对环境其他实体的控制。CAIMAN利用因果动作影响作为内在动机目标，使腿足机器人即使在稀疏的任务奖励下也能高效地获得物体推举技能。我们采用分层控制策略，结合低级运动模块和生成任务相关速度命令的高级策略，并训练以最大化内在奖励。为了估计因果动作影响，我们通过结合运动学先验和训练期间收集的数据来学习环境动力学。我们在模拟中实证展示了CAIMAN的样本高效性和对多种场景的适应性，并且无需进一步微调即可成功转移到实际系统中。视频演示可在https://www.youtube.com/watch?v=dNyvT04Cqaw获取。

Summary / 总结

CAIMAN is a reinforcement learning framework designed to enable legged robots to perform non-prehensile loco-manipulation tasks, such as whole-body object pushing, by encouraging the robot to gain control over other entities in the environment. It uses causal action influence as an intrinsic motivation to achieve this, allowing the robot to learn these skills efficiently even with sparse rewards. The framework employs a hierarchical control strategy and learns the environment dynamics to estimate causal action influence, demonstrating superior sample efficiency and adaptability in both simulation and real-world scenarios.

CAIMAN 是一种强化学习框架，旨在通过利用因果动作影响作为内在动机来使腿足机器人执行非抓取式移动操作任务，如全身物体推移。该框架采用层次控制策略，结合低级运动模块和生成任务相关速度命令的高级策略。CAIMAN 在多种模拟场景中展示了出色的样本效率和适应性，并成功地转移到了实际系统中而无需进一步微调。

Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies

Authors: Zhuoran Li, Hai Zhong, Xun Wang, Qingxin Xia, Lihua Zhang, Longbo Huang

First: 2026-02-20T15:38:02+00:00 · Latest: 2026-02-20T15:38:02+00:00

Abs · PDF · Code1 · Code2

Abstract

Online Multi-Agent Reinforcement Learning (MARL) is a prominent framework for efficient agent coordination. Crucially, enhancing policy expressiveness is pivotal for achieving superior performance. Diffusion-based generative models are well-positioned to meet this demand, having demonstrated remarkable expressiveness and multimodal representation in image generation and offline settings. Yet, their potential in online MARL remains largely under-explored. A major obstacle is that the intractable likelihoods of diffusion models impede entropy-based exploration and coordination. To tackle this challenge, we propose among the first \underline{O}nline off-policy \underline{MA}RL framework using \underline{D}iffusion policies (\textbf{OMAD}) to orchestrate coordination. Our key innovation is a relaxed policy objective that maximizes scaled joint entropy, facilitating effective exploration without relying on tractable likelihood. Complementing this, within the centralized training with decentralized execution (CTDE) paradigm, we employ a joint distributional value function to optimize decentralized diffusion policies. It leverages tractable entropy-augmented targets to guide the simultaneous updates of diffusion policies, thereby ensuring stable coordination. Extensive evaluations on MPE and MAMuJoCo establish our method as the new state-of-the-art across $10$ diverse tasks, demonstrating a remarkable $2.5\times$ to $5\times$ improvement in sample efficiency.

中文标题/摘要

标题：扩散以协调：高效的在线多智能体扩散策略

在线多智能体强化学习（MARL）是实现智能体高效协调的突出框架。增强策略表达能力对于实现卓越性能至关重要。基于扩散的生成模型在图像生成和离线设置中展示了显著的表达能力和多模态表示，但它们在在线MARL中的潜力尚未得到充分探索。主要障碍在于扩散模型的不可计算似然性阻碍了基于熵的探索和协调。为解决这一挑战，我们提出了第一个在线离策略MARL框架——基于扩散策略的（OMAD）以协调智能体。我们的关键创新是最大化缩放联合熵的松弛策略目标，这有助于有效探索而不依赖于可计算的似然性。在此基础上，我们采用集中训练分散执行（CTDE）范式，并使用联合分布价值函数来优化分散的扩散策略。它利用可计算的熵增强目标来指导扩散策略的同时更新，从而确保协调的稳定性。在MPE和MAMuJoCo上的广泛评估表明，我们的方法在10个不同任务中成为新的最先进的方法，显示出高达2.5倍至5倍的样本效率改进。

Summary / 总结

The research aims to enhance policy expressiveness in online Multi-Agent Reinforcement Learning (MARL) to improve agent coordination. The proposed OMAD framework uses diffusion policies within a centralized training decentralized execution paradigm, employing a joint distributional value function to optimize decentralized policies. Key findings show a significant improvement in sample efficiency, with a 2.5 to 5 times increase across 10 diverse tasks compared to existing methods.

论文旨在通过增强在线多智能体强化学习（MARL）中的策略表达性来提高智能体间的协调能力。提出了一种名为OMAD的在线离策略MARL框架，使用了扩散策略。关键创新在于通过最大化缩放后的联合熵来优化松弛的策略目标，从而在无需依赖可计算似然性的前提下实现有效的探索。OMAD在集中训练分散执行（CTDE）框架下使用联合分布价值函数来优化分散的扩散策略，确保协调的稳定性。实验结果表明，OMAD在MPE和MAMuJoCo上的样本效率提高了2.5到5倍。

Incomplete Multi-view Clustering via Hierarchical Semantic Alignment and Cooperative Completion

Authors: Xiaojian Ding, Lin Zhao, Xian Li, Xiaoying Zhu

Venue: NeurIPS 2025

First: 2025-10-14T02:58:10+00:00 · Latest: 2026-02-20T15:30:47+00:00

Comments: 13 pages, conference paper. Accepted to the Thirty-ninth Conference on Neural Information Processing Systems (NeurIPS 2025)

Abs · PDF · Code1 · Code2 · Code3

Abstract

Incomplete multi-view data, where certain views are entirely missing for some samples, poses significant challenges for traditional multi-view clustering methods. Existing deep incomplete multi-view clustering approaches often rely on static fusion strategies or two-stage pipelines, leading to suboptimal fusion results and error propagation issues. To address these limitations, this paper proposes a novel incomplete multi-view clustering framework based on Hierarchical Semantic Alignment and Cooperative Completion (HSACC). HSACC achieves robust cross-view fusion through a dual-level semantic space design. In the low-level semantic space, consistency alignment is ensured by maximizing mutual information across views. In the high-level semantic space, adaptive view weights are dynamically assigned based on the distributional affinity between individual views and an initial fused representation, followed by weighted fusion to generate a unified global representation. Additionally, HSACC implicitly recovers missing views by projecting aligned latent representations into high-dimensional semantic spaces and jointly optimizes reconstruction and clustering objectives, enabling cooperative learning of completion and clustering. Experimental results demonstrate that HSACC significantly outperforms state-of-the-art methods on five benchmark datasets. Ablation studies validate the effectiveness of the hierarchical alignment and dynamic weighting mechanisms, while parameter analysis confirms the model's robustness to hyperparameter variations. The code is available at https://github.com/XiaojianDing/2025-NeurIPS-HSACC.

中文标题/摘要

标题：基于层次语义对齐和协同完成的不完全多视图聚类

不完全多视图数据，其中某些视图对某些样本完全缺失，给传统的多视图聚类方法带来了重大挑战。现有的深度不完全多视图聚类方法通常依赖于静态融合策略或两阶段管道，导致融合结果次优和错误传播问题。为了解决这些局限性，本文提出了一种基于层次语义对齐和协同完成（HSACC）的新型不完全多视图聚类框架。HSACC通过双层语义空间设计实现了稳健的跨视图融合。在低层语义空间中，通过最大化视图间的互信息来确保一致性对齐。在高层语义空间中，根据各个视图与初始融合表示之间的分布亲和性动态分配视图权重，然后加权融合生成统一的全局表示。此外，HSACC通过将对齐的潜在表示投影到高维语义空间并联合优化重构和聚类目标，隐式恢复缺失视图，从而实现完成和聚类的协同学习。实验结果表明，HSACC在五个基准数据集上显著优于现有最先进的方法。消融研究验证了层次对齐和动态加权机制的有效性，而参数分析证实了该模型对超参数变化的鲁棒性。代码可在https://github.com/XiaojianDing/2025-NeurIPS-HSACC 获取。

Summary / 总结

This paper addresses the challenge of clustering incomplete multi-view data by proposing a novel framework called Hierarchical Semantic Alignment and Cooperative Completion (HSACC). HSACC uses a dual-level semantic space design to ensure consistency alignment and adaptive view weighting, and it implicitly recovers missing views through joint optimization of reconstruction and clustering objectives. Experimental results show that HSACC outperforms existing methods on five benchmark datasets, and ablation studies confirm the effectiveness of the hierarchical alignment and dynamic weighting mechanisms.

本文提出了一种新的框架Hierarchical Semantic Alignment and Cooperative Completion (HSACC)，以解决不完整多视图数据的聚类问题。HSACC 使用双层语义空间设计，在低层确保一致性对齐，并在高层动态分配适应性视图权重，从而实现稳健的跨视图融合。此外，通过联合优化重建和聚类目标隐式恢复缺失视图。实验结果表明，HSACC 在五个基准数据集上优于现有方法，并且消融研究和参数分析进一步验证了其有效性和鲁棒性。

History

20260223_0328 20260222_0327 20260221_0340 20260220_0343 20260219_0354 20260218_0353 20260217_0336 20260216_0328 20260215_0327 20260213_0358 20260212_0400 20260211_0405 20260210_0407 20260209_0330 20260208_0328 20260207_0346 20260206_0343 20260205_0342 20260204_0351 20260202_0327 20260201_0324 20260131_0335 20260130_0334 20260129_0331 20260128_0330 20260127_0327 20260126_0321 20260125_0320 20260124_0329 20260123_0328 20260122_0333 20260121_0416 20260120_0324 20260119_0320 20260118_0318 20260117_0326 20260116_0329 20260115_0326 20260114_0325 20260113_0324 20260112_0323 20260111_0321 20260110_0324 20260109_0325 20260108_0325 20260107_0320 20260106_0327 20260105_0320 20260104_0319 20260103_0317 20260102_0329 20260101_0320 20251231_0326 20251230_0324 20251229_0320 20251228_0323 20251227_0321 20251226_0320 20251225_0320 20251224_0323 20251223_0323 20251222_0320 20251221_0320 20251220_0323 20251219_0323 20251218_0335 20251217_0324 20251216_0325 20251215_1246 20251215_0333 20251214_0327 20251212_0333 20251211_0331 20251210_0332 20251209_0331 20251208_0328 20251207_0327 20251206_0330 20251205_0331 20251204_0331 20251203_0333 20251202_0335 20251201_0328 20251130_0327 20251129_0328 20251128_0327 20251127_0327 20251126_0329 20251125_0327 20251124_0327 20251123_0326 20251122_0328 20251121_0328 20251120_0329 20251119_0328 20251118_0328 20251117_0326 20251116_0325 20251115_0327 20251114_0328 20251113_0330 20251112_0329 20251111_0328 20251110_0325 20251109_0326 20251108_0328 20251107_0328 20251106_0329 20251105_0326 20251104_0327 20251103_0324 20251102_0326 20251101_0324 20251031_0328 20251030_0330 20251029_0329 20251028_0329 20251027_0322 20251026_0327 20251025_0331 20251024_0329 20251023_0329 20251022_0330 20251021_0331 20251020_0328 20251019_0321 20251018_0327 20251017_0320 20251016_0328 20251015_0328 20251014_0323 20251011_0328 20251010_0330 20251009_0321 20251008_0343 20251007_0353 20251006_0325 20251005_0350 20251004_0352 20251003_0352 20251002_0356 20251001_0321 20250925_0335 20250924_0350 20250923_0348 20250922_0346 20250921_0345 20250920_0342 20250919_0346 20250918_0342 20250917_0336 20250916_0333 20250915_0333 20250914_0328 20250913_0322 20250912_0335 20250911_0337 20250910_0338 20250909_0341 20250908_0342 20250907_0333 20250906_0350 20250905_0319 20250904_0323 20250903_0355 20250902_0325 20250901_0355 20250831_0355 20250830_0356 20250829_0355 20250828_0333 20250827_1654 20250827_1602 20250827_1557 20250827_0320 20250826_0320 20250825_1752 20250825_1709 20250825_1652 20250825_1647 20250825_1645 20250825_1631 20250825_1606 20250825_1559 20250825_1558 20250825_1556 20250825_1531 20250825_1525 20250825_1516 20250825_1450 20250825_1444 20250825_1438 20250825_1414 20250825_1413 20250825_1410 20250825_1408 20250825_1405 20250825_1401 20250825_1355 20250825_1347 20250825_1345 20250825_1344 20250825_1343 20250825_1340 20250825_1339 20250825_1333 20250825_1323 20250825_1317 20250825_1243 20250824_0342 20250823_0343 20250823_0142 20250822_2331 20250822_2308 20250822_2258 20250822_2241 20250822_2228 20250822_2206 20250822_2147 20250822_2111 20250822_1259 20250822_1233 20250822_1229 20250822_1223 20250822_1210 20250822_1201 20250822_1111 20250822_1058 20250822_1052 20250822_1045 20250822_0657 20250822_0553