arXiv 论文速递

Snapshot: 20260219_0354

Operationalising the Superficial Alignment Hypothesis via Task Complexity

Authors: Tomás Vergara-Browne, Darshan Patil, Ivan Titov, Siva Reddy, Tiago Pimentel, Marius Mosbach

First: 2026-02-17T18:59:39+00:00 · Latest: 2026-02-17T18:59:39+00:00

Abstract

The superficial alignment hypothesis (SAH) posits that large language models learn most of their knowledge during pre-training, and that post-training merely surfaces this knowledge. The SAH, however, lacks a precise definition, which has led to (i) different and seemingly orthogonal arguments supporting it, and (ii) important critiques to it. We propose a new metric called task complexity: the length of the shortest program that achieves a target performance on a task. In this framework, the SAH simply claims that pre-trained models drastically reduce the complexity of achieving high performance on many tasks. Our definition unifies prior arguments supporting the SAH, interpreting them as different strategies to find such short programs. Experimentally, we estimate the task complexity of mathematical reasoning, machine translation, and instruction following; we then show that these complexities can be remarkably low when conditioned on a pre-trained model. Further, we find that pre-training enables access to strong performances on our tasks, but it can require programs of gigabytes of length to access them. Post-training, on the other hand, collapses the complexity of reaching this same performance by several orders of magnitude. Overall, our results highlight that task adaptation often requires surprisingly little information -- often just a few kilobytes.

中文标题/摘要

标题：通过任务复杂度实现表层对齐假说的操作化

表层对齐假说（SAH）认为，大型语言模型在其预训练过程中学习了大部分知识，而后续训练只是将这些知识呈现出来。然而，SAH 缺乏精确的定义，这导致了（i）支持它的不同且看似独立的论据，以及（ii）对其的重要批评。我们提出了一种新的度量标准，即任务复杂度：实现特定任务目标性能的最短程序的长度。在这种框架下，SAH 简单地声称，预训练模型极大地降低了在许多任务上实现高性能的复杂度。我们的定义统一了之前支持 SAH 的论据，将它们解释为寻找此类短程序的不同策略。实验上，我们估计了数学推理、机器翻译和指令遵循的任务复杂度；然后我们展示了在预训练模型条件下，这些复杂度可以出奇地低。此外，我们发现预训练使访问我们任务的强性能变得可能，但可能需要数吉字长的程序才能实现。另一方面，后续训练将达到相同性能的复杂度压缩了几个数量级。总体而言，我们的结果突显了任务适应往往只需要极少的信息——通常只需几千字节。

Summary / 总结

The study operationalizes the superficial alignment hypothesis (SAH) by defining task complexity as the length of the shortest program achieving a target performance. The research shows that pre-training significantly reduces the complexity of achieving high performance on tasks like mathematical reasoning, machine translation, and instruction following, while post-training further decreases this complexity by several orders of magnitude. This suggests that pre-trained models enable access to strong performances with minimal additional information.

研究通过定义任务复杂性为实现目标性能所需最短程序的长度来操作化浅表对齐假设（SAH）。结果显示，预训练显著降低了在数学推理、机器翻译和指令遵循等任务上达到高性能的复杂性，而后续训练进一步将这一复杂性降低了几个数量级。这表明预训练模型能够以极少量的额外信息访问强大的性能。

Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation

Authors: Yuxuan Kuang, Sungjae Park, Katerina Fragkiadaki, Shubham Tulsiani

First: 2026-02-17T18:59:31+00:00 · Latest: 2026-02-17T18:59:31+00:00

Comments: Project page: https://dex4d.github.io/

Abs · PDF · Code1 · Code2 · Project1

Abstract

Learning generalist policies capable of accomplishing a plethora of everyday tasks remains an open challenge in dexterous manipulation. In particular, collecting large-scale manipulation data via real-world teleoperation is expensive and difficult to scale. While learning in simulation provides a feasible alternative, designing multiple task-specific environments and rewards for training is similarly challenging. We propose Dex4D, a framework that instead leverages simulation for learning task-agnostic dexterous skills that can be flexibly recomposed to perform diverse real-world manipulation tasks. Specifically, Dex4D learns a domain-agnostic 3D point track conditioned policy capable of manipulating any object to any desired pose. We train this 'Anypose-to-Anypose' policy in simulation across thousands of objects with diverse pose configurations, covering a broad space of robot-object interactions that can be composed at test time. At deployment, this policy can be zero-shot transferred to real-world tasks without finetuning, simply by prompting it with desired object-centric point tracks extracted from generated videos. During execution, Dex4D uses online point tracking for closed-loop perception and control. Extensive experiments in simulation and on real robots show that our method enables zero-shot deployment for diverse dexterous manipulation tasks and yields consistent improvements over prior baselines. Furthermore, we demonstrate strong generalization to novel objects, scene layouts, backgrounds, and trajectories, highlighting the robustness and scalability of the proposed framework.

中文标题/摘要

标题：Dex4D：通用点轨迹策略框架实现模拟到现实的灵巧操作

学习能够完成多种日常任务的一般性策略仍然是灵巧操作领域的开放挑战。特别是，通过现实世界的远程操作收集大规模操作数据既昂贵又难以扩展。虽然在模拟中学习提供了一种可行的替代方案，但设计多个特定任务的环境和奖励进行训练同样具有挑战性。我们提出了Dex4D框架，该框架利用模拟来学习任务无关的灵巧技能，这些技能可以在测试时灵活重组以执行各种现实世界的操作任务。具体而言，Dex4D学习了一种领域无关的3D点轨迹条件策略，该策略能够操作任何物体到任何期望的姿态。我们在数千种具有不同姿态配置的物体上对这种“任意姿态到任意姿态”的策略进行了模拟训练，涵盖了可以在测试时组合的广泛机器人-物体交互空间。在部署时，该策略可以通过仅提示其期望的物体中心点轨迹（从生成的视频中提取）来零样本转移至现实世界的任务，无需微调。在执行过程中，Dex4D使用在线点跟踪进行闭环感知和控制。在模拟和真实机器人上的大量实验表明，我们的方法能够实现多种灵巧操作任务的零样本部署，并且在先前基线方法上取得了持续改进。此外，我们展示了其在新型物体、场景布局、背景和轨迹上的强大泛化能力，突显了所提出框架的鲁棒性和可扩展性。

Summary / 总结

Dex4D is a framework designed to learn task-agnostic dexterous manipulation skills in simulation, which can be flexibly applied to various real-world tasks. It trains a 3D point track policy to manipulate any object to any desired pose across thousands of objects with diverse configurations. During deployment, the policy can be zero-shot transferred to real-world tasks by prompting it with desired object-centric point tracks. Experiments show that Dex4D outperforms previous methods and demonstrates strong generalization to novel objects and scenes.

Dex4D 是一个框架，旨在通过模拟学习通用的灵巧操作技能，这些技能可以灵活应用于各种实际任务。它训练了一个3D点轨迹策略，可以在数千个具有不同配置的对象上操纵任何物体到任何期望的姿态。该策略可以通过提示它所需的物体中心点轨迹在实际任务中零样本转移。实验表明，Dex4D 在性能上优于先前的方法，并且在新物体和场景中表现出强大的泛化能力。

Hunt Globally: Wide Search AI Agents for Drug Asset Scouting in Investing, Business Development, and Competitive Intelligence

Authors: Alisa Vinogradova, Vlad Vinogradov, Luba Greenwood, Ilya Yasny, Dmitry Kobyzev, Shoman Kasbekar, Kong Nguyen, Dmitrii Radkevich, Roman Doronin, Andrey Doronichev

First: 2026-02-16T18:57:49+00:00 · Latest: 2026-02-17T18:58:56+00:00