arXiv 论文速递

Alterbute: Editing Intrinsic Attributes of Objects in Images

Authors: Tal Reiss, Daniel Winter, Matan Cohen, Alex Rav-Acha, Yael Pritch, Ariel Shamir, Yedid Hoshen

First: 2026-01-15T18:59:53+00:00 · Latest: 2026-01-15T18:59:53+00:00

Comments: Project page is available at https://talreiss.github.io/alterbute/

Abstract

We introduce Alterbute, a diffusion-based method for editing an object's intrinsic attributes in an image. We allow changing color, texture, material, and even the shape of an object, while preserving its perceived identity and scene context. Existing approaches either rely on unsupervised priors that often fail to preserve identity or use overly restrictive supervision that prevents meaningful intrinsic variations. Our method relies on: (i) a relaxed training objective that allows the model to change both intrinsic and extrinsic attributes conditioned on an identity reference image, a textual prompt describing the target intrinsic attributes, and a background image and object mask defining the extrinsic context. At inference, we restrict extrinsic changes by reusing the original background and object mask, thereby ensuring that only the desired intrinsic attributes are altered; (ii) Visual Named Entities (VNEs) - fine-grained visual identity categories (e.g., ''Porsche 911 Carrera'') that group objects sharing identity-defining features while allowing variation in intrinsic attributes. We use a vision-language model to automatically extract VNE labels and intrinsic attribute descriptions from a large public image dataset, enabling scalable, identity-preserving supervision. Alterbute outperforms existing methods on identity-preserving object intrinsic attribute editing.

中文标题/摘要

标题：Alterbute：图像中对象固有属性的编辑

我们介绍了Alterbute，一种基于扩散的方法，用于编辑图像中对象的固有属性。我们允许更改对象的颜色、纹理、材质，甚至形状，同时保持其感知身份和场景上下文。现有方法要么依赖于无法保留身份的无监督先验，要么使用过于严格的监督，这限制了有意义的固有属性变化。我们的方法依赖于：(i) 放松的训练目标，允许模型在参考身份图像、描述目标固有属性的文本提示以及定义外部上下文的背景图像和对象掩码的条件下，同时改变固有属性和外部属性；(ii) 视觉命名实体（VNEs）——细粒度的视觉身份类别（例如，“Porsche 911 Carrera”），这些类别将具有身份定义特征的对象分组，同时允许固有属性的变化。我们使用视觉语言模型从大型公共图像数据集中自动提取VNE标签和固有属性描述，从而实现可扩展、保留身份的监督。Alterbute在保留身份的对象固有属性编辑方面优于现有方法。

Summary / 总结

Alterbute is a diffusion-based method for editing an object's intrinsic attributes in an image, such as color, texture, and material, while preserving its identity and scene context. It uses a relaxed training objective and Visual Named Entities (VNEs) to allow changes in intrinsic attributes while keeping extrinsic attributes consistent. The method outperforms existing approaches in identity-preserving object intrinsic attribute editing.

Alterbute 是一种基于扩散的方法，用于在图像中编辑对象的内在属性，如颜色、纹理、材料和形状，同时保持对象的身份和场景上下文。它使用一个宽松的训练目标和视觉命名实体（VNEs）来允许内在属性的变化，同时保持外在属性的一致性。该方法在保持对象身份不变的情况下，优于现有方法进行内在属性编辑。

MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching

Authors: Changle Qu, Sunhao Dai, Hengyi Cai, Jun Xu, Shuaiqiang Wang, Dawei Yin

First: 2026-01-15T18:59:23+00:00 · Latest: 2026-01-15T18:59:23+00:00

Abs · PDF · Code1 · Code2 · Code3

Abstract

Tool-Integrated Reasoning (TIR) empowers large language models (LLMs) to tackle complex tasks by interleaving reasoning steps with external tool interactions. However, existing reinforcement learning methods typically rely on outcome- or trajectory-level rewards, assigning uniform advantages to all steps within a trajectory. This coarse-grained credit assignment fails to distinguish effective tool calls from redundant or erroneous ones, particularly in long-horizon multi-turn scenarios. To address this, we propose MatchTIR, a framework that introduces fine-grained supervision via bipartite matching-based turn-level reward assignment and dual-level advantage estimation. Specifically, we formulate credit assignment as a bipartite matching problem between predicted and ground-truth traces, utilizing two assignment strategies to derive dense turn-level rewards. Furthermore, to balance local step precision with global task success, we introduce a dual-level advantage estimation scheme that integrates turn-level and trajectory-level signals, assigning distinct advantage values to individual interaction turns. Extensive experiments on three benchmarks demonstrate the superiority of MatchTIR. Notably, our 4B model surpasses the majority of 8B competitors, particularly in long-horizon and multi-turn tasks. Our codes are available at https://github.com/quchangle1/MatchTIR.

中文标题/摘要

标题：MatchTIR：通过二分匹配实现细粒度监督的工具集成推理

工具集成推理（TIR）通过在推理步骤与外部工具交互之间交替，赋予大型语言模型（LLMs）处理复杂任务的能力。然而，现有的强化学习方法通常依赖于结果或轨迹级别的奖励，对轨迹内的所有步骤分配相同的优点。这种粗粒度的信用分配无法区分有效的工具调用与冗余或错误的调用，特别是在长时间多轮场景中。为了解决这个问题，我们提出了MatchTIR框架，通过基于二分匹配的轮次级别奖励分配和双层优势估计引入细粒度监督。具体来说，我们将信用分配形式化为预测和真实轨迹之间的二分匹配问题，利用两种分配策略推导密集的轮次级别奖励。此外，为了平衡局部步骤精度与全局任务成功，我们引入了一种双层优势估计方案，结合轮次级别和轨迹级别的信号，为每个交互轮次分配不同的优势值。在三个基准上的广泛实验表明了MatchTIR的优势。值得注意的是，我们的4B模型在大多数8B竞争对手中表现更优，特别是在长时间多轮任务中。我们的代码可在https://github.com/quchangle1/MatchTIR获取。

Summary / 总结

MatchTIR is designed to enhance Tool-Integrated Reasoning (TIR) by providing fine-grained supervision through bipartite matching-based turn-level reward assignment and dual-level advantage estimation. This method distinguishes effective tool calls from redundant ones, especially in long-horizon multi-turn scenarios. Experiments show that MatchTIR outperforms most 8B models, particularly in long-horizon and multi-turn tasks.

MatchTIR通过提出一种细粒度监督框架来解决工具集成推理中的粗粒度信用分配问题。它使用二分匹配来分配轮次级别的奖励，并引入双层优势估计方案来平衡局部精度和全局成功。实验表明，MatchTIR在三个基准测试中优于大多数竞争对手，尤其是在长时程和多轮任务中。

From One-to-One to Many-to-Many: Dynamic Cross-Layer Injection for Deep Vision-Language Fusion

Authors: Cheng Chen, Yuyu Guo, Pengpeng Zeng, Jingkuan Song, Peng Di, Hang Yu, Lianli Gao

First: 2026-01-15T18:59:10+00:00 · Latest: 2026-01-15T18:59:10+00:00