arXiv 论文速递

Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning

Authors: Valentin Noël

First: 2026-01-02T18:49:37+00:00 · Latest: 2026-01-02T18:49:37+00:00

Comments: 58 pages, 19 figures, Under Review

Abstract

We present a training-free method for detecting valid mathematical reasoning in large language models through spectral analysis of attention patterns. By treating attention matrices as adjacency matrices of dynamic graphs over tokens, we extract four interpretable spectral diagnostics, the Fiedler value (algebraic connectivity), high-frequency energy ratio (HFER), graph signal smoothness, and spectral entropy, that exhibit statistically significant differences between valid and invalid mathematical proofs. Experiments across seven transformer models from four independent architectural families (Meta Llama, Alibaba Qwen, Microsoft Phi, and Mistral AI) demonstrate that this spectral signature produces effect sizes up to Cohen's $d = 3.30$ ($p < 10^{-116}$), enabling 85.0--95.6\% classification accuracy under rigorous evaluation, with calibrated thresholds reaching 93--95\% on the full dataset. The method requires no training data, fine-tuning, or learned classifiers: a single threshold on a spectral metric suffices for high accuracy. Through systematic label correction, we discover that the spectral method detects logical coherence rather than compiler acceptance, identifying mathematically valid proofs that formal verifiers reject due to technical failures. We further identify an architectural dependency: Mistral-7B's Sliding Window Attention shifts the discriminative signal from HFER to late-layer Smoothness ($d = 2.09$, $p_{\text{MW}} = 1.16 \times 10^{-48}$), revealing that attention mechanism design affects which spectral features capture reasoning validity. These findings establish spectral graph analysis as a principled framework for reasoning verification with immediate applications to hallucination detection and AI safety monitoring.

中文标题/摘要

标题：理性几何：有效数学推理的光谱特征

我们提出了一种无需训练的方法，通过光谱分析注意力模式来检测大型语言模型中的有效数学推理。通过将注意力矩阵视为动态图的邻接矩阵，我们提取了四个可解释的光谱诊断指标：Fiedler 值（代数连通性）、高频能量比（HFER）、图信号平滑性和光谱熵，这些指标在有效和无效数学证明之间表现出统计学上的显著差异。在四个独立架构家族（Meta Llama、阿里巴巴 Qwen、微软 Phi 和 Mistral AI）的七个变压器模型上进行的实验表明，这种光谱特征产生的效应大小高达 Cohen's $d = 3.30$ ($p < 10^{-116}$)，在严格的评估下可实现 85.0–95.6% 的分类准确率，且在完整数据集上校准的阈值达到 93–95%。该方法不需要训练数据、微调或学习分类器：只需一个光谱指标的阈值即可实现高准确率。通过系统性的标签修正，我们发现光谱方法检测的是逻辑连贯性而非编译器接受，识别出形式验证器因技术故障而拒绝的数学上有效的证明。我们还发现一种架构依赖性：Mistral-7B 的滑动窗口注意力将区分信号从 HFER 转移到晚期层平滑性 ($d = 2.09$, $p_{\text{MW}} = 1.16 \times 10^{-48}$)，揭示了注意力机制设计影响哪些光谱特征捕捉推理有效性的事实。这些发现确立了光谱图分析作为推理验证的原理性框架，并立即应用于幻觉检测和 AI 安全监控。

Summary / 总结

The study introduces a training-free method to detect valid mathematical reasoning in large language models by analyzing spectral diagnostics derived from attention patterns. By treating attention matrices as adjacency matrices of dynamic graphs, four spectral diagnostics—Fiedler value, high-frequency energy ratio, graph signal smoothness, and spectral entropy—are extracted, showing significant differences between valid and invalid proofs. Experiments across seven transformer models from different architectural families demonstrate high classification accuracy (85.0–95.6%) and calibrated thresholds (93–95%) with these spectral signatures. The method identifies logical coherence rather than compiler acceptance and reveals architectural dependencies affecting the discriminative signal.

研究提出了一种无需训练的方法，通过分析注意力模式的谱特性来识别大型语言模型中的有效数学推理。通过将注意力矩阵转换为动态图，作者提取了四个谱诊断指标：Fiedler值、高频率能量比、图信号平滑性和谱熵。这些指标在有效和无效证明之间显示出显著差异。实验表明，这些谱签名在七个来自四个架构家族的变压器模型中实现了高分类准确率（85.0–95.6%）和校准阈值（93–95%）。该方法检测逻辑连贯性而非编译接受，并揭示了架构依赖性如何影响区分信号。

Adapting Natural Language Processing Models Across Jurisdictions: A pilot Study in Canadian Cancer Registries

Authors: Jonathan Simkin, Lovedeep Gondara, Zeeshan Rizvi, Gregory Doyle, Jeff Dowden, Dan Bond, Desmond Martin, Raymond Ng

First: 2026-01-02T18:46:19+00:00 · Latest: 2026-01-02T18:46:19+00:00