Formal proof search takes the lead while autonomous agents gain source-level self-evolution

Today’s literature marks a shift toward operational maturity, focusing on the infrastructure of agentic systems and the practical application of LLMs to hard research problems. We are seeing a move away from pure prompting toward structural modifications of agent code and standardized, cross-platform tool definitions.

Advancing Mathematics Research with AI-Driven Formal Proof Search

Tsoukalas et al. · [abs] [pdf]

This work demonstrates an LLM-based agent capable of resolving open mathematical conjectures by generating proofs in Lean. The researchers successfully resolved 9 of 353 open Erdős problems and 44/492 OEIS conjectures, moving beyond toy benchmarks into active research contributions.

↳ This confirms that formal verification combined with LLMs has passed the threshold of being a viable, albeit costly, assistant for professional-grade research.

Formal Methods Mathematics LLMs

MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems

Cai et al. · [abs] [pdf]

MOSS pushes agent self-improvement past simple prompt or skill-file editing by allowing the agent to modify its own source code, including routing and state management logic. This addresses structural failure modes that are impossible to resolve through text-mutable artifacts alone.

↳ It represents a shift toward more dangerous, yet vastly more capable, recursive self-improvement that treats the agent’s core harness as dynamic rather than immutable.

Agents Software Engineering

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Hatamizadeh et al. · [abs] [pdf]

This paper refines linear attention models by decoupling the gate mechanisms for erasure and writing in the recurrent state. The result is a more stable architecture that prevents the memory-scrambling issues common in standard Delta-rule linear attention.

↳ Essential reading for those building or optimizing long-context recurrent-style transformers where state management is the primary bottleneck.

Transformers Linear Attention Architecture

LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

Asif et al. · [abs] [pdf]

LCGuard identifies a security vulnerability in multi-agent systems where agents share KV caches to save time, accidentally leaking sensitive intermediate states. They introduce a guardrail mechanism to sanitize these latent representations before they are cross-consumed.

↳ As multi-agent collaboration becomes the standard, raw KV sharing creates a massive, poorly-understood attack surface that we are only now beginning to regulate.

Security Multi-Agent Systems KV Cache

HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools

Jose, E. · [abs] [pdf]

HarnessAPI provides a unified framework to define tools that work simultaneously as standard HTTP REST endpoints and MCP-compliant agent tools. It uses Pydantic schemas as a single source of truth to prevent the common drift between production API documentation and agent tool definitions.

↳ A pragmatic piece of glue code that solves the immediate pain of maintaining two disparate tool definitions for humans and agents.

Tool Use Infrastructure

Deep Reinforcement Learning for Flexible Job Shop Scheduling with Random Job Arrivals

Tang et al. · [abs] [pdf]

The authors apply PPO to the Flexible Job Shop Scheduling problem, specifically targeting stochastic job arrivals that traditional MILP solvers struggle to handle in real-time. The approach shows superior performance in minimizing completion times in dynamic, unpredictable manufacturing environments.

↳ A solid example of DRL successfully replacing computationally expensive heuristics in high-stakes operational research tasks.

DRL Operations Research

Keep your KV caches clean and your agents in their containers. See you tomorrow.