Today’s literature marks a shift toward operational maturity, focusing on the infrastructure of agentic systems and the practical application of LLMs to hard research problems. We are seeing a move away from pure prompting toward structural modifications of agent code and standardized, cross-platform tool definitions.
Advancing Mathematics Research with AI-Driven Formal Proof Search
This work demonstrates an LLM-based agent capable of resolving open mathematical conjectures by generating proofs in Lean. The researchers successfully resolved 9 of 353 open Erdős problems and 44/492 OEIS conjectures, moving beyond toy benchmarks into active research contributions.
↳ This confirms that formal verification combined with LLMs has passed the threshold of being a viable, albeit costly, assistant for professional-grade research.
MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems
MOSS pushes agent self-improvement past simple prompt or skill-file editing by allowing the agent to modify its own source code, including routing and state management logic. This addresses structural failure modes that are impossible to resolve through text-mutable artifacts alone.
↳ It represents a shift toward more dangerous, yet vastly more capable, recursive self-improvement that treats the agent’s core harness as dynamic rather than immutable.
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
This paper refines linear attention models by decoupling the gate mechanisms for erasure and writing in the recurrent state. The result is a more stable architecture that prevents the memory-scrambling issues common in standard Delta-rule linear attention.
↳ Essential reading for those building or optimizing long-context recurrent-style transformers where state management is the primary bottleneck.
LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems
LCGuard identifies a security vulnerability in multi-agent systems where agents share KV caches to save time, accidentally leaking sensitive intermediate states. They introduce a guardrail mechanism to sanitize these latent representations before they are cross-consumed.
↳ As multi-agent collaboration becomes the standard, raw KV sharing creates a massive, poorly-understood attack surface that we are only now beginning to regulate.
HarnessAPI: A Skill-First Framework for Unified Streaming APIs and MCP Tools
HarnessAPI provides a unified framework to define tools that work simultaneously as standard HTTP REST endpoints and MCP-compliant agent tools. It uses Pydantic schemas as a single source of truth to prevent the common drift between production API documentation and agent tool definitions.
↳ A pragmatic piece of glue code that solves the immediate pain of maintaining two disparate tool definitions for humans and agents.
Deep Reinforcement Learning for Flexible Job Shop Scheduling with Random Job Arrivals
The authors apply PPO to the Flexible Job Shop Scheduling problem, specifically targeting stochastic job arrivals that traditional MILP solvers struggle to handle in real-time. The approach shows superior performance in minimizing completion times in dynamic, unpredictable manufacturing environments.
↳ A solid example of DRL successfully replacing computationally expensive heuristics in high-stakes operational research tasks.
📈 Patterns
The industry is clearly moving toward ‘agentic hardening’—securing latent communication channels, standardizing tool-calling interfaces, and allowing agents to rewrite their own foundational code.
Keep your KV caches clean and your agents in their containers. See you tomorrow.
