Formalizing Agent Evolution and Decoding Scaling Laws

Today’s batch highlights a shift toward rigorous infrastructure for AI agents, moving beyond simple prompting to systematic skill optimization and memory auditing. We also see a compelling attempt to ground scaling laws in information theory, moving past empirical curve-fitting.

SkillOpt: Executive Strategy for Self-Evolving Agent Skills

Yang et al. · [abs] [pdf]

SkillOpt treats agent skills as a trainable external state rather than ephemeral prompts, using a dedicated optimizer model to perform controlled add/delete/replace edits on skills based on rollout performance. This frames skill evolution as a systematic optimization process rather than heuristic self-revision. It successfully improves agent performance over multiple iterations by maintaining structured, version-controlled procedural knowledge.

↳ If you are building agents that need to get better at repetitive tasks, moving to explicit, optimized skill libraries is the next logical step beyond monolithic fine-tuning.

Agents Optimization Skills

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Ouyang et al. · [abs] [pdf]

This work introduces the Shannon Scaling Law, mapping LLM training to information transmission over a noisy channel to explain non-monotonic phenomena like catastrophic overtraining. By treating model parameters as channel bandwidth and tokens as signal power, the authors provide a theoretical basis for performance degradation that standard power laws miss. It offers a more robust framework for predicting when adding compute becomes counterproductive.

↳ A rare piece of theory that actually explains real-world engineering headaches like quantization-induced degradation and compute-to-data scaling limits.

Theory Scaling Laws Information Theory

Agentic Proving for Program Verification

Sosso et al. · [abs] [pdf]

The authors apply an agentic approach using Claude Code to the CLEVER benchmark for Lean 4 program verification. They achieve a 98.8% specification generation rate and 87.5% success in verifying implementations against ground-truth specs. It demonstrates that agentic workflows are finally reliable enough to handle formal logic-heavy environments.

↳ Formal verification is the ultimate stress test for agents; these numbers suggest we are reaching a point where LLMs can serve as high-quality assistants for software engineers working in safety-critical domains.

Agents Formal Verification Coding

MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection

Tan et al. · [abs] [pdf]

MemAudit provides a framework to detect and isolate malicious memory injections in LLM agents using causal attribution. Rather than relying on simple prompt filtering, it analyzes the agent’s memory bank to identify which specific records are steering problematic behavior. This is a necessary evolution in securing RAG and persistent-memory agent architectures.

↳ Security in agentic systems is currently the Wild West; post-hoc auditability is essential for any enterprise deployment involving long-term memory.

Security Agents Memory

CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception

Li et al. · [abs] [pdf]

CVSearch proposes an adaptive, training-free ‘Assess-then-Search’ workflow to handle high-resolution inputs in MLLMs. By dynamically deciding when to use visual experts versus scanning, it avoids the typical trade-off between computational redundancy and semantic fragmentation. It is an efficient, plug-and-play solution for models struggling with dense visual detail.

↳ Moving high-res perception from ‘just throw more tokens at it’ to a selective search paradigm is vital for efficiency in embodied AI.

Vision-Language Efficiency

Back to the terminal. The theory is nice, but I’m looking forward to seeing if SkillOpt holds up in production environments.