Today’s batch highlights a shift toward rigorous infrastructure for AI agents, moving beyond simple prompting to systematic skill optimization and memory auditing. We also see a compelling attempt to ground scaling laws in information theory, moving past empirical curve-fitting.
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
SkillOpt treats agent skills as a trainable external state rather than ephemeral prompts, using a dedicated optimizer model to perform controlled add/delete/replace edits on skills based on rollout performance. This frames skill evolution as a systematic optimization process rather than heuristic self-revision. It successfully improves agent performance over multiple iterations by maintaining structured, version-controlled procedural knowledge.
↳ If you are building agents that need to get better at repetitive tasks, moving to explicit, optimized skill libraries is the next logical step beyond monolithic fine-tuning.
LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws
This work introduces the Shannon Scaling Law, mapping LLM training to information transmission over a noisy channel to explain non-monotonic phenomena like catastrophic overtraining. By treating model parameters as channel bandwidth and tokens as signal power, the authors provide a theoretical basis for performance degradation that standard power laws miss. It offers a more robust framework for predicting when adding compute becomes counterproductive.
↳ A rare piece of theory that actually explains real-world engineering headaches like quantization-induced degradation and compute-to-data scaling limits.
Agentic Proving for Program Verification
The authors apply an agentic approach using Claude Code to the CLEVER benchmark for Lean 4 program verification. They achieve a 98.8% specification generation rate and 87.5% success in verifying implementations against ground-truth specs. It demonstrates that agentic workflows are finally reliable enough to handle formal logic-heavy environments.
↳ Formal verification is the ultimate stress test for agents; these numbers suggest we are reaching a point where LLMs can serve as high-quality assistants for software engineers working in safety-critical domains.
MemAudit: Post-hoc Auditing of Poisoned Agent Memory via Causal Attribution and Structural Anomaly Detection
MemAudit provides a framework to detect and isolate malicious memory injections in LLM agents using causal attribution. Rather than relying on simple prompt filtering, it analyzes the agent’s memory bank to identify which specific records are steering problematic behavior. This is a necessary evolution in securing RAG and persistent-memory agent architectures.
↳ Security in agentic systems is currently the Wild West; post-hoc auditability is essential for any enterprise deployment involving long-term memory.
CVSearch: Empowering Multimodal LLMs with Cognitive Visual Search for High-Resolution Image Perception
CVSearch proposes an adaptive, training-free ‘Assess-then-Search’ workflow to handle high-resolution inputs in MLLMs. By dynamically deciding when to use visual experts versus scanning, it avoids the typical trade-off between computational redundancy and semantic fragmentation. It is an efficient, plug-and-play solution for models struggling with dense visual detail.
↳ Moving high-res perception from ‘just throw more tokens at it’ to a selective search paradigm is vital for efficiency in embodied AI.
📈 Patterns
The field is maturing from ‘can it do it’ to ‘how can we optimize and secure it,’ with a clear emphasis on treating agent components as modular, auditable, and theoretically grounded systems.
Back to the terminal. The theory is nice, but I’m looking forward to seeing if SkillOpt holds up in production environments.
