Today’s batch highlights a clear shift in focus: from simple LLM prompting to sophisticated multi-agent orchestration and the mechanistic understanding of model internals. We are moving beyond general-purpose models toward domain-specific toolkits and self-evolving agent architectures.
MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems
MASPO introduces an iterative framework to jointly refine prompts across multiple agents to align individual roles with system-level goals. By treating prompt optimization as a joint problem rather than an isolated task, the authors mitigate the misalignment that typically plagues complex multi-agent cooperation.
↳ This is a necessary step for moving multi-agent systems from fragile prototypes to reliable, orchestrated production workflows.
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI
This work presents an agentic workbench designed for the messy, iterative reality of professional mathematical research. It manages state, tracks failed hypotheses, and integrates computational tools, effectively acting as an asynchronous partner rather than a simple code generator.
↳ It sets a high bar for domain-specific agents, showing how to structure interaction loops for open-ended creative tasks.
The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity
The authors provide a rigorous mechanistic explanation for the ‘attention sink’ phenomenon, tracing it to variance discrepancies in value aggregation and the influence of ‘super neurons’ in FFN layers. By mapping this to specific architectural components, they demystify one of the most persistent quirks of transformer inference.
↳ Understanding these architectural ‘sinks’ is critical for building more efficient, stable long-context models.
Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
The authors propose ScaleLogic, a synthetic benchmark to isolate proof-planning depth and logical expressiveness. They demonstrate that reinforcement learning performance is highly sensitive to the interaction between reasoning depth and the complexity of the underlying logic.
↳ This helps clarify the limits of current RL training regimes and suggests that reasoning scaling isn’t just about ‘more data’ but about task structure.
SkillOS: Learning Skill Curation for Self-Evolving Agents
SkillOS addresses the bottleneck of agent stagnation by automating the curation of reusable skills from past interactions. It moves beyond short-horizon learning by training a meta-policy to distill experience into a persistent, evolvable skill library.
↳ This is a foundational concept for long-lived agents that need to compound performance over time without human intervention.
BAMI: Training-Free Bias Mitigation in GUI Grounding
BAMI identifies that GUI grounding failures are largely driven by precision bias at high resolutions and ambiguity in dense interfaces. The authors propose a training-free inference method that dynamically adjusts predictions based on masked attribution to resolve these biases.
↳ Practical, compute-efficient fixes for GUI agent vision-language models that bypass the need for massive retraining.
📈 Patterns
The field is clearly transitioning from ‘model-centric’ progress toward ‘system-centric’ design, where interpretability and modular agent architectures are treated as production requirements.
Keep your prompts tight and your weights interpreted. See you tomorrow.
