Agentic workflows move from manual engineering to autonomous orchestration

Today’s batch reflects the industry’s pivot from building monolithic models to orchestrating specialized agentic systems. We are seeing a shift away from ‘model scale as the only solution’ toward smarter data synthesis, automated red teaming, and dynamic, experience-driven tool use.

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories

Du et al. · [abs] [pdf]

This work demonstrates that frontier-level search agent performance can be achieved via simple supervised fine-tuning (SFT) if the training data contains high-difficulty, informative trajectories. By shifting focus from resource-heavy RL pipelines to data synthesis, they prove that the ‘quality over quantity’ mantra holds for search-augmented LLMs.

↳ Proves that you don’t necessarily need massive RL scale to build a competitive search agent if your data generation is sufficiently adversarial.

Agents Search Data Synthesis

Redefining AI Red Teaming in the Agentic Era: From Weeks to Hours

Dheekonda et al. · [abs] [pdf]

The authors introduce an agentic framework that automates the construction of red-teaming workflows, replacing manual assembly of transforms and scorers. By using an agent to probe for vulnerabilities, they effectively collapse security validation timelines from weeks to hours.

↳ A necessary evolution in safety engineering; manual red-teaming is currently the bottleneck for deploying AI in high-stakes industries.

Security Agents Safety

An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration

Zhang et al. · [abs] [pdf]

This paper introduces a ‘Skill’ layer that sits between the agent and its retrieval pool to dynamically select search strategies based on task context. Instead of a one-size-fits-all RAG pipeline, the system consults an experience memory to optimize how evidence is surfaced for different task types.

↳ Addresses the critical ‘one-size-fits-all’ limitation in modern RAG, moving toward adaptive retrieval.

RAG Agents

SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment

Breda et al. · [abs] [pdf]

This large-scale study (N=13,917) evaluates AI agents for real-world symptom assessment in a consumer environment. It provides a rare, empirical look at the gap between curated medical benchmark performance and the messier reality of patient-reported symptoms in the wild.

↳ Grounds the hype around ‘medical AI’ with large-scale longitudinal evidence, highlighting the challenges of deployment outside controlled benchmarks.

Healthcare Evaluation

From Intent to Execution: Composing Agentic Workflows with Agent Recommendation

Athrey et al. · [abs] [pdf]

The authors propose a framework for automating the composition of multi-agent systems, replacing manual design of execution graphs with an automated recommendation engine. The system maps user intent directly to a workflow, treating agent composition as a software engineering task.

↳ Represents the transition from ‘hand-coding’ agent architectures to ‘orchestration-as-a-service’.

Agents Workflow Engineering

Keep your prompts tight and your evaluation sets tighter. Back to the terminal.