When organizations invest in AI agents, they typically face a crucial question: How do we make these systems better over time?
Register for an upcoming AI Ops Lab. Learn More
The instinctive answer—"upgrade the AI model"—often leads teams down an expensive path that delivers diminishing returns. Recent research from Stanford, UIUC, and other leading AI institutions reveals a more strategic alternative: one approach achieves comparable results with 70× less data and 33× faster implementation time.
This isn't an incremental improvement. It's a fundamentally different cost structure that changes how organizations should think about AI agent investments.
The Real-World Challenge
Consider a scenario we see frequently in our AI Design Sprint™ workshops:
A financial services team spends months fine-tuning their document processing agent, optimizing the underlying AI model end-to-end. After investing significant resources—150,000+ training examples and substantial engineering cycles—they discover the retrieval system (how the agent finds relevant information) was the actual bottleneck.
A focused optimization of the retrieval component—achievable in weeks with a fraction of the data—would have delivered most of the improvement.
The model was never the problem.
This pattern repeats across industries. Teams reach for model fine-tuning when simpler, more targeted interventions would deliver better results faster.
The Four Ways to Improve AI Agents
New research introduces a unified framework that organizes agent improvement into four distinct approaches. Each has dramatically different implications for cost, flexibility, and long-term maintenance.
Understanding these options helps organizations make smarter investment decisions and avoid the "fine-tune everything" trap.
Approach 1: Train the Agent on Tool Success
What it means: You improve the AI model itself, using signals from whether tools executed successfully as your guide.
When it works well: Your agent performs specific, well-defined tasks with clear success criteria. Writing database queries (did the query run?), generating code (did it execute?), making API calls (did you get a valid response?).
The tradeoff: You're still modifying the core AI model, which creates tight coupling to specific model versions. When foundation models update—which happens constantly—you may need to retrain.
Best for: Organizations with strong technical teams who need agents to master specific, verifiable workflows.
Approach 2: Train the Agent on Overall Output Quality
What it means: You improve the AI model based on holistic output quality rather than individual tool success.
When it works well: Complex tasks where success depends on coordinating multiple tools and capabilities—like research tasks requiring web search, database queries, and document analysis working together.
The tradeoff: This approach requires substantial data. Research shows representative systems need approximately 170,000 training samples to achieve strong performance. At typical data engineering costs, that's a significant investment before you see results.
Best for: High-stakes applications where getting the overall answer right matters more than optimizing individual components, and where you have the resources to support large-scale data operations.
Approach 3: Improve the Tools Independently
What it means: Instead of training the agent, you optimize the tools it uses—retrieval systems, code execution environments, specialized analyzers—independent of any specific agent.
When it works well: You're building shared infrastructure that serves multiple AI applications. A better document retriever benefits every agent that searches documents. A better code analyzer helps any system that reviews code.
The tradeoff: Tools optimized in isolation can't adapt to the specific needs of any particular agent. A retrieval system that scores well on general benchmarks might underperform for your unique use case.
Best for: Organizations building horizontal AI infrastructure—shared capabilities that multiple teams and products will use.
Approach 4: Improve Tools Based on Agent Feedback
What it means: You train the tools rather than the agent, but you use the agent's behavior as your guide. The AI model essentially teaches the peripheral tools how to better serve its needs.
This is where the research becomes most compelling.
The s3 system demonstrates this approach. Instead of training an entire agent to search better, they trained a small "searcher" component to generate effective queries for a frozen reasoning model.
The results:
- 70× less data required: 2,400 training samples versus 170,000
- 33× faster training time
- Better generalization: 76.6% accuracy on new domains versus 71.8% for the full-model approach
The business implications are concrete. At typical data engineering costs, this approach costs roughly $1,200 versus $85,000 for the data-intensive alternative. That's the difference between a quick experiment your team runs next sprint and a quarterly budget conversation with leadership.
Best for: Organizations that want to add capabilities to existing agent systems without touching the core AI model—exactly the scenario most businesses face.
The Peripheral Adaptation Principle
This pattern is significant enough to warrant a name: Train the edges, freeze the core.
Rather than continuously modifying expensive, general-purpose AI models, invest in the specialized tools that surround them: retrieval systems, memory components, domain-specific adapters.
This principle aligns directly with how we approach AI implementation at Magnetiz.ai. The highest-ROI improvements often come from optimizing the workflow—the tools, data flows, and integration points—rather than chasing the latest model upgrade.
What This Means for Your AI Strategy
For organizations evaluating AI agent investments, this framework offers practical guidance:
Start with the workflow, not the model. When an AI agent underperforms, ask: Is the problem in the reasoning, or in the tools the agent uses? Document retrieval, data access, output formatting—these peripheral components are often the actual bottleneck.
Match your approach to your constraints. Resource-constrained teams (most organizations) should favor Approach 4's peripheral adaptation. The 70× data efficiency isn't marginal—it's the difference between experimentation and paralysis.
Build modular infrastructure. Treat retrieval, memory, and specialized tools as first-class components that can be independently optimized. This creates flexibility as your needs evolve.
Plan for the graduation lifecycle. Capabilities you develop for specific agents can become reusable tools for your broader AI ecosystem. Today's custom solution becomes tomorrow's shared infrastructure.
A Decision Framework

Getting Started
The teams that master peripheral adaptation—improving the tools around frozen AI models—will iterate faster and generalize better than teams locked into continuous model fine-tuning.
For your next AI improvement initiative, start with this question: Is the actual bottleneck in the AI model's reasoning, or in the tools and data it works with?
For most organizations, the answer points toward peripheral optimization. That's where the highest ROI lives.
Want Help?
The AI Ops Lab helps operations managers identify and capture high-value AI opportunities. Through process mapping, value analysis, and solution design, you'll discover efficiency gains worth $100,000 or more annually.
Apply now to see if you qualify for a one-hour session, where we'll help you map your workflows, calculate the value of automation, and visualize your AI-enabled operations. Limited spots available. Want to catch up on earlier issues? Explore our resource Hub.
Magnetiz.ai is your AI consultancy. We work with you to develop AI strategies that improve efficiency and deliver a competitive edge.

