Half the production "agents" I see don't need to be agents. They're chains of two API calls dressed up in an agent framework, paying agent latency and agent cost for chain-shaped results.
Here's what an agent actually is. A loop where the model picks the next action from a set of tools, executes it, observes the result, and decides what to do next.
Multiple iterations. Real branching. The system makes its own decisions based on what just happened.
That's the definition. If your "agent" doesn't do that, it's a chain. Or a pipeline.
Calling it an agent is marketing.
The cost of being an agent
Agents pay tax for their flexibility.
Latency. Every iteration of the loop is at least one model round trip. A 4-iteration agent has 4x the latency of a single call. p95 of 8 seconds is common. p95 of 30 seconds happens.
Cost per call. Same multiplier. Plus the model has to re-read the conversation context every iteration. Costs scale with iteration count and prompt growth.
Debuggability. The execution path is non-deterministic. The same input can produce different traces. Reproducing a bug means logging every step of the loop and replaying it.
Failure modes. Agents fail in ways chains don't. Stuck in a tool-call loop, hallucinated tool arguments, picking the wrong tool for the input. You need eval coverage that catches these specifically.
A chain pays none of those. One call. Deterministic flow.
If your problem doesn't need the agent loop, the agent loop is taxing you for nothing.
When agents earn their cost
Agents are the right answer when at least one of these is true.
Real branching from a large tool set. If the right next action depends on the input in ways you can't enumerate ahead of time, and there are 5+ tools the model might pick from, an agent makes sense. Customer-support agents that can search docs, query orders, or escalate to a human are a fair example.
Long-horizon planning. If the system needs to break a goal into 10+ subtasks and adapt the plan as subtasks succeed or fail, you're past chain territory. Coding agents that read a repo, propose changes, run tests, and iterate are the canonical case.
Open-ended tool use. If the user can plug in their own tools and the system has to figure out how to use them, you need the loop. Function-calling assistants that connect to someone else's API on demand fit here.
If none of those apply, you don't need an agent.
What you probably need instead
For most production AI work, the right shape is a chain with two or three steps.
Step 1: extract or classify with a model call.
Step 2: do the work with code or another model call, depending on the result of step 1.
Step 3: validate or format the output.
That's it. No loop. No tool selection.
Cost is bounded. Latency is bounded. The whole thing fits in a single function you can read in 50 lines.
A chain is honest about what it's doing. An agent that doesn't need to be one is a chain pretending it doesn't know its own shape.
A worked example
The use case: customer wants to draft a reply email based on the incoming message and their company's voice guide.
Agent version (real, I've seen it):
- Tools: read inbox, search voice guide, draft reply, check tone, refine draft, send.
- Loop iterates 6 to 10 times depending on the message.
- p95 latency: 12 seconds.
- Cost per draft: $0.18.
- Failure mode: occasionally calls "send" instead of "draft" because the model gets confused about the goal.
Chain version (rebuild):
- Step 1: pull the relevant voice-guide section based on the email category (deterministic lookup).
- Step 2: single model call with the email + voice-guide section + a fixed prompt template. Output: draft reply.
- Step 3: validate length and check for forbidden phrases (regex).
- p95 latency: 1.2 seconds.
- Cost per draft: $0.012.
- Failure mode: bad voice match if the category lookup is wrong. Easy to test.
Same product. 10x faster.
15x cheaper. One-tenth the eval surface area.
The chain version doesn't look as cool in a demo. It works better in production.
A decision rule
Use an agent when all three of these are true.
- You can't enumerate the action space ahead of time.
- The decision about what to do next depends on results that don't exist until runtime.
- The cost and latency budget can absorb a 4-10x multiplier.
If any one is false, build a chain.
Most of the time, all three are false.
When in doubt
Build the chain version first. Measure it. If it doesn't work because the problem really does require autonomous decision-making, then upgrade to an agent.
You'll have a chain to compare against, which makes your eval suite easier to write.
The reverse never happens. Nobody builds an agent first and then downgrades to a chain when they discover they didn't need the loop. They keep paying the agent tax forever.