Agent demos are seductive. The planner calls tools, tools return results, the executor synthesizes and responds. The happy path is genuinely impressive. Then you hand it to a user and the first edge case sends it into a retry loop that burns $40 in API calls before timing out.
Failure mode 1: unbounded tool loops
The most common production failure is an agent that calls the same tool repeatedly because it doesn't know when it has enough information. Fix: explicit termination conditions on every loop, a maximum step count enforced at the orchestrator level, and a fallback that escalates to a human rather than retrying.
Warning
Never let an agent decide when it's done. Always build an external stop condition. Agents are optimistic about their own progress in ways that compound expensively.