Owner: Agent Owner (Department Champion). Input: Evaluation passed. Sub-steps:
- Deploy to a limited population. A small set of named users (e.g., 1–5 accountants, not the entire Finance team).
- Train the pilot users: what the agent does, what it does not do, where it can fail, how to report issues.
- Set explicit pilot success criteria before starting (e.g., "≥ 90% of agent recommendations accepted by humans over 30 days, zero Severity-1 incidents, latency p95 < X seconds").
- Monitor daily during the pilot — not weekly. Catch anomalies fast.
- Run weekly review meetings with pilot users for 4–6 weeks. Capture qualitative feedback.
- Track the five monitoring signals (
framework.md§21):- Output distribution shift.
- Escalation rate (watch out for it falling to zero — red flag).
- Decision audit trails capturing business logic.
- Cost per execution at the step level.
- Exception routing patterns.
- Log every incident, near-miss, and false output in the post-mortem file even if no one was harmed.
Output / gate criteria: A documented pilot performance report against the success criteria. Decision: promote, iterate, or kill.
Decision branches:
- Pilot KPIs missed → either fix and re-pilot, or kill the agent and document the lessons.
- Pilot KPIs met → go to Step 12.
Skip-this-step risk: Agent goes to full production with no real-world validation. First failure is also first customer-facing failure.