Owner: Agent Owner + Platform team. Input: Agent in production. Sub-steps: Day-in-day-out:
- Dashboards watched daily for the first 30 days; weekly afterwards.
- Five monitoring signals running continuously (
framework.md§21). - HITL escalation rate watched specifically. A falling escalation rate without a corresponding scope reduction is a flag, not a victory.
- Cost per execution watched at the step level. Token spikes investigated quickly.
- Incidents logged + triaged as they happen. Post-mortems written for anything Severity-2 or above.
- Drift / distribution-shift alarms routed to the Agent Owner and the CoE.
- Quarterly reconciliation against the Agent Card: is the agent still doing what its card says? If scope has drifted, formally update the card.
Output / gate criteria: Ongoing. Incident count, KPI tracking, cost trend, drift indicators all flowing into the registry / dashboards.
Decision branches:
- Severity-1 incident → invoke runbook; possibly kill the agent; do a full post-mortem; feed lessons back into the framework.
- KPI miss for one period → investigate; iterate.
- KPI miss for two consecutive quarters → retirement candidate (Step 16).
Skip-this-step risk: The whole point of governance was monitoring. Without this step, the entire framework reduces to paperwork.