← All steps
Part B · Step 14

Continuous monitoring

Owner
Agent Owner + Platform team.
Input
Agent in production.

Owner: Agent Owner + Platform team. Input: Agent in production. Sub-steps: Day-in-day-out:

  1. Dashboards watched daily for the first 30 days; weekly afterwards.
  2. Five monitoring signals running continuously (framework.md §21).
  3. HITL escalation rate watched specifically. A falling escalation rate without a corresponding scope reduction is a flag, not a victory.
  4. Cost per execution watched at the step level. Token spikes investigated quickly.
  5. Incidents logged + triaged as they happen. Post-mortems written for anything Severity-2 or above.
  6. Drift / distribution-shift alarms routed to the Agent Owner and the CoE.
  7. Quarterly reconciliation against the Agent Card: is the agent still doing what its card says? If scope has drifted, formally update the card.

Output / gate criteria: Ongoing. Incident count, KPI tracking, cost trend, drift indicators all flowing into the registry / dashboards.

Decision branches:

  • Severity-1 incident → invoke runbook; possibly kill the agent; do a full post-mortem; feed lessons back into the framework.
  • KPI miss for one period → investigate; iterate.
  • KPI miss for two consecutive quarters → retirement candidate (Step 16).

Skip-this-step risk: The whole point of governance was monitoring. Without this step, the entire framework reduces to paperwork.