Every agent — Low, Medium, or High risk — logs the following per execution. No exceptions.
Per-execution log fields:
- Timestamp (start / end)
- Initiating user ID (or system trigger)
- Agent ID + version
- Prompt / input (with PII redaction where applicable)
- Output / response
- Tool calls made, with parameters
- Model + version used
- Token counts (input + output) and computed cost
- Policy checks (which fired, which passed, which blocked)
- HITL events (approved / rejected / overridden, by whom)
- Latency at each step
- Outcome (success / failure / human-overridden)
- Error state, if any
Dashboards we keep live:
- Token usage by agent and by department
- Cost by agent and by department
- Failure rate per agent
- Latency p50/p95 per agent
- Adoption (unique users per agent per week)
- HITL escalation rate per agent
- Distribution-shift / drift indicators per agent
- Incident count and severity per agent
Audit log retention: at least 6 months for High-risk agents (EU AI Act Article 19 baseline); longer where sector regulation requires.