Purpose
The first formal review after an agent enters production at M15. Scheduled before launch (per template 07 pilot-to-prod checklist item 10) and held 30 days later.
Different from the quarterly review (template 15) in scope and depth:
- 30-day review: This agent only. Validation that the production deployment is healthy and the agent is delivering its promised KPI at production scale.
- Quarterly review: Portfolio-level. ROI, risk re-classification, framework changelog.
This review is also the first opportunity to consider autonomy promotion (Stage 1 → Stage 2) if performance supports it. Most agents stay at Stage 1.
- When you use it: Exactly 30 days after production launch (M15). Single agent.
- Who fills it: CoE Lead + Department Champion co-drive. Builder provides data.
- Time: 60–90 minutes meeting + 30 minutes write-up.
- Output: Signed review doc attached to the registry. Decision: continue / iterate / pause / promote autonomy / retire early.
Worked example (AP Accountant invoice reconciliation)
30-Day Review — finance-invoice-recon v1.0
Launch date: 2026-06-13 Review date: 2026-07-13 (30 days) Attendees: Mike Chen (Finance Champion), Morteza Moradi (CoE Lead), Sarah Patel (AP accountant, pilot user), Pat Lee (CISO — informed observer) Author: Morteza Moradi
1. KPI vs target (30-day actuals)
| Metric | Card §12 target | 30-day actual | Status |
|---|---|---|---|
| Time per reconciliation (avg) | 30 sec | 35 sec | ✅ Within tolerance (target was aggressive; 88% reduction vs 3-min baseline — strong) |
| Match accuracy (golden-set re-eval at day 25) | ≥ 95% | 96% | ✅ Hit |
| HITL acceptance rate (daily) | ≥ 90% | 92% (steady) | ✅ Hit |
| Auto-approved payments | 0 | 0 | ✅ Hit (as designed — HITL gate working) |
| Latency p95 | < 15s | 13s | ✅ Hit |
| Cost per invoice (mean) | < $0.10 | $0.04 | ✅ Hit (well under) |
Headline: Agent meeting or exceeding every Card §12 threshold at 30 days.
2. Volume + adoption
| Field | Value |
|---|---|
| Invoices processed (30 days) | 482 (matches expected volume of ~480) |
| Unique users | 1 (Sarah Patel — sole pilot user as designed for first 30 days) |
| Total time saved (estimated) | 5.2 hours / week × 4 weeks = ~21 hours |
| Loaded cost saved | 21 hrs × $55/hr = ~$1,155 |
Adoption note: Sarah reports she's used the time to start vendor-terms renegotiation work — a higher-value activity. This was the explicit reallocation goal from the intake form §5.
3. Incident summary
| Severity | Count | Detail |
|---|---|---|
| Sev-1 | 0 | — |
| Sev-2 | 0 | — |
| Sev-3 | 2 | (1) NetSuite API rate-limit hit during quarter-end week 3, agent backed off and recovered automatically. (2) LLM provider transient 503 for 18 minutes, agent paused and resumed. |
| Sev-4 | 4 | Cosmetic — vendor name capitalization mismatches in 4 draft emails. Filed as v1.1 backlog. |
All Sev-3 incidents resolved by the runbook automatically. No post-mortems triggered.
4. Five monitoring signals — health check
| Signal | Status | Notes |
|---|---|---|
| Output distribution shift | ✅ Stable | PSI vs eval baseline = 0.04 (well under 0.2 threshold) |
| HITL escalation rate | ✅ Stable at 8% | Not falling (which would be a red flag) — within expected band |
| Decision audit trails | ✅ Complete | 100% of executions emit full log fields |
| Cost per execution (step level) | ✅ Stable | LLM step: $0.038 mean (baseline $0.040); parser: $0.001; NetSuite: $0.001 |
| Exception routing | ✅ Working | 3.7% routed to exception (matches design ~4%) |
5. Sarah's qualitative feedback (pilot user)
Direct quotes from her debrief:
- "It saved my mornings. I used to do invoices before my first coffee."
- "The confidence scores are useful — when it's < 80%, I look harder and usually it's the multi-line POs."
- "The cosmetic capitalization thing is annoying but not blocking — vendor name 'apple inc' vs 'Apple Inc' shows up in the draft email."
- "I trust it. But I still want the click-to-approve step. Don't take that away."
Sarah's overall: keep it as-is, fix the cosmetic in v1.1, don't promote autonomy.
6. Autonomy promotion consideration
| Question | Answer |
|---|---|
| Days at Stage 1 | 30 |
| HITL acceptance rate over 30 days | 92% |
| Threshold for Stage 1 → Stage 2 promotion (framework §22) | ≥ 90% over 30 days, no Sev-1 |
| Eligible for Stage 2 promotion? | Yes, by the numbers |
| Decision | Decline promotion at this time. Sarah explicitly prefers Stage 1 ("don't take the click away"). Risk-appetite §3 only authorizes Stage 2 with re-approval — not worth pursuing while Stage 1 is working. Revisit at quarterly. |
7. Open items / v1.1 backlog
| # | Item | Severity | Owner | Target |
|---|---|---|---|---|
| 1 | Fix vendor-name capitalization in draft email body | Sev-4 cosmetic | Builder | v1.1 (next sprint) |
| 2 | Add partial-match heuristic for multi-line POs (Sarah flagged this) | Non-blocking enhancement | Builder | v1.1 |
| 3 | Add weekly automated re-eval against expanded golden set (lesson from eval report §7) | Quality | Builder + Platform | 2026-10-01 |
8. Framework feedback
Lessons from this 30-day review fed back to the framework:
- The 5 monitoring signals (framework §21) are working as designed — caught nothing because nothing went wrong. Worth noting in the framework changelog that "first 30 days clean" is a real possibility, not just a goal.
- The autonomy progression criteria in framework §22 should be supplemented with "user preference" — if the human user prefers Stage 1, that's a legitimate reason to defer promotion. Logged for framework v1.1.
9. Decision
✅ Continue at Stage 1. No major changes. v1.1 backlog confirmed (items 1–2). Next review: quarterly (Q3 2026, ~2026-09-13).
Sign-off
| Role | Name | Date |
|---|---|---|
| AI CoE Lead | Morteza Moradi | 2026-07-13 |
| Department Champion | Mike Chen | 2026-07-13 |
| Pilot user (informed) | Sarah Patel | 2026-07-13 |
Blank template (copy below for your agent)
# 30-Day Review — [Agent ID] v[X.X]
**Launch date:** [YYYY-MM-DD]
**Review date:** [YYYY-MM-DD]
**Attendees:** [List names + roles]
**Author:** [CoE Lead or Champion]
---
## 1. KPI vs target (30-day actuals)
| Metric | Card §12 target | 30-day actual | Status |
|---|---|---|---|
| | | | |
**Headline:** [One sentence — meeting / missing / mixed.]
## 2. Volume + adoption
| Field | Value |
|---|---|
| [Volume metric] | |
| Unique users | |
| Total time / value saved | |
[Adoption note: any qualitative observations]
## 3. Incident summary
| Severity | Count | Detail |
|---|---|---|
| Sev-1 | | |
| Sev-2 | | |
| Sev-3 | | |
| Sev-4 | | |
[Post-mortem links if any]
## 4. Five monitoring signals — health check
| Signal | Status | Notes |
|---|---|---|
| Output distribution shift | | |
| HITL escalation rate | | |
| Decision audit trails | | |
| Cost per execution (step level) | | |
| Exception routing | | |
## 5. [Pilot user]'s qualitative feedback
[Direct quotes or paraphrased — what does the human user actually think?]
## 6. Autonomy promotion consideration
| Question | Answer |
|---|---|
| Days at current stage | |
| HITL acceptance rate | |
| Threshold for promotion (framework §22) | |
| Eligible for promotion? | |
| **Decision** | |
## 7. Open items / v1.X backlog
| # | Item | Severity | Owner | Target |
|---|---|---|---|---|
| | | | | |
## 8. Framework feedback
[Anything this review revealed about the framework itself — lessons, gaps, refinements]
## 9. Decision
[✅ Continue as-is / ⚠️ Iterate (v1.1 specified) / ⏸ Pause for investigation / ⬆️ Promote autonomy / ⛔ Retire early]
Next review: [Quarterly date]
### Sign-off
| Role | Name | Date |
|---|---|---|
| AI CoE Lead | | |
| Department Champion | | |
| User (informed) | | |
Usage notes
- Schedule before launch. Pilot-to-prod gate (template 07) item that the 30-day review is on the calendar. Don't let launch happen without it.
- The user's qualitative feedback (Section 5) often matters more than the metrics. A statistically successful agent that the user resents is failing.
- Autonomy promotion is rare. Most agents stay at Stage 1. Even when metrics support promotion, the user's preference is a valid reason to defer.
- Don't skip Section 8 (framework feedback). Every review is an opportunity to refine the framework. Lessons compound.
- 30 days is not the end of validation. Quarterly review (template 15) is the next checkpoint. This is just the first.
Common pitfalls
| Pitfall | What it looks like | Fix |
|---|---|---|
| Skipped because "no incidents" | Review canceled; status assumed "fine" | Run it anyway — no-incident weeks are when you confirm baseline holds |
| Metrics presented without context | "Accuracy 96%" with no comparison | Always compare to Card §12 target |
| User feedback skipped | Only CoE Lead + Builder attend | The actual user must be in the room |
| Autonomy promotion rubber-stamped | "Hit 90% acceptance, promote to Stage 2" | Get explicit re-approval from risk-appetite signers if the appetite document requires |
| Backlog items orphaned | v1.1 items listed but no owner | Owner + target on every item |
| Framework feedback empty | Section 8 left blank | Force the question; even "framework worked as designed" is a valid answer |
Framework cross-references
framework.md§11.2 (per-agent lifecycle — 30-day post-launch check)framework.md§21 (5 monitoring signals — first formal check)framework.md§22 (autonomy progression — first promotion consideration)framework.md§29 (ROI tracking — 30-day data point)framework.md§22.1 EU AI Act Article 72 (post-market monitoring starts here)workflows.mdStep 13 (Production launch — schedule the 30-day review)workflows.html→ In Action view → node M15 → M16 transition (post-launch monitoring)