Purpose

The first formal review after an agent enters production at M15. Scheduled before launch (per template 07 pilot-to-prod checklist item 10) and held 30 days later.

Different from the quarterly review (template 15) in scope and depth:

30-day review: This agent only. Validation that the production deployment is healthy and the agent is delivering its promised KPI at production scale.
Quarterly review: Portfolio-level. ROI, risk re-classification, framework changelog.

This review is also the first opportunity to consider autonomy promotion (Stage 1 → Stage 2) if performance supports it. Most agents stay at Stage 1.

When you use it: Exactly 30 days after production launch (M15). Single agent.
Who fills it: CoE Lead + Department Champion co-drive. Builder provides data.
Time: 60–90 minutes meeting + 30 minutes write-up.
Output: Signed review doc attached to the registry. Decision: continue / iterate / pause / promote autonomy / retire early.

Worked example (AP Accountant invoice reconciliation)

30-Day Review — `finance-invoice-recon` v1.0

Launch date: 2026-06-13 Review date: 2026-07-13 (30 days) Attendees: Mike Chen (Finance Champion), Morteza Moradi (CoE Lead), Sarah Patel (AP accountant, pilot user), Pat Lee (CISO — informed observer) Author: Morteza Moradi

1. KPI vs target (30-day actuals)

Metric	Card §12 target	30-day actual	Status
Time per reconciliation (avg)	30 sec	35 sec	✅ Within tolerance (target was aggressive; 88% reduction vs 3-min baseline — strong)
Match accuracy (golden-set re-eval at day 25)	≥ 95%	96%	✅ Hit
HITL acceptance rate (daily)	≥ 90%	92% (steady)	✅ Hit
Auto-approved payments	0	0	✅ Hit (as designed — HITL gate working)
Latency p95	< 15s	13s	✅ Hit
Cost per invoice (mean)	< $0.10	$0.04	✅ Hit (well under)

Headline: Agent meeting or exceeding every Card §12 threshold at 30 days.

2. Volume + adoption

Field	Value
Invoices processed (30 days)	482 (matches expected volume of ~480)
Unique users	1 (Sarah Patel — sole pilot user as designed for first 30 days)
Total time saved (estimated)	5.2 hours / week × 4 weeks = ~21 hours
Loaded cost saved	21 hrs × $55/hr = ~$1,155

Adoption note: Sarah reports she's used the time to start vendor-terms renegotiation work — a higher-value activity. This was the explicit reallocation goal from the intake form §5.

3. Incident summary

Severity	Count	Detail
Sev-1	0	—
Sev-2	0	—
Sev-3	2	(1) NetSuite API rate-limit hit during quarter-end week 3, agent backed off and recovered automatically. (2) LLM provider transient 503 for 18 minutes, agent paused and resumed.
Sev-4	4	Cosmetic — vendor name capitalization mismatches in 4 draft emails. Filed as v1.1 backlog.

All Sev-3 incidents resolved by the runbook automatically. No post-mortems triggered.

4. Five monitoring signals — health check

Signal	Status	Notes
Output distribution shift	✅ Stable	PSI vs eval baseline = 0.04 (well under 0.2 threshold)
HITL escalation rate	✅ Stable at 8%	Not falling (which would be a red flag) — within expected band
Decision audit trails	✅ Complete	100% of executions emit full log fields
Cost per execution (step level)	✅ Stable	LLM step: $0.038 mean (baseline $0.040); parser: $0.001; NetSuite: $0.001
Exception routing	✅ Working	3.7% routed to exception (matches design ~4%)

5. Sarah's qualitative feedback (pilot user)

Direct quotes from her debrief:

"It saved my mornings. I used to do invoices before my first coffee."
"The confidence scores are useful — when it's < 80%, I look harder and usually it's the multi-line POs."
"The cosmetic capitalization thing is annoying but not blocking — vendor name 'apple inc' vs 'Apple Inc' shows up in the draft email."
"I trust it. But I still want the click-to-approve step. Don't take that away."

Sarah's overall: keep it as-is, fix the cosmetic in v1.1, don't promote autonomy.

6. Autonomy promotion consideration

Question	Answer
Days at Stage 1	30
HITL acceptance rate over 30 days	92%
Threshold for Stage 1 → Stage 2 promotion (framework §22)	≥ 90% over 30 days, no Sev-1
Eligible for Stage 2 promotion?	Yes, by the numbers
Decision	Decline promotion at this time. Sarah explicitly prefers Stage 1 ("don't take the click away"). Risk-appetite §3 only authorizes Stage 2 with re-approval — not worth pursuing while Stage 1 is working. Revisit at quarterly.

7. Open items / v1.1 backlog

#	Item	Severity	Owner	Target
1	Fix vendor-name capitalization in draft email body	Sev-4 cosmetic	Builder	v1.1 (next sprint)
2	Add partial-match heuristic for multi-line POs (Sarah flagged this)	Non-blocking enhancement	Builder	v1.1
3	Add weekly automated re-eval against expanded golden set (lesson from eval report §7)	Quality	Builder + Platform	2026-10-01

8. Framework feedback

Lessons from this 30-day review fed back to the framework:

The 5 monitoring signals (framework §21) are working as designed — caught nothing because nothing went wrong. Worth noting in the framework changelog that "first 30 days clean" is a real possibility, not just a goal.
The autonomy progression criteria in framework §22 should be supplemented with "user preference" — if the human user prefers Stage 1, that's a legitimate reason to defer promotion. Logged for framework v1.1.

9. Decision

✅ Continue at Stage 1. No major changes. v1.1 backlog confirmed (items 1–2). Next review: quarterly (Q3 2026, ~2026-09-13).

Sign-off

Role	Name	Date
AI CoE Lead	Morteza Moradi	2026-07-13
Department Champion	Mike Chen	2026-07-13
Pilot user (informed)	Sarah Patel	2026-07-13

Blank template (copy below for your agent)

# 30-Day Review — [Agent ID] v[X.X]

**Launch date:** [YYYY-MM-DD]
**Review date:** [YYYY-MM-DD]
**Attendees:** [List names + roles]
**Author:** [CoE Lead or Champion]

---

## 1. KPI vs target (30-day actuals)

| Metric | Card §12 target | 30-day actual | Status |
|---|---|---|---|
| | | | |

**Headline:** [One sentence — meeting / missing / mixed.]

## 2. Volume + adoption

| Field | Value |
|---|---|
| [Volume metric] | |
| Unique users | |
| Total time / value saved | |

[Adoption note: any qualitative observations]

## 3. Incident summary

| Severity | Count | Detail |
|---|---|---|
| Sev-1 | | |
| Sev-2 | | |
| Sev-3 | | |
| Sev-4 | | |

[Post-mortem links if any]

## 4. Five monitoring signals — health check

| Signal | Status | Notes |
|---|---|---|
| Output distribution shift | | |
| HITL escalation rate | | |
| Decision audit trails | | |
| Cost per execution (step level) | | |
| Exception routing | | |

## 5. [Pilot user]'s qualitative feedback

[Direct quotes or paraphrased — what does the human user actually think?]

## 6. Autonomy promotion consideration

| Question | Answer |
|---|---|
| Days at current stage | |
| HITL acceptance rate | |
| Threshold for promotion (framework §22) | |
| Eligible for promotion? | |
| **Decision** | |

## 7. Open items / v1.X backlog

| # | Item | Severity | Owner | Target |
|---|---|---|---|---|
| | | | | |

## 8. Framework feedback

[Anything this review revealed about the framework itself — lessons, gaps, refinements]

## 9. Decision

[✅ Continue as-is / ⚠️ Iterate (v1.1 specified) / ⏸ Pause for investigation / ⬆️ Promote autonomy / ⛔ Retire early]

Next review: [Quarterly date]

### Sign-off

| Role | Name | Date |
|---|---|---|
| AI CoE Lead | | |
| Department Champion | | |
| User (informed) | | |

Usage notes

Schedule before launch. Pilot-to-prod gate (template 07) item that the 30-day review is on the calendar. Don't let launch happen without it.
The user's qualitative feedback (Section 5) often matters more than the metrics. A statistically successful agent that the user resents is failing.
Autonomy promotion is rare. Most agents stay at Stage 1. Even when metrics support promotion, the user's preference is a valid reason to defer.
Don't skip Section 8 (framework feedback). Every review is an opportunity to refine the framework. Lessons compound.
30 days is not the end of validation. Quarterly review (template 15) is the next checkpoint. This is just the first.

Common pitfalls

Pitfall	What it looks like	Fix
Skipped because "no incidents"	Review canceled; status assumed "fine"	Run it anyway — no-incident weeks are when you confirm baseline holds
Metrics presented without context	"Accuracy 96%" with no comparison	Always compare to Card §12 target
User feedback skipped	Only CoE Lead + Builder attend	The actual user must be in the room
Autonomy promotion rubber-stamped	"Hit 90% acceptance, promote to Stage 2"	Get explicit re-approval from risk-appetite signers if the appetite document requires
Backlog items orphaned	v1.1 items listed but no owner	Owner + target on every item
Framework feedback empty	Section 8 left blank	Force the question; even "framework worked as designed" is a valid answer

Framework cross-references

framework.md §11.2 (per-agent lifecycle — 30-day post-launch check)
framework.md §21 (5 monitoring signals — first formal check)
framework.md §22 (autonomy progression — first promotion consideration)
framework.md §29 (ROI tracking — 30-day data point)
framework.md §22.1 EU AI Act Article 72 (post-market monitoring starts here)
workflows.md Step 13 (Production launch — schedule the 30-day review)
workflows.html → In Action view → node M15 → M16 transition (post-launch monitoring)