← All templates
Template 14

Template 14 — 30-Day Post-Launch Review

ID
14-30-day-review
Version
1
Last revised
2026-05-14
Owner
AI CoE Lead + Department Champion (co-drive) · Agent Builder (provides data)

Purpose

The first formal review after an agent enters production at M15. Scheduled before launch (per template 07 pilot-to-prod checklist item 10) and held 30 days later.

Different from the quarterly review (template 15) in scope and depth:

  • 30-day review: This agent only. Validation that the production deployment is healthy and the agent is delivering its promised KPI at production scale.
  • Quarterly review: Portfolio-level. ROI, risk re-classification, framework changelog.

This review is also the first opportunity to consider autonomy promotion (Stage 1 → Stage 2) if performance supports it. Most agents stay at Stage 1.

  • When you use it: Exactly 30 days after production launch (M15). Single agent.
  • Who fills it: CoE Lead + Department Champion co-drive. Builder provides data.
  • Time: 60–90 minutes meeting + 30 minutes write-up.
  • Output: Signed review doc attached to the registry. Decision: continue / iterate / pause / promote autonomy / retire early.

Worked example (AP Accountant invoice reconciliation)

30-Day Review — finance-invoice-recon v1.0

Launch date: 2026-06-13 Review date: 2026-07-13 (30 days) Attendees: Mike Chen (Finance Champion), Morteza Moradi (CoE Lead), Sarah Patel (AP accountant, pilot user), Pat Lee (CISO — informed observer) Author: Morteza Moradi


1. KPI vs target (30-day actuals)

MetricCard §12 target30-day actualStatus
Time per reconciliation (avg)30 sec35 sec✅ Within tolerance (target was aggressive; 88% reduction vs 3-min baseline — strong)
Match accuracy (golden-set re-eval at day 25)≥ 95%96%✅ Hit
HITL acceptance rate (daily)≥ 90%92% (steady)✅ Hit
Auto-approved payments00✅ Hit (as designed — HITL gate working)
Latency p95< 15s13s✅ Hit
Cost per invoice (mean)< $0.10$0.04✅ Hit (well under)

Headline: Agent meeting or exceeding every Card §12 threshold at 30 days.

2. Volume + adoption

FieldValue
Invoices processed (30 days)482 (matches expected volume of ~480)
Unique users1 (Sarah Patel — sole pilot user as designed for first 30 days)
Total time saved (estimated)5.2 hours / week × 4 weeks = ~21 hours
Loaded cost saved21 hrs × $55/hr = ~$1,155

Adoption note: Sarah reports she's used the time to start vendor-terms renegotiation work — a higher-value activity. This was the explicit reallocation goal from the intake form §5.

3. Incident summary

SeverityCountDetail
Sev-10
Sev-20
Sev-32(1) NetSuite API rate-limit hit during quarter-end week 3, agent backed off and recovered automatically. (2) LLM provider transient 503 for 18 minutes, agent paused and resumed.
Sev-44Cosmetic — vendor name capitalization mismatches in 4 draft emails. Filed as v1.1 backlog.

All Sev-3 incidents resolved by the runbook automatically. No post-mortems triggered.

4. Five monitoring signals — health check

SignalStatusNotes
Output distribution shift✅ StablePSI vs eval baseline = 0.04 (well under 0.2 threshold)
HITL escalation rate✅ Stable at 8%Not falling (which would be a red flag) — within expected band
Decision audit trails✅ Complete100% of executions emit full log fields
Cost per execution (step level)✅ StableLLM step: $0.038 mean (baseline $0.040); parser: $0.001; NetSuite: $0.001
Exception routing✅ Working3.7% routed to exception (matches design ~4%)

5. Sarah's qualitative feedback (pilot user)

Direct quotes from her debrief:

  • "It saved my mornings. I used to do invoices before my first coffee."
  • "The confidence scores are useful — when it's < 80%, I look harder and usually it's the multi-line POs."
  • "The cosmetic capitalization thing is annoying but not blocking — vendor name 'apple inc' vs 'Apple Inc' shows up in the draft email."
  • "I trust it. But I still want the click-to-approve step. Don't take that away."

Sarah's overall: keep it as-is, fix the cosmetic in v1.1, don't promote autonomy.

6. Autonomy promotion consideration

QuestionAnswer
Days at Stage 130
HITL acceptance rate over 30 days92%
Threshold for Stage 1 → Stage 2 promotion (framework §22)≥ 90% over 30 days, no Sev-1
Eligible for Stage 2 promotion?Yes, by the numbers
DecisionDecline promotion at this time. Sarah explicitly prefers Stage 1 ("don't take the click away"). Risk-appetite §3 only authorizes Stage 2 with re-approval — not worth pursuing while Stage 1 is working. Revisit at quarterly.

7. Open items / v1.1 backlog

#ItemSeverityOwnerTarget
1Fix vendor-name capitalization in draft email bodySev-4 cosmeticBuilderv1.1 (next sprint)
2Add partial-match heuristic for multi-line POs (Sarah flagged this)Non-blocking enhancementBuilderv1.1
3Add weekly automated re-eval against expanded golden set (lesson from eval report §7)QualityBuilder + Platform2026-10-01

8. Framework feedback

Lessons from this 30-day review fed back to the framework:

  • The 5 monitoring signals (framework §21) are working as designed — caught nothing because nothing went wrong. Worth noting in the framework changelog that "first 30 days clean" is a real possibility, not just a goal.
  • The autonomy progression criteria in framework §22 should be supplemented with "user preference" — if the human user prefers Stage 1, that's a legitimate reason to defer promotion. Logged for framework v1.1.

9. Decision

Continue at Stage 1. No major changes. v1.1 backlog confirmed (items 1–2). Next review: quarterly (Q3 2026, ~2026-09-13).

Sign-off

RoleNameDate
AI CoE LeadMorteza Moradi2026-07-13
Department ChampionMike Chen2026-07-13
Pilot user (informed)Sarah Patel2026-07-13

Blank template (copy below for your agent)

# 30-Day Review — [Agent ID] v[X.X]

**Launch date:** [YYYY-MM-DD]
**Review date:** [YYYY-MM-DD]
**Attendees:** [List names + roles]
**Author:** [CoE Lead or Champion]

---

## 1. KPI vs target (30-day actuals)

| Metric | Card §12 target | 30-day actual | Status |
|---|---|---|---|
| | | | |

**Headline:** [One sentence — meeting / missing / mixed.]

## 2. Volume + adoption

| Field | Value |
|---|---|
| [Volume metric] | |
| Unique users | |
| Total time / value saved | |

[Adoption note: any qualitative observations]

## 3. Incident summary

| Severity | Count | Detail |
|---|---|---|
| Sev-1 | | |
| Sev-2 | | |
| Sev-3 | | |
| Sev-4 | | |

[Post-mortem links if any]

## 4. Five monitoring signals — health check

| Signal | Status | Notes |
|---|---|---|
| Output distribution shift | | |
| HITL escalation rate | | |
| Decision audit trails | | |
| Cost per execution (step level) | | |
| Exception routing | | |

## 5. [Pilot user]'s qualitative feedback

[Direct quotes or paraphrased — what does the human user actually think?]

## 6. Autonomy promotion consideration

| Question | Answer |
|---|---|
| Days at current stage | |
| HITL acceptance rate | |
| Threshold for promotion (framework §22) | |
| Eligible for promotion? | |
| **Decision** | |

## 7. Open items / v1.X backlog

| # | Item | Severity | Owner | Target |
|---|---|---|---|---|
| | | | | |

## 8. Framework feedback

[Anything this review revealed about the framework itself — lessons, gaps, refinements]

## 9. Decision

[✅ Continue as-is / ⚠️ Iterate (v1.1 specified) / ⏸ Pause for investigation / ⬆️ Promote autonomy / ⛔ Retire early]

Next review: [Quarterly date]

### Sign-off

| Role | Name | Date |
|---|---|---|
| AI CoE Lead | | |
| Department Champion | | |
| User (informed) | | |

Usage notes

  • Schedule before launch. Pilot-to-prod gate (template 07) item that the 30-day review is on the calendar. Don't let launch happen without it.
  • The user's qualitative feedback (Section 5) often matters more than the metrics. A statistically successful agent that the user resents is failing.
  • Autonomy promotion is rare. Most agents stay at Stage 1. Even when metrics support promotion, the user's preference is a valid reason to defer.
  • Don't skip Section 8 (framework feedback). Every review is an opportunity to refine the framework. Lessons compound.
  • 30 days is not the end of validation. Quarterly review (template 15) is the next checkpoint. This is just the first.

Common pitfalls

PitfallWhat it looks likeFix
Skipped because "no incidents"Review canceled; status assumed "fine"Run it anyway — no-incident weeks are when you confirm baseline holds
Metrics presented without context"Accuracy 96%" with no comparisonAlways compare to Card §12 target
User feedback skippedOnly CoE Lead + Builder attendThe actual user must be in the room
Autonomy promotion rubber-stamped"Hit 90% acceptance, promote to Stage 2"Get explicit re-approval from risk-appetite signers if the appetite document requires
Backlog items orphanedv1.1 items listed but no ownerOwner + target on every item
Framework feedback emptySection 8 left blankForce the question; even "framework worked as designed" is a valid answer

Framework cross-references

  • framework.md §11.2 (per-agent lifecycle — 30-day post-launch check)
  • framework.md §21 (5 monitoring signals — first formal check)
  • framework.md §22 (autonomy progression — first promotion consideration)
  • framework.md §29 (ROI tracking — 30-day data point)
  • framework.md §22.1 EU AI Act Article 72 (post-market monitoring starts here)
  • workflows.md Step 13 (Production launch — schedule the 30-day review)
  • workflows.html → In Action view → node M15 → M16 transition (post-launch monitoring)