Purpose
This is the deep-dive review for one specific agent, run every 90 days from production launch. It proves — with numbers, not anecdotes — whether this particular agent is delivering its promised KPI, generating positive ROI, gaining adoption, accumulating bugs, or drifting.
Different from template 15 (which is the portfolio-level 2-page exec readout). Template 16 is the per-agent forensic record that:
- The Department Champion uses to defend (or kill) their agent
- The CoE Lead uses to spot patterns across the portfolio
- Finance uses to validate the ROI claims that roll up into template 15
- External auditors use as EU AI Act Article 72 post-market monitoring evidence
The 6-month question — "did this agent actually change anything?" — is answered with two of these reports (Q1 + Q2). The trend matters as much as the absolute numbers.
- When you use it: Every 90 days, per production agent, until retirement. First one at Q1 = 3 months post-launch (overlap with template 14 30-day review and bridges into the quarterly cadence).
- Who fills it: Department Champion (data owner) + Agent Owner (qualitative). CoE Lead reviews + signs.
- Time: 2–3 hours of data assembly + 60 min review meeting + 30 min writeup.
- Length: 3–5 pages. Long enough to be defensible, short enough to be read.
Worked example (AP Accountant invoice reconciliation — Q2 2026 review, 6 months in)
Per-Agent Quarterly Review — finance-invoice-recon Q2 2026
Quarter: Q2 2026 (Apr 1 – Jun 30) Review date: 2026-07-10 Author: Mike Chen (Finance Champion) + Sarah Patel (AP Accountant, primary user) CoE Lead reviewer: Morteza Moradi Time in production: 6 months (launched 2026-01-13)
§1. Headline (one-sentence summary)
The agent saved 273 accountant-hours in Q2 ($15K loaded cost), maintained 96% match accuracy with zero Sev-1 incidents, and gained a second user (Tom Riley). Cumulative 6-month ROI: $26,400 net positive. Recommendation: continue at Stage 1; defer Stage 2 promotion until Q3 with second-user data.
§2. KPI trend (quarter over quarter)
| Metric | Card §12 target | Q1 2026 actual | Q2 2026 actual | Δ Q-o-Q | Status |
|---|---|---|---|---|---|
| Time per reconciliation (avg) | 30 sec | 35 sec | 32 sec | ↓ 9% | ✅ Closing in on target |
| Match accuracy (rolling weekly) | ≥ 95% | 96.0% | 96.4% | ↑ 0.4 pt | ✅ Above target |
| HITL acceptance rate (steady) | ≥ 90% | 92% | 93% | ↑ 1 pt | ✅ Healthy, not collapsing |
| Auto-approved payments | 0 | 0 | 0 | flat | ✅ As designed |
| Latency p95 | < 15s | 13s | 11s | ↓ 15% | ✅ Trending better |
| Coverage (% invoices handled, not exception) | n/a (info) | 92% | 95% | ↑ 3 pt | ✅ Reach growing |
KPI verdict: All targets hit. Trend positive on every dimension. No goalpost adjustments needed.
§3. ROI realized (this quarter + cumulative)
Q2 2026 calculation
Hours saved this quarter:
~127 invoices/week × 13 weeks × 2.5 min saved/invoice
≈ 4,128 minutes saved
≈ 68.8 hours saved (Sarah)
+ 25.2 hours saved (Tom, 4-week ramp from Jun)
= 94.0 hours saved this quarter
Loaded cost saved:
94 hours × $55/hr loaded cost
= $5,170
Add-on value: zero missed matches at month-end (vs ~2% baseline)
estimated avoided correction cost
= ~$1,200/quarter
Costs:
LLM API (Anthropic Claude): $156
Observability (LangSmith share): $87
Infrastructure (n8n self-host): $42
Maintenance (CoE time): $320 (4 hrs at $80)
Total: $605
Q2 net ROI: $5,170 + $1,200 − $605 = $5,765
Cumulative 6-month ROI
| Period | Hours saved | Value | Costs | Net |
|---|---|---|---|---|
| Q1 2026 (Jan–Mar) | 64 hrs | $3,520 + $1,200 = $4,720 | $545 | +$4,175 |
| Q2 2026 (Apr–Jun) | 94 hrs | $5,170 + $1,200 = $6,370 | $605 | +$5,765 |
| Pre-launch ramp (Dec '25) | — | — | (build cost $X amortized over Year 1) | — |
| 6-month cumulative | 158 hrs saved | $11,090 value | $1,150 ops + amortized build | ~$9,940 realized |
(Build cost $16,000 amortized over 12 months → $1,333/month, so 6-month amortization = $8,000.)
Net 6-month ROI: +$9,940 − $8,000 build amortization = +$1,940 net positive over 6 months.
Year 1 projection: Q3 + Q4 expected to maintain $5K+/quarter net; full-year build cost fully amortized by month 12. Year-1 projected net: +$15K–$20K.
Finance sign-off (template 15 §17b dependency): Validated by Finance on 2026-07-08. Loaded-cost rate of $55/hr confirmed. Avoided-correction-cost estimate marked as illustrative (not GAAP).
§4. Adoption metrics
| Metric | Q1 2026 | Q2 2026 | Δ | Notes |
|---|---|---|---|---|
| Unique users (active in quarter) | 1 (Sarah) | 2 (Sarah + Tom from Jun) | +1 | Tom onboarded as backup |
| Total executions | 1,560 | 1,651 | +5.8% | Volume slightly up at quarter-end |
| Average executions per user-week | ~120 | ~127 | +5.8% | Steady |
| Time-to-first-use for new user (Tom) | n/a | 14 minutes | — | Loom training was enough |
| Self-reported satisfaction (1–5) | 4.5 (Sarah) | 4.5 (Sarah), 4 (Tom) | flat | Tom wants the v1.1 cosmetic fix |
| Workflow coverage | 92% | 95% | +3pt | More invoice formats handled |
Adoption verdict: Healthy. User base doubled. No churn. Sarah uses it for every invoice without prompting — adoption is "default-on" for AP work.
Depth of use: Sarah has stopped reviewing low-confidence matches one-at-a-time; she batches them at end of day. This is emergent positive behavior. Not a risk because HITL gate still enforced.
§5. The 5 monitoring signals — quarter-over-quarter
| Signal | Q1 status | Q2 status | Trend | Alarms this quarter |
|---|---|---|---|---|
| Output distribution shift (PSI) | Stable, PSI 0.04 | Stable, PSI 0.05 | ↑ 25% relative, still well under 0.2 threshold | 0 |
| HITL escalation rate | 8% steady | 7% steady | ↓ 1pt — NOT collapsing (healthy) | 0 |
| Decision audit trail completeness | 100% | 100% | flat | 0 |
| Cost per execution (step level) | $0.040 mean | $0.038 mean | ↓ 5% | 0 (under cap) |
| Exception routing patterns | 8% exception, all routed correctly | 5% exception, all routed correctly | ↓ — agent handling more cases inline | 0 |
Signal verdict: All five signals stable. Nothing concerning. Drift detector did NOT fire this quarter (one Sev-2 incident in Q1 that I'll cover in §6 caused a one-time alarm that was resolved).
§6. Bug + incident trend
| Severity | Q1 2026 | Q2 2026 | Δ | Open at end of quarter |
|---|---|---|---|---|
| Sev-1 (customer impact / data exposure / out-of-scope action) | 0 | 0 | flat | 0 |
| Sev-2 (sustained quality issue, near-miss Sev-1, broad disruption) | 1 (NetSuite schema change drift incident, post-mortem ref: pm-2026-02-15.md) | 0 | ↓ | 0 |
| Sev-3 (brief outage, single-case quirk, contained quickly) | 4 (rate-limits, parser edge cases) | 2 (rate-limits) | ↓ 50% | 0 |
| Sev-4 (cosmetic) | 3 (vendor-name capitalization, etc.) | 1 (same cosmetic, deferred to v1.1) | ↓ | 1 (open in backlog) |
Bug verdict: Declining trend. The Q1 Sev-2 was post-mortem'd, action items closed within 14 days, drift detector added (now monitoring at 1-hour window for sharp quality drops). No recurrence in Q2.
Open backlog: 1 cosmetic item (vendor-name capitalization in draft emails) — slated for v1.1 in Q3.
§7. Cost trend
| Cost component | Q1 2026 | Q2 2026 | Δ |
|---|---|---|---|
| LLM API (Anthropic Claude) | $148 | $156 | +5% (volume) |
| Observability (LangSmith allocation) | $87 | $87 | flat |
| Infrastructure (n8n self-host allocation) | $42 | $42 | flat |
| Maintenance (CoE time) | $240 (3 hrs) | $320 (4 hrs) | +33% (Tom onboarding) |
| Total per-quarter cost | $517 | $605 | +17% |
Cost verdict: Costs growing slower than value (value Q1 → Q2: +35%). Healthy cost-to-value ratio. No cost overruns. No budget escalation needed.
§8. Autonomy stage history
| Date | Stage | Trigger | Decided by |
|---|---|---|---|
| 2026-01-13 | Stage 1 (Assistive) | Initial launch | CoE Lead + Head of Finance per Agent Card §7 |
| 2026-04-13 (Q1 review) | Stage 1 (no change) | Sarah requested staying at Stage 1; user preference is valid signal | CoE Lead + Head of Finance |
| 2026-07-10 (Q2 review, this doc) | Stage 1 (defer Stage 2) | Per Card §7 thresholds we'd qualify, but Tom's onboarding window means we want 90 more days of two-user data before promotion. Discussion: Q3 review. | CoE Lead + Head of Finance |
Autonomy verdict: Stay at Stage 1. Re-evaluate Stage 2 at Q3.
§9. User feedback summary
Q2 verbatim feedback (collected via 30-min interviews with each user):
Sarah Patel:
"It's invisible. I don't think about it. I think about all the other AP work I'm doing instead. The capitalization thing still bugs me but I just edit before send."
Tom Riley (4 weeks of use):
"It feels weirdly fast. I worried I'd miss something but the confidence scores help me trust it more. I've kept the Loom recording bookmarked but haven't needed it after week 1."
Sentiment: Both users want continuity. Neither wants to go back to manual. Both want the v1.1 cosmetic fix.
§10. Changes shipped this quarter
| Date | Change | Reason | Impact |
|---|---|---|---|
| 2026-04-08 | v1.0.3 — added retry-with-backoff for NetSuite API | Q1 rate-limit Sev-3 follow-up | 0 rate-limit incidents in Q2 (was 2 in Q1) |
| 2026-05-22 | v1.0.4 — golden set expanded by 50 invoices (multi-line POs + new vendor formats) | Quality investment | Match accuracy ↑ 0.4pt |
| 2026-06-10 | v1.0.5 — agent identity onboarded backup user Tom (Gmail OAuth scope expanded to two mailboxes) | Tom onboarding | Adoption +1 user |
No prompt changes this quarter. Agent Card unchanged. Risk tier unchanged.
§11. Risk re-classification
| Field | Last quarter | This quarter | Material change? |
|---|---|---|---|
| Risk tier | Medium | Medium | No |
| PII driver | Yes (mild — vendor contacts) | Yes — same | No |
| Consequential decisions about people | No | No | No |
| Autonomous behavior | No (HITL on every NetSuite-related action) | No | No |
| EU AI Act exposure | Not Annex III | Not Annex III | No |
| Jurisdictional tags | US + EU | US + EU | No |
Risk verdict: No material change. Continue under current tier + sign-off chain.
§12. Cross-references to other artifacts (audit trail)
- Agent Card v1.0.5:
github.com/acme/agents/finance-invoice-recon/AGENT_CARD.md - Most recent eval report:
eval-reports/eval-2026-06-15.md(re-ran after v1.0.4 golden set expansion) - Q1 2026 quarterly review:
[link to Q1 doc] - 30-day review (Feb 2026):
templates/14-30-day-review--finance-invoice-recon--2026-02-13.md - Post-mortem (Sev-2 Feb 2026):
post-mortems/pm-2026-02-15.md - Runbook current version:
templates/09-runbook--finance-invoice-recon--v1.0.5.md
§13. Decision + plan for next quarter
| Item | Decision | Owner | Due |
|---|---|---|---|
| Continue agent in production? | ✅ Yes | Decided by CoE Lead | n/a |
| Promote to Stage 2 autonomy? | ⏸ Defer to Q3 (need 2-user steady-state data) | CoE Lead | Q3 review 2026-10-10 |
| Ship v1.1 with cosmetic fix + partial-match heuristic? | ✅ Yes | Builder | 2026-08-15 |
| Expand to international vendors (EU vendor data)? | ⚠️ Out of scope for v1.x — requires re-triage at M7 (EU AI Act Article 50 transparency check) | CoE Lead | Defer |
| Risk re-classification needed? | ❌ No | — | — |
| Retirement candidate? | ❌ No (ROI positive, KPI hit, no severe incidents) | — | — |
§14. Roll-up to portfolio review (template 15)
The following numbers from this review feed the Q2 portfolio exec readout:
- Quarterly ROI contribution: +$5,765
- Cumulative ROI: +$9,940 (or +$1,940 net of build amortization)
- Sev-1 incidents: 0
- Sev-2 incidents: 0
- Adoption growth: +1 user (now 2 users)
- KPI status: hit
- Compliance evidence retained: per Agent Card §11 (6-month LangSmith + S3 Glacier 7-year for SOX adjacency)
Sign-off
| Role | Name | Date |
|---|---|---|
| Department Champion + author | Mike Chen | 2026-07-10 |
| Primary user (informed) | Sarah Patel | 2026-07-10 |
| Backup user (informed) | Tom Riley | 2026-07-10 |
| Finance (ROI validation) | Bob Lin, CFO | 2026-07-08 |
| AI CoE Lead | Morteza Moradi | 2026-07-10 |
| Head of Finance | (department head) | 2026-07-11 |
Status: Quarterly review complete. Continue at Stage 1. Next review: Q3 2026 (2026-10-10).
Blank template (copy below for your agent's quarterly review)
# Per-Agent Quarterly Review — [Agent ID] [Quarter] [Year]
**Quarter:** [Q1 / Q2 / Q3 / Q4] [Year] ([date range])
**Review date:** [YYYY-MM-DD]
**Author:** [Department Champion + primary user]
**CoE Lead reviewer:** [Name]
**Time in production:** [N months]
## §1. Headline (one-sentence summary)
> [One sentence: the agent delivered $X / saved Y hrs / maintained Z accuracy / N users / verdict.]
## §2. KPI trend (quarter over quarter)
| Metric | Card §12 target | Prev Q | This Q | Δ Q-o-Q | Status |
|---|---|---|---|---|---|
| | | | | | |
**KPI verdict:** [hit / miss / mixed]
## §3. ROI realized (this quarter + cumulative)
### This quarter calculation
Hours saved: [N hrs] Loaded cost: [N hrs × $/hr] Add-on value: [revenue uplift / error reduction] Costs: [LLM + obs + infra + maintenance] Net: [+/− $]
### Cumulative ROI table
| Period | Hours saved | Value | Costs | Net |
|---|---|---|---|---|
**Year projection:** [estimate]
**Finance sign-off:** [validated / pending] by [name] on [date]
## §4. Adoption metrics
| Metric | Prev Q | This Q | Δ | Notes |
|---|---|---|---|---|
| Unique users | | | | |
| Total executions | | | | |
| Avg executions per user-week | | | | |
| Time-to-first-use for new users | | | | |
| Self-reported satisfaction | | | | |
| Workflow coverage | | | | |
**Adoption verdict:** [healthy / declining / stalled — with reasoning]
## §5. The 5 monitoring signals — quarter-over-quarter
| Signal | Prev Q | This Q | Trend | Alarms this Q |
|---|---|---|---|---|
| Output distribution shift | | | | |
| HITL escalation rate | | | | |
| Decision audit trail completeness | | | | |
| Cost per execution (step level) | | | | |
| Exception routing patterns | | | | |
**Signal verdict:** [all stable / one concern / multiple concerns — describe]
## §6. Bug + incident trend
| Severity | Prev Q | This Q | Δ | Open at end |
|---|---|---|---|---|
| Sev-1 | | | | |
| Sev-2 | | | | |
| Sev-3 | | | | |
| Sev-4 | | | | |
**Open backlog:** [list]
**Bug verdict:** [improving / stable / degrading]
## §7. Cost trend
| Cost component | Prev Q | This Q | Δ |
|---|---|---|---|
**Cost verdict:** [healthy / watch / concerning — with reasoning]
## §8. Autonomy stage history
| Date | Stage | Trigger | Decided by |
|---|---|---|---|
**Autonomy verdict:** [stay / promote / demote — with reasoning]
## §9. User feedback summary
[Verbatim quotes from primary + backup users + sentiment]
## §10. Changes shipped this quarter
| Date | Change | Reason | Impact |
|---|---|---|---|
## §11. Risk re-classification
| Field | Last Q | This Q | Material change? |
|---|---|---|---|
**Risk verdict:** [no change / re-classify — describe]
## §12. Cross-references to other artifacts
- Agent Card current version: [link]
- Most recent eval report: [link]
- Previous quarterly review: [link]
- Most recent 30-day review: [link]
- Recent post-mortems: [links]
- Runbook current: [link]
## §13. Decision + plan for next quarter
| Item | Decision | Owner | Due |
|---|---|---|---|
| Continue agent? | | | |
| Promote autonomy? | | | |
| Ship updates? | | | |
| Expand scope? | | | |
| Risk re-classify? | | | |
| Retirement candidate? | | | |
## §14. Roll-up to portfolio review
The following numbers feed template 15 (this quarter's exec readout):
- Quarterly ROI contribution: [$]
- Cumulative ROI: [$]
- Sev-1 incidents: [N]
- Sev-2 incidents: [N]
- Adoption growth: [Δ users]
- KPI status: [hit / miss / mixed]
- Compliance evidence retained per Agent Card §11: [confirmed]
## Sign-off
| Role | Name | Date |
|---|---|---|
| Department Champion + author | | |
| Primary user (informed) | | |
| Finance (ROI validation) | | |
| AI CoE Lead | | |
| Head of department | | |
**Status:** [complete / open items / re-review needed]
**Next review:** [date]
Usage notes
- The 6-month question lives here. When the CEO asks "what did our AI investment actually do?" — the answer is two of these reports (Q1 + Q2) per agent, plus a portfolio readout (template 15) at month 6. Numbers, not anecdotes.
- Re-running this quarterly is the discipline. Skip a quarter and the trend data has a gap; skip two and you can't credibly report.
- Finance validates ROI BEFORE the exec readout. §3 has a
Finance sign-offline — this matters because the per-agent number rolls up into the company-level exec story. - User feedback (§9) often matters more than the metrics. A statistically successful agent that the user resents is failing — and the failure shows up in adoption metrics (§4) and qualitative quotes before it shows up in incident count.
- Don't move goalposts. If §2 misses the Card §12 target, the answer is either retire-candidate OR formal scope/KPI revision via re-approval at M8 — not silent re-targeting.
- Autonomy promotion (§8) is the place where Stage 2 / Stage 3 decisions get made. Most agents stay at Stage 1 indefinitely. Promotion requires evidence; this report is the evidence.
- The audit trail in §12 is what survives auditor questions. Keep links current.
Common pitfalls
| Pitfall | What it looks like | Fix |
|---|---|---|
| ROI hand-waved | "Saves a lot of time" with no number | Force the hours-saved × loaded-cost math |
| ROI never validated by Finance | CoE Lead's spreadsheet only | Finance signs §3 before exec readout |
| Adoption assumed equal to volume | "1,650 executions = 1,650 sessions" | Track unique users + frequency separately |
| Drift summarized as "fine" | No PSI/KL-divergence numbers | Section 5 must have numeric trend |
| Bug count without trend | "We had 2 Sev-3 this quarter" | Always Q-o-Q: improving or degrading? |
| Autonomy promoted on metrics alone | "Hit 95%, promote to Stage 2" | User preference + 2-user steady state + Card §7 thresholds, not just one metric |
| User feedback skipped | Section 9 marked "TBD" | Run the 30-min user interview before review meeting |
| Cross-references stale | §12 links 404 | Update at every review |
Framework cross-references
framework.md§11.2 (per-agent lifecycle — quarterly)framework.md§21 (5 monitoring signals — feeds §5)framework.md§22 (autonomy progression — §8)framework.md§29 (ROI tracking — §3)framework.md§22.1 EU AI Act Article 72 (post-market monitoring)framework.md§22.2 NIST AI RMF MANAGE (continuous improvement)framework.md§22.3 ISO/IEC 42001 Clause 9.1 (monitoring + measurement)workflows.mdStep 15 (Quarterly review)workflows.html→ In Action view → node M17 Quarterly review- Companion template:
15-quarterly-exec-readout.md(this report rolls up into the portfolio readout)