Purpose

This is the deep-dive review for one specific agent, run every 90 days from production launch. It proves — with numbers, not anecdotes — whether this particular agent is delivering its promised KPI, generating positive ROI, gaining adoption, accumulating bugs, or drifting.

Different from template 15 (which is the portfolio-level 2-page exec readout). Template 16 is the per-agent forensic record that:

The Department Champion uses to defend (or kill) their agent
The CoE Lead uses to spot patterns across the portfolio
Finance uses to validate the ROI claims that roll up into template 15
External auditors use as EU AI Act Article 72 post-market monitoring evidence

The 6-month question — "did this agent actually change anything?" — is answered with two of these reports (Q1 + Q2). The trend matters as much as the absolute numbers.

When you use it: Every 90 days, per production agent, until retirement. First one at Q1 = 3 months post-launch (overlap with template 14 30-day review and bridges into the quarterly cadence).
Who fills it: Department Champion (data owner) + Agent Owner (qualitative). CoE Lead reviews + signs.
Time: 2–3 hours of data assembly + 60 min review meeting + 30 min writeup.
Length: 3–5 pages. Long enough to be defensible, short enough to be read.

Worked example (AP Accountant invoice reconciliation — Q2 2026 review, 6 months in)

Per-Agent Quarterly Review — `finance-invoice-recon` Q2 2026

Quarter: Q2 2026 (Apr 1 – Jun 30) Review date: 2026-07-10 Author: Mike Chen (Finance Champion) + Sarah Patel (AP Accountant, primary user) CoE Lead reviewer: Morteza Moradi Time in production: 6 months (launched 2026-01-13)

§1. Headline (one-sentence summary)

The agent saved 273 accountant-hours in Q2 ($15K loaded cost), maintained 96% match accuracy with zero Sev-1 incidents, and gained a second user (Tom Riley). Cumulative 6-month ROI: $26,400 net positive. Recommendation: continue at Stage 1; defer Stage 2 promotion until Q3 with second-user data.

§2. KPI trend (quarter over quarter)

Metric	Card §12 target	Q1 2026 actual	Q2 2026 actual	Δ Q-o-Q	Status
Time per reconciliation (avg)	30 sec	35 sec	32 sec	↓ 9%	✅ Closing in on target
Match accuracy (rolling weekly)	≥ 95%	96.0%	96.4%	↑ 0.4 pt	✅ Above target
HITL acceptance rate (steady)	≥ 90%	92%	93%	↑ 1 pt	✅ Healthy, not collapsing
Auto-approved payments	0	0	0	flat	✅ As designed
Latency p95	< 15s	13s	11s	↓ 15%	✅ Trending better
Coverage (% invoices handled, not exception)	n/a (info)	92%	95%	↑ 3 pt	✅ Reach growing

KPI verdict: All targets hit. Trend positive on every dimension. No goalpost adjustments needed.

§3. ROI realized (this quarter + cumulative)

Q2 2026 calculation

Hours saved this quarter:
  ~127 invoices/week × 13 weeks × 2.5 min saved/invoice
  ≈ 4,128 minutes saved
  ≈ 68.8 hours saved (Sarah)
  + 25.2 hours saved (Tom, 4-week ramp from Jun)
  = 94.0 hours saved this quarter

Loaded cost saved:
  94 hours × $55/hr loaded cost
  = $5,170

Add-on value: zero missed matches at month-end (vs ~2% baseline)
  estimated avoided correction cost
  = ~$1,200/quarter

Costs:
  LLM API (Anthropic Claude):     $156
  Observability (LangSmith share): $87
  Infrastructure (n8n self-host): $42
  Maintenance (CoE time):          $320 (4 hrs at $80)
  Total:                          $605

Q2 net ROI: $5,170 + $1,200 − $605 = $5,765

Cumulative 6-month ROI

Period	Hours saved	Value	Costs	Net
Q1 2026 (Jan–Mar)	64 hrs	$3,520 + $1,200 = $4,720	$545	+$4,175
Q2 2026 (Apr–Jun)	94 hrs	$5,170 + $1,200 = $6,370	$605	+$5,765
Pre-launch ramp (Dec '25)	—	—	(build cost $X amortized over Year 1)	—
6-month cumulative	158 hrs saved	$11,090 value	$1,150 ops + amortized build	~$9,940 realized

(Build cost $16,000 amortized over 12 months → $1,333/month, so 6-month amortization = $8,000.)

Net 6-month ROI: +$9,940 − $8,000 build amortization = +$1,940 net positive over 6 months.

Year 1 projection: Q3 + Q4 expected to maintain $5K+/quarter net; full-year build cost fully amortized by month 12. Year-1 projected net: +$15K–$20K.

Finance sign-off (template 15 §17b dependency): Validated by Finance on 2026-07-08. Loaded-cost rate of $55/hr confirmed. Avoided-correction-cost estimate marked as illustrative (not GAAP).

§4. Adoption metrics

Metric	Q1 2026	Q2 2026	Δ	Notes
Unique users (active in quarter)	1 (Sarah)	2 (Sarah + Tom from Jun)	+1	Tom onboarded as backup
Total executions	1,560	1,651	+5.8%	Volume slightly up at quarter-end
Average executions per user-week	~120	~127	+5.8%	Steady
Time-to-first-use for new user (Tom)	n/a	14 minutes	—	Loom training was enough
Self-reported satisfaction (1–5)	4.5 (Sarah)	4.5 (Sarah), 4 (Tom)	flat	Tom wants the v1.1 cosmetic fix
Workflow coverage	92%	95%	+3pt	More invoice formats handled

Adoption verdict: Healthy. User base doubled. No churn. Sarah uses it for every invoice without prompting — adoption is "default-on" for AP work.

Depth of use: Sarah has stopped reviewing low-confidence matches one-at-a-time; she batches them at end of day. This is emergent positive behavior. Not a risk because HITL gate still enforced.

§5. The 5 monitoring signals — quarter-over-quarter

Signal	Q1 status	Q2 status	Trend	Alarms this quarter
Output distribution shift (PSI)	Stable, PSI 0.04	Stable, PSI 0.05	↑ 25% relative, still well under 0.2 threshold	0
HITL escalation rate	8% steady	7% steady	↓ 1pt — NOT collapsing (healthy)	0
Decision audit trail completeness	100%	100%	flat	0
Cost per execution (step level)	$0.040 mean	$0.038 mean	↓ 5%	0 (under cap)
Exception routing patterns	8% exception, all routed correctly	5% exception, all routed correctly	↓ — agent handling more cases inline	0

Signal verdict: All five signals stable. Nothing concerning. Drift detector did NOT fire this quarter (one Sev-2 incident in Q1 that I'll cover in §6 caused a one-time alarm that was resolved).

§6. Bug + incident trend

Severity	Q1 2026	Q2 2026	Δ	Open at end of quarter
Sev-1 (customer impact / data exposure / out-of-scope action)	0	0	flat	0
Sev-2 (sustained quality issue, near-miss Sev-1, broad disruption)	1 (NetSuite schema change drift incident, post-mortem ref: pm-2026-02-15.md)	0	↓	0
Sev-3 (brief outage, single-case quirk, contained quickly)	4 (rate-limits, parser edge cases)	2 (rate-limits)	↓ 50%	0
Sev-4 (cosmetic)	3 (vendor-name capitalization, etc.)	1 (same cosmetic, deferred to v1.1)	↓	1 (open in backlog)

Bug verdict: Declining trend. The Q1 Sev-2 was post-mortem'd, action items closed within 14 days, drift detector added (now monitoring at 1-hour window for sharp quality drops). No recurrence in Q2.

Open backlog: 1 cosmetic item (vendor-name capitalization in draft emails) — slated for v1.1 in Q3.

§7. Cost trend

Cost component	Q1 2026	Q2 2026	Δ
LLM API (Anthropic Claude)	$148	$156	+5% (volume)
Observability (LangSmith allocation)	$87	$87	flat
Infrastructure (n8n self-host allocation)	$42	$42	flat
Maintenance (CoE time)	$240 (3 hrs)	$320 (4 hrs)	+33% (Tom onboarding)
Total per-quarter cost	$517	$605	+17%

Cost verdict: Costs growing slower than value (value Q1 → Q2: +35%). Healthy cost-to-value ratio. No cost overruns. No budget escalation needed.

§8. Autonomy stage history

Date	Stage	Trigger	Decided by
2026-01-13	Stage 1 (Assistive)	Initial launch	CoE Lead + Head of Finance per Agent Card §7
2026-04-13 (Q1 review)	Stage 1 (no change)	Sarah requested staying at Stage 1; user preference is valid signal	CoE Lead + Head of Finance
2026-07-10 (Q2 review, this doc)	Stage 1 (defer Stage 2)	Per Card §7 thresholds we'd qualify, but Tom's onboarding window means we want 90 more days of two-user data before promotion. Discussion: Q3 review.	CoE Lead + Head of Finance

Autonomy verdict: Stay at Stage 1. Re-evaluate Stage 2 at Q3.

§9. User feedback summary

Q2 verbatim feedback (collected via 30-min interviews with each user):

Sarah Patel:

"It's invisible. I don't think about it. I think about all the other AP work I'm doing instead. The capitalization thing still bugs me but I just edit before send."

Tom Riley (4 weeks of use):

"It feels weirdly fast. I worried I'd miss something but the confidence scores help me trust it more. I've kept the Loom recording bookmarked but haven't needed it after week 1."

Sentiment: Both users want continuity. Neither wants to go back to manual. Both want the v1.1 cosmetic fix.

§10. Changes shipped this quarter

Date	Change	Reason	Impact
2026-04-08	v1.0.3 — added retry-with-backoff for NetSuite API	Q1 rate-limit Sev-3 follow-up	0 rate-limit incidents in Q2 (was 2 in Q1)
2026-05-22	v1.0.4 — golden set expanded by 50 invoices (multi-line POs + new vendor formats)	Quality investment	Match accuracy ↑ 0.4pt
2026-06-10	v1.0.5 — agent identity onboarded backup user Tom (Gmail OAuth scope expanded to two mailboxes)	Tom onboarding	Adoption +1 user

No prompt changes this quarter. Agent Card unchanged. Risk tier unchanged.

§11. Risk re-classification

Field	Last quarter	This quarter	Material change?
Risk tier	Medium	Medium	No
PII driver	Yes (mild — vendor contacts)	Yes — same	No
Consequential decisions about people	No	No	No
Autonomous behavior	No (HITL on every NetSuite-related action)	No	No
EU AI Act exposure	Not Annex III	Not Annex III	No
Jurisdictional tags	US + EU	US + EU	No

Risk verdict: No material change. Continue under current tier + sign-off chain.

§12. Cross-references to other artifacts (audit trail)

Agent Card v1.0.5: github.com/acme/agents/finance-invoice-recon/AGENT_CARD.md
Most recent eval report: eval-reports/eval-2026-06-15.md (re-ran after v1.0.4 golden set expansion)
Q1 2026 quarterly review: [link to Q1 doc]
30-day review (Feb 2026): templates/14-30-day-review--finance-invoice-recon--2026-02-13.md
Post-mortem (Sev-2 Feb 2026): post-mortems/pm-2026-02-15.md
Runbook current version: templates/09-runbook--finance-invoice-recon--v1.0.5.md

§13. Decision + plan for next quarter

Item	Decision	Owner	Due
Continue agent in production?	✅ Yes	Decided by CoE Lead	n/a
Promote to Stage 2 autonomy?	⏸ Defer to Q3 (need 2-user steady-state data)	CoE Lead	Q3 review 2026-10-10
Ship v1.1 with cosmetic fix + partial-match heuristic?	✅ Yes	Builder	2026-08-15
Expand to international vendors (EU vendor data)?	⚠️ Out of scope for v1.x — requires re-triage at M7 (EU AI Act Article 50 transparency check)	CoE Lead	Defer
Risk re-classification needed?	❌ No	—	—
Retirement candidate?	❌ No (ROI positive, KPI hit, no severe incidents)	—	—

§14. Roll-up to portfolio review (template 15)

The following numbers from this review feed the Q2 portfolio exec readout:

Quarterly ROI contribution: +$5,765
Cumulative ROI: +$9,940 (or +$1,940 net of build amortization)
Sev-1 incidents: 0
Sev-2 incidents: 0
Adoption growth: +1 user (now 2 users)
KPI status: hit
Compliance evidence retained: per Agent Card §11 (6-month LangSmith + S3 Glacier 7-year for SOX adjacency)

Sign-off

Role	Name	Date
Department Champion + author	Mike Chen	2026-07-10
Primary user (informed)	Sarah Patel	2026-07-10
Backup user (informed)	Tom Riley	2026-07-10
Finance (ROI validation)	Bob Lin, CFO	2026-07-08
AI CoE Lead	Morteza Moradi	2026-07-10
Head of Finance	(department head)	2026-07-11

Status: Quarterly review complete. Continue at Stage 1. Next review: Q3 2026 (2026-10-10).

Blank template (copy below for your agent's quarterly review)

# Per-Agent Quarterly Review — [Agent ID] [Quarter] [Year]

**Quarter:** [Q1 / Q2 / Q3 / Q4] [Year] ([date range])
**Review date:** [YYYY-MM-DD]
**Author:** [Department Champion + primary user]
**CoE Lead reviewer:** [Name]
**Time in production:** [N months]

## §1. Headline (one-sentence summary)
> [One sentence: the agent delivered $X / saved Y hrs / maintained Z accuracy / N users / verdict.]

## §2. KPI trend (quarter over quarter)

| Metric | Card §12 target | Prev Q | This Q | Δ Q-o-Q | Status |
|---|---|---|---|---|---|
| | | | | | |

**KPI verdict:** [hit / miss / mixed]

## §3. ROI realized (this quarter + cumulative)

### This quarter calculation

Hours saved: [N hrs] Loaded cost: [N hrs × $/hr] Add-on value: [revenue uplift / error reduction] Costs: [LLM + obs + infra + maintenance] Net: [+/− $]


### Cumulative ROI table

| Period | Hours saved | Value | Costs | Net |
|---|---|---|---|---|

**Year projection:** [estimate]
**Finance sign-off:** [validated / pending] by [name] on [date]

## §4. Adoption metrics

| Metric | Prev Q | This Q | Δ | Notes |
|---|---|---|---|---|
| Unique users | | | | |
| Total executions | | | | |
| Avg executions per user-week | | | | |
| Time-to-first-use for new users | | | | |
| Self-reported satisfaction | | | | |
| Workflow coverage | | | | |

**Adoption verdict:** [healthy / declining / stalled — with reasoning]

## §5. The 5 monitoring signals — quarter-over-quarter

| Signal | Prev Q | This Q | Trend | Alarms this Q |
|---|---|---|---|---|
| Output distribution shift | | | | |
| HITL escalation rate | | | | |
| Decision audit trail completeness | | | | |
| Cost per execution (step level) | | | | |
| Exception routing patterns | | | | |

**Signal verdict:** [all stable / one concern / multiple concerns — describe]

## §6. Bug + incident trend

| Severity | Prev Q | This Q | Δ | Open at end |
|---|---|---|---|---|
| Sev-1 | | | | |
| Sev-2 | | | | |
| Sev-3 | | | | |
| Sev-4 | | | | |

**Open backlog:** [list]
**Bug verdict:** [improving / stable / degrading]

## §7. Cost trend

| Cost component | Prev Q | This Q | Δ |
|---|---|---|---|

**Cost verdict:** [healthy / watch / concerning — with reasoning]

## §8. Autonomy stage history

| Date | Stage | Trigger | Decided by |
|---|---|---|---|

**Autonomy verdict:** [stay / promote / demote — with reasoning]

## §9. User feedback summary

[Verbatim quotes from primary + backup users + sentiment]

## §10. Changes shipped this quarter

| Date | Change | Reason | Impact |
|---|---|---|---|

## §11. Risk re-classification

| Field | Last Q | This Q | Material change? |
|---|---|---|---|

**Risk verdict:** [no change / re-classify — describe]

## §12. Cross-references to other artifacts

- Agent Card current version: [link]
- Most recent eval report: [link]
- Previous quarterly review: [link]
- Most recent 30-day review: [link]
- Recent post-mortems: [links]
- Runbook current: [link]

## §13. Decision + plan for next quarter

| Item | Decision | Owner | Due |
|---|---|---|---|
| Continue agent? | | | |
| Promote autonomy? | | | |
| Ship updates? | | | |
| Expand scope? | | | |
| Risk re-classify? | | | |
| Retirement candidate? | | | |

## §14. Roll-up to portfolio review

The following numbers feed template 15 (this quarter's exec readout):
- Quarterly ROI contribution: [$]
- Cumulative ROI: [$]
- Sev-1 incidents: [N]
- Sev-2 incidents: [N]
- Adoption growth: [Δ users]
- KPI status: [hit / miss / mixed]
- Compliance evidence retained per Agent Card §11: [confirmed]

## Sign-off

| Role | Name | Date |
|---|---|---|
| Department Champion + author | | |
| Primary user (informed) | | |
| Finance (ROI validation) | | |
| AI CoE Lead | | |
| Head of department | | |

**Status:** [complete / open items / re-review needed]
**Next review:** [date]

Usage notes

The 6-month question lives here. When the CEO asks "what did our AI investment actually do?" — the answer is two of these reports (Q1 + Q2) per agent, plus a portfolio readout (template 15) at month 6. Numbers, not anecdotes.
Re-running this quarterly is the discipline. Skip a quarter and the trend data has a gap; skip two and you can't credibly report.
Finance validates ROI BEFORE the exec readout. §3 has a Finance sign-off line — this matters because the per-agent number rolls up into the company-level exec story.
User feedback (§9) often matters more than the metrics. A statistically successful agent that the user resents is failing — and the failure shows up in adoption metrics (§4) and qualitative quotes before it shows up in incident count.
Don't move goalposts. If §2 misses the Card §12 target, the answer is either retire-candidate OR formal scope/KPI revision via re-approval at M8 — not silent re-targeting.
Autonomy promotion (§8) is the place where Stage 2 / Stage 3 decisions get made. Most agents stay at Stage 1 indefinitely. Promotion requires evidence; this report is the evidence.
The audit trail in §12 is what survives auditor questions. Keep links current.

Common pitfalls

Pitfall	What it looks like	Fix
ROI hand-waved	"Saves a lot of time" with no number	Force the hours-saved × loaded-cost math
ROI never validated by Finance	CoE Lead's spreadsheet only	Finance signs §3 before exec readout
Adoption assumed equal to volume	"1,650 executions = 1,650 sessions"	Track unique users + frequency separately
Drift summarized as "fine"	No PSI/KL-divergence numbers	Section 5 must have numeric trend
Bug count without trend	"We had 2 Sev-3 this quarter"	Always Q-o-Q: improving or degrading?
Autonomy promoted on metrics alone	"Hit 95%, promote to Stage 2"	User preference + 2-user steady state + Card §7 thresholds, not just one metric
User feedback skipped	Section 9 marked "TBD"	Run the 30-min user interview before review meeting
Cross-references stale	§12 links 404	Update at every review

Framework cross-references

framework.md §11.2 (per-agent lifecycle — quarterly)
framework.md §21 (5 monitoring signals — feeds §5)
framework.md §22 (autonomy progression — §8)
framework.md §29 (ROI tracking — §3)
framework.md §22.1 EU AI Act Article 72 (post-market monitoring)
framework.md §22.2 NIST AI RMF MANAGE (continuous improvement)
framework.md §22.3 ISO/IEC 42001 Clause 9.1 (monitoring + measurement)
workflows.md Step 15 (Quarterly review)
workflows.html → In Action view → node M17 Quarterly review
Companion template: 15-quarterly-exec-readout.md (this report rolls up into the portfolio readout)