← All templates
Template 16

Template 16 — Per-Agent Quarterly Review

ID
16-per-agent-quarterly-review
Version
1
Last revised
2026-05-14
Owner
Department Champion + Agent Owner (drive) · AI CoE Lead (signs) · Finance (validates ROI)

Purpose

This is the deep-dive review for one specific agent, run every 90 days from production launch. It proves — with numbers, not anecdotes — whether this particular agent is delivering its promised KPI, generating positive ROI, gaining adoption, accumulating bugs, or drifting.

Different from template 15 (which is the portfolio-level 2-page exec readout). Template 16 is the per-agent forensic record that:

  • The Department Champion uses to defend (or kill) their agent
  • The CoE Lead uses to spot patterns across the portfolio
  • Finance uses to validate the ROI claims that roll up into template 15
  • External auditors use as EU AI Act Article 72 post-market monitoring evidence

The 6-month question — "did this agent actually change anything?" — is answered with two of these reports (Q1 + Q2). The trend matters as much as the absolute numbers.

  • When you use it: Every 90 days, per production agent, until retirement. First one at Q1 = 3 months post-launch (overlap with template 14 30-day review and bridges into the quarterly cadence).
  • Who fills it: Department Champion (data owner) + Agent Owner (qualitative). CoE Lead reviews + signs.
  • Time: 2–3 hours of data assembly + 60 min review meeting + 30 min writeup.
  • Length: 3–5 pages. Long enough to be defensible, short enough to be read.

Worked example (AP Accountant invoice reconciliation — Q2 2026 review, 6 months in)

Per-Agent Quarterly Review — finance-invoice-recon Q2 2026

Quarter: Q2 2026 (Apr 1 – Jun 30) Review date: 2026-07-10 Author: Mike Chen (Finance Champion) + Sarah Patel (AP Accountant, primary user) CoE Lead reviewer: Morteza Moradi Time in production: 6 months (launched 2026-01-13)


§1. Headline (one-sentence summary)

The agent saved 273 accountant-hours in Q2 ($15K loaded cost), maintained 96% match accuracy with zero Sev-1 incidents, and gained a second user (Tom Riley). Cumulative 6-month ROI: $26,400 net positive. Recommendation: continue at Stage 1; defer Stage 2 promotion until Q3 with second-user data.

§2. KPI trend (quarter over quarter)

MetricCard §12 targetQ1 2026 actualQ2 2026 actualΔ Q-o-QStatus
Time per reconciliation (avg)30 sec35 sec32 sec↓ 9%✅ Closing in on target
Match accuracy (rolling weekly)≥ 95%96.0%96.4%↑ 0.4 pt✅ Above target
HITL acceptance rate (steady)≥ 90%92%93%↑ 1 pt✅ Healthy, not collapsing
Auto-approved payments000flat✅ As designed
Latency p95< 15s13s11s↓ 15%✅ Trending better
Coverage (% invoices handled, not exception)n/a (info)92%95%↑ 3 pt✅ Reach growing

KPI verdict: All targets hit. Trend positive on every dimension. No goalpost adjustments needed.

§3. ROI realized (this quarter + cumulative)

Q2 2026 calculation

Hours saved this quarter:
  ~127 invoices/week × 13 weeks × 2.5 min saved/invoice
  ≈ 4,128 minutes saved
  ≈ 68.8 hours saved (Sarah)
  + 25.2 hours saved (Tom, 4-week ramp from Jun)
  = 94.0 hours saved this quarter

Loaded cost saved:
  94 hours × $55/hr loaded cost
  = $5,170

Add-on value: zero missed matches at month-end (vs ~2% baseline)
  estimated avoided correction cost
  = ~$1,200/quarter

Costs:
  LLM API (Anthropic Claude):     $156
  Observability (LangSmith share): $87
  Infrastructure (n8n self-host): $42
  Maintenance (CoE time):          $320 (4 hrs at $80)
  Total:                          $605

Q2 net ROI: $5,170 + $1,200 − $605 = $5,765

Cumulative 6-month ROI

PeriodHours savedValueCostsNet
Q1 2026 (Jan–Mar)64 hrs$3,520 + $1,200 = $4,720$545+$4,175
Q2 2026 (Apr–Jun)94 hrs$5,170 + $1,200 = $6,370$605+$5,765
Pre-launch ramp (Dec '25)(build cost $X amortized over Year 1)
6-month cumulative158 hrs saved$11,090 value$1,150 ops + amortized build~$9,940 realized

(Build cost $16,000 amortized over 12 months → $1,333/month, so 6-month amortization = $8,000.)

Net 6-month ROI: +$9,940 − $8,000 build amortization = +$1,940 net positive over 6 months.

Year 1 projection: Q3 + Q4 expected to maintain $5K+/quarter net; full-year build cost fully amortized by month 12. Year-1 projected net: +$15K–$20K.

Finance sign-off (template 15 §17b dependency): Validated by Finance on 2026-07-08. Loaded-cost rate of $55/hr confirmed. Avoided-correction-cost estimate marked as illustrative (not GAAP).

§4. Adoption metrics

MetricQ1 2026Q2 2026ΔNotes
Unique users (active in quarter)1 (Sarah)2 (Sarah + Tom from Jun)+1Tom onboarded as backup
Total executions1,5601,651+5.8%Volume slightly up at quarter-end
Average executions per user-week~120~127+5.8%Steady
Time-to-first-use for new user (Tom)n/a14 minutesLoom training was enough
Self-reported satisfaction (1–5)4.5 (Sarah)4.5 (Sarah), 4 (Tom)flatTom wants the v1.1 cosmetic fix
Workflow coverage92%95%+3ptMore invoice formats handled

Adoption verdict: Healthy. User base doubled. No churn. Sarah uses it for every invoice without prompting — adoption is "default-on" for AP work.

Depth of use: Sarah has stopped reviewing low-confidence matches one-at-a-time; she batches them at end of day. This is emergent positive behavior. Not a risk because HITL gate still enforced.

§5. The 5 monitoring signals — quarter-over-quarter

SignalQ1 statusQ2 statusTrendAlarms this quarter
Output distribution shift (PSI)Stable, PSI 0.04Stable, PSI 0.05↑ 25% relative, still well under 0.2 threshold0
HITL escalation rate8% steady7% steady↓ 1pt — NOT collapsing (healthy)0
Decision audit trail completeness100%100%flat0
Cost per execution (step level)$0.040 mean$0.038 mean↓ 5%0 (under cap)
Exception routing patterns8% exception, all routed correctly5% exception, all routed correctly↓ — agent handling more cases inline0

Signal verdict: All five signals stable. Nothing concerning. Drift detector did NOT fire this quarter (one Sev-2 incident in Q1 that I'll cover in §6 caused a one-time alarm that was resolved).

§6. Bug + incident trend

SeverityQ1 2026Q2 2026ΔOpen at end of quarter
Sev-1 (customer impact / data exposure / out-of-scope action)00flat0
Sev-2 (sustained quality issue, near-miss Sev-1, broad disruption)1 (NetSuite schema change drift incident, post-mortem ref: pm-2026-02-15.md)00
Sev-3 (brief outage, single-case quirk, contained quickly)4 (rate-limits, parser edge cases)2 (rate-limits)↓ 50%0
Sev-4 (cosmetic)3 (vendor-name capitalization, etc.)1 (same cosmetic, deferred to v1.1)1 (open in backlog)

Bug verdict: Declining trend. The Q1 Sev-2 was post-mortem'd, action items closed within 14 days, drift detector added (now monitoring at 1-hour window for sharp quality drops). No recurrence in Q2.

Open backlog: 1 cosmetic item (vendor-name capitalization in draft emails) — slated for v1.1 in Q3.

§7. Cost trend

Cost componentQ1 2026Q2 2026Δ
LLM API (Anthropic Claude)$148$156+5% (volume)
Observability (LangSmith allocation)$87$87flat
Infrastructure (n8n self-host allocation)$42$42flat
Maintenance (CoE time)$240 (3 hrs)$320 (4 hrs)+33% (Tom onboarding)
Total per-quarter cost$517$605+17%

Cost verdict: Costs growing slower than value (value Q1 → Q2: +35%). Healthy cost-to-value ratio. No cost overruns. No budget escalation needed.

§8. Autonomy stage history

DateStageTriggerDecided by
2026-01-13Stage 1 (Assistive)Initial launchCoE Lead + Head of Finance per Agent Card §7
2026-04-13 (Q1 review)Stage 1 (no change)Sarah requested staying at Stage 1; user preference is valid signalCoE Lead + Head of Finance
2026-07-10 (Q2 review, this doc)Stage 1 (defer Stage 2)Per Card §7 thresholds we'd qualify, but Tom's onboarding window means we want 90 more days of two-user data before promotion. Discussion: Q3 review.CoE Lead + Head of Finance

Autonomy verdict: Stay at Stage 1. Re-evaluate Stage 2 at Q3.

§9. User feedback summary

Q2 verbatim feedback (collected via 30-min interviews with each user):

Sarah Patel:

"It's invisible. I don't think about it. I think about all the other AP work I'm doing instead. The capitalization thing still bugs me but I just edit before send."

Tom Riley (4 weeks of use):

"It feels weirdly fast. I worried I'd miss something but the confidence scores help me trust it more. I've kept the Loom recording bookmarked but haven't needed it after week 1."

Sentiment: Both users want continuity. Neither wants to go back to manual. Both want the v1.1 cosmetic fix.

§10. Changes shipped this quarter

DateChangeReasonImpact
2026-04-08v1.0.3 — added retry-with-backoff for NetSuite APIQ1 rate-limit Sev-3 follow-up0 rate-limit incidents in Q2 (was 2 in Q1)
2026-05-22v1.0.4 — golden set expanded by 50 invoices (multi-line POs + new vendor formats)Quality investmentMatch accuracy ↑ 0.4pt
2026-06-10v1.0.5 — agent identity onboarded backup user Tom (Gmail OAuth scope expanded to two mailboxes)Tom onboardingAdoption +1 user

No prompt changes this quarter. Agent Card unchanged. Risk tier unchanged.

§11. Risk re-classification

FieldLast quarterThis quarterMaterial change?
Risk tierMediumMediumNo
PII driverYes (mild — vendor contacts)Yes — sameNo
Consequential decisions about peopleNoNoNo
Autonomous behaviorNo (HITL on every NetSuite-related action)NoNo
EU AI Act exposureNot Annex IIINot Annex IIINo
Jurisdictional tagsUS + EUUS + EUNo

Risk verdict: No material change. Continue under current tier + sign-off chain.

§12. Cross-references to other artifacts (audit trail)

  • Agent Card v1.0.5: github.com/acme/agents/finance-invoice-recon/AGENT_CARD.md
  • Most recent eval report: eval-reports/eval-2026-06-15.md (re-ran after v1.0.4 golden set expansion)
  • Q1 2026 quarterly review: [link to Q1 doc]
  • 30-day review (Feb 2026): templates/14-30-day-review--finance-invoice-recon--2026-02-13.md
  • Post-mortem (Sev-2 Feb 2026): post-mortems/pm-2026-02-15.md
  • Runbook current version: templates/09-runbook--finance-invoice-recon--v1.0.5.md

§13. Decision + plan for next quarter

ItemDecisionOwnerDue
Continue agent in production?✅ YesDecided by CoE Leadn/a
Promote to Stage 2 autonomy?⏸ Defer to Q3 (need 2-user steady-state data)CoE LeadQ3 review 2026-10-10
Ship v1.1 with cosmetic fix + partial-match heuristic?✅ YesBuilder2026-08-15
Expand to international vendors (EU vendor data)?⚠️ Out of scope for v1.x — requires re-triage at M7 (EU AI Act Article 50 transparency check)CoE LeadDefer
Risk re-classification needed?❌ No
Retirement candidate?❌ No (ROI positive, KPI hit, no severe incidents)

§14. Roll-up to portfolio review (template 15)

The following numbers from this review feed the Q2 portfolio exec readout:

  • Quarterly ROI contribution: +$5,765
  • Cumulative ROI: +$9,940 (or +$1,940 net of build amortization)
  • Sev-1 incidents: 0
  • Sev-2 incidents: 0
  • Adoption growth: +1 user (now 2 users)
  • KPI status: hit
  • Compliance evidence retained: per Agent Card §11 (6-month LangSmith + S3 Glacier 7-year for SOX adjacency)

Sign-off

RoleNameDate
Department Champion + authorMike Chen2026-07-10
Primary user (informed)Sarah Patel2026-07-10
Backup user (informed)Tom Riley2026-07-10
Finance (ROI validation)Bob Lin, CFO2026-07-08
AI CoE LeadMorteza Moradi2026-07-10
Head of Finance(department head)2026-07-11

Status: Quarterly review complete. Continue at Stage 1. Next review: Q3 2026 (2026-10-10).


Blank template (copy below for your agent's quarterly review)

# Per-Agent Quarterly Review — [Agent ID] [Quarter] [Year]

**Quarter:** [Q1 / Q2 / Q3 / Q4] [Year] ([date range])
**Review date:** [YYYY-MM-DD]
**Author:** [Department Champion + primary user]
**CoE Lead reviewer:** [Name]
**Time in production:** [N months]

## §1. Headline (one-sentence summary)
> [One sentence: the agent delivered $X / saved Y hrs / maintained Z accuracy / N users / verdict.]

## §2. KPI trend (quarter over quarter)

| Metric | Card §12 target | Prev Q | This Q | Δ Q-o-Q | Status |
|---|---|---|---|---|---|
| | | | | | |

**KPI verdict:** [hit / miss / mixed]

## §3. ROI realized (this quarter + cumulative)

### This quarter calculation

Hours saved: [N hrs] Loaded cost: [N hrs × $/hr] Add-on value: [revenue uplift / error reduction] Costs: [LLM + obs + infra + maintenance] Net: [+/− $]


### Cumulative ROI table

| Period | Hours saved | Value | Costs | Net |
|---|---|---|---|---|

**Year projection:** [estimate]
**Finance sign-off:** [validated / pending] by [name] on [date]

## §4. Adoption metrics

| Metric | Prev Q | This Q | Δ | Notes |
|---|---|---|---|---|
| Unique users | | | | |
| Total executions | | | | |
| Avg executions per user-week | | | | |
| Time-to-first-use for new users | | | | |
| Self-reported satisfaction | | | | |
| Workflow coverage | | | | |

**Adoption verdict:** [healthy / declining / stalled — with reasoning]

## §5. The 5 monitoring signals — quarter-over-quarter

| Signal | Prev Q | This Q | Trend | Alarms this Q |
|---|---|---|---|---|
| Output distribution shift | | | | |
| HITL escalation rate | | | | |
| Decision audit trail completeness | | | | |
| Cost per execution (step level) | | | | |
| Exception routing patterns | | | | |

**Signal verdict:** [all stable / one concern / multiple concerns — describe]

## §6. Bug + incident trend

| Severity | Prev Q | This Q | Δ | Open at end |
|---|---|---|---|---|
| Sev-1 | | | | |
| Sev-2 | | | | |
| Sev-3 | | | | |
| Sev-4 | | | | |

**Open backlog:** [list]
**Bug verdict:** [improving / stable / degrading]

## §7. Cost trend

| Cost component | Prev Q | This Q | Δ |
|---|---|---|---|

**Cost verdict:** [healthy / watch / concerning — with reasoning]

## §8. Autonomy stage history

| Date | Stage | Trigger | Decided by |
|---|---|---|---|

**Autonomy verdict:** [stay / promote / demote — with reasoning]

## §9. User feedback summary

[Verbatim quotes from primary + backup users + sentiment]

## §10. Changes shipped this quarter

| Date | Change | Reason | Impact |
|---|---|---|---|

## §11. Risk re-classification

| Field | Last Q | This Q | Material change? |
|---|---|---|---|

**Risk verdict:** [no change / re-classify — describe]

## §12. Cross-references to other artifacts

- Agent Card current version: [link]
- Most recent eval report: [link]
- Previous quarterly review: [link]
- Most recent 30-day review: [link]
- Recent post-mortems: [links]
- Runbook current: [link]

## §13. Decision + plan for next quarter

| Item | Decision | Owner | Due |
|---|---|---|---|
| Continue agent? | | | |
| Promote autonomy? | | | |
| Ship updates? | | | |
| Expand scope? | | | |
| Risk re-classify? | | | |
| Retirement candidate? | | | |

## §14. Roll-up to portfolio review

The following numbers feed template 15 (this quarter's exec readout):
- Quarterly ROI contribution: [$]
- Cumulative ROI: [$]
- Sev-1 incidents: [N]
- Sev-2 incidents: [N]
- Adoption growth: [Δ users]
- KPI status: [hit / miss / mixed]
- Compliance evidence retained per Agent Card §11: [confirmed]

## Sign-off

| Role | Name | Date |
|---|---|---|
| Department Champion + author | | |
| Primary user (informed) | | |
| Finance (ROI validation) | | |
| AI CoE Lead | | |
| Head of department | | |

**Status:** [complete / open items / re-review needed]
**Next review:** [date]

Usage notes

  • The 6-month question lives here. When the CEO asks "what did our AI investment actually do?" — the answer is two of these reports (Q1 + Q2) per agent, plus a portfolio readout (template 15) at month 6. Numbers, not anecdotes.
  • Re-running this quarterly is the discipline. Skip a quarter and the trend data has a gap; skip two and you can't credibly report.
  • Finance validates ROI BEFORE the exec readout. §3 has a Finance sign-off line — this matters because the per-agent number rolls up into the company-level exec story.
  • User feedback (§9) often matters more than the metrics. A statistically successful agent that the user resents is failing — and the failure shows up in adoption metrics (§4) and qualitative quotes before it shows up in incident count.
  • Don't move goalposts. If §2 misses the Card §12 target, the answer is either retire-candidate OR formal scope/KPI revision via re-approval at M8 — not silent re-targeting.
  • Autonomy promotion (§8) is the place where Stage 2 / Stage 3 decisions get made. Most agents stay at Stage 1 indefinitely. Promotion requires evidence; this report is the evidence.
  • The audit trail in §12 is what survives auditor questions. Keep links current.

Common pitfalls

PitfallWhat it looks likeFix
ROI hand-waved"Saves a lot of time" with no numberForce the hours-saved × loaded-cost math
ROI never validated by FinanceCoE Lead's spreadsheet onlyFinance signs §3 before exec readout
Adoption assumed equal to volume"1,650 executions = 1,650 sessions"Track unique users + frequency separately
Drift summarized as "fine"No PSI/KL-divergence numbersSection 5 must have numeric trend
Bug count without trend"We had 2 Sev-3 this quarter"Always Q-o-Q: improving or degrading?
Autonomy promoted on metrics alone"Hit 95%, promote to Stage 2"User preference + 2-user steady state + Card §7 thresholds, not just one metric
User feedback skippedSection 9 marked "TBD"Run the 30-min user interview before review meeting
Cross-references stale§12 links 404Update at every review

Framework cross-references

  • framework.md §11.2 (per-agent lifecycle — quarterly)
  • framework.md §21 (5 monitoring signals — feeds §5)
  • framework.md §22 (autonomy progression — §8)
  • framework.md §29 (ROI tracking — §3)
  • framework.md §22.1 EU AI Act Article 72 (post-market monitoring)
  • framework.md §22.2 NIST AI RMF MANAGE (continuous improvement)
  • framework.md §22.3 ISO/IEC 42001 Clause 9.1 (monitoring + measurement)
  • workflows.md Step 15 (Quarterly review)
  • workflows.html → In Action view → node M17 Quarterly review
  • Companion template: 15-quarterly-exec-readout.md (this report rolls up into the portfolio readout)