← All templates
Template 07

Template 07 — Pilot-to-Prod Checklist

ID
07-pilot-to-prod-checklist
Version
1
Last revised
2026-05-14
Owner
AI CoE Lead (signs) · Agent Builder (assembles)

Purpose

The 10-item gate between Pilot (M14) and Production (M15) in the In Action roadmap. Last chance to catch a missing on-call, an untested kill switch, or audit retention that wasn't configured. CoE Lead signs — without this signature, production deployment is blocked.

Pilot is rehearsal. Production is the show. The 10 items below are what separates the two.

  • When you use it: Once per agent at end of pilot, before production launch.
  • Who fills it: Agent Builder assembles evidence per item. CoE Lead reviews + signs.
  • Time: 30–60 minutes if pilot was run well; longer if items are missing.
  • Decision: Cleared to launch / Block / Conditional launch with named open items.

Worked example (AP Accountant invoice reconciliation)

Agent: finance-invoice-recon v1.0 Pilot ran: 2026-05-13 to 2026-06-12 (30 days) Pilot-to-prod review date: 2026-06-12

#ItemStatusEvidence
1Pilot success criteria metPilot success criteria (Agent Card §12 + intake §5): ≥ 90% acceptance, zero Sev-1, latency p95 < 20s, ≥ 4 hrs/week saved. Actuals: 92% acceptance, zero Sev-1, p95 13s, 5 hrs/week saved.
2Monitoring dashboards live and being watchedLangSmith dashboard finance-invoice-recon active. Datadog widget for cost + latency. Finance Champion checked daily during pilot; CoE Lead reviewed weekly.
3On-call defined (named person + pager)Primary: Mike Chen (Finance Champion) — PagerDuty schedule finance-ai-primary. Backup: Morteza Moradi (CoE Lead) — finance-ai-backup. Weekend coverage rotates via standard engineering on-call.
4Runtime guardrails confirmed firing in expected scenariosTested 2026-06-10: malicious-PDF input was rejected by input validation; out-of-scope tool call (attempted NetSuite write) was blocked by tool allowlist; output schema check rejected a malformed LLM response. All three guardrails fired.
5Identity in IdP, credentials rotated at least once during pilotagent-finance-invoice-recon active. NetSuite OAuth rotated 2026-05-27 (day 14 of pilot) — agent continued operating, no downtime. Anthropic API key rotated 2026-06-03 (day 21) — same.
6Audit log retention configured per tierLangSmith retention set to 6 months for this project. S3 Glacier archive lifecycle for 7-year SOX retention configured. Verified 2026-06-11 by Platform team.
7Runbook exists and on-call has read itRunbook at github.com/acme/agents/finance-invoice-recon/runbook.md v1.0. Mike Chen acknowledged reading 2026-06-09 (signed quiz: kill switch location, on-call escalation, top 3 failure modes — passed).
8Communication plan readyFinance team Slack post drafted + scheduled for launch day. All-hands email digest entry drafted. Exec sponsor 1-pager prepared for 2026-06-14 review.
9Rollback / kill plan testedDrill 2026-06-08: flipped LaunchDarkly flag finance-invoice-recon-enabled to off. In-flight execution completed naturally (no in-flight at moment of flip). New triggers queued for manual handling. Sarah (pilot user) reverted to manual process. Tested for 1 hour, then re-enabled. Worked.
10CoE Lead signs offAll 9 items above are green. Sign-off below.

Sign-off

RoleNameDateSignature
Agent BuilderMorteza Moradi + Mike Chen2026-06-12(signed)
AI CoE LeadMorteza Moradi2026-06-12(signed)

Decision

Cleared for production launch. Target launch date: 2026-06-13 (next business day).

Status in registry will move from Pilot → Production at launch. 30-day post-launch review scheduled for 2026-07-13.

Open items

None.


Blank template (copy below for your agent)

# Pilot-to-Prod Checklist — [Agent Name]

**Agent ID:** [agent-dept-slug]
**Agent version:** [X.X]
**Tier:** [Low / Medium / High]
**Pilot ran:** [start date] to [end date]
**Review date:** [YYYY-MM-DD]

| # | Item | Status | Evidence |
|---|---|---|---|
| 1 | Pilot success criteria met (pre-set criteria from Agent Card §12 / intake §5) | [✅ / ❌] | [Criteria + actuals] |
| 2 | Monitoring dashboards live and being watched | [✅ / ❌] | [Dashboard link + who watches when] |
| 3 | On-call defined (named person + pager) | [✅ / ❌] | [Primary + backup + pager schedule] |
| 4 | Runtime guardrails confirmed firing in expected scenarios (input val / output schema / tool allowlist / kill switch) | [✅ / ❌] | [Test date + scenarios + results] |
| 5 | Identity in IdP, credentials rotated at least once during pilot | [✅ / ❌] | [Identity name + rotation dates + result] |
| 6 | Audit log retention configured per tier | [✅ / ❌] | [Retention period + storage location] |
| 7 | Runbook exists and on-call has read it | [✅ / ❌] | [Runbook link + on-call acknowledgment + comprehension check] |
| 8 | Communication plan ready | [✅ / ❌] | [Channels + content + schedule] |
| 9 | Rollback / kill plan tested | [✅ / ❌] | [Drill date + procedure + result] |
| 10 | CoE Lead signs off | [✅ / ❌] | [See sign-off below] |

## Sign-off

| Role | Name | Date | Signature |
|---|---|---|---|
| Agent Builder | | | |
| AI CoE Lead | | | |

## Decision

[✅ Cleared for production launch / ⚠️ Conditional launch with open items / ❌ Blocked — return to pilot]

Target launch date: [YYYY-MM-DD]
30-day post-launch review: [YYYY-MM-DD]

## Open items (only if conditional or blocked)

| Item # | What's missing | Owner | Due date |
|---|---|---|---|
| | | | |

Per-item guidance

1. Pilot success criteria met

Pass: Numbers from pilot match or exceed pre-set criteria from Agent Card §12 and the intake form (§5 Expected business value).

Fail: Don't move goalposts. If criteria weren't met, either return to pilot for another cycle, redefine criteria (with re-approval at M8), or kill the agent.

2. Monitoring dashboards live and being watched

Pass: Dashboards exist AND are being actively watched. The "being watched" part is not theoretical — name who looks at them and when.

Fail: "Dashboards exist" but nobody actually checks them. Common pattern. Don't ship to prod.

3. On-call defined

Pass: Primary on-call + backup, both named, both paged via the company's existing paging system (PagerDuty / Opsgenie / etc.). Weekend coverage defined.

Fail: "The team" is on call (i.e., no one). 2am incident = no response.

4. Runtime guardrails confirmed firing

Pass: Each of the runtime guardrails from Agent Card §6 has been tested with an actual triggering input. Logs confirm the guardrail fired and blocked or escalated correctly.

Fail: "We implemented the guardrails" but no proof they work. Run the tests.

5. Identity + rotation

Pass: Agent identity exists in the IdP. Credentials have been rotated at least once during pilot — proving rotation actually works.

Fail: Credentials never rotated. First time rotation is run in anger, the agent breaks.

6. Audit log retention

Pass: Retention period configured per Agent Card §11. Period matches the tier and any applicable sector regulation.

Fail: Default retention (often 30 days) silently shorter than the agent's tier requires.

7. Runbook + on-call comprehension

Pass: Runbook exists in source repo, current, and the named on-call has read it. Comprehension check: can they answer 3 questions without looking?

  • How do you kill the agent in 60 seconds?
  • What are the top 3 failure modes and what do you do for each?
  • Where are the logs?

Fail: Runbook exists but on-call has never seen it. 2am incident = improvisation.

8. Communication plan

Pass: Specific Slack posts, email digests, exec briefings — drafted and scheduled, not just "we'll announce it."

Fail: "We'll figure out comms when we launch." Adoption stalls.

9. Rollback / kill plan tested

Pass: Within the past 30 days, the rollback procedure has been executed end-to-end in a drill. Documented timing. Documented what happens to in-flight work.

Fail: "We have a rollback plan" with no drill. First real rollback turns into chaos.

10. CoE Lead signs

Pass: CoE Lead has reviewed items 1–9 and signs. If CoE Lead is also the Builder, they sign in their CoE Lead capacity — and additionally seek peer review from one of the approvers from M8 (e.g., Department Head).

Fail: No sign-off, or sign-off without actual review. Don't.


Decision matrix

Items 1–9 statusDecision
All ✅Cleared. Launch on the target date.
1–2 items ⚠️ with documented open items and named ownerConditional. Launch may proceed with explicit risk acceptance from sponsor. Open items must close within 14 days post-launch.
3+ items ⚠️ or any ❌Blocked. Return to pilot or fix items.

Usage notes

  • Don't allow this to become a rubber stamp. Items 4, 7, 9 are where rubber-stamping happens most. Force actual evidence.
  • Conditional launch is rare. Most checklist items shouldn't be skippable. If item 4 (guardrails) or item 9 (rollback) is ⚠️, the answer is almost always ❌ — block.
  • Communication plan (item 8) is underrated. Many agents launch, run silently for weeks, and nobody uses them because nobody knew.
  • The 30-day post-launch review at item 10 is non-negotiable. Put it on the calendar BEFORE launch. Use template 14.
  • The pilot ends here. From this signature forward, the agent is in production — different monitoring cadence, different incident-response expectations.

Common pitfalls

PitfallWhat it looks likeFix
Moving the goalpostsPilot missed targets; CoE Lead approves anyway with new criteriaDon't. Return to pilot or kill.
Dashboards but no observerDashboards exist, nobody looksName who watches when. Make it real.
On-call = "team"No named individualOne name. One pager. Backup also named.
Guardrails untested"We implemented them, trust us"Trigger each one. Confirm in logs.
Rotation drill never run"Rotation is configured"Run a real rotation during pilot.
Retention default30-day default instead of tier-appropriateChange before launch.
Runbook unreadOn-call hasn't seen it15-minute walkthrough + comprehension check
Communication "we'll figure it out"No draft message, no scheduleWrite the Slack post + email. Schedule them.
Rollback never tested"We have a kill switch"Drill it. Document timing.
Sign-off in a Slack threadNot in the registryRecord in registry + signed doc, not in a chat

Framework cross-references

  • framework.md §11.2 (per-agent lifecycle — this is the Pilot → Production gate)
  • framework.md §22 (autonomy progression — pilot performance feeds this)
  • framework.md §24 (observability — Items 2, 6)
  • framework.md §17 (privileged identities — Item 5)
  • framework.md §19 + §20 (guardrails + controls — Item 4)
  • workflows.md Step 12 (Pilot-to-prod gate sub-steps)
  • workflows.html → In Action view → node M15 (Production launch — this checklist is the gate)