Purpose
The 10-item gate between Pilot (M14) and Production (M15) in the In Action roadmap. Last chance to catch a missing on-call, an untested kill switch, or audit retention that wasn't configured. CoE Lead signs — without this signature, production deployment is blocked.
Pilot is rehearsal. Production is the show. The 10 items below are what separates the two.
- When you use it: Once per agent at end of pilot, before production launch.
- Who fills it: Agent Builder assembles evidence per item. CoE Lead reviews + signs.
- Time: 30–60 minutes if pilot was run well; longer if items are missing.
- Decision: Cleared to launch / Block / Conditional launch with named open items.
Worked example (AP Accountant invoice reconciliation)
Agent: finance-invoice-recon v1.0
Pilot ran: 2026-05-13 to 2026-06-12 (30 days)
Pilot-to-prod review date: 2026-06-12
| # | Item | Status | Evidence |
|---|---|---|---|
| 1 | Pilot success criteria met | ✅ | Pilot success criteria (Agent Card §12 + intake §5): ≥ 90% acceptance, zero Sev-1, latency p95 < 20s, ≥ 4 hrs/week saved. Actuals: 92% acceptance, zero Sev-1, p95 13s, 5 hrs/week saved. |
| 2 | Monitoring dashboards live and being watched | ✅ | LangSmith dashboard finance-invoice-recon active. Datadog widget for cost + latency. Finance Champion checked daily during pilot; CoE Lead reviewed weekly. |
| 3 | On-call defined (named person + pager) | ✅ | Primary: Mike Chen (Finance Champion) — PagerDuty schedule finance-ai-primary. Backup: Morteza Moradi (CoE Lead) — finance-ai-backup. Weekend coverage rotates via standard engineering on-call. |
| 4 | Runtime guardrails confirmed firing in expected scenarios | ✅ | Tested 2026-06-10: malicious-PDF input was rejected by input validation; out-of-scope tool call (attempted NetSuite write) was blocked by tool allowlist; output schema check rejected a malformed LLM response. All three guardrails fired. |
| 5 | Identity in IdP, credentials rotated at least once during pilot | ✅ | agent-finance-invoice-recon active. NetSuite OAuth rotated 2026-05-27 (day 14 of pilot) — agent continued operating, no downtime. Anthropic API key rotated 2026-06-03 (day 21) — same. |
| 6 | Audit log retention configured per tier | ✅ | LangSmith retention set to 6 months for this project. S3 Glacier archive lifecycle for 7-year SOX retention configured. Verified 2026-06-11 by Platform team. |
| 7 | Runbook exists and on-call has read it | ✅ | Runbook at github.com/acme/agents/finance-invoice-recon/runbook.md v1.0. Mike Chen acknowledged reading 2026-06-09 (signed quiz: kill switch location, on-call escalation, top 3 failure modes — passed). |
| 8 | Communication plan ready | ✅ | Finance team Slack post drafted + scheduled for launch day. All-hands email digest entry drafted. Exec sponsor 1-pager prepared for 2026-06-14 review. |
| 9 | Rollback / kill plan tested | ✅ | Drill 2026-06-08: flipped LaunchDarkly flag finance-invoice-recon-enabled to off. In-flight execution completed naturally (no in-flight at moment of flip). New triggers queued for manual handling. Sarah (pilot user) reverted to manual process. Tested for 1 hour, then re-enabled. Worked. |
| 10 | CoE Lead signs off | ✅ | All 9 items above are green. Sign-off below. |
Sign-off
| Role | Name | Date | Signature |
|---|---|---|---|
| Agent Builder | Morteza Moradi + Mike Chen | 2026-06-12 | (signed) |
| AI CoE Lead | Morteza Moradi | 2026-06-12 | (signed) |
Decision
✅ Cleared for production launch. Target launch date: 2026-06-13 (next business day).
Status in registry will move from Pilot → Production at launch. 30-day post-launch review scheduled for 2026-07-13.
Open items
None.
Blank template (copy below for your agent)
# Pilot-to-Prod Checklist — [Agent Name]
**Agent ID:** [agent-dept-slug]
**Agent version:** [X.X]
**Tier:** [Low / Medium / High]
**Pilot ran:** [start date] to [end date]
**Review date:** [YYYY-MM-DD]
| # | Item | Status | Evidence |
|---|---|---|---|
| 1 | Pilot success criteria met (pre-set criteria from Agent Card §12 / intake §5) | [✅ / ❌] | [Criteria + actuals] |
| 2 | Monitoring dashboards live and being watched | [✅ / ❌] | [Dashboard link + who watches when] |
| 3 | On-call defined (named person + pager) | [✅ / ❌] | [Primary + backup + pager schedule] |
| 4 | Runtime guardrails confirmed firing in expected scenarios (input val / output schema / tool allowlist / kill switch) | [✅ / ❌] | [Test date + scenarios + results] |
| 5 | Identity in IdP, credentials rotated at least once during pilot | [✅ / ❌] | [Identity name + rotation dates + result] |
| 6 | Audit log retention configured per tier | [✅ / ❌] | [Retention period + storage location] |
| 7 | Runbook exists and on-call has read it | [✅ / ❌] | [Runbook link + on-call acknowledgment + comprehension check] |
| 8 | Communication plan ready | [✅ / ❌] | [Channels + content + schedule] |
| 9 | Rollback / kill plan tested | [✅ / ❌] | [Drill date + procedure + result] |
| 10 | CoE Lead signs off | [✅ / ❌] | [See sign-off below] |
## Sign-off
| Role | Name | Date | Signature |
|---|---|---|---|
| Agent Builder | | | |
| AI CoE Lead | | | |
## Decision
[✅ Cleared for production launch / ⚠️ Conditional launch with open items / ❌ Blocked — return to pilot]
Target launch date: [YYYY-MM-DD]
30-day post-launch review: [YYYY-MM-DD]
## Open items (only if conditional or blocked)
| Item # | What's missing | Owner | Due date |
|---|---|---|---|
| | | | |
Per-item guidance
1. Pilot success criteria met
Pass: Numbers from pilot match or exceed pre-set criteria from Agent Card §12 and the intake form (§5 Expected business value).
Fail: Don't move goalposts. If criteria weren't met, either return to pilot for another cycle, redefine criteria (with re-approval at M8), or kill the agent.
2. Monitoring dashboards live and being watched
Pass: Dashboards exist AND are being actively watched. The "being watched" part is not theoretical — name who looks at them and when.
Fail: "Dashboards exist" but nobody actually checks them. Common pattern. Don't ship to prod.
3. On-call defined
Pass: Primary on-call + backup, both named, both paged via the company's existing paging system (PagerDuty / Opsgenie / etc.). Weekend coverage defined.
Fail: "The team" is on call (i.e., no one). 2am incident = no response.
4. Runtime guardrails confirmed firing
Pass: Each of the runtime guardrails from Agent Card §6 has been tested with an actual triggering input. Logs confirm the guardrail fired and blocked or escalated correctly.
Fail: "We implemented the guardrails" but no proof they work. Run the tests.
5. Identity + rotation
Pass: Agent identity exists in the IdP. Credentials have been rotated at least once during pilot — proving rotation actually works.
Fail: Credentials never rotated. First time rotation is run in anger, the agent breaks.
6. Audit log retention
Pass: Retention period configured per Agent Card §11. Period matches the tier and any applicable sector regulation.
Fail: Default retention (often 30 days) silently shorter than the agent's tier requires.
7. Runbook + on-call comprehension
Pass: Runbook exists in source repo, current, and the named on-call has read it. Comprehension check: can they answer 3 questions without looking?
- How do you kill the agent in 60 seconds?
- What are the top 3 failure modes and what do you do for each?
- Where are the logs?
Fail: Runbook exists but on-call has never seen it. 2am incident = improvisation.
8. Communication plan
Pass: Specific Slack posts, email digests, exec briefings — drafted and scheduled, not just "we'll announce it."
Fail: "We'll figure out comms when we launch." Adoption stalls.
9. Rollback / kill plan tested
Pass: Within the past 30 days, the rollback procedure has been executed end-to-end in a drill. Documented timing. Documented what happens to in-flight work.
Fail: "We have a rollback plan" with no drill. First real rollback turns into chaos.
10. CoE Lead signs
Pass: CoE Lead has reviewed items 1–9 and signs. If CoE Lead is also the Builder, they sign in their CoE Lead capacity — and additionally seek peer review from one of the approvers from M8 (e.g., Department Head).
Fail: No sign-off, or sign-off without actual review. Don't.
Decision matrix
| Items 1–9 status | Decision |
|---|---|
| All ✅ | Cleared. Launch on the target date. |
| 1–2 items ⚠️ with documented open items and named owner | Conditional. Launch may proceed with explicit risk acceptance from sponsor. Open items must close within 14 days post-launch. |
| 3+ items ⚠️ or any ❌ | Blocked. Return to pilot or fix items. |
Usage notes
- Don't allow this to become a rubber stamp. Items 4, 7, 9 are where rubber-stamping happens most. Force actual evidence.
- Conditional launch is rare. Most checklist items shouldn't be skippable. If item 4 (guardrails) or item 9 (rollback) is ⚠️, the answer is almost always ❌ — block.
- Communication plan (item 8) is underrated. Many agents launch, run silently for weeks, and nobody uses them because nobody knew.
- The 30-day post-launch review at item 10 is non-negotiable. Put it on the calendar BEFORE launch. Use template 14.
- The pilot ends here. From this signature forward, the agent is in production — different monitoring cadence, different incident-response expectations.
Common pitfalls
| Pitfall | What it looks like | Fix |
|---|---|---|
| Moving the goalposts | Pilot missed targets; CoE Lead approves anyway with new criteria | Don't. Return to pilot or kill. |
| Dashboards but no observer | Dashboards exist, nobody looks | Name who watches when. Make it real. |
| On-call = "team" | No named individual | One name. One pager. Backup also named. |
| Guardrails untested | "We implemented them, trust us" | Trigger each one. Confirm in logs. |
| Rotation drill never run | "Rotation is configured" | Run a real rotation during pilot. |
| Retention default | 30-day default instead of tier-appropriate | Change before launch. |
| Runbook unread | On-call hasn't seen it | 15-minute walkthrough + comprehension check |
| Communication "we'll figure it out" | No draft message, no schedule | Write the Slack post + email. Schedule them. |
| Rollback never tested | "We have a kill switch" | Drill it. Document timing. |
| Sign-off in a Slack thread | Not in the registry | Record in registry + signed doc, not in a chat |
Framework cross-references
framework.md§11.2 (per-agent lifecycle — this is the Pilot → Production gate)framework.md§22 (autonomy progression — pilot performance feeds this)framework.md§24 (observability — Items 2, 6)framework.md§17 (privileged identities — Item 5)framework.md§19 + §20 (guardrails + controls — Item 4)workflows.mdStep 12 (Pilot-to-prod gate sub-steps)workflows.html→ In Action view → node M15 (Production launch — this checklist is the gate)