Rulix AI — AI Governance Assessment Platform

Purpose

The 10-item gate between Pilot (M14) and Production (M15) in the In Action roadmap. Last chance to catch a missing on-call, an untested kill switch, or audit retention that wasn't configured. CoE Lead signs — without this signature, production deployment is blocked.

Pilot is rehearsal. Production is the show. The 10 items below are what separates the two.

When you use it: Once per agent at end of pilot, before production launch.
Who fills it: Agent Builder assembles evidence per item. CoE Lead reviews + signs.
Time: 30–60 minutes if pilot was run well; longer if items are missing.
Decision: Cleared to launch / Block / Conditional launch with named open items.

Worked example (AP Accountant invoice reconciliation)

Agent: finance-invoice-recon v1.0 Pilot ran: 2026-05-13 to 2026-06-12 (30 days) Pilot-to-prod review date: 2026-06-12

#	Item	Status	Evidence
1	Pilot success criteria met	✅	Pilot success criteria (Agent Card §12 + intake §5): ≥ 90% acceptance, zero Sev-1, latency p95 < 20s, ≥ 4 hrs/week saved. Actuals: 92% acceptance, zero Sev-1, p95 13s, 5 hrs/week saved.
2	Monitoring dashboards live and being watched	✅	LangSmith dashboard `finance-invoice-recon` active. Datadog widget for cost + latency. Finance Champion checked daily during pilot; CoE Lead reviewed weekly.
3	On-call defined (named person + pager)	✅	Primary: Mike Chen (Finance Champion) — PagerDuty schedule `finance-ai-primary`. Backup: Morteza Moradi (CoE Lead) — `finance-ai-backup`. Weekend coverage rotates via standard engineering on-call.
4	Runtime guardrails confirmed firing in expected scenarios	✅	Tested 2026-06-10: malicious-PDF input was rejected by input validation; out-of-scope tool call (attempted NetSuite write) was blocked by tool allowlist; output schema check rejected a malformed LLM response. All three guardrails fired.
5	Identity in IdP, credentials rotated at least once during pilot	✅	`agent-finance-invoice-recon` active. NetSuite OAuth rotated 2026-05-27 (day 14 of pilot) — agent continued operating, no downtime. Anthropic API key rotated 2026-06-03 (day 21) — same.
6	Audit log retention configured per tier	✅	LangSmith retention set to 6 months for this project. S3 Glacier archive lifecycle for 7-year SOX retention configured. Verified 2026-06-11 by Platform team.
7	Runbook exists and on-call has read it	✅	Runbook at `github.com/acme/agents/finance-invoice-recon/runbook.md` v1.0. Mike Chen acknowledged reading 2026-06-09 (signed quiz: kill switch location, on-call escalation, top 3 failure modes — passed).
8	Communication plan ready	✅	Finance team Slack post drafted + scheduled for launch day. All-hands email digest entry drafted. Exec sponsor 1-pager prepared for 2026-06-14 review.
9	Rollback / kill plan tested	✅	Drill 2026-06-08: flipped LaunchDarkly flag `finance-invoice-recon-enabled` to `off`. In-flight execution completed naturally (no in-flight at moment of flip). New triggers queued for manual handling. Sarah (pilot user) reverted to manual process. Tested for 1 hour, then re-enabled. Worked.
10	CoE Lead signs off	✅	All 9 items above are green. Sign-off below.

Sign-off

Role	Name	Date	Signature
Agent Builder	Morteza Moradi + Mike Chen	2026-06-12	(signed)
AI CoE Lead	Morteza Moradi	2026-06-12	(signed)

Decision

✅ Cleared for production launch. Target launch date: 2026-06-13 (next business day).

Status in registry will move from Pilot → Production at launch. 30-day post-launch review scheduled for 2026-07-13.

Open items

None.

Blank template (copy below for your agent)

# Pilot-to-Prod Checklist — [Agent Name]

**Agent ID:** [agent-dept-slug]
**Agent version:** [X.X]
**Tier:** [Low / Medium / High]
**Pilot ran:** [start date] to [end date]
**Review date:** [YYYY-MM-DD]

| # | Item | Status | Evidence |
|---|---|---|---|
| 1 | Pilot success criteria met (pre-set criteria from Agent Card §12 / intake §5) | [✅ / ❌] | [Criteria + actuals] |
| 2 | Monitoring dashboards live and being watched | [✅ / ❌] | [Dashboard link + who watches when] |
| 3 | On-call defined (named person + pager) | [✅ / ❌] | [Primary + backup + pager schedule] |
| 4 | Runtime guardrails confirmed firing in expected scenarios (input val / output schema / tool allowlist / kill switch) | [✅ / ❌] | [Test date + scenarios + results] |
| 5 | Identity in IdP, credentials rotated at least once during pilot | [✅ / ❌] | [Identity name + rotation dates + result] |
| 6 | Audit log retention configured per tier | [✅ / ❌] | [Retention period + storage location] |
| 7 | Runbook exists and on-call has read it | [✅ / ❌] | [Runbook link + on-call acknowledgment + comprehension check] |
| 8 | Communication plan ready | [✅ / ❌] | [Channels + content + schedule] |
| 9 | Rollback / kill plan tested | [✅ / ❌] | [Drill date + procedure + result] |
| 10 | CoE Lead signs off | [✅ / ❌] | [See sign-off below] |

## Sign-off

| Role | Name | Date | Signature |
|---|---|---|---|
| Agent Builder | | | |
| AI CoE Lead | | | |

## Decision

[✅ Cleared for production launch / ⚠️ Conditional launch with open items / ❌ Blocked — return to pilot]

Target launch date: [YYYY-MM-DD]
30-day post-launch review: [YYYY-MM-DD]

## Open items (only if conditional or blocked)

| Item # | What's missing | Owner | Due date |
|---|---|---|---|
| | | | |

Per-item guidance

1. Pilot success criteria met

Pass: Numbers from pilot match or exceed pre-set criteria from Agent Card §12 and the intake form (§5 Expected business value).

Fail: Don't move goalposts. If criteria weren't met, either return to pilot for another cycle, redefine criteria (with re-approval at M8), or kill the agent.

2. Monitoring dashboards live and being watched

Pass: Dashboards exist AND are being actively watched. The "being watched" part is not theoretical — name who looks at them and when.

Fail: "Dashboards exist" but nobody actually checks them. Common pattern. Don't ship to prod.

3. On-call defined

Pass: Primary on-call + backup, both named, both paged via the company's existing paging system (PagerDuty / Opsgenie / etc.). Weekend coverage defined.

Fail: "The team" is on call (i.e., no one). 2am incident = no response.

4. Runtime guardrails confirmed firing

Pass: Each of the runtime guardrails from Agent Card §6 has been tested with an actual triggering input. Logs confirm the guardrail fired and blocked or escalated correctly.

Fail: "We implemented the guardrails" but no proof they work. Run the tests.

5. Identity + rotation

Pass: Agent identity exists in the IdP. Credentials have been rotated at least once during pilot — proving rotation actually works.

Fail: Credentials never rotated. First time rotation is run in anger, the agent breaks.

6. Audit log retention

Pass: Retention period configured per Agent Card §11. Period matches the tier and any applicable sector regulation.

Fail: Default retention (often 30 days) silently shorter than the agent's tier requires.

7. Runbook + on-call comprehension

Pass: Runbook exists in source repo, current, and the named on-call has read it. Comprehension check: can they answer 3 questions without looking?

How do you kill the agent in 60 seconds?
What are the top 3 failure modes and what do you do for each?
Where are the logs?

Fail: Runbook exists but on-call has never seen it. 2am incident = improvisation.

8. Communication plan

Pass: Specific Slack posts, email digests, exec briefings — drafted and scheduled, not just "we'll announce it."

Fail: "We'll figure out comms when we launch." Adoption stalls.

9. Rollback / kill plan tested

Pass: Within the past 30 days, the rollback procedure has been executed end-to-end in a drill. Documented timing. Documented what happens to in-flight work.

Fail: "We have a rollback plan" with no drill. First real rollback turns into chaos.

10. CoE Lead signs

Pass: CoE Lead has reviewed items 1–9 and signs. If CoE Lead is also the Builder, they sign in their CoE Lead capacity — and additionally seek peer review from one of the approvers from M8 (e.g., Department Head).

Fail: No sign-off, or sign-off without actual review. Don't.

Decision matrix

Items 1–9 status	Decision
All ✅	Cleared. Launch on the target date.
1–2 items ⚠️ with documented open items and named owner	Conditional. Launch may proceed with explicit risk acceptance from sponsor. Open items must close within 14 days post-launch.
3+ items ⚠️ or any ❌	Blocked. Return to pilot or fix items.

Usage notes

Don't allow this to become a rubber stamp. Items 4, 7, 9 are where rubber-stamping happens most. Force actual evidence.
Conditional launch is rare. Most checklist items shouldn't be skippable. If item 4 (guardrails) or item 9 (rollback) is ⚠️, the answer is almost always ❌ — block.
Communication plan (item 8) is underrated. Many agents launch, run silently for weeks, and nobody uses them because nobody knew.
The 30-day post-launch review at item 10 is non-negotiable. Put it on the calendar BEFORE launch. Use template 14.
The pilot ends here. From this signature forward, the agent is in production — different monitoring cadence, different incident-response expectations.

Common pitfalls

Pitfall	What it looks like	Fix
Moving the goalposts	Pilot missed targets; CoE Lead approves anyway with new criteria	Don't. Return to pilot or kill.
Dashboards but no observer	Dashboards exist, nobody looks	Name who watches when. Make it real.
On-call = "team"	No named individual	One name. One pager. Backup also named.
Guardrails untested	"We implemented them, trust us"	Trigger each one. Confirm in logs.
Rotation drill never run	"Rotation is configured"	Run a real rotation during pilot.
Retention default	30-day default instead of tier-appropriate	Change before launch.
Runbook unread	On-call hasn't seen it	15-minute walkthrough + comprehension check
Communication "we'll figure it out"	No draft message, no schedule	Write the Slack post + email. Schedule them.
Rollback never tested	"We have a kill switch"	Drill it. Document timing.
Sign-off in a Slack thread	Not in the registry	Record in registry + signed doc, not in a chat

Framework cross-references

framework.md §11.2 (per-agent lifecycle — this is the Pilot → Production gate)
framework.md §22 (autonomy progression — pilot performance feeds this)
framework.md §24 (observability — Items 2, 6)
framework.md §17 (privileged identities — Item 5)
framework.md §19 + §20 (guardrails + controls — Item 4)
workflows.md Step 12 (Pilot-to-prod gate sub-steps)
workflows.html → In Action view → node M15 (Production launch — this checklist is the gate)