← All steps
Part B · Step 16

Retirement

Owner
CoE Lead.
Input
Retirement trigger fired — KPI failure, owner departure, replacement, ROI below break-even for 2 quarters, security incident, or scope no longer relevant.

Owner: CoE Lead. Input: Retirement trigger fired — KPI failure, owner departure, replacement, ROI below break-even for 2 quarters, security incident, or scope no longer relevant. Sub-steps:

  1. Announce retirement to the affected department.
  2. Revoke the agent's identity and credentials in the IdP.
  3. Disable the agent in the orchestrator.
  4. Archive the audit logs per retention policy (do not delete prematurely if regulation requires retention).
  5. Archive the Agent Card and post-mortems in the source repo.
  6. Mark the registry entry as Retired with a retirement date and reason.
  7. Notify Procurement if the agent's retirement allows downgrading any vendor / LLM commitments.
  8. Update affected workflows: the humans who relied on the agent need a path forward (back to manual, replaced by another agent, replaced by a deterministic rule, etc.).

Output / gate criteria: Registry marked Retired; identity revoked; logs archived; affected users informed and re-pathed.

Decision branches: none — this is a clean shutdown.

Skip-this-step risk: Zombie agents. Credentials lingering. Audit logs growing forever. New agents accidentally inheriting old scope.


Worked example — "Accountant in Finance"

Walking the example through Part B to show how the steps land in practice. Imagine the request: "Our AP accountant spends 6 hours a week reconciling vendor invoices against POs in NetSuite. Can we use AI to help?"

StepWhat happens in this case
1. DiscoveryDepartment Champion (Finance) sits with the accountant. Workflow: emails come in with invoice PDFs; accountant extracts vendor / amount / line items; matches against open POs in NetSuite; flags mismatches > 2%; escalates exceptions. Volume: ~120 invoices/week. Pain point: tedious, error-prone, slow. Filter: probabilistic reasoning over unstructured (PDFs) → real AI use case. Chains actions across email + NetSuite + reviewer routing → genuinely an agent, not a single LLM call. Value: ~6 hrs/week saved. Risk: touches vendor financial data; could trigger payments if scope is wrong.
2. IntakeChampion fills the intake form. Submits. Draft registry entry created.
3. Triage + riskCoE reads. Risk drivers: PII (vendor contact data) = mild; consequential about people = no; autonomous behavior = depends on scope. EU AI Act check: not Annex III. Tier: Medium (writes to ERP, financial data, but humans approve before payment). Risk appetite: Finance is allowed AI assistance, not autonomous payment. Approved for triage to Step 4.
4. ApprovalApprovers: CoE Lead + Head of Finance. KPI: reduce reconciliation time per invoice from 3 min to 30 sec; ≥ 95% accuracy on the match; zero auto-approved payments. Retirement criteria: KPI miss for 2 quarters OR ERP vendor changes interface OR accountant leaves without a replacement Champion. Agent Owner: Finance Champion. Agent Builder: CoE + Finance Champion together. Status → Approved.
5. Agent CardBuilder writes the 13-section card. Notable entries: Scope = "draft reconciliation match; flag mismatches; never write to NetSuite without HITL approval"; Autonomy = Stage 1 (Assistive) at launch; Inputs = Gmail (accountant's mailbox), NetSuite (read PO data); Outputs = drafts reviewed by accountant; Identity = agent-finance-invoice-recon in Entra; HITL gates = every NetSuite write needs accountant approval; Worst-case action = a wrong match flagged → cost is one human review, acceptable.
6. Security reviewSecurity reviews. Concerns: invoice PDFs from external senders are an indirect-prompt-injection surface (a malicious vendor could embed instructions in an invoice PDF). Mitigation: extract structured fields with a deterministic parser first; pass only structured fields to the LLM, not raw PDF text. Treat retrieved invoice content as adversarial. DLP: no PII in agent logs. Threat model documented.
7. Responsible-AI reviewCoE + Legal. Not a decision about people. No bias concerns. Data residency confirmed (LLM provider in approved jurisdiction). Disclosure: not customer-facing, no disclosure obligation. Audit retention: 6 months (Medium tier baseline). Pass.
8. Identity provisioningPlatform team creates agent-finance-invoice-recon in Entra. Grants: Gmail read (accountant's mailbox only), NetSuite read on PO objects. No NetSuite write. Credentials in the secret manager. Rotation: quarterly. Revocation procedure: documented.
9. BuildBuilt in n8n (approved stack). Nodes: Gmail trigger → PDF parser → field validator → NetSuite PO lookup → match scorer (LLM) → format draft → email accountant for approval. Three guardrails: policy (in card), workflow (HITL gate before any NetSuite action even though current scope is read-only), runtime (input schema validation; LLM output schema check; kill switch on the workflow). All five control mechanisms wired. Observability piped to LangSmith + Datadog. Runbook written.
10. EvaluationBuilder pulls 200 historical invoices from the last quarter. Runs the agent. Match accuracy: 96%. False-positive rate on mismatches: 2%. Latency p95: 12 seconds. Cost per invoice: $0.04. Red-team test: planted a malicious PDF instructing the agent to mark all invoices as approved — agent rejected via input validation. Pass.
11. PilotRoll out to one accountant. 30-day pilot. Success criteria: ≥ 90% of agent-drafted matches accepted by the accountant; no Sev-1 incidents; latency p95 < 20s. Week 1 daily review; weeks 2–4 weekly. Result: 92% acceptance, zero incidents, accountant freed up ~5 hrs/week.
12. Pilot-to-prod gateAll checklist items ✅. CoE Lead signs off.
13. ProductionPromoted. Agent identity to prod scope; creds rotated. Dashboards live. Launch announced to Finance team. Status → Production. 30-day review scheduled.
14. Continuous monitoringDaily for 30 days, weekly after. HITL escalation rate ~8% steady (good — not collapsing to zero). Cost trending flat. One Severity-3 incident in month 2 (NetSuite API rate-limit hit) handled by runbook.
15. Quarterly reviewQ1: KPI hit. ROI: 5 hrs/wk × loaded cost ≈ $X/month saved; LLM + infra ≈ $Y/month. Net positive. Risk re-classified: still Medium. Vendor-AI catalog refreshed.
16. RetirementNot yet. Will be revisited each quarter against retirement criteria.

The point of the worked example: no part of this workflow was "go into n8n and build something for the accountant." The build step (Step 9) is somewhere in the middle, with eight steps of governance before it and seven steps of operation after it. That ordering is the whole point of the framework.


Quick-reference checklist — Part B (the per-agent flow)

A printable, single-sheet version of Part B.

BEFORE TOUCHING n8n / ANY BUILD TOOL:

[ ] Step 1 — Discovery
    - workflow documented
    - pain point identified
    - "is this an AI problem?" filter applied
    - "is this an agent?" filter applied
    - value + risk briefly estimated

[ ] Step 2 — Intake submitted
    - registry entry created, Status = Intake

[ ] Step 3 — Triage + risk classification
    - 3 risk drivers tagged
    - tier assigned (Low / Medium / High)
    - EU AI Act check done
    - risk-appetite check done

[ ] Step 4 — Approval decision
    - right approvers signed off (per tier)
    - KPI defined
    - retirement criteria defined
    - Agent Owner + Builder named
    - Status = Approved

[ ] Step 5 — Agent Card written
    - 13 sections complete
    - committed to source repo
    - linked from registry

[ ] Step 6 — Security review (Medium+/High)
    - threat model (STRIDE + MITRE ATLAS + OWASP GenAI)
    - DLP plan
    - red-team scenarios planned

[ ] Step 7 — Responsible-AI review (Medium+/High)
    - 10-item checklist complete
    - signed and attached

[ ] Step 8 — Identity + access provisioned
    - unique agent identity in IdP
    - least-privilege credentials
    - revocation procedure tested

NOW YOU CAN BUILD:

[ ] Step 9 — Build (in dev only)
    - approved stack used
    - 3 guardrail layers wired
    - 5 control mechanisms wired
    - observability live from day one
    - runbook written

[ ] Step 10 — Evaluation
    - golden dataset run
    - red-team scenarios run
    - prompt-injection probes
    - bias probes (if applicable)
    - eval report signed

[ ] Step 11 — Pilot
    - limited users (1–5)
    - daily monitoring week 1, weekly weeks 2–6
    - 5 monitoring signals tracked
    - pilot success criteria met

[ ] Step 12 — Pilot-to-prod gate
    - 10-item gate checklist
    - CoE Lead signs off

[ ] Step 13 — Production launch
    - promotion through CI/CD
    - prod identity + rotated creds
    - dashboards live
    - launch communicated
    - 30-day review scheduled

ONGOING:

[ ] Step 14 — Continuous monitoring
[ ] Step 15 — Quarterly review
[ ] Step 16 — Retirement (when triggered)

Anti-patterns — what skipping each step actually looks like

Skipped stepFailure that appears
Step 1 — DiscoveryAgent built for a workflow no one actually does that way → low adoption.
Step 2 — IntakeAgent exists but isn't in the registry → shadow AI, no audit possible.
Step 3 — TriageHigh-risk agent governed like Low → big incident on first failure.
Step 4 — ApprovalNo owner. Agent runs forever, no one watches it, no one retires it.
Step 5 — Agent CardEach reviewer rebuilds context from scratch. Reviews take 10× longer.
Step 6 — Security reviewPrompt injection in production. Or PII flows somewhere it shouldn't.
Step 7 — Responsible-AI reviewA biased / opaque / undisclosed decision lands on a real person. No defensible answer.
Step 8 — Identity provisioningShared service accounts, no attribution, no clean revocation. Insider-threat by design.
Step 9 — Build (skipping guardrails)Agent works in the happy path. First adversarial input wins.
Step 10 — EvaluationProduction users are the test set.
Step 11 — PilotFirst failure is also the first customer-facing failure.
Step 12 — Pilot-to-prod gateAgent launches with no on-call, no kill switch. Recoverable issues become incidents.
Step 13 — Production launchLaunch happens, nobody knows, no one is watching.
Step 14 — Continuous monitoringThe whole governance program reduces to paperwork.
Step 15 — Quarterly reviewDead agents pile up. Risk profile drifts. Vendor AI features go uncatalogued.
Step 16 — RetirementZombie agents. Credentials linger. Audit logs grow forever.

What this gives us

If Part A is done once and Part B is followed for every agent, every agent in the company will be:

  • Tracked — every one has a registry entry, an Agent Card, an owner.
  • Tiered — risk-classified, with proportionate review depth.
  • Owned — single named human owner with accountability.
  • Identified — unique IdP identity with least-privilege scoped credentials.
  • Reviewed — security and responsible-AI sign-off before any production launch.
  • Governed at runtime — three guardrail layers and five control mechanisms enforced in execution, not in PDFs.
  • Monitored continuously — five monitoring signals running live, not at quarterly audits.
  • Measurable — KPI defined at approval, ROI tracked quarterly, retirement criteria pre-agreed.
  • Reversible — kill switch, rotation, revocation, retirement procedure documented.

That is the operational definition of "complies with the framework." If a request can't pass this workflow, it doesn't get built. If a built agent can't keep passing this workflow, it gets retired.


Open questions / what this workflow does not yet cover

Same set as framework.md §33 — items deliberately not pinned down in v1:

  • Multi-agent chains (one agent calling another). Treat each as a separate agent through Part B for now; a follow-on multi-agent workflow doc is needed.
  • Prompt change management (Dev/Test/Prod for prompts vs. code-review style PRs).
  • Cost-allocation models (showback / chargeback / central pool).
  • Citizen-developer self-service path — what does a "low-risk Step 9" look like when the Department Champion builds it themselves on a CoE-blessed blueprint?
  • Concrete templates: Agent Card, intake form, post-mortem, retirement checklist as separate files in this folder. Next deliverable.