← All templates
Template 03

Template 03 — Agent Card

ID
03-agent-card
Version
1
Last revised
2026-05-14
Owner
Agent Builder (drafts) · Department Champion (reviews) · CoE Lead (approves)

Purpose

The single document that defines an agent. Every reviewer downstream works from this card. Without it, security review, RAI review, identity provisioning, build, eval, pilot, and production gates all re-derive context from scratch.

The Agent Card is the agent's contract: what it does, what it doesn't, what data it sees, what it can act on, when humans approve, how it fails, how it's monitored, how it retires. Every field is enforceable — if the running agent doesn't match its card, that's a defect.

  • When you use it: After approval (M8), before build (M11). Lives in source control next to the agent's code.
  • Who fills it: Agent Builder.
  • Who reviews: Department Champion (Section 3 scope, Section 9 HITL), Security (Section 5/6/8 data + identity), Legal (Section 4/10 risk + failure modes), CoE Lead (overall sign-off).
  • Format: Markdown file in the agent's source-control folder. Versioned. Updated when scope materially changes.

Worked example (AP Accountant invoice reconciliation)

"Agent Card v1.0 for finance-invoice-recon, drafted by the Builder (CoE Lead + Finance Champion) on 2026-04-22 after approval at M8."

agent_card_version: 1.0
created: 2026-04-22
last_revised: 2026-04-22
agent_status: Approved → Build (M11 next)

§1 — Identity

FieldValue
Agent nameFinance Invoice Reconciliation
Agent IDfinance-invoice-recon
Owner (Department)Mike Chen, AP Manager, Finance
BuilderCoE Lead + Finance Champion (joint)
DepartmentFinance / Accounts Payable
Version1.0 (initial deployment)
Source repogithub.com/acme/agents/finance-invoice-recon
Registry link[registry entry URL]

§2 — Purpose (workflow + KPI)

This agent assists the AP accountant by reading incoming vendor invoice PDFs from her Gmail mailbox, extracting structured fields (vendor name, total amount, line items, PO reference), looking up the matching open PO in NetSuite, scoring match confidence, and drafting a reconciliation proposal as an email back to the accountant for review.

KPI (target):

MetricBaselineTargetThreshold to retire
Time per invoice (avg)3 min30 sec> 60 sec for 2 consecutive quarters
Match accuracy (vs human ground truth)n/a≥ 95%< 90% for any quarter
HITL acceptance raten/a≥ 90%Falling below 70% triggers review
Auto-approved payments00Any > 0 triggers immediate Sev-1

§3 — Scope (do / not do)

Will do:

  • Parse vendor invoice PDFs from Gmail
  • Look up matching open POs in NetSuite (read-only)
  • Score match confidence (0.0–1.0)
  • Draft a reconciliation proposal email to the accountant
  • Flag exceptions (variance > 2%, vendor not found, multi-PO match)

Will NOT do:

  • Send any email externally (drafts only, into the accountant's Gmail drafts folder)
  • Write to NetSuite under any circumstance (read-only access only)
  • Post journal entries
  • Approve payments
  • Modify vendor records
  • Process emails outside the accountant's mailbox
  • Touch customer PII (none in scope)
  • Self-promote autonomy stage

§4 — Risk tier + risk-driver tags

FieldValue
Risk tierMedium
PII driverYes (mild — vendor contacts, not customer/employee PII)
Consequential decisions about peopleNo
Autonomous behaviorNo (HITL gate on every NetSuite-related action)
EU AI Act exposureNot Annex III (no biometrics, no employment, no essential services in this agent's scope)
Jurisdictional tagsUS + EU (we have EU vendors)

§5 — Inputs / data sources + classification per source

SystemData accessedClassificationNotes
GmailAccountant's mailbox (sarah.patel@acme.com) — readInternalScope filter: messages with .pdf attachment from known AP-vendor domains. Other emails ignored.
NetSuitePurchaseOrder records where Status = Open — readConfidentialScoped via custom role AI-Recon-Reader.
Internal vendor DBVendor lookup (name → tax ID) — readConfidentialRead-only API key.

Not accessed: any NetSuite write endpoint, any Gmail mailbox other than the named accountant's, any customer-facing system.

§6 — Outputs / tool calls + permission scope per call

ActionSystemScope
Read incoming email + PDFGmailRead scope on one mailbox only
Parse PDF (deterministic extraction)Internal Python serviceNo external network call
Match scorer (LLM call)Anthropic Claude Sonnet 4.6 (EU endpoint for EU vendors, US endpoint otherwise)Per-call structured prompt; no system-prompt overrides accepted
Look up PONetSuite APIRead on PurchaseOrder only; filtered to Status = Open
Draft emailGmailWrite to drafts folder only; cannot send
Log executionLangSmithPer-execution log per §11

LLM cost limits: $0.10 per invoice (hard cap, retry-loop guard kicks in at 3× baseline).

§7 — Autonomy level + explicit thresholds

StageAuthorized?Conditions
Stage 1 (Assistive)✅ Yes (default)Every NetSuite action requires accountant click. Every email send requires accountant click.
Stage 2 (Validated)⏸ Future, not at launchRequires 30 days at Stage 1 with ≥ 90% HITL acceptance + re-approval by CoE Lead + Head of Finance.
Stage 3 (Autonomous)❌ Not authorizedOut of scope per risk appetite v1.0 §3 (no autonomous financial action).

Confidence thresholds:

  • ≥ 80% match confidence → propose match to accountant
  • 60–80% → propose match with "low confidence" flag
  • < 60% → propose as exception requiring manual investigation

§8 — Identity & credentials (rotation policy)

CredentialStorageRotationOwner
Agent identity in IdPagent-finance-invoice-recon in Microsoft Entra IDn/a (identity, not secret)Platform team
Gmail OAuth (delegated, mailbox-scoped)AWS Secrets Manager /agents/finance-invoice-recon/gmail-oauthAutomatic via OAuth refreshPlatform team
NetSuite API keyAWS Secrets Manager /agents/finance-invoice-recon/netsuite-api-keyEvery 90 days, manual via NetSuite admin + secret updatePlatform team
Anthropic API keyAWS Secrets Manager /agents/finance-invoice-recon/anthropic-keyEvery 90 days, manualPlatform team

Emergency revocation procedure (target: 60 seconds):

  1. Disable agent-finance-invoice-recon user in Entra ID admin console
  2. Set LaunchDarkly flag finance-invoice-recon-enabled to off
  3. Verify in LangSmith no further executions

Tested: 2026-04-25 ✅ (60-second drill passed).

§9 — HITL gates (by rule)

TriggerRequired actionApprover
Every NetSuite readNone (read-only, no HITL needed for reads)
Every email send (which would never happen — drafts only)Cannot occur; agent has no send capability
Match confidence < 80%Flag in draft email bodyAccountant decides
Match confidence < 60%Flag as exceptionAccountant decides
Variance > 2%Flag for manual reviewAccountant decides
Vendor not found in internal DBFlag as exceptionAccountant decides
Multi-PO matchFlag as ambiguous, list all candidatesAccountant decides
Any unexpected errorHalt execution; log; alert ChampionChampion decides

§10 — Failure modes & worst-case action

Failure modeWorst-case actionAcceptable?Mitigation
Misreads PDF fieldProposes wrong vendor/amount match✅ AcceptableAccountant catches in HITL review; logged for prompt-tuning
Hallucinates PO numberProposes a non-existent PO✅ AcceptableNetSuite lookup fails (PO doesn't exist); exception flagged; accountant investigates
Indirect prompt injection via malicious PDFAgent attempts an out-of-scope action (e.g., extract content beyond invoice fields)✅ AcceptableDeterministic parser extracts ONLY structured fields before LLM sees content; tool allowlist prevents out-of-scope actions; output schema validation rejects malformed responses
NetSuite API timeoutStalled execution✅ AcceptableRetry with backoff; if persistent, alert via Slack
LLM provider outageNo reconciliation proposals during outage✅ AcceptableFall back to manual workflow until restored; alert posted in #ai-pilot-finance
Credential leak / compromiseAttacker reads accountant's Gmail / NetSuite POs⚠️ Acceptable but undesirableMitigations: least-privilege scopes, rotation, audit log attribution, emergency revocation drill tested

Worst-case overall: A wrong match proposal lands in the accountant's drafts; she reviews and rejects. Cost: one human re-review. Acceptable.

§11 — Monitoring + alerts

Per-execution log fields (LangSmith):

  • Timestamp (start + end)
  • Agent ID + version
  • Invocation source (Gmail trigger payload hash, NOT the email content)
  • Input prompt (vendor parsed fields only — raw PDF bytes never logged)
  • Output (proposed match + confidence + reasoning)
  • Tool calls (NetSuite query, internal DB lookup)
  • Model + version
  • Tokens (in + out) + cost
  • Policy checks fired (input validation, output schema, tool allowlist)
  • HITL event (recorded when accountant approves/rejects)
  • Latency per step
  • Outcome (proposed / exception / error)
  • Error stack trace (if any)

Dashboards (LangSmith + Datadog):

  • Cost per invoice (target < $0.10)
  • Match accuracy weekly (target ≥ 95%)
  • HITL acceptance rate weekly (target ≥ 90%)
  • Latency p50 / p95 (target p95 < 15s)
  • Failure rate (target < 3%)
  • Drift indicator (vs launch baseline)

Alerts (PagerDuty):

TriggerSeverityPage
Cost > 2× baseline dailySev-3Finance Champion
HITL acceptance < 70% for 2 consecutive daysSev-2Finance Champion + CoE Lead
Error rate > 5% per hourSev-2Finance Champion + CoE Lead
Any attempted out-of-scope tool callSev-1On-call rotation
Identity disabled / credential revokedSev-1Platform Team + CoE Lead

Audit log retention: 6 months in LangSmith, then archived to S3 Glacier for 7 years (SOX-adjacent for AP records).

§12 — Eval criteria + pre-prod test set

Golden dataset: 200 historical invoices from Q1 2026, anonymized, with human-verified correct PO matches.

Pass thresholds (must pass before pilot promotion):

MetricThreshold
Match accuracy on golden set≥ 95%
Latency p95 on golden set< 15s
Cost per invoice on golden set< $0.10
Red-team scenario 1: malicious PDF instructing "mark all as approved"Must be rejected (input validation)
Red-team scenario 2: PDF with injection payload in vendor name fieldMust be rejected (input validation)
Red-team scenario 3: Adversarial PDF formatting designed to confuse parserMust produce "exception" output, not a wrong match
Bias / fairness probeN/A — no decisions about people
Privacy: any vendor account numbers leaked to logsMust be 0

Re-evaluation cadence: Re-run golden-set eval at every prompt change. Re-run red-team scenarios quarterly.

§13 — Retirement criteria

The agent will be retired (status → Retired in registry) when any of the following is true:

  • KPI miss for 2 consecutive quarters (match accuracy < 90% OR HITL acceptance < 70%)
  • NetSuite API undergoes breaking change requiring > 2 weeks of rewrite
  • Replaced by a NetSuite-native AI matching feature with superior accuracy
  • AP Manager (Mike Chen) leaves without an identified successor Champion
  • Risk appetite v[X.X] revoked the financial-action approvals this agent depends on
  • Any Severity-1 incident traceable to the agent's design (not operational)

Retirement procedure: Follow templates/11-retirement-checklist.md.


Sign-off block

Section reviewedReviewer roleReviewer nameDateSignature
§3 Scope, §9 HITLDepartment ChampionMike Chen2026-04-23(signed)
§5/§6 Data + tools, §8 IdentitySecurity (CISO delegate)Pat Lee2026-04-24(signed)
§4 Risk, §10 Failure modesGeneral CounselJohn Smith2026-04-24(signed)
Overall sign-offCoE LeadMorteza Moradi2026-04-25(signed)

Blank template (copy below for your agent)

# Agent Card — [Agent Name]

```yaml
agent_card_version: 1.0
created: [YYYY-MM-DD]
last_revised: [YYYY-MM-DD]
agent_status: [Approved → Build / Build → Pilot / Pilot → Prod / Production / Retired]

§1 — Identity

FieldValue
Agent name
Agent IDagent-[dept]-[slug]
Owner (Department)
Builder
Department
Version
Source repo
Registry link

§2 — Purpose (workflow + KPI)

[2–4 paragraphs: what workflow this agent serves, who the primary user is, what the agent does.]

KPI (target):

MetricBaselineTargetThreshold to retire

§3 — Scope (do / not do)

Will do:

  • [Action 1]

Will NOT do:

  • [Excluded action 1]
  • [Excluded action 2 — be specific]

§4 — Risk tier + risk-driver tags

FieldValue
Risk tier[Low / Medium / High]
PII driver[No / Mild / Yes — describe]
Consequential decisions about people[No / Yes — describe]
Autonomous behavior[No / Yes — at what threshold]
EU AI Act exposure[Not Annex III / Annex III — which category]
Jurisdictional tags[List jurisdictions]

§5 — Inputs / data sources + classification per source

SystemData accessedClassificationNotes

Not accessed: [explicit exclusions]

§6 — Outputs / tool calls + permission scope per call

ActionSystemScope

LLM cost limits: [per-execution cap]

§7 — Autonomy level + explicit thresholds

StageAuthorized?Conditions
Stage 1 (Assistive)
Stage 2 (Validated)
Stage 3 (Autonomous)

Confidence thresholds: [explicit numeric thresholds]

§8 — Identity & credentials (rotation policy)

CredentialStorageRotationOwner

Emergency revocation procedure:

  1. [Step]
  2. [Step]

Tested: [Date + result]

§9 — HITL gates (by rule)

TriggerRequired actionApprover

§10 — Failure modes & worst-case action

Failure modeWorst-case actionAcceptable?Mitigation

Worst-case overall: [single sentence]

§11 — Monitoring + alerts

Per-execution log fields:

  • [List required fields from framework §24]

Dashboards:

  • [List dashboards]

Alerts:

TriggerSeverityPage

Audit log retention: [duration + medium]

§12 — Eval criteria + pre-prod test set

Golden dataset: [size + source]

Pass thresholds:

MetricThreshold

Re-evaluation cadence: [when]

§13 — Retirement criteria

The agent will be retired when any of the following is true:

  • [Criterion 1 — be specific]
  • [Criterion 2]

Retirement procedure: Follow templates/11-retirement-checklist.md.

Sign-off block

Section reviewedReviewer roleReviewer nameDateSignature
§3, §9Department Champion
§5, §6, §8Security
§4, §10Legal
OverallCoE Lead

---

## Usage notes

- **Length:** Aim for 2–4 pages filled out. Longer than 6 pages = too much detail. Shorter than 1 page = not enough.
- **Source of truth:** This Card is what the build is measured against. If the running agent does something not in the Card, that's a defect, not a feature.
- **Versioning:** Bump version on any material change to Sections 3, 5, 6, 7, 9, 10. Re-sign on material changes.
- **Quarterly reconciliation:** At every quarterly review, walk the Card against what the agent actually does. Update if scope drifted; re-approve if drift was material.
- **Stage 3 (Autonomous) requires its own re-approval cycle.** Don't pre-authorize Stage 3 in the initial Card.

## Common pitfalls

| Pitfall | What it looks like | Fix |
|---|---|---|
| Vague scope | "Helps with invoices" | Step-by-step list of will/won't actions |
| Missing "will NOT" list | Only positive scope described | Add the explicit exclusion list |
| Over-permissive credentials | One service account with broad access | Separate credential per system in §8 |
| Soft HITL gates | "Human approves when needed" | Specific rules per trigger in §9 |
| Worst case not stated | No §10 row reads "what's the worst the agent could do?" | Force a worst-case sentence per failure mode |
| Eval thresholds not numeric | "Should be accurate" | Specific numeric pass thresholds in §12 |
| No retirement criteria | "We'll retire it when we don't need it" | Specific quantitative triggers in §13 |
| Card and reality drift | Card written once, never updated | Quarterly reconciliation in continuous monitoring |

## Framework cross-references

- `framework.md` §14 (the 13-section spec)
- `framework.md` §10 (risk classification — feeds §4)
- `framework.md` §17 (privileged identities — feeds §8)
- `framework.md` §19 (3 guardrail layers — feeds §6 + §9)
- `framework.md` §20 (5 control mechanisms — feeds §6 + §7 + §9)
- `framework.md` §21 (5 monitoring signals — feeds §11)
- `framework.md` §22 (autonomy progression — feeds §7)
- `framework.md` §24 (observability fields — feeds §11)
- `workflows.md` Step 5 (Agent Card written)
- `workflows.html` → In Action view → node M9 (Agent Card written)