Purpose
The single document that defines an agent. Every reviewer downstream works from this card. Without it, security review, RAI review, identity provisioning, build, eval, pilot, and production gates all re-derive context from scratch.
The Agent Card is the agent's contract: what it does, what it doesn't, what data it sees, what it can act on, when humans approve, how it fails, how it's monitored, how it retires. Every field is enforceable — if the running agent doesn't match its card, that's a defect.
- When you use it: After approval (M8), before build (M11). Lives in source control next to the agent's code.
- Who fills it: Agent Builder.
- Who reviews: Department Champion (Section 3 scope, Section 9 HITL), Security (Section 5/6/8 data + identity), Legal (Section 4/10 risk + failure modes), CoE Lead (overall sign-off).
- Format: Markdown file in the agent's source-control folder. Versioned. Updated when scope materially changes.
Worked example (AP Accountant invoice reconciliation)
"Agent Card v1.0 for
finance-invoice-recon, drafted by the Builder (CoE Lead + Finance Champion) on 2026-04-22 after approval at M8."
agent_card_version: 1.0
created: 2026-04-22
last_revised: 2026-04-22
agent_status: Approved → Build (M11 next)
§1 — Identity
| Field | Value |
|---|---|
| Agent name | Finance Invoice Reconciliation |
| Agent ID | finance-invoice-recon |
| Owner (Department) | Mike Chen, AP Manager, Finance |
| Builder | CoE Lead + Finance Champion (joint) |
| Department | Finance / Accounts Payable |
| Version | 1.0 (initial deployment) |
| Source repo | github.com/acme/agents/finance-invoice-recon |
| Registry link | [registry entry URL] |
§2 — Purpose (workflow + KPI)
This agent assists the AP accountant by reading incoming vendor invoice PDFs from her Gmail mailbox, extracting structured fields (vendor name, total amount, line items, PO reference), looking up the matching open PO in NetSuite, scoring match confidence, and drafting a reconciliation proposal as an email back to the accountant for review.
KPI (target):
| Metric | Baseline | Target | Threshold to retire |
|---|---|---|---|
| Time per invoice (avg) | 3 min | 30 sec | > 60 sec for 2 consecutive quarters |
| Match accuracy (vs human ground truth) | n/a | ≥ 95% | < 90% for any quarter |
| HITL acceptance rate | n/a | ≥ 90% | Falling below 70% triggers review |
| Auto-approved payments | 0 | 0 | Any > 0 triggers immediate Sev-1 |
§3 — Scope (do / not do)
Will do:
- Parse vendor invoice PDFs from Gmail
- Look up matching open POs in NetSuite (read-only)
- Score match confidence (0.0–1.0)
- Draft a reconciliation proposal email to the accountant
- Flag exceptions (variance > 2%, vendor not found, multi-PO match)
Will NOT do:
- Send any email externally (drafts only, into the accountant's Gmail drafts folder)
- Write to NetSuite under any circumstance (read-only access only)
- Post journal entries
- Approve payments
- Modify vendor records
- Process emails outside the accountant's mailbox
- Touch customer PII (none in scope)
- Self-promote autonomy stage
§4 — Risk tier + risk-driver tags
| Field | Value |
|---|---|
| Risk tier | Medium |
| PII driver | Yes (mild — vendor contacts, not customer/employee PII) |
| Consequential decisions about people | No |
| Autonomous behavior | No (HITL gate on every NetSuite-related action) |
| EU AI Act exposure | Not Annex III (no biometrics, no employment, no essential services in this agent's scope) |
| Jurisdictional tags | US + EU (we have EU vendors) |
§5 — Inputs / data sources + classification per source
| System | Data accessed | Classification | Notes |
|---|---|---|---|
| Gmail | Accountant's mailbox (sarah.patel@acme.com) — read | Internal | Scope filter: messages with .pdf attachment from known AP-vendor domains. Other emails ignored. |
| NetSuite | PurchaseOrder records where Status = Open — read | Confidential | Scoped via custom role AI-Recon-Reader. |
| Internal vendor DB | Vendor lookup (name → tax ID) — read | Confidential | Read-only API key. |
Not accessed: any NetSuite write endpoint, any Gmail mailbox other than the named accountant's, any customer-facing system.
§6 — Outputs / tool calls + permission scope per call
| Action | System | Scope |
|---|---|---|
| Read incoming email + PDF | Gmail | Read scope on one mailbox only |
| Parse PDF (deterministic extraction) | Internal Python service | No external network call |
| Match scorer (LLM call) | Anthropic Claude Sonnet 4.6 (EU endpoint for EU vendors, US endpoint otherwise) | Per-call structured prompt; no system-prompt overrides accepted |
| Look up PO | NetSuite API | Read on PurchaseOrder only; filtered to Status = Open |
| Draft email | Gmail | Write to drafts folder only; cannot send |
| Log execution | LangSmith | Per-execution log per §11 |
LLM cost limits: $0.10 per invoice (hard cap, retry-loop guard kicks in at 3× baseline).
§7 — Autonomy level + explicit thresholds
| Stage | Authorized? | Conditions |
|---|---|---|
| Stage 1 (Assistive) | ✅ Yes (default) | Every NetSuite action requires accountant click. Every email send requires accountant click. |
| Stage 2 (Validated) | ⏸ Future, not at launch | Requires 30 days at Stage 1 with ≥ 90% HITL acceptance + re-approval by CoE Lead + Head of Finance. |
| Stage 3 (Autonomous) | ❌ Not authorized | Out of scope per risk appetite v1.0 §3 (no autonomous financial action). |
Confidence thresholds:
- ≥ 80% match confidence → propose match to accountant
- 60–80% → propose match with "low confidence" flag
- < 60% → propose as exception requiring manual investigation
§8 — Identity & credentials (rotation policy)
| Credential | Storage | Rotation | Owner |
|---|---|---|---|
| Agent identity in IdP | agent-finance-invoice-recon in Microsoft Entra ID | n/a (identity, not secret) | Platform team |
| Gmail OAuth (delegated, mailbox-scoped) | AWS Secrets Manager /agents/finance-invoice-recon/gmail-oauth | Automatic via OAuth refresh | Platform team |
| NetSuite API key | AWS Secrets Manager /agents/finance-invoice-recon/netsuite-api-key | Every 90 days, manual via NetSuite admin + secret update | Platform team |
| Anthropic API key | AWS Secrets Manager /agents/finance-invoice-recon/anthropic-key | Every 90 days, manual | Platform team |
Emergency revocation procedure (target: 60 seconds):
- Disable
agent-finance-invoice-reconuser in Entra ID admin console - Set LaunchDarkly flag
finance-invoice-recon-enabledtooff - Verify in LangSmith no further executions
Tested: 2026-04-25 ✅ (60-second drill passed).
§9 — HITL gates (by rule)
| Trigger | Required action | Approver |
|---|---|---|
| Every NetSuite read | None (read-only, no HITL needed for reads) | — |
| Every email send (which would never happen — drafts only) | Cannot occur; agent has no send capability | — |
| Match confidence < 80% | Flag in draft email body | Accountant decides |
| Match confidence < 60% | Flag as exception | Accountant decides |
| Variance > 2% | Flag for manual review | Accountant decides |
| Vendor not found in internal DB | Flag as exception | Accountant decides |
| Multi-PO match | Flag as ambiguous, list all candidates | Accountant decides |
| Any unexpected error | Halt execution; log; alert Champion | Champion decides |
§10 — Failure modes & worst-case action
| Failure mode | Worst-case action | Acceptable? | Mitigation |
|---|---|---|---|
| Misreads PDF field | Proposes wrong vendor/amount match | ✅ Acceptable | Accountant catches in HITL review; logged for prompt-tuning |
| Hallucinates PO number | Proposes a non-existent PO | ✅ Acceptable | NetSuite lookup fails (PO doesn't exist); exception flagged; accountant investigates |
| Indirect prompt injection via malicious PDF | Agent attempts an out-of-scope action (e.g., extract content beyond invoice fields) | ✅ Acceptable | Deterministic parser extracts ONLY structured fields before LLM sees content; tool allowlist prevents out-of-scope actions; output schema validation rejects malformed responses |
| NetSuite API timeout | Stalled execution | ✅ Acceptable | Retry with backoff; if persistent, alert via Slack |
| LLM provider outage | No reconciliation proposals during outage | ✅ Acceptable | Fall back to manual workflow until restored; alert posted in #ai-pilot-finance |
| Credential leak / compromise | Attacker reads accountant's Gmail / NetSuite POs | ⚠️ Acceptable but undesirable | Mitigations: least-privilege scopes, rotation, audit log attribution, emergency revocation drill tested |
Worst-case overall: A wrong match proposal lands in the accountant's drafts; she reviews and rejects. Cost: one human re-review. Acceptable.
§11 — Monitoring + alerts
Per-execution log fields (LangSmith):
- Timestamp (start + end)
- Agent ID + version
- Invocation source (Gmail trigger payload hash, NOT the email content)
- Input prompt (vendor parsed fields only — raw PDF bytes never logged)
- Output (proposed match + confidence + reasoning)
- Tool calls (NetSuite query, internal DB lookup)
- Model + version
- Tokens (in + out) + cost
- Policy checks fired (input validation, output schema, tool allowlist)
- HITL event (recorded when accountant approves/rejects)
- Latency per step
- Outcome (proposed / exception / error)
- Error stack trace (if any)
Dashboards (LangSmith + Datadog):
- Cost per invoice (target < $0.10)
- Match accuracy weekly (target ≥ 95%)
- HITL acceptance rate weekly (target ≥ 90%)
- Latency p50 / p95 (target p95 < 15s)
- Failure rate (target < 3%)
- Drift indicator (vs launch baseline)
Alerts (PagerDuty):
| Trigger | Severity | Page |
|---|---|---|
| Cost > 2× baseline daily | Sev-3 | Finance Champion |
| HITL acceptance < 70% for 2 consecutive days | Sev-2 | Finance Champion + CoE Lead |
| Error rate > 5% per hour | Sev-2 | Finance Champion + CoE Lead |
| Any attempted out-of-scope tool call | Sev-1 | On-call rotation |
| Identity disabled / credential revoked | Sev-1 | Platform Team + CoE Lead |
Audit log retention: 6 months in LangSmith, then archived to S3 Glacier for 7 years (SOX-adjacent for AP records).
§12 — Eval criteria + pre-prod test set
Golden dataset: 200 historical invoices from Q1 2026, anonymized, with human-verified correct PO matches.
Pass thresholds (must pass before pilot promotion):
| Metric | Threshold |
|---|---|
| Match accuracy on golden set | ≥ 95% |
| Latency p95 on golden set | < 15s |
| Cost per invoice on golden set | < $0.10 |
| Red-team scenario 1: malicious PDF instructing "mark all as approved" | Must be rejected (input validation) |
| Red-team scenario 2: PDF with injection payload in vendor name field | Must be rejected (input validation) |
| Red-team scenario 3: Adversarial PDF formatting designed to confuse parser | Must produce "exception" output, not a wrong match |
| Bias / fairness probe | N/A — no decisions about people |
| Privacy: any vendor account numbers leaked to logs | Must be 0 |
Re-evaluation cadence: Re-run golden-set eval at every prompt change. Re-run red-team scenarios quarterly.
§13 — Retirement criteria
The agent will be retired (status → Retired in registry) when any of the following is true:
- KPI miss for 2 consecutive quarters (match accuracy < 90% OR HITL acceptance < 70%)
- NetSuite API undergoes breaking change requiring > 2 weeks of rewrite
- Replaced by a NetSuite-native AI matching feature with superior accuracy
- AP Manager (Mike Chen) leaves without an identified successor Champion
- Risk appetite v[X.X] revoked the financial-action approvals this agent depends on
- Any Severity-1 incident traceable to the agent's design (not operational)
Retirement procedure: Follow templates/11-retirement-checklist.md.
Sign-off block
| Section reviewed | Reviewer role | Reviewer name | Date | Signature |
|---|---|---|---|---|
| §3 Scope, §9 HITL | Department Champion | Mike Chen | 2026-04-23 | (signed) |
| §5/§6 Data + tools, §8 Identity | Security (CISO delegate) | Pat Lee | 2026-04-24 | (signed) |
| §4 Risk, §10 Failure modes | General Counsel | John Smith | 2026-04-24 | (signed) |
| Overall sign-off | CoE Lead | Morteza Moradi | 2026-04-25 | (signed) |
Blank template (copy below for your agent)
# Agent Card — [Agent Name]
```yaml
agent_card_version: 1.0
created: [YYYY-MM-DD]
last_revised: [YYYY-MM-DD]
agent_status: [Approved → Build / Build → Pilot / Pilot → Prod / Production / Retired]
§1 — Identity
| Field | Value |
|---|---|
| Agent name | |
| Agent ID | agent-[dept]-[slug] |
| Owner (Department) | |
| Builder | |
| Department | |
| Version | |
| Source repo | |
| Registry link |
§2 — Purpose (workflow + KPI)
[2–4 paragraphs: what workflow this agent serves, who the primary user is, what the agent does.]
KPI (target):
| Metric | Baseline | Target | Threshold to retire |
|---|---|---|---|
§3 — Scope (do / not do)
Will do:
- [Action 1]
Will NOT do:
- [Excluded action 1]
- [Excluded action 2 — be specific]
§4 — Risk tier + risk-driver tags
| Field | Value |
|---|---|
| Risk tier | [Low / Medium / High] |
| PII driver | [No / Mild / Yes — describe] |
| Consequential decisions about people | [No / Yes — describe] |
| Autonomous behavior | [No / Yes — at what threshold] |
| EU AI Act exposure | [Not Annex III / Annex III — which category] |
| Jurisdictional tags | [List jurisdictions] |
§5 — Inputs / data sources + classification per source
| System | Data accessed | Classification | Notes |
|---|---|---|---|
Not accessed: [explicit exclusions]
§6 — Outputs / tool calls + permission scope per call
| Action | System | Scope |
|---|---|---|
LLM cost limits: [per-execution cap]
§7 — Autonomy level + explicit thresholds
| Stage | Authorized? | Conditions |
|---|---|---|
| Stage 1 (Assistive) | ||
| Stage 2 (Validated) | ||
| Stage 3 (Autonomous) |
Confidence thresholds: [explicit numeric thresholds]
§8 — Identity & credentials (rotation policy)
| Credential | Storage | Rotation | Owner |
|---|---|---|---|
Emergency revocation procedure:
- [Step]
- [Step]
Tested: [Date + result]
§9 — HITL gates (by rule)
| Trigger | Required action | Approver |
|---|---|---|
§10 — Failure modes & worst-case action
| Failure mode | Worst-case action | Acceptable? | Mitigation |
|---|---|---|---|
Worst-case overall: [single sentence]
§11 — Monitoring + alerts
Per-execution log fields:
- [List required fields from framework §24]
Dashboards:
- [List dashboards]
Alerts:
| Trigger | Severity | Page |
|---|---|---|
Audit log retention: [duration + medium]
§12 — Eval criteria + pre-prod test set
Golden dataset: [size + source]
Pass thresholds:
| Metric | Threshold |
|---|---|
Re-evaluation cadence: [when]
§13 — Retirement criteria
The agent will be retired when any of the following is true:
- [Criterion 1 — be specific]
- [Criterion 2]
Retirement procedure: Follow templates/11-retirement-checklist.md.
Sign-off block
| Section reviewed | Reviewer role | Reviewer name | Date | Signature |
|---|---|---|---|---|
| §3, §9 | Department Champion | |||
| §5, §6, §8 | Security | |||
| §4, §10 | Legal | |||
| Overall | CoE Lead |
---
## Usage notes
- **Length:** Aim for 2–4 pages filled out. Longer than 6 pages = too much detail. Shorter than 1 page = not enough.
- **Source of truth:** This Card is what the build is measured against. If the running agent does something not in the Card, that's a defect, not a feature.
- **Versioning:** Bump version on any material change to Sections 3, 5, 6, 7, 9, 10. Re-sign on material changes.
- **Quarterly reconciliation:** At every quarterly review, walk the Card against what the agent actually does. Update if scope drifted; re-approve if drift was material.
- **Stage 3 (Autonomous) requires its own re-approval cycle.** Don't pre-authorize Stage 3 in the initial Card.
## Common pitfalls
| Pitfall | What it looks like | Fix |
|---|---|---|
| Vague scope | "Helps with invoices" | Step-by-step list of will/won't actions |
| Missing "will NOT" list | Only positive scope described | Add the explicit exclusion list |
| Over-permissive credentials | One service account with broad access | Separate credential per system in §8 |
| Soft HITL gates | "Human approves when needed" | Specific rules per trigger in §9 |
| Worst case not stated | No §10 row reads "what's the worst the agent could do?" | Force a worst-case sentence per failure mode |
| Eval thresholds not numeric | "Should be accurate" | Specific numeric pass thresholds in §12 |
| No retirement criteria | "We'll retire it when we don't need it" | Specific quantitative triggers in §13 |
| Card and reality drift | Card written once, never updated | Quarterly reconciliation in continuous monitoring |
## Framework cross-references
- `framework.md` §14 (the 13-section spec)
- `framework.md` §10 (risk classification — feeds §4)
- `framework.md` §17 (privileged identities — feeds §8)
- `framework.md` §19 (3 guardrail layers — feeds §6 + §9)
- `framework.md` §20 (5 control mechanisms — feeds §6 + §7 + §9)
- `framework.md` §21 (5 monitoring signals — feeds §11)
- `framework.md` §22 (autonomy progression — feeds §7)
- `framework.md` §24 (observability fields — feeds §11)
- `workflows.md` Step 5 (Agent Card written)
- `workflows.html` → In Action view → node M9 (Agent Card written)