← All sections
§28

Cost management

AI costs scale non-linearly with adoption. Without cost discipline, "successful adoption" produces a budget crisis.

Required practices:

  • Tag every agent invocation by department + use case. Token + infra cost roll up to a real owner.
  • Real-time budget alerts at multiple thresholds, not just at 100%.
  • Token caps + rate limits per agent and per department, enforced at the AI gateway or orchestrator.
  • Model right-sizing. Don't use a frontier model for routine tasks a smaller model handles equally well. Reserve premium models for tasks that actually need them.
  • Caching for frequent, deterministic queries.
  • Route deterministic tasks to deterministic logic. If an answer can come from a rule, a deterministic function, or a small model, the LLM isn't the answer.
  • Monthly review of token-per-request and cost-per-execution trends. Drift is gradual until it isn't.