← All sections
§27

Data foundation

Reliable agentic AI depends on reliable data. Adopted from lakeFS, Microsoft CAF, and Databricks.

FoundationWhy it mattersOwner
Data classification (public / internal / confidential / PII / regulated)The basis for agent scoping and DLPPlatform / Security
Data lineage (where data comes from, how it transforms)Required for audits and incident responsePlatform
Golden datasets (curated, validated benchmarks for evals)Lets us know whether agents are getting better or worse over timeCoE
Data version control (reproducibility, rollback)Lets us roll back data, not just code, when something breaksPlatform
Validation gates at ingestionStops low-quality / biased data poisoning shared pipelinesPlatform
Semantic consistency across systems"Customer" must mean the same thing in CRM and ERP, or agents reason inconsistentlyCoE + Data team

Agents are downstream of data. Garbage in stays garbage out — except faster, at higher volume, with less human checkpointing.