Building an AI governance stack: 15 layers from identity to correlation

A reference architecture. What each layer owns, what it depends on, what fails if you skip it.

Why The Layer Model

"AI governance" as a phrase is a category, not an artefact. It can mean a board-level policy document, a procurement checklist, a runtime guard, an audit log, a correlation feed, an incident playbook, or all of those together. Without a layer model the conversation between the CISO, the platform engineer, and the compliance lead descends into category confusion within a quarter.

The 15-layer model below is the reference architecture AiEGIS uses to keep that conversation tractable. Each layer has one owner, one input from the layer below, one output to the layer above, and a single failure mode that determines what breaks if the layer is absent.

The Bottom Six: The Load-Bearing Minimum

If your deployment claims EU AI Act Article 26 alignment, layers 1–6 are not optional. Each maps directly to a paragraph or pair of paragraphs in the Regulation.

L#LayerOwnsFails if absent
1Agent identityPer-agent cryptographic identifier (did:key + Ed25519).You cannot answer "which agent did this" to a regulator.
2Signing & verificationPer-request signature; offline verifier path.Audit log is a claim, not evidence.
3Transport & authmTLS or signed JWT at the wire; rate limiting.Standard web-tier compromises remain unmitigated.
4Audit ledgerAppend-only event store with storage-layer enforcement.Article 26§6 fails on first adversarial review.
5Policy enginePer-request decision: allow / deny / require-step-up."Human oversight" is a policy document, not a runtime gate.
6Evidence packagingSigned bundle per period, fetchable on request.Article 26§11 cooperation has nothing to hand over.

The single most common deployment pattern that fails an audit walkthrough is skipping layer 4 in favour of "we have logs in Datadog". Datadog is a telemetry layer, not an audit layer. The retention model, the tamper model, and the chain-of-custody model are different.

The Middle Five: The Operational Spine

L#LayerOwnsFails if absent
7Identity rotation & revocationPer-agent key lifecycle, revocation list, cascade.A compromised agent stays trusted indefinitely.
8Capability assertionSigned claim of what each agent is permitted to do.An agent can advertise capabilities it cannot back.
9Guard / preflightInput-side checks (prompt-injection patterns, PII, jailbreak signatures).Layer 5 (policy) operates blind to known attack surface.
10Postflight / output reviewOutput-side checks (PII leakage, policy violation, hallucination signals).Bad outputs reach the consumer; remediation is per-incident, not class-based.
11Risk & vulnerability scoringPer-agent AIVSS score; trend over time.Risk decisions are gut-feel; insurance and procurement cannot price the deployment.

The Top Four: The Observability And Settlement Spine

L#LayerOwnsFails if absent
12Telemetry & metricsLatency, throughput, error-rate, per-tool counters.Operability collapses; incidents take longer to detect.
13Tracing & replayPer-request trace tree; deterministic replay for forensic analysis.Post-incident investigation is anecdotal.
14Marketplace & settlementCross-organisation discovery, capability cards, atomic settlement.Each peer connection is a bespoke integration.
15Cross-tenant correlationPattern detection across deployments (shared attack signatures, regression clusters).Each tenant relearns the same attacks individually.

What Goes Wrong When Layers Are Conflated

Three conflations show up regularly in deployments we review:

  1. Layer 4 (audit) folded into Layer 12 (telemetry). Telemetry is sampled, lossy, mutable, and retention is operationally bounded by storage cost. Audit must be unsampled, lossless, immutable, and retention is bounded by law. Conflating them produces a system that does neither well.
  2. Layer 9 (preflight) treated as a substitute for Layer 5 (policy). A guard is a filter; a policy is a decision with context. A guard cannot say "this agent normally allowed for this tool, except this user is in a category that requires step-up." Trying to encode policy logic in regex guards produces unmaintainable rule sets within two quarters.
  3. Layer 11 (scoring) bolted on after the fact. AIVSS-style scoring needs the per-request signed audit trail from Layer 4 to compute trend metrics. Adding scoring on top of an opaque deployment produces snapshot scores with no time series and no replay path. See AIVSS vs CVSS for why this matters.

How To Adopt Incrementally

No greenfield team builds all 15 layers at once. The defensible incremental path:

  1. Ship layers 1–4 before first production use. This is the EU AI Act Article 26 floor; everything else is harder without it.
  2. Add layers 5–6 before first external user. Without policy + evidence packaging, the deployment cannot satisfy a customer due-diligence questionnaire.
  3. Add layers 7–10 before second tenant. Multi-tenant introduces failure modes that single-tenant defaults paper over.
  4. Add layer 11 before procurement requires it (typically when annual revenue from regulated customers crosses ~€1M).
  5. Layers 12–13 are observability hygiene; add when SRE on-call cost justifies them.
  6. Layers 14–15 are unlock layers; add when you participate in a marketplace or want correlation across deployments.

Where AiEGIS Sits On The Stack

The AiEGIS platform ships implementations for layers 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 14, and 15. Layers 3, 12, and 13 are intentionally outside scope — your existing web tier, observability platform, and tracing stack already cover those, and re-implementing them inside the governance layer produces duplication rather than value.

The full architecture breakdown is at /architecture. The harness implementation that ships layers 4–11 is documented at /harness.