The canonical 7-field receipt, the two OWASP discussions, and how a vendor scores themselves. 2026-05-25.
The OWASP Artificial Intelligence Vulnerability Scoring System (AIVSS) is the AI-era analogue of CVSS. Same shape: a structured scoring vector that lets defenders compare findings across vendors and standardise the conversation about severity. The project lives at OWASP/www-project-artificial-intelligence-vulnerability-scoring-system. As of 2026-05-25 the project page is active and community discussions are open.
CVSS scores a vulnerability. AIVSS in its first cut scores a vulnerability in an AI system. Both answer "how bad is this if exploited." Neither answers a question that matters as much for AI deployments: when the vendor's policy says "this is blocked," does anything actually block it at runtime?
The pattern in 2025-2026 vendor procurement is: the vendor ships a policy document, an auditor walks through the policy, the policy is approved, and nothing in the runtime ever evaluates an inbound action against the policy. The vendor still scores well in any review that reads documents rather than running probes. Enforcement-effectiveness as a scoring dimension is the proposal to make the runtime check itself a graded dimension, not just a binary "policy exists" tick-box.
Two issues drive the work. Both are open at time of writing and verifiable by curl against the GitHub API.
| Issue | Title | State | Comments |
|---|---|---|---|
| #31 | Discussion: Runtime Enforcement Effectiveness as a Scoring Dimension | open | 15 |
| #32 | Discussion: Multi-Agent Governance Gaps in Cross-Agent Risk Scoring | open | 33 |
Both issues were opened by @VeloGerber. The comment counts here are the values returned by the GitHub REST API at the time this post was written; they will drift up as the conversation continues.
The fixture lives at aeoess/aivss-enforcement-effectiveness. The repo description (verbatim from the GitHub API): "Working text for the OWASP AIVSS enforcement_effectiveness dimension family. Co-authored with @VeloGerber per OWASP/www-project-artificial-intelligence-vulnerability-scoring-system#31." The latest substantive commit at the time of writing is 0d4f380674ad2d852b4853941735d67447714b65 — "v0.1 follow-up fix: enforcement_locus as 7th canonical receipt field (6.1), repoint 1.4, scope OQ2, drop TODO" by aeoess on 2026-05-16.
An enforcement-effectiveness receipt is what a vendor produces for each decision their system makes. The fixture proposes seven canonical fields. A receipt missing any of these is incomplete for scoring purposes.
| # | Field | What it means |
|---|---|---|
| 1 | decision_verdict | ALLOW, WARN, BLOCK, or DENY. The runtime outcome, not the documented policy. |
| 2 | layer | The layer or stage of the pipeline that produced the verdict. Tracking layer prevents single-signal-inference at audit time. |
| 3 | reason_code | A stable string identifier that maps to a single regulation or policy clause. Auditors pin findings to reason codes. |
| 4 | decision_ms | Latency from request arrival to decision emission. Real-time enforcement is structurally different from batch review. |
| 5 | evidence_hash | Cryptographic hash of the inputs and the rule that produced the verdict. Replayable. |
| 6 | policy_version | Version of the policy that was evaluated. Drift over time is the failure class. |
| 7 | enforcement_locus | Where the enforcement runs: inline (before the action), shadow (parallel to the action), post-hoc (after). Added 2026-05-16 per the v0.1 follow-up commit. |
Every decision the AiEGIS pipeline emits at /api/protect carries the seven fields. A live response includes decision, layer, reason, decision_ms, and the broader schema_version (currently v0.7.0-15layers-2026-05-23). The layers_evaluated array lists L1 through L15 with per-layer verdicts. The full receipt — including the evidence hash and the policy version — is in the audit log written to agent_logs on the customer database, and signed against the operator's per-customer Ed25519 issuer key.
curl -X POST https://aiegis.ie/api/protect \
-H "Authorization: Bearer ${AIEGIS_API_KEY}" \
-H "Content-Type: application/json" \
-d '{"agent_id":"test_agent","action":"read","payload":"hello"}'
# returns (schema v0.7.0-15layers-2026-05-23):
# {"decision":"ALLOW",
# "layer":"L1-Identity",
# "decision_ms":1,
# "schema_version":"v0.7.0-15layers-2026-05-23",
# "layers_evaluated":[
# {"layer":"L1","name":"Agent Identity Protocol","verdict":"ALLOW"},
# {"layer":"L2","name":"Agent Instruction Language","verdict":"ALLOW"},
# ... L3 through L15 ...
# ]}
To score a third-party vendor against the enforcement-effectiveness dimension, the auditor does three things:
evidence_hash, recompute it from the documented inputs, confirm match. A vendor that cannot replay scores zero on this dimension regardless of marketing.From Issue #31 and #32, the open community questions include: how to normalise decision_ms across vendors with different hardware baselines; whether enforcement_locus needs a sub-grade for partial enforcement (e.g. inline-block but no human-intervention path); and how multi-agent topologies aggregate per-agent receipts into a system-level score. The OWASP discussions are the right place to follow these.