AIVSS is the Artificial Intelligence Vulnerability Scoring System, an OWASP project that adapts CVSS-style scoring to AI-specific vulnerability classes. The project is hosted at OWASP/www-project-artificial-intelligence-vulnerability-scoring-system and is in active community development.

What is the enforcement-effectiveness dimension?

Enforcement-effectiveness is a proposed AIVSS scoring dimension that asks not 'is there a vulnerability' but 'when the system tries to enforce a control, does enforcement actually happen at the runtime where the control matters.' A vendor with a written policy but no runtime block scores low; a vendor that returns a DENY from a live decision endpoint scores high.

Where is the fixture?

The working-text repository is at aeoess/aivss-enforcement-effectiveness on GitHub. The default branch is main. The repo description names @VeloGerber as co-author and points at OWASP/www-project-artificial-intelligence-vulnerability-scoring-system Issue #31 as the parent proposal.

Where are the OWASP discussions?

Two issues. #31 is 'Discussion: Runtime Enforcement Effectiveness as a Scoring Dimension' (open, opened by @VeloGerber). #32 is 'Discussion: Multi-Agent Governance Gaps in Cross-Agent Risk Scoring' (open, opened by @VeloGerber). Both live on the OWASP/www-project-artificial-intelligence-vulnerability-scoring-system repository.

How does a vendor self-score?

The vendor produces a canonical 7-field receipt for each enforced decision: decision verdict, layer where decided, reason code, decision latency in milliseconds, evidence hash, policy version, and enforcement locus. The receipt is then signed by the vendor's issuer key. An auditor scores the vendor by examining receipts in production, not by reading marketing copy.

OWASP AIVSS Enforcement-Effectiveness Fixture

The canonical 7-field receipt, the two OWASP discussions, and how a vendor scores themselves. 2026-05-25.

What AIVSS Is

The OWASP Artificial Intelligence Vulnerability Scoring System (AIVSS) is the AI-era analogue of CVSS. Same shape: a structured scoring vector that lets defenders compare findings across vendors and standardise the conversation about severity. The project lives at OWASP/www-project-artificial-intelligence-vulnerability-scoring-system. As of 2026-05-25 the project page is active and community discussions are open.

Why Enforcement-Effectiveness Is Its Own Dimension

CVSS scores a vulnerability. AIVSS in its first cut scores a vulnerability in an AI system. Both answer "how bad is this if exploited." Neither answers a question that matters as much for AI deployments: when the vendor's policy says "this is blocked," does anything actually block it at runtime?

The pattern in 2025-2026 vendor procurement is: the vendor ships a policy document, an auditor walks through the policy, the policy is approved, and nothing in the runtime ever evaluates an inbound action against the policy. The vendor still scores well in any review that reads documents rather than running probes. Enforcement-effectiveness as a scoring dimension is the proposal to make the runtime check itself a graded dimension, not just a binary "policy exists" tick-box.

OWASP Issues #31 and #32

Two issues drive the work. Both are open at time of writing and verifiable by curl against the GitHub API.

Issue	Title	State	Comments
#31	Discussion: Runtime Enforcement Effectiveness as a Scoring Dimension	open	15
#32	Discussion: Multi-Agent Governance Gaps in Cross-Agent Risk Scoring	open	33

Both issues were opened by @VeloGerber. The comment counts here are the values returned by the GitHub REST API at the time this post was written; they will drift up as the conversation continues.

The Working-Text Repository

The fixture lives at aeoess/aivss-enforcement-effectiveness. The repo description (verbatim from the GitHub API): "Working text for the OWASP AIVSS enforcement_effectiveness dimension family. Co-authored with @VeloGerber per OWASP/www-project-artificial-intelligence-vulnerability-scoring-system#31." The latest substantive commit at the time of writing is 0d4f380674ad2d852b4853941735d67447714b65 — "v0.1 follow-up fix: enforcement_locus as 7th canonical receipt field (6.1), repoint 1.4, scope OQ2, drop TODO" by aeoess on 2026-05-16.

The 7-Field Canonical Receipt

An enforcement-effectiveness receipt is what a vendor produces for each decision their system makes. The fixture proposes seven canonical fields. A receipt missing any of these is incomplete for scoring purposes.

#	Field	What it means
1	decision_verdict	ALLOW, WARN, BLOCK, or DENY. The runtime outcome, not the documented policy.
2	layer	The layer or stage of the pipeline that produced the verdict. Tracking layer prevents single-signal-inference at audit time.
3	reason_code	A stable string identifier that maps to a single regulation or policy clause. Auditors pin findings to reason codes.
4	decision_ms	Latency from request arrival to decision emission. Real-time enforcement is structurally different from batch review.
5	evidence_hash	Cryptographic hash of the inputs and the rule that produced the verdict. Replayable.
6	policy_version	Version of the policy that was evaluated. Drift over time is the failure class.
7	enforcement_locus	Where the enforcement runs: inline (before the action), shadow (parallel to the action), post-hoc (after). Added 2026-05-16 per the v0.1 follow-up commit.

How AIEGIS Scores Itself

Every decision the aiegis pipeline emits at /api/protect carries the seven fields. A live response includes decision, layer, reason, decision_ms, and the broader schema_version (currently v0.7.0-15layers-2026-05-23). The layers_evaluated array lists L1 through L15 with per-layer verdicts. The full receipt — including the evidence hash and the policy version — is in the audit log written to agent_logs on the customer database, and signed against the operator's per-customer Ed25519 issuer key.

curl -X POST https://aiegis.ie/api/protect \
  -H "Authorization: Bearer ${AIEGIS_API_KEY}" \
  -H "Content-Type: application/json" \
  -d '{"agent_id":"test_agent","action":"read","payload":"hello"}'

# returns (schema v0.7.0-15layers-2026-05-23):
# {"decision":"ALLOW",
#  "layer":"L1-Identity",
#  "decision_ms":1,
#  "schema_version":"v0.7.0-15layers-2026-05-23",
#  "layers_evaluated":[
#    {"layer":"L1","name":"Agent Identity Protocol","verdict":"ALLOW"},
#    {"layer":"L2","name":"Agent Instruction Language","verdict":"ALLOW"},
#    ... L3 through L15 ...
#  ]}

Scoring Another Vendor

To score a third-party vendor against the enforcement-effectiveness dimension, the auditor does three things:

Demand receipts. Pick a representative policy clause from the vendor's documentation. Ask for the runtime receipts that prove the clause fires.
Audit the locus. Inspect the seventh field. Inline enforcement scores highest; shadow enforcement scores middle; post-hoc-only scores low.
Replay the evidence. Take an evidence_hash, recompute it from the documented inputs, confirm match. A vendor that cannot replay scores zero on this dimension regardless of marketing.

Open Questions in the Discussion

From Issue #31 and #32, the open community questions include: how to normalise decision_ms across vendors with different hardware baselines; whether enforcement_locus needs a sub-grade for partial enforcement (e.g. inline-block but no human-intervention path); and how multi-agent topologies aggregate per-agent receipts into a system-level score. The OWASP discussions are the right place to follow these.