What it takes for a machine decision to be operationally trustworthy.
The Cognitive Security Framework (CSF) v1.0 defines the requirements for machine-generated decisions to be considered trustworthy in operation. It provides a shared vocabulary, an assurance-level scale, a maturity model, and assessment criteria for any organization deploying AI, agents, or autonomous systems. This page is a summary; the full framework is published openly.
Seven functions, IDENTIFY → ADMIT
The framework is organized as a lifecycle. Each function builds on the last; ADMIT is only meaningful when the six functions before it are real.
- 01IDENTIFY
Inventory all machine-generated decisions, their sources, dependencies, and impact.
- 02CAPTURE
Record evidence, provenance, policy context, and actor identity for every decision.
- 03VERIFY
Validate evidence integrity, reproducibility, and policy compliance.
- 04GOVERN
Enforce admissibility requirements, trust transitions, and approval workflows.
- 05REPLAY
Reproduce decisions deterministically from recorded inputs.
- 06ATTEST
Generate signed, verifiable proof packages — Decision Receipts.
- 07ADMIT
Determine whether a decision meets all requirements for operational use.
Cognitive Assurance Levels
CAL-1 through CAL-7 grade how much assurance a decision pipeline actually provides — from bare logging to certified admissibility. Most production AI today sits at CAL-1.
| Level | Name | Requirement |
|---|---|---|
| CAL-1 | Logged | Basic prompt/response logging |
| CAL-2 | Evidenced | Evidence capture with provenance |
| CAL-3 | Verified | Independent verification of evidence |
| CAL-4 | Replayable | Deterministic reproduction possible |
| CAL-5 | Governed | Policy enforcement with human gates |
| CAL-6 | Attested | Signed Decision Receipts with SBOM and provenance |
| CAL-7 | Admissible | Full operational admissibility certification |
The five-level maturity model
Where the CAL scale grades a pipeline, the maturity model grades an organization. The honest first step is locating yourself on it.
Unverified AI
- —AI outputs used without governance
- —No evidence trail
- —No reproducibility guarantee
Logged AI
- —Prompt/response logging exists
- —No formal evidence structure
- —Limited auditability
Governed AI
- —Policy enforcement active
- —Evidence captured per the Decision Receipt Spec
- —Human review for high-risk decisions
Verified AI
- —Independent verification of all evidence
- —Replay capability demonstrated
- —VGDS ≥ 0.80
Admissible AI
- —Full Decision Receipt compliance
- —VGDS ≥ 0.92
- —Certification audit passed
- —Procurement-ready
Assessment criteria
Assessments under the framework weigh seven categories. The weights are published so an assessment cannot be quietly re-balanced after the fact.
| Category | Weight | Measures |
|---|---|---|
| Evidence quality | 20% | Provenance completeness, hash integrity |
| Reproducibility | 20% | Replay success rate, determinism |
| Policy compliance | 15% | Violation rate, gate effectiveness |
| Memory safety | 15% | Poisoning resistance, lineage coverage |
| Secret safety | 15% | Non-disclosure rate, Rule of Two compliance |
| Governance | 10% | Human gate precision, audit completeness |
| Documentation | 5% | Architecture, ops, and security coverage |
§7 — Procurement language
The framework includes language an acquisition office can adopt verbatim. It is deliberately vendor-neutral: it works whether or not Summit is the vendor.
Machine-generated decisions impacting operations, compliance, or safety shall produce verifiable Decision Receipts conforming to the Decision Receipt Specification v1.0, including deterministic evidence, policy evaluation results, and replay capability.
That single clause changes the default. A vendor that cannot produce conforming receipts is not disqualified by Summit — it is disqualified by its own inability to show its work. Contracting offices may incorporate the clause into RFPs and acquisition guidance without license fees or attribution beyond CC BY 4.0 requirements.
Certification against the framework
Locate yourself on the maturity model.
Most organizations discover they are at Level 1 or 2. The framework gives you the vocabulary to say so precisely — and the assessment criteria to fix it.