add advisories

This commit is contained in:
master
2025-12-01 17:50:11 +02:00
parent c11d87d252
commit 790801f329
7 changed files with 3723 additions and 0 deletions

View File

@@ -0,0 +1,446 @@
Heres a crisp, practical way to turn StellaOps “verifiable proof spine” into a moat—and how to measure it.
# Why this matters (in plain terms)
Security tools often say “trust me.” Youll say “prove it”—every finding and every “notaffected” claim ships with cryptographic receipts anyone can verify.
---
# Differentiators to build in
**1) Bind every verdict to a graph hash**
* Compute a stable **Graph Revision ID** (Merkle root) over: SBOM nodes, edges, policies, feeds, scan params, and tool versions.
* Store the ID on each finding/VEX item; show it in the UI and APIs.
* Rule: any data change → new graph hash → new revisioned verdicts.
**2) Attach machineverifiable receipts (intoto/DSSE)**
* For each verdict, emit a **DSSEwrapped intoto statement**:
* predicateType: `stellaops.dev/verdict@v1`
* includes: graphRevisionId, artifact digests, rule id/version, inputs (CPE/CVE/CVSS), timestamps.
* Sign with your **Authority** (Sigstore key, offline mode supported).
* Keep receipts queryable and exportable; mirror to Rekorcompatible ledger when online.
**3) Add reachability “callstack slices” or binarysymbol proofs**
* For codelevel reachability, store compact slices: entry → sink, with symbol names + file:line.
* For binary-only targets, include **symbol presence proofs** (e.g., Bloom filters + offsets) with executable digest.
* Compress and embed a hash of the slice/proof inside the DSSE payload.
**4) Deterministic replay manifests**
* Alongside receipts, publish a **Replay Manifest** (inputs, feeds, rule versions, container digests) so any auditor can reproduce the same graph hash and verdicts offline.
---
# Benchmarks to publish (make them your headline KPIs)
**A) Falsepositive reduction vs. baseline scanners (%)**
* Method: run a public corpus (e.g., sample images + app stacks) across 34 popular scanners; label ground truth once; compare FP rate.
* Report: mean & p95 FP reduction.
**B) Proof coverage (% of findings with signed evidence)**
* Definition: `(# findings or VEX items carrying valid DSSE receipts) / (total surfaced items)`.
* Break out: runtimereachable vs. unreachable, and “notaffected” claims.
**C) Triage time saved (p50/p95)**
* Measure analyst minutes from “alert created” → “final disposition.”
* A/B with receipts hidden vs. visible; publish median/p95 deltas.
**D) Determinism stability**
* Re-run identical scans N times / across nodes; publish `% identical graph hashes` and drift causes when different.
---
# Minimal implementation plan (weekbyweek)
**Week 1: primitives**
* Add Graph Revision ID generator in `scanner.webservice` (Merkle over normalized JSON of SBOM+edges+policies+toolVersions).
* Define `VerdictReceipt` schema (protobuf/JSON) and DSSE envelope types.
**Week 2: signing + storage**
* Wire DSSE signing in **Authority**; offline key support + rotation.
* Persist receipts in `Receipts` table (Postgres) keyed by `(graphRevisionId, verdictId)`; enable export (JSONL) and ledger mirror.
**Week 3: reachability proofs**
* Add callstack slice capture in reachability engine; serialize compactly; hash + reference from receipts.
* Binary symbol proof module for ELF/PE: symbol bitmap + digest.
**Week 4: replay + UX**
* Emit `replay.manifest.json` per scan (inputs, tool digests).
* UI: show **“Verified”** badge, graph hash, signature issuer, and a oneclick “Copy receipt” button.
* API: `GET /verdicts/{id}/receipt`, `GET /graphs/{rev}/replay`.
**Week 5: benchmarks harness**
* Create `bench/` with golden fixtures and a runner:
* Baseline scanner adapters
* Groundtruth labels
* Metrics export (FP%, proof coverage, triage time capture hooks)
---
# Developer guardrails (make these nonnegotiable)
* **No receipt, no ship:** any surfaced verdict must carry a DSSE receipt.
* **Schema freeze windows:** changes to rule inputs or policy logic must bump rule version and therefore the graph hash.
* **Replayfirst CI:** PRs touching scanning/rules must pass a replay test that reproduces prior graph hashes on gold fixtures.
* **Clock safety:** use monotonic time inside receipts; add UTC walltime separately.
---
# What to show buyers/auditors
* A short **audit kit**: sample container + your receipts + replay manifest + one command to reproduce the same graph hash.
* A onepage **benchmark readout**: FP reduction, proof coverage, and triage time saved (p50/p95), with corpus description.
---
If you want, Ill draft:
1. the DSSE `predicate` schema,
2. the Postgres DDL for `Receipts` and `Graphs`, and
3. a tiny .NET verification CLI (`stellaops-verify`) that replays a manifest and validates signatures.
Heres a focused “developer guidelines” doc just for **Benchmarks for a Testable Security Moat** in StellaOps.
---
# Stella Ops Developer Guidelines
## Benchmarks for a Testable Security Moat
> **Goal:** Benchmarks are how we *prove* StellaOps is better, not just say it is. If a “moat” claim cant be tied to a benchmark, it doesnt exist.
Everything here is about how you, as a developer, design, extend, and run those benchmarks.
---
## 1. What our benchmarks must measure
Every core product claim needs at least one benchmark:
1. **Detection quality**
* Precision / recall vs ground truth.
* False positives vs popular scanners.
* False negatives on knownbad samples.
2. **Proof & evidence quality**
* % of findings with **valid receipts** (DSSE).
* % of VEX “notaffected” with attached proofs.
* Reachability proof quality:
* callstack slice present?
* symbol proof present for binaries?
3. **Triage & workflow impact**
* Timetodecision for analysts (p50/p95).
* Click depth and context switches per decision.
* “Verified” vs “unverified” verdict triage times.
4. **Determinism & reproducibility**
* Same inputs → same **Graph Revision ID**.
* Stable verdict sets across runs/nodes.
> **Rule:** If you add a feature that impacts any of these, you must either hook it into an existing benchmark or add a new one.
---
## 2. Benchmark assets and layout
**2.1 Repo layout (convention)**
Under `bench/` we maintain everything benchmarkrelated:
* `bench/corpus/`
* `images/` curated container images / tarballs.
* `repos/` sample codebases (with known vulns).
* `sboms/` canned SBOMs for edge cases.
* `bench/scenarios/`
* `*.yaml` scenario definitions (inputs + expected outputs).
* `bench/golden/`
* `*.json` golden results (expected findings, metrics).
* `bench/tools/`
* adapters for baseline scanners, parsers, helpers.
* `bench/scripts/`
* `run_benchmarks.[sh/cs]` single entrypoint.
**2.2 Scenario definition (highlevel)**
Each scenario yaml should minimally specify:
* **Inputs**
* artifact references (image name / path / repo SHA / SBOM file).
* environment knobs (features enabled/disabled).
* **Ground truth**
* list of expected vulns (or explicit “none”).
* for some: expected reachability (reachable/unreachable).
* expected VEX entries (affected / not affected).
* **Expectations**
* required metrics (e.g., “no more than 2 FPs”, “no FNs”).
* required proof coverage (e.g., “100% of surfaced findings have receipts”).
---
## 3. Core benchmark metrics (developerfacing definitions)
Use these consistently across code and docs.
### 3.1 Detection metrics
* `true_positive_count` (TP)
* `false_positive_count` (FP)
* `false_negative_count` (FN)
Derived:
* `precision = TP / (TP + FP)`
* `recall = TP / (TP + FN)`
* For UX: track **FP per asset** and **FP per 100 findings**.
**Developer guideline:**
* When you introduce a filter, deduper, or rule tweak, add/modify a scenario where:
* the change **helps** (reduces FP or FN); and
* a different scenario guards against regressions.
### 3.2 Moatspecific metrics
These are the ones that directly support the “testable moat” story:
1. **Falsepositive reduction vs baseline scanners**
* Run baseline scanners across our corpus (via adapters in `bench/tools`).
* Compute:
* `baseline_fp_rate`
* `stella_fp_rate`
* `fp_reduction = (baseline_fp_rate - stella_fp_rate) / baseline_fp_rate`.
2. **Proof coverage**
* `proof_coverage_all = findings_with_valid_receipts / total_findings`
* `proof_coverage_vex = vex_items_with_valid_receipts / total_vex_items`
* `proof_coverage_reachable = reachable_findings_with_proofs / total_reachable_findings`
3. **Triage time improvement**
* In test harnesses, simulate or record:
* `time_to_triage_with_receipts`
* `time_to_triage_without_receipts`
* Compute median & p95 deltas.
4. **Determinism**
* Rerun the same scenario `N` times:
* `% runs with identical Graph Revision ID`
* `% runs with identical verdict sets`
* On mismatch, diff and log cause (e.g., nonstable sort, nonpinned feed).
---
## 4. How developers should work with benchmarks
### 4.1 “No feature without benchmarks”
If youre adding or changing:
* graph structure,
* rule logic,
* scanner integration,
* VEX handling,
* proof / receipt generation,
you **must** do *at least one* of:
1. **Extend an existing scenario**
* Add expectations that cover your change, or
* tighten an existing bound (e.g., lower FP threshold).
2. **Add a new scenario**
* For new attack classes / edge cases / ecosystems.
**Antipatterns:**
* Shipping a new capability with *no* corresponding scenario.
* Updating golden outputs without explaining why metrics changed.
### 4.2 CI gates
We treat benchmarks as **blocking**:
* Add a CI job, e.g.:
* `make bench:quick` on every PR (small subset).
* `make bench:full` on main / nightly.
* CI fails if:
* Any scenario marked `strict: true` has:
* Precision or recall below its threshold.
* Proof coverage below its configured threshold.
* Global regressions above tolerance:
* e.g. total FP increases > X% without an explicit override.
**Developer rule:**
* If you intentionally change behavior:
* Update the relevant golden files.
* Include a short note in the PR (e.g., `bench-notes.md` snippet) describing:
* what changed,
* why the new result is better, and
* which moat metric it improves (FP, proof coverage, determinism, etc.).
---
## 5. Benchmark implementation guidelines
### 5.1 Make benchmarks deterministic
* **Pin everything**:
* feed snapshots,
* tool container digests,
* rule versions,
* time windows.
* Use **Replay Manifests** as the source of truth:
* `replay.manifest.json` should contain:
* input artifacts,
* tool versions,
* feed versions,
* configuration flags.
* If a benchmark depends on time:
* Inject a **fake clock** or explicit “as of” timestamp.
### 5.2 Keep scenarios small but meaningful
* Prefer many **focused** scenarios over a few huge ones.
* Each scenario should clearly answer:
* “What property of StellaOps are we testing?”
* “What moat claim does this support?”
Examples:
* `bench/scenarios/false_pos_kubernetes.yaml`
* Focus: config noise reduction vs baseline scanner.
* `bench/scenarios/reachability_java_webapp.yaml`
* Focus: reachable vs unreachable vuln proofs.
* `bench/scenarios/vex_not_affected_openssl.yaml`
* Focus: VEX correctness and proof coverage.
### 5.3 Use golden outputs, not adhoc assertions
* Bench harness should:
* Run StellaOps on scenario inputs.
* Normalize outputs (sorted lists, stable IDs).
* Compare to `bench/golden/<scenario>.json`.
* Golden file should include:
* expected findings (id, severity, reachable?, etc.),
* expected VEX entries,
* expected metrics (precision, recall, coverage).
---
## 6. Moatcritical benchmark types (we must have all of these)
When youre thinking about gaps, check that we have:
1. **Crosstool comparison**
* Same corpus, multiple scanners.
* Metrics vs baselines for FP/FN.
2. **Proof density & quality**
* Corpus where:
* some vulns are reachable,
* some are not,
* some are not present.
* Ensure:
* reachable ones have rich proofs (stack slices / symbol proofs).
* nonreachable or absent ones have:
* correct disposition, and
* clear receipts explaining why.
3. **VEX accuracy**
* Scenarios with known SBOM + known vulnerability impact.
* Check:
* VEX “affected”/“notaffected” matches ground truth.
* every VEX entry has a receipt.
4. **Analyst workflow**
* Small usability corpus for internal testing:
* Measure timetotriage with/without receipts.
* Use the same scenarios across releases to track improvement.
5. **Upgrade / drift resistance**
* Scenarios that are **expected to remain stable** across:
* rule changes that *shouldnt* affect outcomes.
* feed updates (within a given version window).
* These act as canaries for unintended regressions.
---
## 7. Developer checklist (TL;DR)
Before merging a change that touches security logic, ask yourself:
1. **Is there at least one benchmark scenario that exercises this change?**
2. **Does the change improve at least one moat metric, or is it neutral?**
3. **Have I run `make bench:quick` locally and checked diffs?**
4. **If goldens changed, did I explain why in the PR?**
5. **Did I keep benchmarks deterministic (pinned versions, fake time, etc.)?**
If any answer is “no”, fix that before merging.
---
If youd like, next step I can sketch a concrete `bench/scenarios/*.yaml` and matching `bench/golden/*.json` example that encodes one *specific* moat claim (e.g., “30% fewer FPs than Scanner X on Kubernetes configs”) so your team has a readyto-copy pattern.

View File

@@ -0,0 +1,287 @@
Heres a condensed **“Stella Ops Developer Guidelines”** based on the official engineering docs and dev guides.
---
## 0. Where to start
* **Dev docs index:** The main entrypoint is `Development Guides & Tooling` (docs/technical/development/README.md). It links to coding standards, test strategy, performance workbook, plugin SDK, examples, and more. ([Gitea: Git with a cup of tea][1])
* **If a term is unfamiliar:** Check the onepage *Glossary of Terms* first. ([Stella Ops][2])
* **Big picture:** Stella Ops is an SBOMfirst, offlineready container security platform; a lot of design decisions (determinism, signatures, policy DSL, SBOM delta scans) flow from that. ([Stella Ops][3])
---
## 1. Core engineering principles
From **Coding Standards & Contributor Guide**: ([Gitea: Git with a cup of tea][4])
1. **SOLID first** especially interface & dependency inversion.
2. **100line file rule** if a file grows >100 physical lines, split or refactor.
3. **Contracts vs runtime** public DTOs and interfaces live in lightweight `*.Contracts` projects; implementations live in sibling runtime projects.
4. **Single composition root** DI wiring happens in `StellaOps.Web/Program.cs` and each plugins `IoCConfigurator`. Nothing else creates a service provider.
5. **No service locator** constructor injection only; no global `ServiceProvider` or static service lookups.
6. **Failfast startup** validate configuration *before* the web host starts listening.
7. **Hotload compatibility** avoid static singletons that would survive plugin unload; dont manually load assemblies outside the builtin loader.
These all serve the product goals of **deterministic, offline, explainable security decisions**. ([Stella Ops][3])
---
## 2. Repository layout & layering
From the repo layout section: ([Gitea: Git with a cup of tea][4])
* **Toplevel structure (simplified):**
```text
src/
backend/
StellaOps.Web/ # ASP.NET host + composition root
StellaOps.Common/ # logging, helpers
StellaOps.Contracts/ # DTO + interface contracts
… more runtime projects
plugins-sdk/ # plugin templates & abstractions
frontend/ # Angular workspace
tests/ # mirrors src 1to1
```
* **Rules:**
* No “Module” folders or nested solution hierarchies.
* Tests mirror `src/` structure 1:1; **no test code in production projects**.
* New features follow *feature folder* layout (e.g., `Scan/ScanService.cs`, `Scan/ScanController.cs`).
---
## 3. Naming, style & language usage
Key conventions: ([Gitea: Git with a cup of tea][4])
* **Namespaces:** filescoped, `StellaOps.*`.
* **Interfaces:** `I` prefix (`IScannerRunner`).
* **Classes/records:** PascalCase (`ScanRequest`, `TrivyRunner`).
* **Private fields:** `camelCase` (no leading `_`).
* **Constants:** `SCREAMING_SNAKE_CASE`.
* **Async methods:** end with `Async`.
* **Usings:** outside namespace, sorted, no wildcard imports.
* **File length:** keep ≤100 lines including `using` and braces (enforced by tooling).
C# feature usage: ([Gitea: Git with a cup of tea][4])
* Nullable reference types **on**.
* Use `record` for immutable DTOs.
* Prefer pattern matching over long `switch` cascades.
* `Span`/`Memory` only when youve measured that you need them.
* Use `await foreach` instead of manual iterator loops.
Formatting & analysis:
* `dotnet format` must be clean; StyleCop + security analyzers + CodeQL run in CI and are treated as gates. ([Gitea: Git with a cup of tea][4])
---
## 4. Dependency injection, async & concurrency
DI policy (core + plugins): ([Gitea: Git with a cup of tea][4])
* Exactly **one composition root** per process (`StellaOps.Web/Program.cs`).
* Plugins contribute through:
* `[ServiceBinding]` attributes for simple bindings, or
* An `IoCConfigurator : IDependencyInjectionRoutine` for advanced setups.
* Default lifetime is **scoped**. Use singletons only for truly stateless, threadsafe helpers.
* Never use a service locator or manually build nested service providers except in tests.
Async & threading: ([Gitea: Git with a cup of tea][4])
* All I/O is async; avoid `.Result` / `.Wait()`.
* Library code uses `ConfigureAwait(false)`.
* Control concurrency with channels or `Parallel.ForEachAsync`, not adhoc `Task.Run` loops.
---
## 5. Tests, tooling & quality gates
The **Automated TestSuite Overview** spells out all CI layers and budgets. ([Gitea: Git with a cup of tea][5])
**Test layers (highlevel):**
* Unit tests: xUnit.
* Propertybased tests: FsCheck.
* Integration:
* API integration with Testcontainers.
* DB/merge flows using Mongo + Redis.
* Contracts: gRPC breakage checks with Buf.
* Frontend:
* Unit tests with Jest.
* E2E tests with Playwright.
* Lighthouse runs for performance & accessibility.
* Nonfunctional:
* Load tests via k6.
* Chaos experiments (CPU/OOM) using Docker tooling.
* Dependency & license scanning.
* SBOM reproducibility/attestation checks.
**Quality gates (examples):** ([Gitea: Git with a cup of tea][5])
* API unit test line coverage ≥ ~85%.
* API P95 latency ≤ ~120ms in nightly runs.
* ΔSBOM warm scan P95 ≤ ~5s on reference hardware.
* Lighthouse perf score ≥ ~90, a11y ≥ ~95.
**Local workflows:**
* Use `./scripts/dev-test.sh` for “fast” local runs and `--full` for the entire stack (API, UI, Playwright, Lighthouse, etc.). Needs Docker and modern Node. ([Gitea: Git with a cup of tea][5])
* Some suites use Mongo2Go + an OpenSSL 1.1 shim; others use a helper script to spin up a local `mongod` for deeper debugging. ([Gitea: Git with a cup of tea][5])
---
## 6. Plugins & connectors
The **Plugin SDK Guide** is your bible for schedule jobs, scanner adapters, TLS providers, notification channels, etc. ([Gitea: Git with a cup of tea][6])
**Basics:**
* Use `.NET` templates to scaffold:
```bash
dotnet new stellaops-plugin-schedule -n MyPlugin.Schedule --output src
```
* At publish time, copy **signed** artefacts to:
```text
src/backend/Stella.Ops.Plugin.Binaries/<MyPlugin>/
MyPlugin.dll
MyPlugin.dll.sig
```
* The backend:
* Verifies the Cosign signature.
* Enforces `[StellaPluginVersion]` compatibility.
* Loads plugins in isolated `AssemblyLoadContext`s.
**DI entrypoints:**
* For simple cases, mark implementations with `[ServiceBinding(typeof(IMyContract), ServiceLifetime.Scoped, …)]`.
* For more control, implement `IoCConfigurator : IDependencyInjectionRoutine` and configure services/options in `Register(...)`. ([Gitea: Git with a cup of tea][6])
**Examples:**
* **Schedule job:** implement `IJob.ExecuteAsync`, add `[StellaPluginVersion("X.Y.Z")]`, register cron with `services.AddCronJob<MyJob>("0 15 * * *")`.
* **Scanner adapter:** implement `IScannerRunner` and register via `services.AddScanner<MyAltScanner>("alt")`; document Docker sidecars if needed. ([Gitea: Git with a cup of tea][6])
**Signing & deployment:**
* Publish, sign with Cosign, optionally zip:
```bash
dotnet publish -c Release -p:PublishSingleFile=true -o out
cosign sign --key $COSIGN_KEY out/MyPlugin.Schedule.dll
```
* Copy into the backend container (e.g., `/opt/plugins/`) and restart.
* Unsigned DLLs are rejected when `StellaOps:Security:DisableUnsigned=false`. ([Gitea: Git with a cup of tea][6])
**Marketplace:**
* Tag releases like `plugin-vX.Y.Z`, attach the signed ZIP, and submit metadata to the community plugin index so it shows up in the UI Marketplace. ([Gitea: Git with a cup of tea][6])
---
## 7. Policy DSL & security decisions
For policy authors and tooling engineers, the **Stella Policy DSL (stelladsl@1)** doc is key. ([Stella Ops][7])
**Goals:**
* Deterministic: same inputs → same findings on every machine.
* Declarative: no arbitrary loops, network calls, or clocks.
* Explainable: each decision carries rule, inputs, rationale.
* Offlinefriendly and reachabilityaware (SBOM + advisories + VEX + reachability). ([Stella Ops][7])
**Structure:**
* One `policy` block per `.stella` file, with:
* `metadata` (description, tags).
* `profile` blocks (severity, trust, reachability adjustments).
* `rule` blocks (`when` / `then` logic).
* Optional `settings`. ([Stella Ops][7])
**Context & builtins:**
* Namespaces like `sbom`, `advisory`, `vex`, `env`, `telemetry`, `secret`, `profile.*`, etc. ([Stella Ops][7])
* Helpers such as `normalize_cvss`, `risk_score`, `vex.any`, `vex.latest`, `sbom.any_component`, `exists`, `coalesce`, and secretsspecific helpers. ([Stella Ops][7])
**Rules of thumb:**
* Always include a clear `because` when you change `status` or `severity`. ([Stella Ops][7])
* Avoid catchall suppressions (`when true` + `status := "suppressed"`); the linter will flag them. ([Stella Ops][7])
* Use `stella policy lint/compile/simulate` in CI and locally; test in sealed (offline) mode to ensure no network dependencies. ([Stella Ops][7])
---
## 8. Commits, PRs & docs
From the commit/PR checklist: ([Gitea: Git with a cup of tea][4])
Before opening a PR:
1. Use **Conventional Commit** prefixes (`feat:`, `fix:`, `docs:`, etc.).
2. Run `dotnet format` and `dotnet test`; both must be green.
3. Keep new/changed files within the 100line guideline.
4. Update XMLdoc comments for any new public API.
5. If you add/change a public contract:
* Update the relevant markdown docs.
* Update JSON schema / API descriptions as needed.
6. Ensure static analyzers and CI jobs relevant to your change are passing.
For new test layers or jobs, also update the testsuite overview and metrics docs so the CI configuration stays discoverable. ([Gitea: Git with a cup of tea][5])
---
## 9. Licensing & reciprocity
Stella Ops ships under **AGPL3.0orlater** with a strong reciprocity clause: ([Stella Ops][8])
* You may run, study, modify, and redistribute it, including as a hosted service.
* If you run a **modified** version for others over a network, you must make that exact source code available to those users.
* Official containers are signed and include SBOMs and attestations; verify them with Cosign as described on the license/security pages. ([Stella Ops][8])
When you build extensions:
* Keep plugins compatible with AGPL expectations around combined works.
* Dont embed proprietary logic into the core without checking license implications.
---
## 10. If you just want a “first contribution” recipe
A practical path that follows the guidelines:
1. Clone the main repo; skim **Coding Standards**, **Test Suite Overview**, and the **Dev Guides & Tooling** index. ([Gitea: Git with a cup of tea][4])
2. Get `dotnet`, Docker, Node set up; run `./scripts/dev-test.sh` to make sure your environment is healthy. ([Gitea: Git with a cup of tea][5])
3. Pick a small issue (docs, small refactor, or new test), make changes respecting:
* 100line files,
* DI patterns,
* naming & style.
4. Add/adjust tests plus any affected docs or JSON schemas.
5. Run tests + formatting locally, push, and open a PR with a conventional title and a short “how I tested this” note.
If you tell me what youre planning to work on (plugin, policy pack, core feature, or UI), I can turn this into a very concrete checklist tailored to that slice of Stella Ops.
[1]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/08b27b8a266c82960c7653797460e1e1d17ecd45/docs/technical/development/README.md "git.stella-ops.org/README.md at 08b27b8a266c82960c7653797460e1e1d17ecd45 - git.stella-ops.org - Gitea: Git with a cup of tea"
[2]: https://stella-ops.org/docs/14_glossary_of_terms/?utm_source=chatgpt.com "Open • Sovereign • Modular container security - Stella Ops"
[3]: https://stella-ops.org/docs/05_SYSTEM_REQUIREMENTS_SPEC/?utm_source=chatgpt.com "system requirements specification - Stella Ops Open • Sovereign ..."
[4]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/08b27b8a266c82960c7653797460e1e1d17ecd45/docs/18_CODING_STANDARDS.md "git.stella-ops.org/18_CODING_STANDARDS.md at 08b27b8a266c82960c7653797460e1e1d17ecd45 - git.stella-ops.org - Gitea: Git with a cup of tea"
[5]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/08b27b8a266c82960c7653797460e1e1d17ecd45/docs/19_TEST_SUITE_OVERVIEW.md "git.stella-ops.org/19_TEST_SUITE_OVERVIEW.md at 08b27b8a266c82960c7653797460e1e1d17ecd45 - git.stella-ops.org - Gitea: Git with a cup of tea"
[6]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/08b27b8a266c82960c7653797460e1e1d17ecd45/docs/10_PLUGIN_SDK_GUIDE.md "git.stella-ops.org/10_PLUGIN_SDK_GUIDE.md at 08b27b8a266c82960c7653797460e1e1d17ecd45 - git.stella-ops.org - Gitea: Git with a cup of tea"
[7]: https://stella-ops.org/docs/policy/dsl/index.html "Stella Ops Signed Reachability · Deterministic Replay · Sovereign Crypto"
[8]: https://stella-ops.org/license/?utm_source=chatgpt.com "AGPL3.0orlater - Stella Ops"

View File

@@ -0,0 +1,585 @@
Heres a tight, practical pattern to make your scanners vulnDB updates rocksolid even when feeds hiccup:
# Offline, verifiable update bundles (DSSE + Rekor v2)
**Idea:** distribute DB updates as offline tarballs. Each tarball ships with:
* a **DSSEsigned** statement (e.g., intoto style) over the bundle hash
* a **Rekor v2 receipt** proving the signature/statement was logged
* a small **manifest.json** (version, created_at, content hashes)
**Startup flow (happy path):**
1. Load latest tarball from your local `updates/` cache.
2. Verify DSSE signature against your trusted public keys.
3. Verify Rekor v2 receipt (inclusion proof) matches the DSSE payload hash.
4. If both pass, unpack/activate; record the bundles **trust_id** (e.g., statement digest).
5. If anything fails, **keep using the last good bundle**. No service disruption.
**Why this helps**
* **Airgap friendly:** no live network needed at activation time.
* **Tamperevident:** DSSE + Rekor receipt proves provenance and transparency.
* **Operational stability:** feed outages become nonevents—scanner just keeps the last good state.
---
## File layout inside each bundle
```
/bundle-2025-11-29/
manifest.json # { version, created_at, entries[], sha256s }
payload.tar.zst # the actual DB/indices
payload.tar.zst.sha256
statement.dsse.json # DSSE-wrapped statement over payload hash
rekor-receipt.json # Rekor v2 inclusion/verification material
```
---
## Acceptance/Activation rules
* **Trust root:** pin one (or more) publisher public keys; rotate via separate, outofband process.
* **Monotonicity:** only activate if `manifest.version > current.version` (or if trust policy explicitly allows replay for rollback testing).
* **Atomic switch:** unpack to `db/staging/`, validate, then symlinkflip to `db/active/`.
* **Quarantine on failure:** move bad bundles to `updates/quarantine/` with a reason code.
---
## Minimal .NET 10 verifier sketch (C#)
```csharp
public sealed record BundlePaths(string Dir) {
public string Manifest => Path.Combine(Dir, "manifest.json");
public string Payload => Path.Combine(Dir, "payload.tar.zst");
public string Dsse => Path.Combine(Dir, "statement.dsse.json");
public string Receipt => Path.Combine(Dir, "rekor-receipt.json");
}
public async Task<bool> ActivateBundleAsync(BundlePaths b, TrustConfig trust, string activeDir) {
var manifest = await Manifest.LoadAsync(b.Manifest);
if (!await Hashes.VerifyAsync(b.Payload, manifest.PayloadSha256)) return false;
// 1) DSSE verify (publisher keys pinned in trust)
var (okSig, dssePayloadDigest) = await Dsse.VerifyAsync(b.Dsse, trust.PublisherKeys);
if (!okSig || dssePayloadDigest != manifest.PayloadSha256) return false;
// 2) Rekor v2 receipt verify (inclusion + statement digest == dssePayloadDigest)
if (!await RekorV2.VerifyReceiptAsync(b.Receipt, dssePayloadDigest, trust.RekorPub)) return false;
// 3) Stage, validate, then atomically flip
var staging = Path.Combine(activeDir, "..", "staging");
DirUtil.Empty(staging);
await TarZstd.ExtractAsync(b.Payload, staging);
if (!await LocalDbSelfCheck.RunAsync(staging)) return false;
SymlinkUtil.AtomicSwap(source: staging, target: activeDir);
State.WriteLastGood(manifest.Version, dssePayloadDigest);
return true;
}
```
---
## Operational playbook
* **On boot & daily at HH:MM:** try `ActivateBundleAsync()` on the newest bundle; on failure, log and continue.
* **Telemetry (no PII):** reason codes (SIG_FAIL, RECEIPT_FAIL, HASH_MISMATCH, SELFTEST_FAIL), versions, last_good.
* **Keys & rotation:** keep `publisher.pub` and `rekor.pub` in a rootowned, readonly path; rotate via a separate signed “trust bundle”.
* **Defenseindepth:** verify both the **payload hash** and each files hash listed in `manifest.entries[]`.
* **Rollback:** allow `--force-activate <bundle>` for emergency testing, but mark as **nonmonotonic** in state.
---
## What to hand your release team
* A Make/CI target that:
1. Builds `payload.tar.zst` and computes hashes
2. Generates `manifest.json`
3. Creates and signs the **DSSE statement**
4. Submits to Rekor (or your mirror) and saves the **v2 receipt**
5. Packages the bundle folder and publishes to your offline repo
* A checksum file (`*.sha256sum`) for ops to verify outofband.
---
If you want, I can turn this into a StellaOps spec page (`docs/modules/scanner/offline-bundles.md`) plus a small reference implementation (C# library + CLI) that drops right into your Scanner service.
Heres a “dropin” Stella Ops dev guide for **DSSEsigned Offline Scanner Updates** — written in the same spirit as the existing docs and sprint files.
You can treat this as the seed for `docs/modules/scanner/development/dsse-offline-updates.md` (or similar).
---
# DSSESigned Offline Scanner Updates — Developer Guidelines
> **Audience**
> Scanner, Export Center, Attestor, CLI, and DevOps engineers implementing DSSEsigned offline vulnerability updates and integrating them into the Offline Update Kit (OUK).
>
> **Context**
>
> * OUK already ships **signed, atomic offline update bundles** with merged vulnerability feeds, container images, and an attested manifest.([git.stella-ops.org][1])
> * DSSE + Rekor is already used for **scan evidence** (SBOM attestations, Rekor proofs).([git.stella-ops.org][2])
> * Sprints 160/162 add **attestation bundles** with manifest, checksums, DSSE signature, and optional transparency log segments, and integrate them into OUK and CLI flows.([git.stella-ops.org][3])
These guidelines tell you how to **wire all of that together** for “offline scanner updates” (feeds, rules, packs) in a way that matches Stella Ops determinism + sovereignty promises.
---
## 0. Mental model
At a high level, youre building this:
```text
Advisory mirrors / Feeds builders
ExportCenter.AttestationBundles
(creates DSSE + Rekor evidence
for each offline update snapshot)
Offline Update Kit (OUK) builder
(adds feeds + evidence to kit tarball)
stella offline kit import / admin CLI
(verifies Cosign + DSSE + Rekor segments,
then atomically swaps scanner feeds)
```
Online, Rekor is live; offline, you rely on **bundled Rekor segments / snapshots** and the existing OUK mechanics (import is atomic, old feeds kept until new bundle is fully verified).([git.stella-ops.org][1])
---
## 1. Goals & nongoals
### Goals
1. **Authentic offline snapshots**
Every offline scanner update (OUK or delta) must be verifiably tied to:
* a DSSE envelope,
* a certificate chain rooted in Stellas Fulcio/KMS profile or BYO KMS/HSM,
* *and* a Rekor v2 inclusion proof or bundled log segment.([Stella Ops][4])
2. **Deterministic replay**
Given:
* a specific offline update kit (`stella-ops-offline-kit-<DATE>.tgz` + `offline-manifest-<DATE>.json`)([git.stella-ops.org][1])
* its DSSE attestation bundle + Rekor segments
every verifier must reach the *same* verdict on integrity and contents — online or fully airgapped.
3. **Separation of concerns**
* Export Center: build attestation bundles, no business logic about scanning.([git.stella-ops.org][5])
* Scanner: import & apply feeds; verify but not generate DSSE.
* Signer / Attestor: own DSSE & Rekor integration.([git.stella-ops.org][2])
4. **Operational safety**
* Imports remain **atomic and idempotent**.
* Old feeds stay live until the new update is **fully verified** (Cosign + DSSE + Rekor).([git.stella-ops.org][1])
### Nongoals
* Designing new crypto or log formats.
* Perfeed DSSE envelopes (you can have more later, but the minimum contract is **bundlelevel** attestation).
---
## 2. Bundle contract for DSSEsigned offline updates
Youre extending the existing OUK contract:
* OUK already packs:
* merged vuln feeds (OSV, GHSA, optional NVD 2.0, CNNVD/CNVD, ENISA, JVN, BDU),
* container images (`stella-ops`, Zastava, etc.),
* provenance (Cosign signature, SPDX SBOM, intoto SLSA attestation),
* `offline-manifest.json` + detached JWS signed during export.([git.stella-ops.org][1])
For **DSSEsigned offline scanner updates**, add a new logical layer:
### 2.1. Files to ship
Inside each offline kit (full or delta) you must produce:
```text
/attestations/
offline-update.dsse.json # DSSE envelope
offline-update.rekor.json # Rekor entry + inclusion proof (or segment descriptor)
/manifest/
offline-manifest.json # existing manifest
offline-manifest.json.jws # existing detached JWS
/feeds/
... # existing feed payloads
```
The exact paths can be adjusted, but keep:
* **One DSSE bundle per kit** (min spec).
* **One canonical Rekor proof file** per DSSE envelope.
### 2.2. DSSE payload contents (minimal)
Define (or reuse) a predicate type such as:
```jsonc
{
"payloadType": "application/vnd.in-toto+json",
"payload": { /* base64 */ }
}
```
Decoded payload (in-toto statement) should **at minimum** contain:
* **Subject**
* `name`: `stella-ops-offline-kit-<DATE>.tgz`
* `digest.sha256`: tarball digest
* **Predicate type** (recommendation)
* `https://stella-ops.org/attestations/offline-update/1`
* **Predicate fields**
* `offline_manifest_sha256` SHA256 of `offline-manifest.json`
* `feeds` array of feed entries such as `{ name, snapshot_date, archive_digest }` (mirrors `rules_and_feeds` style used in the moat doc).([Stella Ops][6])
* `builder` CI workflow id / git commit / Export Center job id
* `created_at` UTC ISO8601
* `oukit_channel` e.g., `edge`, `stable`, `fips-profile`
**Guideline:** this DSSE payload is the **single canonical description** of “what this offline update snapshot is”.
### 2.3. Rekor material
Attestor must:
* Submit `offline-update.dsse.json` to Rekor v2, obtaining:
* `uuid`
* `logIndex`
* inclusion proof (`rootHash`, `hashes`, `checkpoint`)
* Serialize that to `offline-update.rekor.json` and store it in object storage + OUK staging, so it ships in the kit.([git.stella-ops.org][2])
For fully offline operation:
* Either:
* embed a **minimal log segment** containing that entry; or
* rely on daily Rekor snapshot exports included elsewhere in the kit.([git.stella-ops.org][2])
---
## 3. Implementation by module
### 3.1 Export Center — attestation bundles
**Working directory:** `src/ExportCenter/StellaOps.ExportCenter.AttestationBundles`([git.stella-ops.org][7])
**Responsibilities**
1. **Compose attestation bundle job** (EXPORTATTEST74001)
* Input: a snapshot identifier (e.g., offline kit build id or feed snapshot date).
* Read manifest and feed metadata from the Export Centers storage.([git.stella-ops.org][5])
* Generate the DSSE payload structure described above.
* Call `StellaOps.Signer` to wrap it in a DSSE envelope.
* Call `StellaOps.Attestor` to submit DSSE → Rekor and get the inclusion proof.([git.stella-ops.org][2])
* Persist:
* `offline-update.dsse.json`
* `offline-update.rekor.json`
* any log segment artifacts.
2. **Integrate into offline kit packaging** (EXPORTATTEST74002 / 75001)
* The OUK builder (Python script `ops/offline-kit/build_offline_kit.py`) already assembles artifacts & manifests.([Stella Ops][8])
* Extend that pipeline (or add an Export Center step) to:
* fetch the attestation bundle for the snapshot,
* place it under `/attestations/` in the kit staging dir,
* ensure `offline-manifest.json` contains entries for the DSSE and Rekor files (name, sha256, size, capturedAt).([git.stella-ops.org][1])
3. **Contracts & schemas**
* Define a small JSON schema for `offline-update.rekor.json` (UUID, index, proof fields) and check it into `docs/11_DATA_SCHEMAS.md` or modulelocal schemas.
* Keep all new payload schemas **versioned**; avoid “shape drift”.
**Do / Dont**
***Do** treat attestation bundle job as *pure aggregation* (AOC guardrail: no modification of evidence).([git.stella-ops.org][5])
***Do** rely on Signer + Attestor; dont handroll DSSE/Rekor logic in Export Center.([git.stella-ops.org][2])
***Dont** reach out to external networks from this job — it must run with the same offlineready posture as the rest of the platform.
---
### 3.2 Offline Update Kit builder
**Working area:** `ops/offline-kit/*` + `docs/24_OFFLINE_KIT.md`([git.stella-ops.org][1])
Guidelines:
1. **Preserve current guarantees**
* Imports must remain **idempotent and atomic**, with **old feeds kept until the new bundle is fully verified**. This now includes DSSE/Rekor checks in addition to Cosign + JWS.([git.stella-ops.org][1])
2. **Staging layout**
* When staging a kit, ensure the tree looks like:
```text
out/offline-kit/staging/
feeds/...
images/...
manifest/offline-manifest.json
attestations/offline-update.dsse.json
attestations/offline-update.rekor.json
```
* Update `offline-manifest.json` so each new file appears with:
* `name`, `sha256`, `size`, `capturedAt`.([git.stella-ops.org][1])
3. **Deterministic ordering**
* File lists in manifests must be in a stable order (e.g., lexical paths).
* Timestamps = UTC ISO8601 only; never use local time. (Matches determinism guidance in AGENTS.md + policy/runs docs.)([git.stella-ops.org][9])
4. **Delta kits**
* For deltas (`stella-ouk-YYYY-MM-DD.delta.tgz`), DSSE should still cover:
* the delta tarball digest,
* the **logical state** (feeds & versions) after applying the delta.
* Dont shortcut by “attesting only the diff files” — the predicate must describe the resulting snapshot.
---
### 3.3 Scanner — import & activation
**Working directory:** `src/Scanner/StellaOps.Scanner.WebService`, `StellaOps.Scanner.Worker`([git.stella-ops.org][9])
Scanner already exposes admin flows for:
* **Offline kit import**, which:
* validates the Cosign signature of the kit,
* uses the attested manifest,
* keeps old feeds until verification is done.([git.stella-ops.org][1])
Add DSSE/Rekor awareness as follows:
1. **Verification sequence (happy path)**
On `import-offline-usage-kit`:
1. Validate **Cosign** signature of the tarball.
2. Validate `offline-manifest.json` with its JWS signature.
3. Verify **file digests** for all entries (including `/attestations/*`).
4. Verify **DSSE**:
* Call `StellaOps.Attestor.Verify` (or CLI equivalent) with:
* `offline-update.dsse.json`
* `offline-update.rekor.json`
* local Rekor log snapshot / segment (if configured)([git.stella-ops.org][2])
* Ensure the payload digest matches the kit tarball + manifest digests.
5. Only after all checks pass:
* swap Scanners feed pointer to the new snapshot,
* emit an audit event noting:
* kit filename, tarball digest,
* DSSE statement digest,
* Rekor UUID + log index.
2. **Config surface**
Add config keys (names illustrative):
```yaml
scanner:
offlineKit:
requireDsse: true # fail import if DSSE/Rekor verification fails
rekorOfflineMode: true # use local snapshots only
attestationVerifier: https://attestor.internal
```
* Mirror them via ASP.NET Core config + env vars (`SCANNER__OFFLINEKIT__REQUIREDSSSE`, etc.), following the same pattern as the DSSE/Rekor operator guide.([git.stella-ops.org][2])
3. **Failure behaviour**
* **DSSE/Rekor fail, Cosign + manifest OK**
* Keep old feeds active.
* Mark import as failed; surface a `ProblemDetails` error via API/UI.
* Log structured fields: `rekorUuid`, `attestationDigest`, `offlineKitHash`, `failureReason`.([git.stella-ops.org][2])
* **Config flag to soften during rollout**
* When `requireDsse=false`, treat DSSE/Rekor failure as a warning and still allow the import (for initial observation phase), but emit alerts. This mirrors the “observe → enforce” pattern in the DSSE/Rekor operator guide.([git.stella-ops.org][2])
---
### 3.4 Signer & Attestor
You mostly **reuse** existing guidance:([git.stella-ops.org][2])
* Add a new predicate type & schema for offline updates in Signer.
* Ensure Attestor:
* can submit offlineupdate DSSE envelopes to Rekor,
* can emit verification routines (used by CLI and Scanner) that:
* verify the DSSE signature,
* check the certificate chain against the configured root pack (FIPS/eIDAS/GOST/SM, etc.),([Stella Ops][4])
* verify Rekor inclusion using either live log or local snapshot.
* For fully airgapped installs:
* rely on Rekor **snapshots mirrored** into Offline Kit (already recommended in the operator guides offline section).([git.stella-ops.org][2])
---
### 3.5 CLI & UI
Extend CLI with explicit verbs (matching EXPORTATTEST sprints):([git.stella-ops.org][10])
* `stella attest bundle verify --bundle path/to/offline-kit.tgz --rekor-key rekor.pub`
* `stella attest bundle import --bundle ...` (for sites that prefer a twostep “verify then import” flow)
* Wire UI Admin → Offline Kit screen so that:
* verification status shows both **Cosign/JWS** and **DSSE/Rekor** state,
* policy banners display kit generation time, manifest hash, and DSSE/Rekor freshness.([Stella Ops][11])
---
## 4. Determinism & offlinesafety rules
When touching any of this code, keep these rules frontofmind (they align with the policy DSL and architecture docs):([Stella Ops][4])
1. **No hidden network dependencies**
* All verification **must work offline** given the kit + Rekor snapshots.
* Any fallback to live Rekor / Fulcio endpoints must be explicitly toggled and never on by default for “offline mode”.
2. **Stable serialization**
* DSSE payload JSON:
* stable ordering of fields,
* no float weirdness,
* UTC timestamps.
3. **Replayable imports**
* Running `import-offline-usage-kit` twice with the same bundle must be a noop after the first time.
* The DSSE payload for a given snapshot must not change over time; if it does, bump the predicate or snapshot version.
4. **Explainability**
* When verification fails, errors must explain **what** mismatched (kit digest, manifest digest, DSSE envelope hash, Rekor inclusion) so auditors can reason about it.
---
## 5. Testing & CI expectations
Tie this into the existing CI workflows (`scanner-determinism.yml`, `attestation-bundle.yml`, `offline-kit` pipelines, etc.):([git.stella-ops.org][12])
### 5.1 Unit & integration tests
Write tests that cover:
1. **Happy paths**
* Full kit import with valid:
* Cosign,
* manifest JWS,
* DSSE,
* Rekor proof (online and offline modes).
2. **Corruption scenarios**
* Tampered feed file (hash mismatch).
* Tampered `offline-manifest.json`.
* Tampered DSSE payload (signature fails).
* Mismatched Rekor entry (payload digest doesnt match DSSE).
3. **Offline scenarios**
* No network access, only Rekor snapshot:
* DSSE verification still passes,
* Rekor proof validates against local tree head.
4. **Rollback logic**
* Import fails at DSSE/Rekor step:
* scanner DB still points at previous feeds,
* metrics/logs show failure and no partial state.
### 5.2 SLOs & observability
Reuse metrics suggested by DSSE/Rekor guide and adapt to OUK imports:([git.stella-ops.org][2])
* `offlinekit_import_total{status="success|failed_dsse|failed_rekor|failed_cosign"}`
* `offlinekit_attestation_verify_latency_seconds` (histogram)
* `attestor_rekor_success_total`, `attestor_rekor_retry_total`, `rekor_inclusion_latency`
* Dashboards: kit versions per environment, time since last kit, DSSE/Rekor health.
---
## 6. Developer checklist (TL;DR)
When you pick up a task touching DSSEsigned offline updates:
1. **Read the background**
* `docs/modules/scanner/operations/dsse-rekor-operator-guide.md`([git.stella-ops.org][2])
* `docs/24_OFFLINE_KIT.md` (and public offline kit guide).([git.stella-ops.org][1])
* Relevant sprint file (`SPRINT_160_export_evidence`, `SPRINT_162_exportcenter_i`, etc.).([git.stella-ops.org][10])
2. **Implement**
* Generate DSSE payloads in Export Center only.
* Call Signer & Attestor; persist DSSE + Rekor JSON next to manifests.
* Extend OUK builder to include attestation bundle and list it in `offline-manifest.json`.
* Update Scanner import flow to verify DSSE/Rekor before swapping feeds.
3. **Test**
* Unit tests for bundle composition & schema.
* Integration tests for import + rollback.
* Determinism tests (same inputs → same DSSE payload).
4. **Wire telemetry**
* Counters + latency histograms.
* Logs with `offlineKitHash`, `attestationDigest`, `rekorUuid`.
5. **Document**
* Update `docs/modules/export-center/architecture.md`, `docs/modules/scanner/architecture.md`, and the OUK docs where flows or contracts changed.([git.stella-ops.org][5])
---
If you tell me which module youre actually coding in next (Scanner, Export Center, CLI, or Attestor), I can turn this into a very concrete “AGENTS.mdstyle” section with exact file paths, class names, and a starter test layout for that module.
[1]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/7bf40b8589c94078e8eadb240553c02f097a5127/docs/24_OFFLINE_KIT.md "git.stella-ops.org/24_OFFLINE_KIT.md at 7bf40b8589c94078e8eadb240553c02f097a5127 - git.stella-ops.org - Gitea: Git with a cup of tea"
[2]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/13e4b53dda1575ba46c6188c794fd465ec6fdeec/docs/modules/scanner/operations/dsse-rekor-operator-guide.md "git.stella-ops.org/dsse-rekor-operator-guide.md at 13e4b53dda1575ba46c6188c794fd465ec6fdeec - git.stella-ops.org - Gitea: Git with a cup of tea"
[3]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/raw/commit/61f963fd52cd4d6bb2f86afc5a82eac04c04b00e/docs/implplan/SPRINT_162_exportcenter_i.md?utm_source=chatgpt.com "https://git.stella-ops.org/stella-ops.org/git.stel..."
[4]: https://stella-ops.org/docs/07_high_level_architecture/index.html?utm_source=chatgpt.com "Open • Sovereign • Modular container security - Stella Ops"
[5]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/d870da18ce194c6a5f1a6d71abea36205d9fb276/docs/export-center/architecture.md?utm_source=chatgpt.com "Export Center Architecture - Stella Ops"
[6]: https://stella-ops.org/docs/moat/?utm_source=chatgpt.com "Open • Sovereign • Modular container security - Stella Ops"
[7]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/79b8e53441e92dbc63684f42072434d40b80275f/src/ExportCenter?utm_source=chatgpt.com "Code - Stella Ops"
[8]: https://stella-ops.org/docs/24_offline_kit/?utm_source=chatgpt.com "Offline Update Kit (OUK) — AirGap Bundle - Stella Ops Open ..."
[9]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/7768555f2d107326050cc5ff7f5cb81b82b7ce5f/AGENTS.md "git.stella-ops.org/AGENTS.md at 7768555f2d107326050cc5ff7f5cb81b82b7ce5f - git.stella-ops.org - Gitea: Git with a cup of tea"
[10]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/66cb6c4b8af58a33efa1521b7953dda834431497/docs/implplan/SPRINT_160_export_evidence.md?utm_source=chatgpt.com "git.stella-ops.org/SPRINT_160_export_evidence.md at ..."
[11]: https://stella-ops.org/about/?utm_source=chatgpt.com "Signed Reachability · Deterministic Replay · Sovereign Crypto"
[12]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/actions/?actor=0&status=0&workflow=sdk-publish.yml&utm_source=chatgpt.com "Actions - git.stella-ops.org - Gitea: Git with a cup of tea"

View File

@@ -0,0 +1,819 @@
Heres a crisp, opinionated storage blueprint you can hand to your StellaOps devs right now, plus zerodowntime conversion tactics so you can keep prototyping fast without painting yourself into a corner.
# Module → store map (deterministic by default)
* **Authority / OAuth / Accounts & Audit**
* **PostgreSQL** as the primary source of truth.
* Tables: `users`, `clients`, `oauth_tokens`, `roles`, `grants`, `audit_log`.
* **RowLevel Security (RLS)** on `users`, `grants`, `audit_log`; **STRICT FK + CHECK** constraints; **immutable UUID PKs**.
* **Audit**: `audit_log(actor_id, action, entity, entity_id, at timestamptz default now(), diff jsonb)`.
* **Why**: ACID + RLS keeps authz decisions and audit trails deterministic and reviewable.
* **VEX & Vulnerability Writes**
* **PostgreSQL** with **JSONB facts + relational decisions**.
* Tables: `vuln_fact(jsonb)`, `vex_decision(package_id, vuln_id, status, rationale, proof_ref, updated_at)`.
* **Materialized views** for triage queues, e.g. `mv_triage_hotset` (refresh on commit or scheduled).
* **Why**: JSONB lets you ingest vendorshaped docs; decisions stay relational for joins, integrity, and explainability.
* **Routing / Feature Flags / Ratelimits**
* **PostgreSQL** (truth) + **Redis** (cache).
* Tables: `feature_flag(key, rules jsonb, version)`, `route(domain, service, instance_id, last_heartbeat)`, `rate_limiter(bucket, quota, interval)`.
* Redis keys: `flag:{key}:{version}`, `route:{domain}`, `rl:{bucket}` with short TTLs.
* **Why**: one canonical RDBMS for consistency; Redis for hot path latency.
* **Unknowns Registry (ambiguity tracker)**
* **PostgreSQL** with **temporal tables** (bitemporal pattern via `valid_from/valid_to`, `sys_from/sys_to`).
* Table: `unknowns(subject_hash, kind, context jsonb, valid_from, valid_to, sys_from default now(), sys_to)`.
* Views: `unknowns_current` where `valid_to is null`.
* **Why**: preserves how/when uncertainty changed (critical for proofs and audits).
* **Artifacts / SBOM / VEX files**
* **OCIcompatible CAS** (e.g., selfhosted registry or MinIO bucket as contentaddressable store).
* Keys by **digest** (`sha256:...`), metadata in Postgres `artifact(index)` with `digest`, `media_type`, `size`, `signatures`.
* **Why**: blobs dont belong in your RDBMS; use CAS for scale + cryptographic addressing.
---
# PostgreSQL implementation essentials (copy/paste starters)
* **RLS scaffold (Authority)**:
```sql
alter table audit_log enable row level security;
create policy p_audit_read_self
on audit_log for select
using (actor_id = current_setting('app.user_id')::uuid or
exists (select 1 from grants g where g.user_id = current_setting('app.user_id')::uuid and g.role = 'auditor'));
```
* **JSONB facts + relational decisions**:
```sql
create table vuln_fact (
id uuid primary key default gen_random_uuid(),
source text not null,
payload jsonb not null,
received_at timestamptz default now()
);
create table vex_decision (
package_id uuid not null,
vuln_id text not null,
status text check (status in ('not_affected','affected','fixed','under_investigation')),
rationale text,
proof_ref text,
decided_at timestamptz default now(),
primary key (package_id, vuln_id)
);
```
* **Materialized view for triage**:
```sql
create materialized view mv_triage_hotset as
select v.id as fact_id, v.payload->>'vuln' as vuln, v.received_at
from vuln_fact v
where (now() - v.received_at) < interval '7 days';
-- refresh concurrently via job
```
* **Temporal pattern (Unknowns)**:
```sql
create table unknowns (
id uuid primary key default gen_random_uuid(),
subject_hash text not null,
kind text not null,
context jsonb not null,
valid_from timestamptz not null default now(),
valid_to timestamptz,
sys_from timestamptz not null default now(),
sys_to timestamptz
);
create view unknowns_current as
select * from unknowns where valid_to is null;
```
---
# Conversion (not migration): zerodowntime, prototypefriendly
Even if youre “not migrating anything yet,” set these rails now so cutting over later is painless.
1. **Encode Mongoshaped docs into JSONB with versioned schemas**
* Ingest pipeline writes to `*_fact(payload jsonb, schema_version int)`.
* Add a **`validate(schema_version, payload)`** step in your service layer (JSON Schema or SQL checks).
* Keep a **forwardcompatible view** that projects stable columns from JSONB (e.g., `payload->>'id' as vendor_id`) so downstream code doesnt break when payload evolves.
2. **Outbox pattern for exactlyonce sideeffects**
* Add `outbox(id, topic, key, payload jsonb, created_at, dispatched bool default false)`.
* On the same transaction as your write, insert the outbox row.
* A background dispatcher reads `dispatched=false`, publishes to MQ/Webhook, then marks `dispatched=true`.
* Guarantees: no lost events, no duplicates to external systems.
3. **Parallel read adapters behind feature flags**
* Keep old readers (e.g., Mongo driver) and new Postgres readers in the same service.
* Gate by `feature_flag('pg_reads')` per tenant or env; flip gradually.
* Add a **readdiff monitor** that compares results and logs mismatches to `audit_log(diff)`.
4. **CDC for analytics without coupling**
* Enable **logical replication** (pgoutput) on your key tables.
* Stream changes into analyzers (reachability, heuristics) without hitting primaries.
* This lets you keep OLTP clean and still power dashboards/tests.
5. **Materialized views & job cadence**
* Refresh `mv_*` on a fixed cadence (e.g., every 25 minutes) or postcommit for hot paths.
* Keep **“cold path”** analytics in separate schemas (`analytics.*`) sourced from CDC.
6. **Cutover playbook (phased)**
* Phase A (Dark Read): write Postgres, still serve from Mongo; compare results silently.
* Phase B (Shadow Serve): 510% traffic from Postgres via flag; autorollback switch.
* Phase C (Authoritative): Postgres becomes source; Mongo path left for emergency readonly.
* Phase D (Retire): freeze Mongo, back up, remove writes, delete code paths after 2 stable sprints.
---
# Ratelimits & flags: single truth, fast edges
* **Truth in Postgres** with versioned flag docs:
```sql
create table feature_flag (
key text primary key,
rules jsonb not null,
version int not null default 1,
updated_at timestamptz default now()
);
```
* **Edge cache** in Redis:
* `SETEX flag:{key}:{version} <ttl> <json>`
* On update, bump `version`; readers compose cache key with version (cachebusting without deletes).
* **Rate limiting**: Persist quotas in Postgres; counters in Redis (`INCR rl:{bucket}:{window}`), with periodic reconciliation jobs writing summaries back to Postgres for audits.
---
# CAS for SBOM/VEX/attestations
* Push blobs to OCI/MinIO by digest; store only pointers in Postgres:
```sql
create table artifact_index (
digest text primary key,
media_type text not null,
size bigint not null,
created_at timestamptz default now(),
signature_refs jsonb
);
```
* Benefits: immutable, deduped, easy to mirror into offline kits.
---
# Guardrails your team should follow
* **Always** wrap multitable writes (facts + outbox + decisions) in a single transaction.
* **Prefer** `jsonb_path_query` for targeted reads; **avoid** scanning entire payloads.
* **Enforce** RLS + leastprivilege roles; application sets `app.user_id` at session start.
* **Version everything**: schemas, flags, materialized views; never “change in place” without bumping version.
* **Observability**: expose `pg_stat_statements`, refresh latency for `mv_*`, outbox lag, Redis hit ratio, and RLS policy hits.
---
If you want, I can turn this into:
* readytorun **EF Core 10** migrations,
* a **/docs/architecture/store-map.md** for your repo,
* and a tiny **dev seed** (Docker + sample data) so the team can poke it immediately.
Heres a focused “PostgreSQL patterns per module” doc you can hand straight to your StellaOps devs.
---
# StellaOps PostgreSQL Patterns per Module
**Scope:** How each StellaOps module should use PostgreSQL: schema patterns, constraints, RLS, indexing, and transaction rules.
---
## 0. Crosscutting PostgreSQL Rules
These apply everywhere unless explicitly overridden.
### 0.1 Core conventions
* **Schemas**
* Use **one logical schema** per module: `authority`, `routing`, `vex`, `unknowns`, `artifact`.
* Shared utilities (e.g., `outbox`) live in a `core` schema.
* **Naming**
* Tables: `snake_case`, singular: `user`, `feature_flag`, `vuln_fact`.
* PK: `id uuid primary key`.
* FKs: `<referenced_table>_id` (e.g., `user_id`, `tenant_id`).
* Timestamps:
* `created_at timestamptz not null default now()`
* `updated_at timestamptz not null default now()`
* **Multitenancy**
* All tenantscoped tables must have `tenant_id uuid not null`.
* Enforce tenant isolation with **RLS** on `tenant_id`.
* **Time & timezones**
* Always `timestamptz`, always store **UTC**, let the DB default `now()`.
### 0.2 RLS & security
* RLS must be **enabled** on any table reachable from a userinitiated path.
* Every session must set:
```sql
select set_config('app.user_id', '<uuid>', false);
select set_config('app.tenant_id', '<uuid>', false);
select set_config('app.roles', 'role1,role2', false);
```
* RLS policies:
* Base policy: `tenant_id = current_setting('app.tenant_id')::uuid`.
* Extra predicates for peruser privacy (e.g., only see own tokens, only own API clients).
* DB users:
* Each modules service has its **own role** with access only to its schema + `core.outbox`.
### 0.3 JSONB & versioning
* Any JSONB column must have:
* `payload jsonb not null`,
* `schema_version int not null`.
* Always index:
* by source (`source` / `origin`),
* by a small set of projected fields used in WHERE clauses.
### 0.4 Migrations
* All schema changes via migrations, forwardonly.
* Backwardscompat pattern:
1. Add new columns / tables.
2. Backfill.
3. Flip code to use new structure (behind a feature flag).
4. After stability, remove old columns/paths.
---
## 1. Authority Module (auth, accounts, audit)
**Schema:** `authority.*`
**Mission:** identity, OAuth, roles, grants, audit.
### 1.1 Core tables & patterns
* `authority.user`
```sql
create table authority.user (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
email text not null,
display_name text not null,
is_disabled boolean not null default false,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (tenant_id, email)
);
```
* Never harddelete users: use `is_disabled` (and optionally `disabled_at`).
* `authority.role`
```sql
create table authority.role (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
name text not null,
description text,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (tenant_id, name)
);
```
* `authority.grant`
```sql
create table authority.grant (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
user_id uuid not null references authority.user(id),
role_id uuid not null references authority.role(id),
created_at timestamptz not null default now(),
unique (tenant_id, user_id, role_id)
);
```
* `authority.oauth_client`, `authority.oauth_token`
* Enforce token uniqueness:
```sql
create table authority.oauth_token (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
user_id uuid not null references authority.user(id),
client_id uuid not null references authority.oauth_client(id),
token_hash text not null, -- hash, never raw
expires_at timestamptz not null,
created_at timestamptz not null default now(),
revoked_at timestamptz,
unique (token_hash)
);
```
### 1.2 Audit log pattern
* `authority.audit_log`
```sql
create table authority.audit_log (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
actor_id uuid, -- null for system
action text not null,
entity_type text not null,
entity_id uuid,
at timestamptz not null default now(),
diff jsonb not null
);
```
* Insert audit rows in the **same transaction** as the change.
### 1.3 RLS patterns
* Base RLS:
```sql
alter table authority.user enable row level security;
create policy p_user_tenant on authority.user
for all using (tenant_id = current_setting('app.tenant_id')::uuid);
```
* Extra policies:
* Audit log is visible only to:
* actor themself, or
* users with an `auditor` or `admin` role.
---
## 2. Routing & Feature Flags Module
**Schema:** `routing.*`
**Mission:** where instances live, what features are on, ratelimit configuration.
### 2.1 Feature flags
* `routing.feature_flag`
```sql
create table routing.feature_flag (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
key text not null,
rules jsonb not null,
version int not null default 1,
is_enabled boolean not null default true,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (tenant_id, key)
);
```
* **Immutability by version**:
* On update, **increment `version`**, dont overwrite historical data.
* Mirror changes into a history table via trigger:
```sql
create table routing.feature_flag_history (
id uuid primary key default gen_random_uuid(),
feature_flag_id uuid not null references routing.feature_flag(id),
tenant_id uuid not null,
key text not null,
rules jsonb not null,
version int not null,
changed_at timestamptz not null default now(),
changed_by uuid
);
```
### 2.2 Instance registry
* `routing.instance`
```sql
create table routing.instance (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
instance_key text not null,
domain text not null,
last_heartbeat timestamptz not null default now(),
status text not null check (status in ('active','draining','offline')),
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (tenant_id, instance_key),
unique (tenant_id, domain)
);
```
* Pattern:
* Heartbeats use `update ... set last_heartbeat = now()` without touching other fields.
* Routing logic filters by `status='active'` and heartbeat recency.
### 2.3 Ratelimit configuration
* Config in Postgres, counters in Redis:
```sql
create table routing.rate_limit_config (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
key text not null,
limit_per_interval int not null,
interval_seconds int not null,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (tenant_id, key)
);
```
---
## 3. VEX & Vulnerability Module
**Schema:** `vex.*`
**Mission:** ingest vulnerability facts, keep decisions & triage state.
### 3.1 Facts as JSONB
* `vex.vuln_fact`
```sql
create table vex.vuln_fact (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
source text not null, -- e.g. "nvd", "vendor_x_vex"
external_id text, -- e.g. CVE, advisory id
payload jsonb not null,
schema_version int not null,
received_at timestamptz not null default now()
);
```
* Index patterns:
```sql
create index on vex.vuln_fact (tenant_id, source);
create index on vex.vuln_fact (tenant_id, external_id);
create index vuln_fact_payload_gin on vex.vuln_fact using gin (payload);
```
### 3.2 Decisions as relational data
* `vex.package`
```sql
create table vex.package (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
name text not null,
version text not null,
ecosystem text not null, -- e.g. "pypi", "npm"
created_at timestamptz not null default now(),
unique (tenant_id, name, version, ecosystem)
);
```
* `vex.vex_decision`
```sql
create table vex.vex_decision (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
package_id uuid not null references vex.package(id),
vuln_id text not null,
status text not null check (status in (
'not_affected', 'affected', 'fixed', 'under_investigation'
)),
rationale text,
proof_ref text, -- CAS digest or URL
decided_by uuid,
decided_at timestamptz not null default now(),
created_at timestamptz not null default now(),
updated_at timestamptz not null default now(),
unique (tenant_id, package_id, vuln_id)
);
```
* For history:
* Keep current state in `vex_decision`.
* Mirror previous versions into `vex_decision_history` table (similar to feature flags).
### 3.3 Triage queues with materialized views
* Example triage view:
```sql
create materialized view vex.mv_triage_queue as
select
d.tenant_id,
p.name,
p.version,
d.vuln_id,
d.status,
d.decided_at
from vex.vex_decision d
join vex.package p on p.id = d.package_id
where d.status = 'under_investigation';
```
* Refresh options:
* Scheduled refresh (cron/worker).
* Or **incremental** via triggers (more complex; use only when needed).
### 3.4 RLS for VEX
* All tables scoped by `tenant_id`.
* Typical policy:
```sql
alter table vex.vex_decision enable row level security;
create policy p_vex_tenant on vex.vex_decision
for all using (tenant_id = current_setting('app.tenant_id')::uuid);
```
---
## 4. Unknowns Module
**Schema:** `unknowns.*`
**Mission:** represent uncertainty and how it changes over time.
### 4.1 Bitemporal unknowns table
* `unknowns.unknown`
```sql
create table unknowns.unknown (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
subject_hash text not null, -- stable identifier for "thing" being reasoned about
kind text not null, -- e.g. "reachability", "version_inferred"
context jsonb not null, -- extra info: call graph node, evidence, etc.
valid_from timestamptz not null default now(),
valid_to timestamptz,
sys_from timestamptz not null default now(),
sys_to timestamptz,
created_at timestamptz not null default now()
);
```
* “Exactly one open unknown per subject/kind” pattern:
```sql
create unique index unknown_one_open_per_subject
on unknowns.unknown (tenant_id, subject_hash, kind)
where valid_to is null;
```
### 4.2 Closing an unknown
* Close by setting `valid_to` and `sys_to`:
```sql
update unknowns.unknown
set valid_to = now(), sys_to = now()
where id = :id and valid_to is null;
```
* Never hard-delete; keep all rows for audit/explanation.
### 4.3 Convenience views
* Current unknowns:
```sql
create view unknowns.current as
select *
from unknowns.unknown
where valid_to is null;
```
### 4.4 RLS
* Same tenant policy as other modules; unknowns are tenantscoped.
---
## 5. Artifact Index / CAS Module
**Schema:** `artifact.*`
**Mission:** index of immutable blobs stored in OCI / S3 / MinIO etc.
### 5.1 Artifact index
* `artifact.artifact`
```sql
create table artifact.artifact (
digest text primary key, -- e.g. "sha256:..."
tenant_id uuid not null,
media_type text not null,
size_bytes bigint not null,
created_at timestamptz not null default now(),
created_by uuid
);
```
* Validate digest shape with a CHECK:
```sql
alter table artifact.artifact
add constraint chk_digest_format
check (digest ~ '^sha[0-9]+:[0-9a-fA-F]{32,}$');
```
### 5.2 Signatures and tags
* `artifact.signature`
```sql
create table artifact.signature (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
artifact_digest text not null references artifact.artifact(digest),
signer text not null,
signature_payload jsonb not null,
created_at timestamptz not null default now()
);
```
* `artifact.tag`
```sql
create table artifact.tag (
id uuid primary key default gen_random_uuid(),
tenant_id uuid not null,
name text not null,
artifact_digest text not null references artifact.artifact(digest),
created_at timestamptz not null default now(),
unique (tenant_id, name)
);
```
### 5.3 RLS
* Ensure that tenants cannot see each others digests, even if the CAS backing store is shared:
```sql
alter table artifact.artifact enable row level security;
create policy p_artifact_tenant on artifact.artifact
for all using (tenant_id = current_setting('app.tenant_id')::uuid);
```
---
## 6. Shared Outbox / Event Pattern
**Schema:** `core.*`
**Mission:** reliable events for external sideeffects.
### 6.1 Outbox table
* `core.outbox`
```sql
create table core.outbox (
id uuid primary key default gen_random_uuid(),
tenant_id uuid,
aggregate_type text not null, -- e.g. "vex_decision", "feature_flag"
aggregate_id uuid,
topic text not null,
payload jsonb not null,
created_at timestamptz not null default now(),
dispatched_at timestamptz,
dispatch_attempts int not null default 0,
error text
);
```
### 6.2 Usage rule
* For anything that must emit an event (webhook, Kafka, notifications):
* In the **same transaction** as the change:
* write primary data (e.g. `vex.vex_decision`),
* insert an `outbox` row.
* A background worker:
* pulls undelivered rows,
* sends to external system,
* updates `dispatched_at`/`dispatch_attempts`/`error`.
---
## 7. Indexing & Query Patterns per Module
### 7.1 Authority
* Index:
* `user(tenant_id, email)`
* `grant(tenant_id, user_id)`
* `oauth_token(token_hash)`
* Typical query patterns:
* Look up user by `tenant_id + email`.
* All roles/grants for a user; design composite indexes accordingly.
### 7.2 Routing & Flags
* Index:
* `feature_flag(tenant_id, key)`
* partial index on enabled flags:
```sql
create index on routing.feature_flag (tenant_id, key)
where is_enabled;
```
* `instance(tenant_id, status)`, `instance(tenant_id, domain)`.
### 7.3 VEX
* Index:
* `package(tenant_id, name, version, ecosystem)`
* `vex_decision(tenant_id, package_id, vuln_id)`
* GIN on `vuln_fact.payload` for flexible querying.
### 7.4 Unknowns
* Index:
* unique open unknown per subject/kind (shown above).
* `unknown(tenant_id, kind)` for filtering by kind.
### 7.5 Artifact
* Index:
* PK on `digest`.
* `signature(tenant_id, artifact_digest)`.
* `tag(tenant_id, name)`.
---
## 8. Transaction & Isolation Guidelines
* Default isolation: **READ COMMITTED**.
* For critical sequences (e.g., provisioning a tenant, bulk role updates):
* consider **REPEATABLE READ** or **SERIALIZABLE** and keep transactions short.
* Pattern:
* One transaction per logical user action (e.g., “set flag”, “record decision”).
* Never do longrunning external calls inside a database transaction.
---
If youd like, next step I can turn this into:
* concrete `CREATE SCHEMA` + `CREATE TABLE` migration files, and
* a short “How to write queries in each module” cheatsheet for devs (with example SELECT/INSERT/UPDATE patterns).

View File

@@ -0,0 +1,585 @@
Heres a tight, practical pattern to make your scanners vulnDB updates rocksolid even when feeds hiccup:
# Offline, verifiable update bundles (DSSE + Rekor v2)
**Idea:** distribute DB updates as offline tarballs. Each tarball ships with:
* a **DSSEsigned** statement (e.g., intoto style) over the bundle hash
* a **Rekor v2 receipt** proving the signature/statement was logged
* a small **manifest.json** (version, created_at, content hashes)
**Startup flow (happy path):**
1. Load latest tarball from your local `updates/` cache.
2. Verify DSSE signature against your trusted public keys.
3. Verify Rekor v2 receipt (inclusion proof) matches the DSSE payload hash.
4. If both pass, unpack/activate; record the bundles **trust_id** (e.g., statement digest).
5. If anything fails, **keep using the last good bundle**. No service disruption.
**Why this helps**
* **Airgap friendly:** no live network needed at activation time.
* **Tamperevident:** DSSE + Rekor receipt proves provenance and transparency.
* **Operational stability:** feed outages become nonevents—scanner just keeps the last good state.
---
## File layout inside each bundle
```
/bundle-2025-11-29/
manifest.json # { version, created_at, entries[], sha256s }
payload.tar.zst # the actual DB/indices
payload.tar.zst.sha256
statement.dsse.json # DSSE-wrapped statement over payload hash
rekor-receipt.json # Rekor v2 inclusion/verification material
```
---
## Acceptance/Activation rules
* **Trust root:** pin one (or more) publisher public keys; rotate via separate, outofband process.
* **Monotonicity:** only activate if `manifest.version > current.version` (or if trust policy explicitly allows replay for rollback testing).
* **Atomic switch:** unpack to `db/staging/`, validate, then symlinkflip to `db/active/`.
* **Quarantine on failure:** move bad bundles to `updates/quarantine/` with a reason code.
---
## Minimal .NET 10 verifier sketch (C#)
```csharp
public sealed record BundlePaths(string Dir) {
public string Manifest => Path.Combine(Dir, "manifest.json");
public string Payload => Path.Combine(Dir, "payload.tar.zst");
public string Dsse => Path.Combine(Dir, "statement.dsse.json");
public string Receipt => Path.Combine(Dir, "rekor-receipt.json");
}
public async Task<bool> ActivateBundleAsync(BundlePaths b, TrustConfig trust, string activeDir) {
var manifest = await Manifest.LoadAsync(b.Manifest);
if (!await Hashes.VerifyAsync(b.Payload, manifest.PayloadSha256)) return false;
// 1) DSSE verify (publisher keys pinned in trust)
var (okSig, dssePayloadDigest) = await Dsse.VerifyAsync(b.Dsse, trust.PublisherKeys);
if (!okSig || dssePayloadDigest != manifest.PayloadSha256) return false;
// 2) Rekor v2 receipt verify (inclusion + statement digest == dssePayloadDigest)
if (!await RekorV2.VerifyReceiptAsync(b.Receipt, dssePayloadDigest, trust.RekorPub)) return false;
// 3) Stage, validate, then atomically flip
var staging = Path.Combine(activeDir, "..", "staging");
DirUtil.Empty(staging);
await TarZstd.ExtractAsync(b.Payload, staging);
if (!await LocalDbSelfCheck.RunAsync(staging)) return false;
SymlinkUtil.AtomicSwap(source: staging, target: activeDir);
State.WriteLastGood(manifest.Version, dssePayloadDigest);
return true;
}
```
---
## Operational playbook
* **On boot & daily at HH:MM:** try `ActivateBundleAsync()` on the newest bundle; on failure, log and continue.
* **Telemetry (no PII):** reason codes (SIG_FAIL, RECEIPT_FAIL, HASH_MISMATCH, SELFTEST_FAIL), versions, last_good.
* **Keys & rotation:** keep `publisher.pub` and `rekor.pub` in a rootowned, readonly path; rotate via a separate signed “trust bundle”.
* **Defenseindepth:** verify both the **payload hash** and each files hash listed in `manifest.entries[]`.
* **Rollback:** allow `--force-activate <bundle>` for emergency testing, but mark as **nonmonotonic** in state.
---
## What to hand your release team
* A Make/CI target that:
1. Builds `payload.tar.zst` and computes hashes
2. Generates `manifest.json`
3. Creates and signs the **DSSE statement**
4. Submits to Rekor (or your mirror) and saves the **v2 receipt**
5. Packages the bundle folder and publishes to your offline repo
* A checksum file (`*.sha256sum`) for ops to verify outofband.
---
If you want, I can turn this into a StellaOps spec page (`docs/modules/scanner/offline-bundles.md`) plus a small reference implementation (C# library + CLI) that drops right into your Scanner service.
Heres a “dropin” Stella Ops dev guide for **DSSEsigned Offline Scanner Updates** — written in the same spirit as the existing docs and sprint files.
You can treat this as the seed for `docs/modules/scanner/development/dsse-offline-updates.md` (or similar).
---
# DSSESigned Offline Scanner Updates — Developer Guidelines
> **Audience**
> Scanner, Export Center, Attestor, CLI, and DevOps engineers implementing DSSEsigned offline vulnerability updates and integrating them into the Offline Update Kit (OUK).
>
> **Context**
>
> * OUK already ships **signed, atomic offline update bundles** with merged vulnerability feeds, container images, and an attested manifest.([git.stella-ops.org][1])
> * DSSE + Rekor is already used for **scan evidence** (SBOM attestations, Rekor proofs).([git.stella-ops.org][2])
> * Sprints 160/162 add **attestation bundles** with manifest, checksums, DSSE signature, and optional transparency log segments, and integrate them into OUK and CLI flows.([git.stella-ops.org][3])
These guidelines tell you how to **wire all of that together** for “offline scanner updates” (feeds, rules, packs) in a way that matches Stella Ops determinism + sovereignty promises.
---
## 0. Mental model
At a high level, youre building this:
```text
Advisory mirrors / Feeds builders
ExportCenter.AttestationBundles
(creates DSSE + Rekor evidence
for each offline update snapshot)
Offline Update Kit (OUK) builder
(adds feeds + evidence to kit tarball)
stella offline kit import / admin CLI
(verifies Cosign + DSSE + Rekor segments,
then atomically swaps scanner feeds)
```
Online, Rekor is live; offline, you rely on **bundled Rekor segments / snapshots** and the existing OUK mechanics (import is atomic, old feeds kept until new bundle is fully verified).([git.stella-ops.org][1])
---
## 1. Goals & nongoals
### Goals
1. **Authentic offline snapshots**
Every offline scanner update (OUK or delta) must be verifiably tied to:
* a DSSE envelope,
* a certificate chain rooted in Stellas Fulcio/KMS profile or BYO KMS/HSM,
* *and* a Rekor v2 inclusion proof or bundled log segment.([Stella Ops][4])
2. **Deterministic replay**
Given:
* a specific offline update kit (`stella-ops-offline-kit-<DATE>.tgz` + `offline-manifest-<DATE>.json`)([git.stella-ops.org][1])
* its DSSE attestation bundle + Rekor segments
every verifier must reach the *same* verdict on integrity and contents — online or fully airgapped.
3. **Separation of concerns**
* Export Center: build attestation bundles, no business logic about scanning.([git.stella-ops.org][5])
* Scanner: import & apply feeds; verify but not generate DSSE.
* Signer / Attestor: own DSSE & Rekor integration.([git.stella-ops.org][2])
4. **Operational safety**
* Imports remain **atomic and idempotent**.
* Old feeds stay live until the new update is **fully verified** (Cosign + DSSE + Rekor).([git.stella-ops.org][1])
### Nongoals
* Designing new crypto or log formats.
* Perfeed DSSE envelopes (you can have more later, but the minimum contract is **bundlelevel** attestation).
---
## 2. Bundle contract for DSSEsigned offline updates
Youre extending the existing OUK contract:
* OUK already packs:
* merged vuln feeds (OSV, GHSA, optional NVD 2.0, CNNVD/CNVD, ENISA, JVN, BDU),
* container images (`stella-ops`, Zastava, etc.),
* provenance (Cosign signature, SPDX SBOM, intoto SLSA attestation),
* `offline-manifest.json` + detached JWS signed during export.([git.stella-ops.org][1])
For **DSSEsigned offline scanner updates**, add a new logical layer:
### 2.1. Files to ship
Inside each offline kit (full or delta) you must produce:
```text
/attestations/
offline-update.dsse.json # DSSE envelope
offline-update.rekor.json # Rekor entry + inclusion proof (or segment descriptor)
/manifest/
offline-manifest.json # existing manifest
offline-manifest.json.jws # existing detached JWS
/feeds/
... # existing feed payloads
```
The exact paths can be adjusted, but keep:
* **One DSSE bundle per kit** (min spec).
* **One canonical Rekor proof file** per DSSE envelope.
### 2.2. DSSE payload contents (minimal)
Define (or reuse) a predicate type such as:
```jsonc
{
"payloadType": "application/vnd.in-toto+json",
"payload": { /* base64 */ }
}
```
Decoded payload (in-toto statement) should **at minimum** contain:
* **Subject**
* `name`: `stella-ops-offline-kit-<DATE>.tgz`
* `digest.sha256`: tarball digest
* **Predicate type** (recommendation)
* `https://stella-ops.org/attestations/offline-update/1`
* **Predicate fields**
* `offline_manifest_sha256` SHA256 of `offline-manifest.json`
* `feeds` array of feed entries such as `{ name, snapshot_date, archive_digest }` (mirrors `rules_and_feeds` style used in the moat doc).([Stella Ops][6])
* `builder` CI workflow id / git commit / Export Center job id
* `created_at` UTC ISO8601
* `oukit_channel` e.g., `edge`, `stable`, `fips-profile`
**Guideline:** this DSSE payload is the **single canonical description** of “what this offline update snapshot is”.
### 2.3. Rekor material
Attestor must:
* Submit `offline-update.dsse.json` to Rekor v2, obtaining:
* `uuid`
* `logIndex`
* inclusion proof (`rootHash`, `hashes`, `checkpoint`)
* Serialize that to `offline-update.rekor.json` and store it in object storage + OUK staging, so it ships in the kit.([git.stella-ops.org][2])
For fully offline operation:
* Either:
* embed a **minimal log segment** containing that entry; or
* rely on daily Rekor snapshot exports included elsewhere in the kit.([git.stella-ops.org][2])
---
## 3. Implementation by module
### 3.1 Export Center — attestation bundles
**Working directory:** `src/ExportCenter/StellaOps.ExportCenter.AttestationBundles`([git.stella-ops.org][7])
**Responsibilities**
1. **Compose attestation bundle job** (EXPORTATTEST74001)
* Input: a snapshot identifier (e.g., offline kit build id or feed snapshot date).
* Read manifest and feed metadata from the Export Centers storage.([git.stella-ops.org][5])
* Generate the DSSE payload structure described above.
* Call `StellaOps.Signer` to wrap it in a DSSE envelope.
* Call `StellaOps.Attestor` to submit DSSE → Rekor and get the inclusion proof.([git.stella-ops.org][2])
* Persist:
* `offline-update.dsse.json`
* `offline-update.rekor.json`
* any log segment artifacts.
2. **Integrate into offline kit packaging** (EXPORTATTEST74002 / 75001)
* The OUK builder (Python script `ops/offline-kit/build_offline_kit.py`) already assembles artifacts & manifests.([Stella Ops][8])
* Extend that pipeline (or add an Export Center step) to:
* fetch the attestation bundle for the snapshot,
* place it under `/attestations/` in the kit staging dir,
* ensure `offline-manifest.json` contains entries for the DSSE and Rekor files (name, sha256, size, capturedAt).([git.stella-ops.org][1])
3. **Contracts & schemas**
* Define a small JSON schema for `offline-update.rekor.json` (UUID, index, proof fields) and check it into `docs/11_DATA_SCHEMAS.md` or modulelocal schemas.
* Keep all new payload schemas **versioned**; avoid “shape drift”.
**Do / Dont**
***Do** treat attestation bundle job as *pure aggregation* (AOC guardrail: no modification of evidence).([git.stella-ops.org][5])
***Do** rely on Signer + Attestor; dont handroll DSSE/Rekor logic in Export Center.([git.stella-ops.org][2])
***Dont** reach out to external networks from this job — it must run with the same offlineready posture as the rest of the platform.
---
### 3.2 Offline Update Kit builder
**Working area:** `ops/offline-kit/*` + `docs/24_OFFLINE_KIT.md`([git.stella-ops.org][1])
Guidelines:
1. **Preserve current guarantees**
* Imports must remain **idempotent and atomic**, with **old feeds kept until the new bundle is fully verified**. This now includes DSSE/Rekor checks in addition to Cosign + JWS.([git.stella-ops.org][1])
2. **Staging layout**
* When staging a kit, ensure the tree looks like:
```text
out/offline-kit/staging/
feeds/...
images/...
manifest/offline-manifest.json
attestations/offline-update.dsse.json
attestations/offline-update.rekor.json
```
* Update `offline-manifest.json` so each new file appears with:
* `name`, `sha256`, `size`, `capturedAt`.([git.stella-ops.org][1])
3. **Deterministic ordering**
* File lists in manifests must be in a stable order (e.g., lexical paths).
* Timestamps = UTC ISO8601 only; never use local time. (Matches determinism guidance in AGENTS.md + policy/runs docs.)([git.stella-ops.org][9])
4. **Delta kits**
* For deltas (`stella-ouk-YYYY-MM-DD.delta.tgz`), DSSE should still cover:
* the delta tarball digest,
* the **logical state** (feeds & versions) after applying the delta.
* Dont shortcut by “attesting only the diff files” — the predicate must describe the resulting snapshot.
---
### 3.3 Scanner — import & activation
**Working directory:** `src/Scanner/StellaOps.Scanner.WebService`, `StellaOps.Scanner.Worker`([git.stella-ops.org][9])
Scanner already exposes admin flows for:
* **Offline kit import**, which:
* validates the Cosign signature of the kit,
* uses the attested manifest,
* keeps old feeds until verification is done.([git.stella-ops.org][1])
Add DSSE/Rekor awareness as follows:
1. **Verification sequence (happy path)**
On `import-offline-usage-kit`:
1. Validate **Cosign** signature of the tarball.
2. Validate `offline-manifest.json` with its JWS signature.
3. Verify **file digests** for all entries (including `/attestations/*`).
4. Verify **DSSE**:
* Call `StellaOps.Attestor.Verify` (or CLI equivalent) with:
* `offline-update.dsse.json`
* `offline-update.rekor.json`
* local Rekor log snapshot / segment (if configured)([git.stella-ops.org][2])
* Ensure the payload digest matches the kit tarball + manifest digests.
5. Only after all checks pass:
* swap Scanners feed pointer to the new snapshot,
* emit an audit event noting:
* kit filename, tarball digest,
* DSSE statement digest,
* Rekor UUID + log index.
2. **Config surface**
Add config keys (names illustrative):
```yaml
scanner:
offlineKit:
requireDsse: true # fail import if DSSE/Rekor verification fails
rekorOfflineMode: true # use local snapshots only
attestationVerifier: https://attestor.internal
```
* Mirror them via ASP.NET Core config + env vars (`SCANNER__OFFLINEKIT__REQUIREDSSSE`, etc.), following the same pattern as the DSSE/Rekor operator guide.([git.stella-ops.org][2])
3. **Failure behaviour**
* **DSSE/Rekor fail, Cosign + manifest OK**
* Keep old feeds active.
* Mark import as failed; surface a `ProblemDetails` error via API/UI.
* Log structured fields: `rekorUuid`, `attestationDigest`, `offlineKitHash`, `failureReason`.([git.stella-ops.org][2])
* **Config flag to soften during rollout**
* When `requireDsse=false`, treat DSSE/Rekor failure as a warning and still allow the import (for initial observation phase), but emit alerts. This mirrors the “observe → enforce” pattern in the DSSE/Rekor operator guide.([git.stella-ops.org][2])
---
### 3.4 Signer & Attestor
You mostly **reuse** existing guidance:([git.stella-ops.org][2])
* Add a new predicate type & schema for offline updates in Signer.
* Ensure Attestor:
* can submit offlineupdate DSSE envelopes to Rekor,
* can emit verification routines (used by CLI and Scanner) that:
* verify the DSSE signature,
* check the certificate chain against the configured root pack (FIPS/eIDAS/GOST/SM, etc.),([Stella Ops][4])
* verify Rekor inclusion using either live log or local snapshot.
* For fully airgapped installs:
* rely on Rekor **snapshots mirrored** into Offline Kit (already recommended in the operator guides offline section).([git.stella-ops.org][2])
---
### 3.5 CLI & UI
Extend CLI with explicit verbs (matching EXPORTATTEST sprints):([git.stella-ops.org][10])
* `stella attest bundle verify --bundle path/to/offline-kit.tgz --rekor-key rekor.pub`
* `stella attest bundle import --bundle ...` (for sites that prefer a twostep “verify then import” flow)
* Wire UI Admin → Offline Kit screen so that:
* verification status shows both **Cosign/JWS** and **DSSE/Rekor** state,
* policy banners display kit generation time, manifest hash, and DSSE/Rekor freshness.([Stella Ops][11])
---
## 4. Determinism & offlinesafety rules
When touching any of this code, keep these rules frontofmind (they align with the policy DSL and architecture docs):([Stella Ops][4])
1. **No hidden network dependencies**
* All verification **must work offline** given the kit + Rekor snapshots.
* Any fallback to live Rekor / Fulcio endpoints must be explicitly toggled and never on by default for “offline mode”.
2. **Stable serialization**
* DSSE payload JSON:
* stable ordering of fields,
* no float weirdness,
* UTC timestamps.
3. **Replayable imports**
* Running `import-offline-usage-kit` twice with the same bundle must be a noop after the first time.
* The DSSE payload for a given snapshot must not change over time; if it does, bump the predicate or snapshot version.
4. **Explainability**
* When verification fails, errors must explain **what** mismatched (kit digest, manifest digest, DSSE envelope hash, Rekor inclusion) so auditors can reason about it.
---
## 5. Testing & CI expectations
Tie this into the existing CI workflows (`scanner-determinism.yml`, `attestation-bundle.yml`, `offline-kit` pipelines, etc.):([git.stella-ops.org][12])
### 5.1 Unit & integration tests
Write tests that cover:
1. **Happy paths**
* Full kit import with valid:
* Cosign,
* manifest JWS,
* DSSE,
* Rekor proof (online and offline modes).
2. **Corruption scenarios**
* Tampered feed file (hash mismatch).
* Tampered `offline-manifest.json`.
* Tampered DSSE payload (signature fails).
* Mismatched Rekor entry (payload digest doesnt match DSSE).
3. **Offline scenarios**
* No network access, only Rekor snapshot:
* DSSE verification still passes,
* Rekor proof validates against local tree head.
4. **Rollback logic**
* Import fails at DSSE/Rekor step:
* scanner DB still points at previous feeds,
* metrics/logs show failure and no partial state.
### 5.2 SLOs & observability
Reuse metrics suggested by DSSE/Rekor guide and adapt to OUK imports:([git.stella-ops.org][2])
* `offlinekit_import_total{status="success|failed_dsse|failed_rekor|failed_cosign"}`
* `offlinekit_attestation_verify_latency_seconds` (histogram)
* `attestor_rekor_success_total`, `attestor_rekor_retry_total`, `rekor_inclusion_latency`
* Dashboards: kit versions per environment, time since last kit, DSSE/Rekor health.
---
## 6. Developer checklist (TL;DR)
When you pick up a task touching DSSEsigned offline updates:
1. **Read the background**
* `docs/modules/scanner/operations/dsse-rekor-operator-guide.md`([git.stella-ops.org][2])
* `docs/24_OFFLINE_KIT.md` (and public offline kit guide).([git.stella-ops.org][1])
* Relevant sprint file (`SPRINT_160_export_evidence`, `SPRINT_162_exportcenter_i`, etc.).([git.stella-ops.org][10])
2. **Implement**
* Generate DSSE payloads in Export Center only.
* Call Signer & Attestor; persist DSSE + Rekor JSON next to manifests.
* Extend OUK builder to include attestation bundle and list it in `offline-manifest.json`.
* Update Scanner import flow to verify DSSE/Rekor before swapping feeds.
3. **Test**
* Unit tests for bundle composition & schema.
* Integration tests for import + rollback.
* Determinism tests (same inputs → same DSSE payload).
4. **Wire telemetry**
* Counters + latency histograms.
* Logs with `offlineKitHash`, `attestationDigest`, `rekorUuid`.
5. **Document**
* Update `docs/modules/export-center/architecture.md`, `docs/modules/scanner/architecture.md`, and the OUK docs where flows or contracts changed.([git.stella-ops.org][5])
---
If you tell me which module youre actually coding in next (Scanner, Export Center, CLI, or Attestor), I can turn this into a very concrete “AGENTS.mdstyle” section with exact file paths, class names, and a starter test layout for that module.
[1]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/7bf40b8589c94078e8eadb240553c02f097a5127/docs/24_OFFLINE_KIT.md "git.stella-ops.org/24_OFFLINE_KIT.md at 7bf40b8589c94078e8eadb240553c02f097a5127 - git.stella-ops.org - Gitea: Git with a cup of tea"
[2]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/13e4b53dda1575ba46c6188c794fd465ec6fdeec/docs/modules/scanner/operations/dsse-rekor-operator-guide.md "git.stella-ops.org/dsse-rekor-operator-guide.md at 13e4b53dda1575ba46c6188c794fd465ec6fdeec - git.stella-ops.org - Gitea: Git with a cup of tea"
[3]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/raw/commit/61f963fd52cd4d6bb2f86afc5a82eac04c04b00e/docs/implplan/SPRINT_162_exportcenter_i.md?utm_source=chatgpt.com "https://git.stella-ops.org/stella-ops.org/git.stel..."
[4]: https://stella-ops.org/docs/07_high_level_architecture/index.html?utm_source=chatgpt.com "Open • Sovereign • Modular container security - Stella Ops"
[5]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/d870da18ce194c6a5f1a6d71abea36205d9fb276/docs/export-center/architecture.md?utm_source=chatgpt.com "Export Center Architecture - Stella Ops"
[6]: https://stella-ops.org/docs/moat/?utm_source=chatgpt.com "Open • Sovereign • Modular container security - Stella Ops"
[7]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/79b8e53441e92dbc63684f42072434d40b80275f/src/ExportCenter?utm_source=chatgpt.com "Code - Stella Ops"
[8]: https://stella-ops.org/docs/24_offline_kit/?utm_source=chatgpt.com "Offline Update Kit (OUK) — AirGap Bundle - Stella Ops Open ..."
[9]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/7768555f2d107326050cc5ff7f5cb81b82b7ce5f/AGENTS.md "git.stella-ops.org/AGENTS.md at 7768555f2d107326050cc5ff7f5cb81b82b7ce5f - git.stella-ops.org - Gitea: Git with a cup of tea"
[10]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/src/commit/66cb6c4b8af58a33efa1521b7953dda834431497/docs/implplan/SPRINT_160_export_evidence.md?utm_source=chatgpt.com "git.stella-ops.org/SPRINT_160_export_evidence.md at ..."
[11]: https://stella-ops.org/about/?utm_source=chatgpt.com "Signed Reachability · Deterministic Replay · Sovereign Crypto"
[12]: https://git.stella-ops.org/stella-ops.org/git.stella-ops.org/actions/?actor=0&status=0&workflow=sdk-publish.yml&utm_source=chatgpt.com "Actions - git.stella-ops.org - Gitea: Git with a cup of tea"

View File

@@ -0,0 +1,425 @@
Heres a simple metric that will make your security UI (and teams) radically better: **TimetoEvidence (TTE)** — the time from opening a finding to seeing *raw proof* (a dataflow edge, an SBOM line, or a VEX note), not a summary.
---
### What it is
* **Definition:** TTE = `t_first_proof_rendered t_open_finding`.
* **Proof =** the exact artifact or path that justifies the claim (e.g., `package-lock.json: line 214 → openssl@1.1.1`, `reachability: A → B → C sink`, or `VEX: not_affected due to unreachable code`).
* **Target:** **P95 ≤ 15s** (stretch: P99 ≤ 30s). If 95% of findings show proof within 15 seconds, the UI stays honest: evidence before opinion, low noise, fast explainability.
---
### Why it matters
* **Trust:** People accept decisions they can *verify* quickly.
* **Triage speed:** Proof-first UIs cut back-and-forth and guesswork.
* **Noise control:** If you cant surface proof fast, you probably shouldnt surface the finding yet.
---
### How to measure (engineeringready)
* Emit two stamps per finding view:
* `t_open_finding` (on route enter or modal open).
* `t_first_proof_rendered` (first DOM paint of SBOM line / path list / VEX clause).
* Store as `tte_ms` in a lightweight events table (Postgres) with tags: `tenant`, `finding_id`, `proof_kind` (`sbom|reachability|vex`), `source` (`local|remote|cache`).
* Nightly rollup: compute P50/P90/P95/P99 by proof_kind and page.
* Alert when **P95 > 15s** for 15 minutes.
---
### UI contract (keeps the UX honest)
* **Above the fold:** always show a compact **Proof panel** first (not hidden behind tabs).
* **Skeletons over spinners:** reserve space; render partial proof as soon as any piece is ready.
* **Plain text copy affordance:** “Copy SBOM line / path” button right next to the proof.
* **Defer nonproof widgets:** CVSS badges, remediation prose, and charts load *after* proof.
* **Emptystate truth:** if no proof exists, say “No proof available yet” and show the loader for *that* proof type only (dont pretend with summaries).
---
### Backend rules of thumb
* **Preindex for first paint:** cache top N proof items per hot finding (e.g., first SBOM hit + shortest path).
* **Bound queries:** proof queries must be *O(log n)* on indexed columns (pkg name@version, file hash, graph node id).
* **Chunked streaming:** send first proof chunk <200ms after backend hit; dont hold for the full set.
* **Timeout budget:** 12s backend budget + 3s UI/render margin = 15s P95.
---
### Minimal contract to add in your code
```ts
// Frontend: fire on open
metrics.emit('finding_open', { findingId, t: performance.now() });
// When the first real proof node/line hits the DOM:
metrics.emit('proof_rendered', { findingId, proofKind, t: performance.now() });
```
```sql
-- Rollup (hourly)
SELECT
proof_kind,
percentile_cont(0.95) WITHIN GROUP (ORDER BY tte_ms) AS p95_ms
FROM tte_events
WHERE ts >= now() - interval '1 hour'
GROUP BY proof_kind;
```
---
### What to put on the team dashboard
* **TTE P95 by page** (Findings list, Finding details).
* **TTE P95 by proof_kind** (sbom / reachability / vex).
* **Error budget burn**: minutes over target per day.
* **Top regressions**: last 7 days vs prior 7.
---
### Acceptance checklist for any finding view
* [ ] First paint shows a real proof snippet (not a summary).
* [ ] Copy proof button works within 1 click.
* [ ] TTE P95 in staging 10s; in prod 15s.
* [ ] If proof missing, explicit emptystate + retry path.
* [ ] Telemetry sampled 50% of sessions (or 100% for internal).
---
If you want, I can turn this into a **`docs/UX/tte.md`** spec plus a **Grafana panel JSON** and a tiny **.NET middleware** to emit the metricsjust say the word and Ill drop in readytouse snippets.
Perfect, got it lets turn **TimetoEvidence (TTE)** into a proper set of *“how we track UX health”* guidelines.
Ill treat this like something you can drop into an internal engineering/UX doc.
---
## 1. What exactly is TimetoEvidence?
**Definition**
> **TTE = t_first_proof_rendered t_open_finding**
* **t_open_finding** when the user first opens a finding / detail view (e.g., vulnerability, alert, ticket, log event).
* **t_first_proof_rendered** when the UI first paints **actual evidence** that backs the finding, for example:
* The SBOM row showing `package@version`.
* The callgraph/dataflow path to a sink.
* A VEX note explaining why something is (not) affected.
* A raw log snippet that the alert is based on.
**Key principle:**
TTE measures **how long users have to trust you blindly** before they can see proof with their own eyes.
---
## 2. UX health goals & targets
Treat TTE like latency SLOs:
* **Primary SLO**:
* **P95 TTE 15s** for all findings in normal conditions.
* **Stretch SLO**:
* **P99 TTE 30s** for heavy cases (big graphs, huge SBOMs, cold caches).
* **Guardrail**:
* P50 TTE should be **< 3s**. If the median creeps up, youre in trouble even if P95 looks OK.
You can refine by feature:
* Simple proof (single SBOM row, small payload):
* P95 5s.
* Complex proof (reachability graph, crossrepo joins):
* P95 15s.
**UX rule of thumb**
* < 2s: feels instant.
* 210s: acceptable if clearly loading something heavy.
* > 10s: needs **strong** feedback (progress, partial results, explanations).
* > 30s: the system should probably **offer fallback** (e.g., “download raw evidence” or “retry”).
---
## 3. Instrumentation guidelines
### 3.1 Event model
Emit two core events per finding view:
1. **`finding_open`**
* When user opens the finding details (route enter / modal open).
* Must include:
* `finding_id`
* `tenant_id` / `org_id`
* `user_role` (admin, dev, triager, etc.)
* `entry_point` (list, search, notification, deep link)
* `ui_version` / `build_sha`
2. **`proof_rendered`**
* First time *any* qualifying proof element is painted.
* Must include:
* `finding_id`
* `proof_kind` (`sbom | reachability | vex | logs | other`)
* `source` (`local_cache | backend_api | 3rd_party`)
* `proof_height` (e.g., pixel offset from top) to ensure its actually above the fold or very close.
**Derived metric**
Your telemetry pipeline should compute:
```text
tte_ms = proof_rendered.timestamp - finding_open.timestamp
```
If there are multiple `proof_rendered` events for the same `finding_open`, use:
* **TTE (first proof)** minimum timestamp; primary SLO.
* Optionally: **TTE (full evidence)** last proof in a defined “bundle” (e.g., path + SBOM row).
### 3.2 Implementation notes
**Frontend**
* Emit `finding_open` as soon as:
* The route is confirmed and
* You know which `finding_id` is being displayed.
* Emit `proof_rendered`:
* **Not** when you *fetch* data, but when at least one evidence component is **visibly rendered**.
* Easiest approach: hook into component lifecycle / intersection observer on the evidence container.
Pseudoexample:
```ts
// On route/mount:
metrics.emit('finding_open', {
findingId,
entryPoint,
userRole,
uiVersion,
t: performance.now()
});
// In EvidencePanel component, after first render with real data:
if (!hasEmittedProof && hasRealEvidence) {
metrics.emit('proof_rendered', {
findingId,
proofKind: 'sbom',
source: 'backend_api',
t: performance.now()
});
hasEmittedProof = true;
}
```
**Backend**
* No special requirement beyond:
* Stable IDs (`finding_id`).
* Knowing which API endpoints respond with evidence payloads — youll want to correlate backend latency with TTE later.
---
## 4. Data quality & sampling
If you want TTE to drive decisions, the data must be boringly reliable.
**Guidelines**
1. **Sample rate**
* Start with **100%** in staging.
* In production, aim for **≥ 25% of sessions** for TTE events at minimum; 100% is ideal if volume is reasonable.
2. **Clock skew**
* Prefer **frontend timestamps** using `performance.now()` for TTE; theyre monotonic within a tab.
* Dont mix backend clocks into the TTE calculation.
3. **Bot / synthetic traffic**
* Tag synthetic tests (`is_synthetic = true`) and exclude them from UX health dashboards.
4. **Retry behavior**
* If the proof fails to load and user hits “retry”:
* Treat it as a separate measurement (`retry = true`) or
* Log an additional `proof_error` event with error class (timeout, 5xx, network, parse, etc.).
---
## 5. Dashboards: how to watch TTE
You want a small, opinionated set of views that answer:
> “Is UX getting better or worse for people trying to understand findings?”
### 5.1 Core widgets
1. **TTE distribution**
* P50 / P90 / P95 / P99 per day (or per release).
* Split by `proof_kind`.
2. **TTE by page / surface**
* Finding list → detail.
* Deep links from notifications.
* Direct URLs / bookmarks.
3. **TTE by user segment**
* New users vs power users.
* Different roles (security engineer vs application dev).
4. **Error budget panel**
* “Minutes over SLO per day” e.g., sum of all userminutes where TTE > 15s.
* Use this to prioritize work.
5. **Correlation with engagement**
* Scatter: TTE vs session length, or TTE vs “user clicked ignore / snooze”.
* Aim to confirm the obvious: **long TTE → worse engagement/completion**.
### 5.2 Operational details
* Update granularity: **realtime or ≤15 min** for oncall/ops panels.
* Retention: at least **90 days** to see trends across big releases.
* Breakdowns:
* `backend_region` (to catch regional issues).
* `build_version` (to spot regressions quickly).
---
## 6. UX & engineering design rules anchored in TTE
These are the **behavior rules** for the product that keep TTE healthy.
### 6.1 “Evidence first” layout rules
* **Evidence above the fold**
* At least *one* proof element must be visible **without scrolling** on a typical laptop viewport.
* **Summary second**
* CVSS scores, severity badges, long descriptions: all secondary. Evidence should come *before* opinion.
* **No fake proof**
* Dont use placeholders that *look* like evidence but arent (e.g., “example path” or generic text).
* If evidence is still loading, show a clear skeleton/loader with “Loading evidence…”.
### 6.2 Loading strategy rules
* Start fetching evidence **as soon as navigation begins**, not after the page is fully mounted.
* Use **lazy loading** for noncritical widgets until after proof is shown.
* If a call is known to be heavy:
* Consider **precomputing** and caching the top evidence (shortest path, first SBOM hit).
* Stream results: render first proof item as soon as it arrives; dont wait for the full list.
### 6.3 Empty / error state rules
* If there is genuinely no evidence:
* Explicitly say **“No supporting evidence available yet”** and treat TTE as:
* Either “no value” (excluded), or
* A special bucket `proof_kind = "none"`.
* If loading fails:
* Show a clear error and a **retry** that reemits `proof_rendered` when successful.
* Log `proof_error` with reason; track error rate alongside TTE.
---
## 7. How to *use* TTE in practice
### 7.1 For releases
For any change that affects findings UI or evidence plumbing:
* Add a release checklist item:
* “No regression on TTE P95 for [pages X, Y].”
* During rollout:
* Compare **pre vs postrelease** TTE P95 by `ui_version`.
* If regression > 20%:
* Roll back, or
* Add a followup ticket explicitly tagged with the regression.
### 7.2 For experiments / A/B tests
When running UI experiments around findings:
* Always capture TTE per variant.
* Compare:
* TTE P50/P95.
* Task completion rate (e.g., “user changed status”).
* Subjective UX (CSAT) if you have it.
Youre looking for patterns like:
* Variant B: **+5% completion**, **+8% TTE** → maybe OK.
* Variant C: **+2% completion**, **+70% TTE** → probably not acceptable.
### 7.3 For prioritization
Use TTE as a lever in planning:
* If P95 TTE is healthy and stable:
* More room for new features / experiments.
* If P95 TTE is trending up for 2+ weeks:
* Time to schedule a “TTE debt” story: caching, query optimization, UI relayout, etc.
---
## 8. Quick “TTEready” checklist
Youre “tracking UX health with TTE” if you can honestly tick these:
1. **Instrumentation**
* [ ] `finding_open` + `proof_rendered` events exist and are correlated.
* [ ] TTE computed in a stable pipeline (joins, dedupe, etc.).
2. **Targets**
* [ ] TTE SLOs defined (P95, P99) and agreed by UX + engineering.
3. **Dashboards**
* [ ] A dashboard shows TTE by proof kind, page, and release.
* [ ] Oncall / ops can see TTE in near realtime.
4. **UX rules**
* [ ] Evidence is visible above the fold for all main finding types.
* [ ] Noncritical widgets load after evidence.
* [ ] Empty/error states are explicit about evidence availability.
5. **Process**
* [ ] Major UI changes check TTE pre vs post as part of release acceptance.
* [ ] Regressions in TTE create real tickets, not just “well watch it”.
---
If you tell me what stack youre on (e.g., React + Next.js + OpenTelemetry + X observability tool), I can turn this into concrete code snippets and an example dashboard spec (fields, queries, charts) tailored exactly to your setup.

View File

@@ -0,0 +1,576 @@
Heres a tight, practical blueprint to turn your SBOM→VEX links into an auditable “proof spine”—using signed DSSE statements and a perdependency trust anchor—so every VEX verdict can be traced, verified, and replayed.
# What this gives you
* A **chain of evidence** from each SBOM entry → analysis → VEX verdict.
* **Tamperevident** DSSEsigned records (offlinefriendly).
* **Deterministic replay**: same inputs → same verdicts (great for audits/regulators).
# Core objects (canonical IDs)
* **ArtifactID**: digest of package/container (e.g., `sha256:…`).
* **SBOMEntryID**: stable ID for a component in an SBOM (`sbomDigest:package@version[:purl]`).
* **EvidenceID**: hash of raw evidence (scanner JSON, reachability, exploit intel).
* **ReasoningID**: hash of normalized reasoning (rules/lattice inputs used).
* **VEXVerdictID**: hash of the final VEX statement body.
* **ProofBundleID**: merkle root of {SBOMEntryID, EvidenceID[], ReasoningID, VEXVerdictID}.
* **TrustAnchorID**: perdependency anchor (public key + policy) used to validate the above.
# Signed DSSE envelopes youll produce
1. **Evidence Statement** (per evidence item)
* `subject`: SBOMEntryID
* `predicateType`: `evidence.stella/v1`
* `predicate`: source, tool version, timestamps, EvidenceID
* **Signers**: scanner/ingestor key
2. **Reasoning Statement**
* `subject`: SBOMEntryID
* `predicateType`: `reasoning.stella/v1` (your lattice/policy inputs + ReasoningID)
* **Signers**: “Policy/Lattice Engine” key (Authority)
3. **VEX Verdict Statement**
* `subject`: SBOMEntryID
* `predicateType`: CycloneDX or CSAF VEX body + VEXVerdictID
* **Signers**: VEXer key (or vendor key if you have it)
4. **Proof Spine Statement** (the spine itself)
* `subject`: SBOMEntryID
* `predicateType`: `proofspine.stella/v1`
* `predicate`: EvidenceID[], ReasoningID, VEXVerdictID, ProofBundleID
* **Signers**: Authority key
# Trust model (perdependency anchor)
* **TrustAnchor** (per package/purl): { TrustAnchorID, allowed signers (KMS refs, PKs), accepted predicateTypes, policy version, revocation list }.
* Store anchors in **Authority** and pin them in your graph by SBOMEntryID→TrustAnchorID.
* Optional: PQC mode (Dilithium/Falcon) for longterm archives.
# Verification pipeline (deterministic)
1. Resolve SBOMEntryID → TrustAnchorID.
2. Verify every DSSE envelopes signature **against the anchors allowed keys**.
3. Recompute EvidenceID/ReasoningID/VEXVerdictID from raw content; compare hashes.
4. Recompute ProofBundleID (merkle root) and compare to the spine.
5. Emit a **Receipt**: {ProofBundleID, verification log, tool digests}. Cache it.
# Storage layout (Postgres + blob store)
* `sbom_entries(entry_id PK, bom_digest, purl, version, artifact_digest, trust_anchor_id)`
* `dsse_envelopes(env_id PK, entry_id, predicate_type, signer_keyid, body_hash, envelope_blob_ref, signed_at)`
* `spines(entry_id PK, bundle_id, evidence_ids[], reasoning_id, vex_id, anchor_id, created_at)`
* `trust_anchors(anchor_id PK, purl_pattern, allowed_keyids[], policy_ref, revoked_keys[])`
* Blobs (immutable): raw evidence, normalized reasoning JSON, VEX JSON, DSSE bytes.
# API surface (clean and small)
* `POST /proofs/:entry/spine` → submit or update spine (idempotent by ProofBundleID)
* `GET /proofs/:entry/receipt` → full verification receipt (JSON)
* `GET /proofs/:entry/vex` → the verified VEX body
* `GET /anchors/:anchor` → fetch trust anchor (for offline kits)
# Normalization rules (so hashes are stable)
* Canonical JSON (UTF8, sorted keys, no insignificant whitespace).
* Strip volatile fields (timestamps that arent part of the semantic claim).
* Version your schemas: `evidence.stella/v1`, `reasoning.stella/v1`, etc.
# Signing keys & rotation
* Keep keys in your **Authority** module (KMS/HSM; offline export for airgap).
* Publish key material via an **attestation feed** (or Rekormirror) for thirdparty audit.
* Rotate by **adding** new allowed_keyids in the TrustAnchor; never mutate old envelopes.
# CI/CD hooks
* On SBOM ingest → create/refresh SBOMEntry rows + attach TrustAnchor.
* On scan completion → produce Evidence Statements (DSSE) immediately.
* On policy evaluation → produce Reasoning + VEX, then assemble Spine.
* Gate releases on `GET /proofs/:entry/receipt` == PASS.
# UX (auditorfriendly)
* **Proof timeline** per entry: SBOM → Evidence tiles → Reasoning → VEX → Receipt.
* Oneclick “Recompute & Compare” to show deterministic replay passes.
* Red/amber flags when a signature no longer matches a TrustAnchor or a key is revoked.
# Minimal dev checklist
* [ ] Implement canonicalizers (Evidence, Reasoning, VEX).
* [ ] Implement DSSE sign/verify (ECDSA + optional PQC).
* [ ] TrustAnchor registry + resolver by purl pattern.
* [ ] Merkle bundling to get ProofBundleID.
* [ ] Receipt generator + verifier.
* [ ] Postgres schema + blob GC (contentaddressed).
* [ ] CI gates + API endpoints above.
* [ ] Auditor UI: timeline + diff + receipts download.
If you want, I can drop in a readytouse JSON schema set (`evidence.stella/v1`, `reasoning.stella/v1`, `proofspine.stella/v1`) and sample DSSE envelopes wired to your .NET 10 stack.
Heres a focused **Stella Ops Developer Guidelines** doc, specifically for the pipeline that turns **SBOM data into verifiable proofs** (your SBOM → Evidence → Reasoning → VEX → Proof Spine).
Feel free to paste this into your internal handbook and tweak names to match your repos/services.
---
# Stella Ops Developer Guidelines
## Turning SBOM Data Into Verifiable Proofs
---
## 1. Mental Model: What Youre Actually Building
For every component in an SBOM, Stella must be able to answer, *“Why should anyone trust our VEX verdict for this dependency, today and ten years from now?”*
We do that with a pipeline:
1. **SBOM Ingest**
Raw SBOM → validated → normalized → `SBOMEntryID`.
2. **Evidence Collection**
Scans, feeds, configs, reachability, etc. → canonical evidence blobs → `EvidenceID` → DSSE-signed.
3. **Reasoning / Policy**
Policy + evidence → deterministic reasoning → `ReasoningID` → DSSE-signed.
4. **VEX Verdict**
VEX statement (CycloneDX / CSAF) → canonicalized → `VEXVerdictID` → DSSE-signed.
5. **Proof Spine**
`{SBOMEntryID, EvidenceIDs[], ReasoningID, VEXVerdictID}` → merkle bundle → `ProofBundleID` → DSSE-signed.
6. **Verification & Receipts**
Re-run verification → `Receipt` that proves everything above is intact and anchored to trusted keys.
Everything you do in this area should keep this spine intact and verifiable.
---
## 2. NonNegotiable Invariants
These are the rules you dont break without an explicit, company-level decision:
1. **Immutability of Signed Facts**
* DSSE envelopes (evidence, reasoning, VEX, spines) are appendonly.
* You never edit or delete content inside a previously signed envelope.
* Corrections are made by **superseding** (new statement pointing at the old one).
2. **Determinism**
* Same `{SBOMEntryID, Evidence set, policyVersion}` ⇒ same `{ReasoningID, VEXVerdictID, ProofBundleID}`.
* No non-deterministic inputs (e.g., “current time”, random IDs) in anything that affects IDs or verdicts.
3. **Traceability**
* Every VEX verdict must be traceable back to:
* The precise SBOM entry
* Concrete evidence blobs
* A specific policy & reasoning snapshot
* A trust anchor defining allowed signers
4. **Least Trust / Least Privilege**
* Services only know the keys and data they need.
* Trust is always explicit: through **TrustAnchors** and signature verification, never “because its in our DB”.
5. **Backwards Compatibility**
* New code must continue to verify **old proofs**.
* New policies must **not rewrite history**; they produce *new* spines, leaving old ones intact.
---
## 3. SBOM Ingestion Guidelines
**Goal:** Turn arbitrary SBOMs into stable, addressable `SBOMEntryID`s and safe internal models.
### 3.1 Inputs & Formats
* Support at least:
* CycloneDX (JSON)
* SPDX (JSON / Tag-Value)
* For each ingested SBOM, store:
* Raw SBOM bytes (immutable, content-addressed)
* A normalized internal representation (your own model)
### 3.2 IDs
* Generate:
* `sbomDigest` = hash(raw SBOM, canonical form)
* `SBOMEntryID` = `sbomDigest + purl + version` (or equivalent stable tuple)
* `SBOMEntryID` must:
* Not depend on ingestion time or database IDs.
* Be reproducible from SBOM + deterministic normalization.
### 3.3 Validation & Errors
* Validate:
* Syntax (JSON, schema)
* Core semantics (package identifiers, digests, versions)
* If invalid:
* Reject the SBOM **but** record a small DSSE “failure attestation” explaining:
* Why it failed
* Which file
* Which system version
* This still gives you a proof trail for “we tried and it failed”.
---
## 4. Evidence Collection Guidelines
**Goal:** Capture all inputs that influence the verdict in a canonical, signed form.
Typical evidence types:
* SCA / vuln scanner results
* CVE feeds & advisory data
* Reachability / call graph analysis
* Runtime context (where this component is used)
* Manual assessments (e.g., security engineer verdicts)
### 4.1 Evidence Canonicalization
For every evidence item:
* Normalize to a schema like `evidence.stella/v1` with fields such as:
* `source` (scanner name, feed)
* `sourceVersion` (tool version, DB version)
* `collectionTime`
* `sbomEntryId`
* `vulnerabilityId` (if applicable)
* `rawFinding` (or pointer to it)
* Canonical JSON rules:
* Sorted keys
* UTF8, no extraneous whitespace
* No volatile fields beyond whats semantically needed (e.g., you might include `collectionTime`, but then know it affects the hash and treat that consciously).
Then:
* Compute `EvidenceID = hash(canonicalEvidenceJson)`.
* Wrap in DSSE:
* `subject`: `SBOMEntryID`
* `predicateType`: `evidence.stella/v1`
* `predicate`: canonical evidence + `EvidenceID`.
* Sign with **evidence-ingestor key** (per environment).
### 4.2 Ops Rules
* **Idempotency:**
Re-running the same scan with same inputs should produce the same evidence object and `EvidenceID`.
* **Tool changes:**
When tool version or configuration changes, thats a *new* evidence statement with a new `EvidenceID`. Do not overwrite old evidence.
* **Partial failure:**
If a scan fails, produce a minimal failure evidence record (with error details) instead of “nothing”.
---
## 5. Reasoning & Policy Engine Guidelines
**Goal:** Turn evidence into a defensible, replayable reasoning step with a clear policy version.
### 5.1 Reasoning Object
Define a canonical reasoning schema, e.g. `reasoning.stella/v1`:
* `sbomEntryId`
* `evidenceIds[]` (sorted)
* `policyVersion`
* `inputs`: normalized form of all policy inputs (severity thresholds, lattice rules, etc.)
* `intermediateFindings`: optional but useful — e.g., “reachable vulns = …”
Then:
* Canonicalize JSON and compute `ReasoningID = hash(canonicalReasoning)`.
* Wrap in DSSE:
* `subject`: `SBOMEntryID`
* `predicateType`: `reasoning.stella/v1`
* `predicate`: canonical reasoning + `ReasoningID`.
* Sign with **Policy/Authority key**.
### 5.2 Determinism
* Reasoning functions must be **pure**:
* Inputs: SBOMEntryID, evidence set, policy version, configuration.
* No hidden calls to external APIs at decision time (fetch feeds earlier and record them as evidence).
* If you need “current time” in policy:
* Treat it as **explicit input** and record it inside reasoning under `inputs.currentEvaluationTime`.
### 5.3 Policy Evolution
* When policy changes:
* Bump `policyVersion`.
* New evaluations produce new `ReasoningID` and new VEX/spines.
* Dont retroactively apply new policy to old reasoning objects; generate new ones alongside.
---
## 6. VEX Verdict Guidelines
**Goal:** Generate VEX statements that are strongly tied to SBOM entries and your reasoning.
### 6.1 Shape
* Target standard formats:
* CycloneDX VEX
* or CSAF
* Required linkages:
* Component reference = `SBOMEntryID` or a resolvable component identifier from your SBOM normalize layer.
* Vulnerability IDs (CVE, GHSA, internal IDs).
* Status (`not_affected`, `affected`, `fixed`, etc.).
* Justification & impact.
### 6.2 Canonicalization & Signing
* Define a canonical VEX body schema (subset of the standard + internal metadata):
* `sbomEntryId`
* `vulnerabilityId`
* `status`
* `justification`
* `policyVersion`
* `reasoningId`
* Canonicalize JSON → `VEXVerdictID = hash(canonicalVexBody)`.
* DSSE-envelope:
* `subject`: `SBOMEntryID`
* `predicateType`: e.g. `cdx-vex.stella/v1`
* `predicate`: canonical VEX + `VEXVerdictID`.
* Sign with **VEXer key** or vendor key (depending on trust anchor).
### 6.3 External VEX
* When importing vendor VEX:
* Verify signature against vendors TrustAnchor.
* Canonicalize to your internal schema but preserve:
* Original document
* Original signature material
* Record “source = vendor” vs “source = stella” so auditors see origin.
---
## 7. Proof Spine Guidelines
**Goal:** Build a compact, tamper-evident “bundle” that ties everything together.
### 7.1 Structure
For each `SBOMEntryID`, gather:
* `EvidenceIDs[]` (sorted lexicographically).
* `ReasoningID`.
* `VEXVerdictID`.
Compute:
* Merkle tree root (or deterministic hash) over:
* `sbomEntryId`
* sorted `EvidenceIDs[]`
* `ReasoningID`
* `VEXVerdictID`
* Result is `ProofBundleID`.
Create a DSSE “spine”:
* `subject`: `SBOMEntryID`
* `predicateType`: `proofspine.stella/v1`
* `predicate`:
* `evidenceIds[]`
* `reasoningId`
* `vexVerdictId`
* `policyVersion`
* `proofBundleId`
* Sign with **Authority key**.
### 7.2 Ops Rules
* Spine generation is idempotent:
* Same inputs → same `ProofBundleID`.
* Never mutate existing spines; new policy or new evidence ⇒ new spine.
* Keep a clear API contract:
* `GET /proofs/:entry` returns **all** spines, each labeled with `policyVersion` and timestamps.
---
## 8. Storage & Schema Guidelines
**Goal:** Keep proofs queryable forever without breaking verification.
### 8.1 Tables (conceptual)
* `sbom_entries`: `entry_id`, `bom_digest`, `purl`, `version`, `artifact_digest`, `trust_anchor_id`.
* `dsse_envelopes`: `env_id`, `entry_id`, `predicate_type`, `signer_keyid`, `body_hash`, `envelope_blob_ref`, `signed_at`.
* `spines`: `entry_id`, `proof_bundle_id`, `policy_version`, `evidence_ids[]`, `reasoning_id`, `vex_verdict_id`, `anchor_id`, `created_at`.
* `trust_anchors`: `anchor_id`, `purl_pattern`, `allowed_keyids[]`, `policy_ref`, `revoked_keys[]`.
### 8.2 Schema Changes
Always follow:
1. **Expand**
* Add new columns/tables.
* Make new code tolerant of old data.
2. **Backfill**
* Idempotent jobs that fill in new IDs/fields without touching old DSSE payloads.
3. **Contract**
* Only after all code uses the new model.
* Never drop the raw DSSE or raw SBOM blobs.
---
## 9. Verification & Receipts
**Goal:** Make it trivial (for you, customers, and regulators) to recheck everything.
### 9.1 Verification Flow
Given `SBOMEntryID` or `ProofBundleID`:
1. Fetch spine and trust anchor.
2. Verify:
* Spine DSSE signature against TrustAnchors allowed keys.
* VEX, reasoning, and evidence DSSE signatures.
3. Recompute:
* `EvidenceIDs` from stored canonical evidence.
* `ReasoningID` from reasoning.
* `VEXVerdictID` from VEX body.
* `ProofBundleID` from the above.
4. Compare to stored IDs.
Emit a **Receipt**:
* `proofBundleId`
* `verifiedAt`
* `verifierVersion`
* `anchorId`
* `result` (pass/fail, with reasons)
### 9.2 Offline Kit
* Provide a minimal CLI (`stella verify`) that:
* Accepts a bundle export (SBOM + DSSE envelopes + anchors).
* Verifies everything without network access.
Developers must ensure:
* Export format is documented and stable.
* All fields required for verification are included.
---
## 10. Security & Key Management (for Devs)
* Keys live in **KMS/HSM**, not env vars or config files.
* Separate keysets:
* `dev`, `staging`, `prod`
* Authority vs VEXer vs Evidence Ingestor.
* TrustAnchors:
* Edit via Authority service only.
* Every change:
* Requires code-reviewed change.
* Writes an audit log entry.
Never:
* Log private keys.
* Log full DSSE envelopes in plaintext logs (log IDs and hashes instead).
---
## 11. Observability & OnCall Expectations
### 11.1 Metrics
For the SBOM→Proof pipeline, expose:
* `sboms_ingested_total`
* `sbom_ingest_errors_total{reason}`
* `evidence_statements_created_total`
* `reasoning_statements_created_total`
* `vex_statements_created_total`
* `proof_spines_created_total`
* `proof_verifications_total{result}` (pass/fail reason)
* Latency histograms per stage (`_duration_seconds`)
### 11.2 Logging
Include in structured logs wherever relevant:
* `sbomEntryId`
* `proofBundleId`
* `anchorId`
* `policyVersion`
* `requestId` / `traceId`
### 11.3 Runbooks
You should maintain runbooks for at least:
* “Pipeline is stalled” (backlog of SBOMs, evidence, or spines).
* “Verification failures increased”.
* “Trust anchor or key issues” (rotation, revocation, misconfiguration).
* “Backfill gone wrong” (how to safely stop, resume, and audit).
---
## 12. Dev Workflow & PR Checklist (SBOM→Proof Changes Only)
When your change touches SBOM ingestion, evidence, reasoning, VEX, or proof spines, check:
* [ ] IDs (`SBOMEntryID`, `EvidenceID`, `ReasoningID`, `VEXVerdictID`, `ProofBundleID`) remain **deterministic** and fully specified.
* [ ] No mutation of existing DSSE envelopes or historical proof data.
* [ ] Schema changes follow **expand → backfill → contract**.
* [ ] New/updated TrustAnchors reviewed by Authority owner.
* [ ] Unit tests cover:
* Canonicalization for any new/changed predicate.
* ID computation.
* [ ] Integration test covers:
* SBOM → Evidence → Reasoning → VEX → Spine → Verification → Receipt.
* [ ] Observability updated:
* New paths emit logs & metrics.
* [ ] Rollback plan documented (especially for migrations & policy changes).
---
If you tell me which microservices/repos map to these stages (e.g. `stella-sbom-ingest`, `stella-proof-authority`, `stella-vexer`), I can turn this into a more concrete, servicebyservice checklist with example API contracts and class/interface sketches.