Files
git.stella-ops.org/EPIC_1.md
master 730354a1af
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
feat: Implement Scheduler Worker Options and Planner Loop
- Added `SchedulerWorkerOptions` class to encapsulate configuration for the scheduler worker.
- Introduced `PlannerBackgroundService` to manage the planner loop, fetching and processing planning runs.
- Created `PlannerExecutionService` to handle the execution logic for planning runs, including impact targeting and run persistence.
- Developed `PlannerExecutionResult` and `PlannerExecutionStatus` to standardize execution outcomes.
- Implemented validation logic within `SchedulerWorkerOptions` to ensure proper configuration.
- Added documentation for the planner loop and impact targeting features.
- Established health check endpoints and authentication mechanisms for the Signals service.
- Created unit tests for the Signals API to ensure proper functionality and response handling.
- Configured options for authority integration and fallback authentication methods.
2025-10-27 09:46:31 +02:00

21 KiB
Raw Blame History

Heres the full writeup you can drop into your repo as the canonical reference for Epic 1. Its written in clean productdoc style so its safe to check in as Markdown. No fluff, just everything you need to build, test, and police it.


Epic 1: AggregationOnly Contract (AOC) Enforcement

Short name: AOC enforcement Services touched: Conseiller (advisory ingestion), Excitator (VEX ingestion), Web API, Workers, Policy Engine, CLI, Console, Authority Data stores: MongoDB primary, optional Redis/NATS for jobs


1) What it is

AggregationOnly Contract (AOC) is the ingestion covenant for StellaOps. It defines a hard boundary between collection and interpretation:

  • Ingestion (Conseiller/Excitator) only collects data and preserves it as immutable raw facts with provenance. It does not decide, merge, normalize, prioritize, or assign severity. It may compute links that help future joins (aliases, PURLs, CPEs), but never derived judgments.
  • Policy evaluation is the only place where merges, deduplication, consensus, severity computation, and status folding are allowed. Its reproducible and traceable.

The AOC establishes:

  • Immutable raw stores: advisory_raw and vex_raw documents with full provenance, signatures, checksums, and upstream identifiers.
  • Linksets: machinegenerated join hints (aliases, PURLs, CPEs, CVE/GHSA IDs) that never change the underlying source content.
  • Invariants: a strict set of “never do this in ingestion” rules enforced by schema validation, runtime guards, and CI checks.
  • AOC Verifier: a buildtime and runtime watchdog that blocks noncompliant code and data writes.

This epic delivers: schemas, guards, error codes, APIs, tests, migration, docs, and ops dashboards to make AOC nonnegotiable across the platform.


2) Why

AOC makes results auditable, deterministic, and organizationspecific. Source vendors disagree; your policies decide. By removing hidden heuristics from ingestion, we avoid unexplainable risk changes, race conditions between collectors, and vendor bias. Policytime evaluation yields reproducible deltas with complete “why” traces.


3) How it should work (deep details)

3.1 Core invariants

The following must be true for every write to advisory_raw and vex_raw and for every ingestion pipeline:

  1. No severity in ingestion

    • Forbidden fields: severity, cvss, cvss_vector, effective_status, effective_range, merged_from, consensus_provider, reachability, asset_criticality, risk_score.
  2. No merges or dedups in ingestion

    • No combining two upstream advisories into one. No picking a single truth when multiple VEX statements exist.
  3. Provenance is mandatory

    • Every raw doc includes provenance and signature/checksum.
  4. Idempotent upserts

    • Same upstream document (by upstream_id + source + content_hash) must not create duplicates.
  5. Appendonly versioning

    • Revisions from the source create new immutable documents with supersedes pointers; no inplace edits.
  6. Linkset only

    • Ingestion can compute and store a linkset for join performance. Linkset does not alter or infer severity/status.
  7. Policytime only for effective findings

    • Only the Policy Engine can write effective_finding_* materializations.
  8. Schema safety

    • Strict JSON schema validation at DB level; unknown fields reject writes.
  9. Clock discipline

    • Timestamps are UTC, monotonic within a batch; collectors record fetched_at and received_at.

3.2 Data model

3.2.1 advisory_raw (Mongo collection)

{
  "_id": "advisory_raw:osv:GHSA-xxxx-....:v3",
  "source": {
    "vendor": "OSV",
    "stream": "github", 
    "api": "https://api.osv.dev/v1/.../GHSA-...",
    "collector_version": "conseiller/1.7.3"
  },
  "upstream": {
    "upstream_id": "GHSA-xxxx-....",
    "document_version": "2024-09-01T12:13:14Z",
    "fetched_at": "2025-01-02T03:04:05Z",
    "received_at": "2025-01-02T03:04:06Z",
    "content_hash": "sha256:...",
    "signature": {
      "present": true,
      "format": "dsse",
      "key_id": "rekor:.../key/abc",
      "sig": "base64..."
    }
  },
  "content": {
    "format": "OSV",
    "spec_version": "1.6",
    "raw": { /* full upstream JSON, unmodified */ }
  },
  "identifiers": {
    "cve": ["CVE-2023-1234"],
    "ghsa": ["GHSA-xxxx-...."],
    "aliases": ["CVE-2023-1234", "GHSA-xxxx-...."]
  },
  "linkset": {
    "purls": ["pkg:npm/lodash@4.17.21", "pkg:maven/..."],
    "cpes": ["cpe:2.3:a:..."],
    "references": [
      {"type":"advisory","url":"https://..."},
      {"type":"fix","url":"https://..."}
    ],
    "reconciled_from": ["content.raw.affected.ranges", "content.raw.pkg"]
  },
  "supersedes": "advisory_raw:osv:GHSA-xxxx-....:v2",
  "tenant": "default"
}

Note: No severity, no cvss, no effective_*. If the upstream payload includes CVSS, it stays inside content.raw and is not promoted or normalized at ingestion.

3.2.2 vex_raw (Mongo collection)

{
  "_id": "vex_raw:vendorX:doc-123:v4",
  "source": {
    "vendor": "VendorX",
    "stream": "vex",
    "api": "https://.../vex/doc-123",
    "collector_version": "excitator/0.9.2"
  },
  "upstream": {
    "upstream_id": "doc-123",
    "document_version": "2025-01-15T08:09:10Z",
    "fetched_at": "2025-01-16T01:02:03Z",
    "received_at": "2025-01-16T01:02:03Z",
    "content_hash": "sha256:...",
    "signature": { "present": true, "format": "cms", "key_id": "kid:...", "sig": "..." }
  },
  "content": {
    "format": "CycloneDX-VEX",   // or "CSAF-VEX"
    "spec_version": "1.5",
    "raw": { /* full upstream VEX */ }
  },
  "identifiers": {
    "statements": [
      {
        "advisory_ids": ["CVE-2023-1234","GHSA-..."],
        "component_purls": ["pkg:deb/openssl@1.1.1"],
        "status": "not_affected",
        "justification": "component_not_present"
      }
    ]
  },
  "linkset": {
    "purls": ["pkg:deb/openssl@1.1.1"],
    "cves": ["CVE-2023-1234"],
    "ghsas": ["GHSA-..."]
  },
  "supersedes": "vex_raw:vendorX:doc-123:v3",
  "tenant": "default"
}

VEX statuses remain as raw facts. No crossprovider consensus is computed here.

3.3 Database validation

  • MongoDB JSON Schema validators on both collections:

    • Reject forbidden fields at the top level.
    • Enforce presence of source, upstream, content, linkset, tenant.
    • Enforce string formats for timestamps and hashes.

3.4 Write paths

  1. Collector fetches upstream

    • Normalize transport (gzip/json), compute content_hash, verify signature if available.
  2. Build raw doc

    • Populate source, upstream, content.raw, identifiers, linkset.
  3. Idempotent upsert

    • Lookup by (source.vendor, upstream.upstream_id, upstream.content_hash). If exists, skip; if new content hash, insert new revision with supersedes.
  4. AOC guard

    • Runtime interceptor inspects write payload; if any forbidden field detected, reject with ERR_AOC_001.
  5. Metrics

    • Emit ingestion_write_ok or ingestion_write_reject with reason code.

3.5 Read paths (ingestion scope)

  • Allow only listing, getting raw docs, and searching by linkset. No endpoints return “effective findings” from ingestion services.

3.6 Error codes

Code Meaning HTTP
ERR_AOC_001 Forbidden field present (severity/consensus/normalized data) 400
ERR_AOC_002 Merge attempt detected (multiple upstreams fused) 400
ERR_AOC_003 Idempotency violation (duplicate without supersedes) 409
ERR_AOC_004 Missing provenance fields 422
ERR_AOC_005 Signature/checksum mismatch 422
ERR_AOC_006 Attempt to write effective findings from ingestion context 403
ERR_AOC_007 Unknown toplevel fields (schema violation) 400

3.7 AOC Verifier

A buildtime and runtime safeguard:

  • Static checks (CI)

    • Block imports of *.Policy* or *.Merge* from ingestion modules.
    • AST lint rule: any write to advisory_raw or vex_raw setting a forbidden key fails the build.
  • Runtime checks

    • Repository layer interceptor inspects documents before insert/update; rejects forbidden fields and multisource merges.
  • Drift detection job

    • Nightly job scans newest N docs; if violation found, pages ops and blocks new pipeline runs.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

3.8 Indexing strategy

  • advisory_raw:

    • { "identifiers.cve": 1 }, { "identifiers.ghsa": 1 }, { "linkset.purls": 1 }, { "source.vendor": 1, "upstream.upstream_id": 1, "upstream.content_hash": 1 } (unique), { "tenant": 1 }.
  • vex_raw:

    • { "identifiers.statements.advisory_ids": 1 }, { "linkset.purls": 1 }, { "source.vendor": 1, "upstream.upstream_id": 1, "upstream.content_hash": 1 } (unique), { "tenant": 1 }.

3.9 Interaction with Policy Engine

  • Policy Engine pulls raw docs by identifiers/linksets and computes:

    • Dedup/merge per policy
    • Consensus for VEX statements
    • Severity normalization and risk scoring
  • Writes only to effective_finding_{policyId} collections.

A dedicated write guard refuses effective_finding_* writes from any caller that isnt the Policy Engine service identity.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

3.10 Migration plan

  1. Freeze ingestion writes except raw passthrough.
  2. Backfill: copy existing ingestion collections to _backup_*.
  3. Strip forbidden fields from raw copies, move them into a temporary advisory_view_legacy used only by Policy Engine for parity.
  4. Enable DB schema validators.
  5. Run collectors in dryrun; ensure only allowed keys land.
  6. Switch Policy Engine to pull exclusively from *_raw and to compute everything else.
  7. Delete legacy normalized fields in ingestion codepaths.
  8. Enable runtime guards and CI lint.

3.11 Observability

  • Metrics:

    • aoc_violation_total{code=...}, ingestion_write_total{result=ok|reject}, ingestion_signature_verified_total{result=ok|fail}, ingestion_latency_seconds, advisory_revision_count.
  • Tracing: span ingest.fetch, ingest.transform, ingest.write, aoc.guard.

  • Logs: include tenant, source.vendor, upstream.upstream_id, content_hash, correlation_id.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

3.12 Security and tenancy

  • Every raw doc carries a tenant field.
  • Authority enforces advisory:ingest and vex:ingest scopes for ingestion endpoints.
  • Crosstenant reads/writes are blocked by default.
  • Secrets never logged; signatures verified with pinned trust stores.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

3.13 CLI and Console behavior

  • CLI

    • stella sources ingest --dry-run prints wouldwrite payload and explicitly shows that no severity/status fields are present.
    • stella aoc verify scans last K documents and reports violations with exit codes.
  • Console

    • Sources dashboard shows AOC pass/fail per job, most recent violation codes, and a drilldown to the offending document.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.


4) API surface (ingestion scope)

4.1 Conseiller (Advisories)

  • POST /ingest/advisory

    • Body: raw upstream advisory with metadata; server constructs document, not the client.
    • Rejections: ERR_AOC_00x per table above.
  • GET /advisories/raw/{id}

  • GET /advisories/raw?cve=CVE-...&purl=pkg:...&tenant=...

  • GET /advisories/raw/{id}/provenance

  • POST /aoc/verify?since=ISO8601 returns summary stats and first N violations.

4.2 Excitator (VEX)

  • POST /ingest/vex
  • GET /vex/raw/{id}
  • GET /vex/raw?advisory_id=CVE-...&purl=pkg:...
  • POST /aoc/verify?since=ISO8601

All endpoints require tenant scope and appropriate :write or :read.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.


5) Example: endtoend flow

  1. Collector fetches GHSA-1234 from OSV.
  2. Build advisory_raw with linkset PURLs.
  3. Insert; AOC guard approves.
  4. Policy Engine later evaluates SBOM S-42 under policy P-7, reads raw advisory and any VEX raw docs, computes effective findings, and writes to effective_finding_P-7.
  5. CLI stella aoc verify --since 24h returns 0 violations.

6) Implementation tasks

Breakdown by component with exact work items. Each section ends with the imposed sentence you requested.

6.1 Conseiller (advisory ingestion, WS + Worker)

  • Add Mongo JSON schema validation for advisory_raw.
  • Implement repository layer with write interceptors that reject forbidden fields.
  • Compute linkset from upstream using deterministic mappers.
  • Enforce idempotency by unique index on (source.vendor, upstream.upstream_id, upstream.content_hash, tenant).
  • Remove any normalization pipelines; relocate to Policy Engine.
  • Add POST /ingest/advisory and GET /advisories/raw* endpoints with Authority scope checks.
  • Emit observability metrics and traces.
  • Unit tests: schema violations, idempotency, supersedes chain, forbidden fields.
  • Integration tests: large batch ingest, linkset correctness against golden fixtures.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

6.2 Excitator (VEX ingestion, WS + Worker)

  • Add Mongo JSON schema validation for vex_raw.
  • Implement repository layer guard identical to Conseiller.
  • Deterministic linkset extraction for advisory IDs and PURLs.
  • Endpoints POST /ingest/vex, GET /vex/raw* with scopes.
  • Remove any consensus or folding logic; leave VEX statements as raw.
  • Tests as per Conseiller, with rich fixtures for CycloneDXVEX and CSAF.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

6.3 Web API shared library

  • Define AOCForbiddenKeys and export for both services.
  • Provide AOCWriteGuard middleware and AOCError types.
  • Provide ProvenanceBuilder utility.
  • Provide SignatureVerifier and Checksum helpers.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

6.4 Policy Engine

  • Block any import/use from ingestion modules by lint rule.
  • Add hard gate on effective_finding_* writes that verifies caller identity is Policy Engine.
  • Update readers to pull fields only from content.raw, identifiers, linkset, not any legacy normalized fields.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

6.5 Authority

  • Introduce scopes: advisory:ingest, advisory:read, vex:ingest, vex:read, aoc:verify.
  • Add tenant claim propagation to ingestion services.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

6.6 CLI

  • stella sources ingest --dry-run and stella aoc verify commands.
  • Exit codes mapping to ERR_AOC_00x.
  • JSON output schema including violation list.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

6.7 Console

  • Sources dashboard tiles: last run, AOC violations, top error codes.
  • Drilldown page rendering offending doc with highlight on forbidden keys.
  • “Verify last 24h” action calling the AOC Verifier endpoint.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.

6.8 CI/CD

  • AST linter to forbid writes of banned keys in ingestion modules.
  • Unit test coverage gates for AOC guard code.
  • Pipeline stage that runs stella aoc verify against seeded DB snapshots.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.


7) Documentation changes (create/update these files)

  1. /docs/ingestion/aggregation-only-contract.md

    • Add: philosophy, invariants, schemas for advisory_raw/vex_raw, error codes, linkset definition, examples, idempotency rules, supersedes, API references, migration steps, observability, security.
  2. /docs/architecture/overview.md

    • Update system diagram to show AOC boundary and raw stores; add sequence diagram: fetch → guard → raw insert → policy evaluation.
  3. /docs/architecture/policy-engine.md

    • Clarify ingestion boundary; list inputs consumed from raw; note that any severity/consensus is policytime only.
  4. /docs/ui/console.md

    • Add Sources dashboard section: AOC tiles and violation drilldown.
  5. /docs/cli/cli-reference.md

    • Add stella aoc verify and stella sources ingest --dry-run usage and exit codes.
  6. /docs/observability/observability.md

    • Document new metrics, traces, logs keys for AOC.
  7. /docs/security/authority-scopes.md

    • Add new scopes and tenancy enforcement for ingestion endpoints.
  8. /docs/deploy/containers.md

    • Note DB validators must be enabled; environment flags for AOC guards; readonly user for verify endpoint.

Each file should include a “Compliance checklist” subsection for AOC.

Imposed rule: Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.


8) Acceptance criteria

  • DB validators are active and reject writes with forbidden fields.
  • AOC runtime guards log and reject violations with correct error codes.
  • CI linter prevents shipping code that writes forbidden keys to raw stores.
  • Ingestion of known fixture sets results in zero normalized fields outside content.raw.
  • Policy Engine is the only writer of effective_finding_* materializations.
  • CLI stella aoc verify returns success on clean datasets and nonzero on seeded violations.
  • Console shows AOC status and violation drilldowns.

9) Risks and mitigations

  • Collector drift: new upstream fields tempt developers to normalize.

    • Mitigation: CI linter + guard + schema validators; require RFC to extend linkset.
  • Performance impact: extra validation on write.

    • Mitigation: guard is O(number of keys) and schema check is bounded; indexes sized appropriately.
  • Migration complexity: moving legacy normalized fields out.

    • Mitigation: temporary advisory_view_legacy for parity; stepwise cutover.
  • Tenant leakage: missing tenant on write.

    • Mitigation: schema requires tenant; middleware injects and asserts.

10) Test plan

  • Unit tests

    • Guard rejects forbidden keys; idempotency; supersedes chain; provenance required.
    • Signature verification paths: good, bad, absent.
  • Property tests

    • Randomized upstream docs never produce forbidden keys at top level.
  • Integration tests

    • Batch ingest of 50k advisories: throughput, zero violations.
    • Mixed VEX sources with contradictory statements remain separate in raw.
  • Contract tests

    • Policy Engine refuses to run without raw inputs; writes only to effective_finding_*.
  • Endtoend

    • Seed SBOM + advisories + VEX; ensure findings are identical pre/post migration.

11) Developer checklists

Definition of Ready

  • Upstream spec reference attached.
  • Linkset mappers defined.
  • Example fixtures added.

Definition of Done

  • DB validators deployed and tested.
  • Runtime guards enabled.
  • CI linter merged and enforced.
  • Docs updated (files in section 7).
  • Metrics visible on dashboard.
  • CLI verify passes.

12) Glossary

  • Raw document: exact upstream content plus provenance, with join hints.
  • Linkset: PURLs/CPEs/IDs extracted to accelerate joins later.
  • Supersedes: pointer from a newer raw doc to the previous revision of the same upstream doc.
  • Policytime: evaluation phase where merges, consensus, and severity are computed.
  • AOC: AggregationOnly Contract.

Final imposed reminder

Work of this type or tasks of this type on this component must also be applied everywhere else it should be applied.