Files
git.stella-ops.org/docs/modules/platform/architecture-overview.md
StellaOps Bot 808ab87b21 up
2025-11-30 21:01:00 +02:00

14 KiB
Raw Permalink Blame History

StellaOps Architecture Overview (Sprint19)

Ownership: Architecture Guild • Docs Guild
Audience: Service owners, platform engineers, solution architects
Related: High-Level Architecture, Concelier Architecture, Policy Engine Architecture, Aggregation-Only Contract

This dossier summarises the end-to-end runtime topology after the Aggregation-Only Contract (AOC) rollout. It highlights where raw facts live, how ingest services enforce guardrails, and how downstream components consume those facts to derive policy decisions and user-facing experiences.


Need a quick orientation? The Developer Quickstart (29-Nov-2025 advisory) captures the core repositories, determinism checks, DSSE conventions, and starter tasks that explain how the platform pieces fit together.

Planner note: the [SBOM→VEX proof blueprint](../product-advisories/29-Nov-2025 - SBOM to VEX Proof Pipeline Blueprint.md) shows the DSSE → Rekor v2 tiles → VEX linkage, so threat-model and compliance teams can copy the capture/verification checkpoints.

Working on a feature? Check the [Implementor Guidelines](../product-advisories/30-Nov-2025 - Implementor Guidelines for Stella Ops.md) to align with the SRS + release playbook checklist before you merge anything into main.

Need to prove Rekor receipts? The [Rekor Receipt Checklist](../product-advisories/30-Nov-2025 - Rekor Receipt Checklist for Stella Ops.md) maps each field to a module owner and explains offline metadata for deterministic re-verification.

Taming unknowns? The [Unknowns Decay & Triage Heuristics](../product-advisories/30-Nov-2025 - Unknowns Decay & Triage Heuristics.md) explains the confidence decay card, triage queue view, and the daily export artifact for planning.

Check the [Ecosystem Reality Test Cases](../product-advisories/30-Nov-2025 - Ecosystem Reality Test Cases for StellaOps.md) for reproducible acceptance tests based on credential leaks, offline DB schema issues, SBOM parity drift, and scanner version divergence.

Need unblocker tasks? The [Standup Sprint Kickstarters](../product-advisories/30-Nov-2025 - Standup Sprint Kickstarters.md) lists three day-0 wins (scanner regressions, Postgres slice, DSSE/Rekor sweep) plus ready-to-copy ticket names. Compare how evidence/suppression/audit flows work elsewhere via the [Comparative Evidence Patterns](../product-advisories/30-Nov-2025 - Comparative Evidence Patterns for Stella Ops.md) brief—Snyk, GitHub, Aqua, Anchore/Grype, Prisma Cloud, and the UX trade-offs.

Evaluate public scanner incidents? The [Ecosystem Test Cases](../product-advisories/30-Nov-2025 - Ecosystem Test Cases for StellaOps.md) document five hardened regressions (Grype credential leak, Trivy offline schema, SBOM parity, Grype instability) that you can turn into acceptance tests today.

1·System landscape

graph TD
    subgraph Edge["Clients & Automation"]
        CLI[stella CLI]
        UI[Console SPA]
        APIClients[CI / API Clients]
    end
    Gateway[API Gateway<br/>(JWT + DPoP scopes)]
    subgraph Scanner["Fact Collection"]
        ScannerWeb[Scanner.WebService]
        ScannerWorkers[Scanner.Workers]
        Agent[Agent Runtime]
    end
    subgraph Ingestion["Aggregation-Only Ingestion (AOC)"]
        Concelier[Concelier.WebService]
        Excititor[Excititor.WebService]
        RawStore[(MongoDB<br/>advisory_raw / vex_raw)]
    end
    subgraph Derivation["Policy & Overlay"]
        Policy[Policy Engine]
        Scheduler[Scheduler Services]
        Notify[Notifier]
    end
    subgraph Experience["UX & Export"]
        UIService[Console Backend]
        Exporters[Export / Offline Kit]
    end
    Observability[Telemetry Stack]

    CLI --> Gateway
    UI --> Gateway
    APIClients --> Gateway
    Gateway --> ScannerWeb
    ScannerWeb --> ScannerWorkers
    ScannerWorkers --> Concelier
    ScannerWorkers --> Excititor
    Concelier --> RawStore
    Excititor --> RawStore
    RawStore --> Policy
    Policy --> Scheduler
    Policy --> Notify
    Policy --> UIService
    Scheduler --> UIService
    UIService --> Exporters
    Exporters --> CLI
    Exporters --> Offline[Offline Kit]
    Observability -.-> ScannerWeb
    Observability -.-> Concelier
    Observability -.-> Excititor
    Observability -.-> Policy
    Observability -.-> Scheduler
    Observability -.-> Notify

Key boundaries:

  • AOC border. Everything inside the Ingestion subgraph writes only immutable raw facts plus link hints. Derived severity, consensus, and risk remain outside the border.
  • Policy-only derivation. Policy Engine materialises effective_finding_* collections and emits overlays; other services consume but never mutate them.
  • Tenant enforcement. Authority-issued DPoP scopes flow through Gateway to every service; raw stores and overlays include tenant strictly.
  • Hybrid reachability attestations. Scanner/Attestor always publish graph-level DSSE for reachability graphs; optional edge-bundle DSSEs capture high-risk/runtime/init edges. Policy/Signals consume both, with graph DSSE as the minimum bar and edge bundles used for quarantine/dispute flows.

2·Aggregation-Only Contract focus

2.1 Responsibilities at the boundary

Area Services Responsibilities under AOC Forbidden under AOC
Ingestion (Concelier / Excititor) StellaOps.Concelier.WebService, StellaOps.Excititor.WebService Fetch upstream advisories/VEX, verify signatures, compute linksets, append immutable documents to advisory_raw / vex_raw, emit observability signals, expose raw read APIs. Computing severity, consensus, suppressions, or policy hints; merging upstream sources into a single derived record; mutating existing documents.
Policy & Overlay StellaOps.Policy.Engine, Scheduler Join SBOM inventory with raw advisories/VEX, evaluate policies, issue effective_finding_* overlays, drive remediation workflows. Writing to raw collections; bypassing guard scopes; running without recorded provenance.
Experience layers Console, CLI, Exporters Surface raw facts + policy overlays; run stella aoc verify; render AOC dashboards and reports. Accepting ingestion payloads that lack provenance or violate guard results.

2.2 Raw stores

Collection Purpose Key fields Notes
advisory_raw Immutable vendor/ecosystem advisory documents. _id, tenant, source.*, upstream.*, content.raw, linkset, supersedes. Idempotent by (source.vendor, upstream.upstream_id, upstream.content_hash).
vex_raw Immutable vendor VEX statements. Mirrors advisory_raw; identifiers.statements summarises affected components. Maintains supersedes chain identical to advisory flow.
Change streams (advisory_raw_stream, vex_raw_stream) Feed Policy Engine and Scheduler. operationType, documentKey, fullDocument, tenant, traceId. Scope filtered per tenant before delivery.

2.3 Guarded ingestion sequence

sequenceDiagram
    participant Upstream as Upstream Source
    participant Connector as Concelier/Excititor Connector
    participant Guard as AOCWriteGuard
    participant Mongo as MongoDB (advisory_raw / vex_raw)
    participant Stream as Change Stream
    participant Policy as Policy Engine

    Upstream-->>Connector: CSAF / OSV / VEX document
    Connector->>Connector: Normalize transport, compute content_hash
    Connector->>Guard: Candidate raw doc (source + upstream + content + linkset)
    Guard-->>Connector: ERR_AOC_00x on violation
    Guard->>Mongo: Append immutable document (with tenant & supersedes)
    Mongo-->>Stream: Change event (tenant scoped)
    Stream->>Policy: Raw delta payload
    Policy->>Policy: Evaluate policies, compute effective findings

2.4 Authority scopes & tenancy

Scope Holder Purpose Notes
advisory:ingest / vex:ingest Concelier / Excititor collectors Append raw documents through ingestion endpoints. Paired with tenant claims; requests without tenant are rejected.
advisory:read / vex:read DevOps verify identity, CLI Run stella aoc verify or call /aoc/verify. Read-only; cannot mutate raw docs.
effective:write Policy Engine Materialise effective_finding_* overlays. Only Policy Engine identity may hold; ingestion contexts receive ERR_AOC_006 if they attempt.
findings:read Console, CLI, exports Consume derived findings. Enforced by Gateway and downstream services.

3·Data & control flow highlights

  1. Ingestion: Concelier / Excititor connectors fetch upstream documents, compute linksets, and hand payloads to AOCWriteGuard. Guards validate schema, provenance, forbidden fields, supersedes pointers, and append-only rules before writing to Mongo.
  2. Verification: stella aoc verify (CLI/CI) and /aoc/verify endpoints replay guard checks against stored documents, mapping ERR_AOC_00x codes to exit codes for automation.
  3. Policy evaluation: Mongo change streams deliver tenant-scoped raw deltas. Policy Engine joins SBOM inventory (via BOM Index), executes deterministic policies, writes overlays, and emits events to Scheduler/Notify.
  4. Experience surfaces: Console renders an AOC dashboard showing ingestion latency, guard violations, and supersedes depth. CLI exposes raw-document fetch helpers for auditing. Offline Kit bundles raw collections alongside guard configs to keep air-gapped installs verifiable.
  5. Observability: All services emit ingestion_write_total, aoc_violation_total{code}, ingestion_latency_seconds, and trace spans ingest.fetch, ingest.transform, ingest.write, aoc.guard. Logs correlate via traceId, tenant, source.vendor, and content_hash.

4·Offline & disaster readiness

  • Offline Kit: Packages raw Mongo snapshots (advisory_raw, vex_raw) plus guard configuration and CLI verifier binaries so air-gapped sites can re-run AOC checks before promotion.
  • Recovery: Supersedes chains allow rollback to prior revisions without mutating documents. Disaster exercises must rehearse restoring from snapshot, replaying change streams into Policy Engine, and re-validating guard compliance.
  • Migration: Legacy normalised fields are moved to temporary views during cutover; ingestion runtime removes writes once guard-enforced path is live (see Migration playbook).

5·Replay CAS & deterministic bundles

  • Replay CAS: Content-addressed storage lives under cas://replay/<sha256-prefix>/<digest>.tar.zst. Writers must use StellaOps.Replay.Core helpers to ensure lexicographic file ordering, POSIX mode normalisation (0644/0755), LF newlines, zstd level19 compression, and shard-by-prefix CAS URIs (BuildCasUri). Bundle metadata (size, hash, created) feeds the platform-wide replay_bundles collection defined in docs/data/replay_schema.md.
  • Artifacts: Each recorded scan stores three bundles:
    1. manifest.json (canonical JSON, hashed and signed via DSSE).
    2. inputbundle.tar.zst (feeds, policies, tools, environment snapshot).
    3. outputbundle.tar.zst (SBOM, findings, VEX, logs, Merkle proofs). Every artifact is signed with multi-profile keys (FIPS, GOST, SM, etc.) managed by Authority. See docs/replay/DETERMINISTIC_REPLAY.md §2§5 for the full schema.
  • Reachability subtree: When reachability recording is enabled, Scanner uploads graphs & runtime traces under cas://replay/<scan-id>/reachability/graphs/ and cas://replay/<scan-id>/reachability/traces/. Manifest references (StellaOps.Replay.Core) bind these URIs along with analyzer hashes so Replay + Signals can rehydrate explainability evidence deterministically.
  • Storage tiers: Primary storage is Mongo (replay_runs, replay_subjects) plus the CAS bucket. Evidence Locker mirrors bundles for long-term retention and legal hold workflows (docs/modules/evidence-locker/architecture.md). Offline kits package bundles under offline/replay/<scan-id> with detached DSSE envelopes for air-gapped verification.
  • APIs & ownership: Scanner WebService produces the bundles via record mode, Scanner Worker emits Merkle metadata, Signer/Authority provide DSSE signatures, Attestor anchors manifests to Rekor, CLI/Evidence Locker handle retrieval, and Docs Guild maintains runbooks. Responsibilities are tracked in docs/implplan/SPRINT_185_shared_replay_primitives.md through SPRINT_187_evidence_locker_cli_integration.md.
  • Operational policies: Retention defaults to 180days for hot CAS storage and 2years for cold Evidence Locker copies. Rotation and pruning follow the checklist in docs/runbooks/replay_ops.md.

6·References


7·Compliance checklist

  • AOC guard enabled for all Concelier and Excititor write paths in production.
  • Mongo schema validators deployed for advisory_raw and vex_raw; change streams scoped per tenant.
  • Authority scopes (advisory:*, vex:*, effective:*) configured in Gateway and validated via integration tests.
  • stella aoc verify wired into CI/CD pipelines with seeded violation fixtures.
  • Console AOC dashboard and CLI documentation reference the new ingestion contract.
  • Offline Kit bundles include guard configs, verifier tooling, and documentation updates.
  • Observability dashboards include violation, latency, and supersedes depth metrics with alert thresholds.

Last updated: 2025-11-03 (Replay planning refresh).