# Data Schemas & Persistence Contracts *Audience* – backend developers, plug‑in authors, DB admins. *Scope* – describes **Redis**, **MongoDB** (optional), and on‑disk blob shapes that power Stella Ops. --- ## 0 Document Conventions * **CamelCase** for JSON. * All timestamps are **RFC 3339 / ISO 8601** with `Z` (UTC). * `⭑` = planned but *not* shipped yet (kept on Feature Matrix “To Do”). --- ## 1 SBOM Wrapper Envelope Every SBOM blob (regardless of format) is stored on disk or in object storage with a *sidecar* JSON file that indexes it for the scanners. #### 1.1 JSON Shape ```jsonc { "id": "sha256:417f…", // digest of the SBOM *file* itself "imageDigest": "sha256:e2b9…", // digest of the original container image "created": "2025-07-14T07:02:13Z", "format": "trivy-json-v2", // NEW enum: trivy-json-v2 | spdx-json | cyclonedx-json "layers": [ "sha256:d38b…", // layer digests (ordered) "sha256:af45…" ], "partial": false, // true => delta SBOM (only some layers) "provenanceId": "prov_0291" // ⭑ link to SLSA attestation (Q1‑2026) } ``` *`format`* **NEW** – added to support **multiple SBOM formats**. *`partial`* **NEW** – true when generated via the **delta SBOM** flow (§1.3). #### 1.2 File‑system Layout ``` blobs/ ├─ 417f… # digest prefix │   ├─ sbom.json # payload (any format) │   └─ sbom.meta.json # wrapper (shape above) ``` > **Note** – blob storage can point at S3, MinIO, or plain disk; driver plug‑ins adapt. #### 1.3 Delta SBOM Extension When `partial: true`, *only* the missing layers have been scanned. Merging logic inside `scanning` module stitches new data onto the cached full SBOM in Redis. --- ## 2 Redis Keyspace | Key pattern | Type | TTL | Purpose | |-------------------------------------|---------|------|--------------------------------------------------| | `scan:<digest>` | string | ∞ | Last scan JSON result (as returned by `/scan`) | | `layers:<digest>` | set | 90d | Layers already possessing SBOMs (delta cache) | | `policy:active` | string | ∞ | YAML **or** Rego ruleset | | `quota:<token>` | string | *until next UTC midnight* | Per‑token scan counter for Free tier ({{ quota_token }} scans). | | `policy:history` | list | ∞ | Change audit IDs (see Mongo) | | `feed:nvd:json` | string | 24h | Normalised feed snapshot | | `locator:<imageDigest>` | string | 30d | Maps image digest → sbomBlobId | | `metrics:…` | various | — | Prom / OTLP runtime metrics | > **Delta SBOM** uses `layers:*` to skip work in <20 ms. > **Quota enforcement** increments `quota:` atomically; when {{ quota_token }} the API returns **429**. --- ## 3 MongoDB Collections (Optional) Only enabled when `MONGO_URI` is supplied (for long‑term audit). | Collection | Shape (summary) | Indexes | |--------------------|------------------------------------------------------------|-------------------------------------| | `sbom_history` | Wrapper JSON + `replaceTs` on overwrite | `{imageDigest}` `{created}` | | `policy_versions` | `{_id, yaml, rego, authorId, created}` | `{created}` | | `attestations` ⭑ | SLSA provenance doc + Rekor log pointer | `{imageDigest}` | | `audit_log` | Fully rendered RFC 5424 entries (UI & CLI actions) | `{userId}` `{ts}` | Schema detail for **policy_versions**: Samples live under `samples/api/scheduler/` (e.g., `schedule.json`, `run.json`, `impact-set.json`, `audit.json`) and mirror the canonical serializer output shown below. ```jsonc { "_id": "6619e90b8c5e1f76", "yaml": "version: 1.0\nrules:\n - …", "rego": null, // filled when Rego uploaded "authorId": "u_1021", "created": "2025-07-14T08:15:04Z", "comment": "Imported via API" } ``` ### 3.1 Scheduler Sprints 16 Artifacts **Collections.** `schedules`, `runs`, `impact_snapshots`, `audit` (module‑local). All documents reuse the canonical JSON emitted by `StellaOps.Scheduler.Models` so agents and fixtures remain deterministic. #### 3.1.1 Schedule (`schedules`) ```jsonc { "_id": "sch_20251018a", "tenantId": "tenant-alpha", "name": "Nightly Prod", "enabled": true, "cronExpression": "0 2 * * *", "timezone": "UTC", "mode": "analysis-only", "selection": { "scope": "by-namespace", "namespaces": ["team-a", "team-b"], "repositories": ["app/service-api"], "includeTags": ["canary", "prod"], "labels": [{"key": "env", "values": ["prod", "staging"]}], "resolvesTags": true }, "onlyIf": {"lastReportOlderThanDays": 7, "policyRevision": "policy@42"}, "notify": {"onNewFindings": true, "minSeverity": "high", "includeKev": true}, "limits": {"maxJobs": 1000, "ratePerSecond": 25, "parallelism": 4}, "subscribers": ["notify.ops"], "createdAt": "2025-10-18T22:00:00Z", "createdBy": "svc_scheduler", "updatedAt": "2025-10-18T22:00:00Z", "updatedBy": "svc_scheduler" } ``` *Constraints*: arrays are alphabetically sorted; `selection.tenantId` is optional but when present must match `tenantId`. Cron expressions are validated for newline/length, timezones are validated via `TimeZoneInfo`. #### 3.1.2 Run (`runs`) ```jsonc { "_id": "run_20251018_0001", "tenantId": "tenant-alpha", "scheduleId": "sch_20251018a", "trigger": "feedser", "state": "running", "stats": { "candidates": 1280, "deduped": 910, "queued": 624, "completed": 310, "deltas": 42, "newCriticals": 7, "newHigh": 11, "newMedium": 18, "newLow": 6 }, "reason": {"feedserExportId": "exp-20251018-03"}, "createdAt": "2025-10-18T22:03:14Z", "startedAt": "2025-10-18T22:03:20Z", "finishedAt": null, "error": null, "deltas": [ { "imageDigest": "sha256:a1b2c3", "newFindings": 3, "newCriticals": 1, "newHigh": 1, "newMedium": 1, "newLow": 0, "kevHits": ["CVE-2025-0002"], "topFindings": [ { "purl": "pkg:rpm/openssl@3.0.12-5.el9", "vulnerabilityId": "CVE-2025-0002", "severity": "critical", "link": "https://ui.internal/scans/sha256:a1b2c3" } ], "attestation": {"uuid": "rekor-314", "verified": true}, "detectedAt": "2025-10-18T22:03:21Z" } ] } ``` Counters are clamped to ≥0, timestamps are converted to UTC, and delta arrays are sorted (critical → info severity precedence, then vulnerability id). Missing `deltas` implies "no change" snapshots. #### 3.1.3 Impact Snapshot (`impact_snapshots`) ```jsonc { "selector": { "scope": "all-images", "tenantId": "tenant-alpha" }, "images": [ { "imageDigest": "sha256:f1e2d3", "registry": "registry.internal", "repository": "app/api", "namespaces": ["team-a"], "tags": ["prod"], "usedByEntrypoint": true, "labels": {"env": "prod"} } ], "usageOnly": true, "generatedAt": "2025-10-18T22:02:58Z", "total": 412, "snapshotId": "impact-20251018-1" } ``` Images are deduplicated and sorted by digest. Label keys are normalised to lowercase to avoid case‑sensitive duplicates during reconciliation. `snapshotId` enables run planners to compare subsequent snapshots for drift. #### 3.1.4 Audit (`audit`) ```jsonc { "_id": "audit_169754", "tenantId": "tenant-alpha", "category": "scheduler", "action": "pause", "occurredAt": "2025-10-18T22:10:00Z", "actor": {"actorId": "user_admin", "displayName": "Cluster Admin", "kind": "user"}, "scheduleId": "sch_20251018a", "correlationId": "corr-123", "metadata": {"details": "schedule paused", "reason": "maintenance"}, "message": "Paused via API" } ``` Metadata keys are lowercased, first‑writer wins (duplicates with different casing are ignored), and optional IDs (`scheduleId`, `runId`) are trimmed when empty. Use the canonical serializer when emitting events so audit digests remain reproducible. #### 3.1.5 Run Summary (`run_summaries`) Materialized view powering the Scheduler UI dashboards. Stores the latest roll-up per schedule/tenant, enabling quick “last run” banners and sparkline counters without scanning the full `runs` collection. ```jsonc { "tenantId": "tenant-alpha", "scheduleId": "sch_20251018a", "updatedAt": "2025-10-18T22:10:10Z", "lastRun": { "runId": "run_20251018_0001", "trigger": "feedser", "state": "completed", "createdAt": "2025-10-18T22:03:14Z", "startedAt": "2025-10-18T22:03:20Z", "finishedAt": "2025-10-18T22:08:45Z", "stats": { "candidates": 1280, "deduped": 910, "queued": 0, "completed": 910, "deltas": 42, "newCriticals": 7, "newHigh": 11, "newMedium": 18, "newLow": 6 }, "error": null }, "recent": [ { "runId": "run_20251018_0001", "trigger": "feedser", "state": "completed", "createdAt": "2025-10-18T22:03:14Z", "startedAt": "2025-10-18T22:03:20Z", "finishedAt": "2025-10-18T22:08:45Z", "stats": { "candidates": 1280, "deduped": 910, "queued": 0, "completed": 910, "deltas": 42, "newCriticals": 7, "newHigh": 11, "newMedium": 18, "newLow": 6 }, "error": null }, { "runId": "run_20251017_0003", "trigger": "cron", "state": "error", "createdAt": "2025-10-17T22:01:02Z", "startedAt": "2025-10-17T22:01:08Z", "finishedAt": "2025-10-17T22:04:11Z", "stats": { "candidates": 1040, "deduped": 812, "queued": 0, "completed": 640, "deltas": 18, "newCriticals": 2, "newHigh": 4, "newMedium": 7, "newLow": 3 }, "error": "scanner timeout" } ], "counters": { "total": 3, "planning": 0, "queued": 0, "running": 0, "completed": 1, "error": 1, "cancelled": 1, "totalDeltas": 60, "totalNewCriticals": 9, "totalNewHigh": 15, "totalNewMedium": 25, "totalNewLow": 9 } } ``` - `_id` combines `tenantId` and `scheduleId` (`tenant:schedule`). - `recent` contains the 20 most recent runs ordered by `createdAt` (UTC). Updates replace the existing entry for a run to respect state transitions. - `counters` aggregate over the retained window (20 runs) for quick trend indicators. Totals are recomputed after every update. - Schedulers should call the projection service after every run state change so the cache mirrors planner/runner progress. Sample file: `samples/api/scheduler/run-summary.json`. --- ## 4 Policy Schema (YAML v1.0) Minimal viable grammar (subset of OSV‑SCHEMA ideas). ```yaml version: "1.0" rules: - name: Block Critical severity: [Critical] action: block - name: Ignore Low Dev severity: [Low, None] environments: [dev, staging] action: ignore expires: "2026-01-01" - name: Escalate RegionalFeed High sources: [NVD, CNNVD, CNVD, ENISA, JVN, BDU] severity: [High, Critical] action: escalate ``` Validation is performed by `policy:mapping.yaml` JSON‑Schema embedded in backend. Canonical schema source: `src/Policy/__Libraries/StellaOps.Policy/Schemas/policy-schema@1.json` (embedded into `StellaOps.Policy`). `PolicyValidationCli` (see `src/Policy/__Libraries/StellaOps.Policy/PolicyValidationCli.cs`) provides the reusable command handler that the main CLI wires up; in the interim it can be invoked from a short host like: ```csharp await new PolicyValidationCli().RunAsync(new PolicyValidationCliOptions { Inputs = new[] { "policies/root.yaml" }, Strict = true, }); ``` ### 4.1 Rego Variant (Advanced – TODO) *Accepted but stored as‑is in `rego` field.* Evaluated via internal **OPA** side‑car once feature graduates from TODO list. ### 4.2 Policy Scoring Config (JSON) *Schema id.* `https://schemas.stella-ops.org/policy/policy-scoring-schema@1.json` *Source.* `src/Policy/__Libraries/StellaOps.Policy/Schemas/policy-scoring-schema@1.json` (embedded in `StellaOps.Policy`), default fixture at `src/Policy/__Libraries/StellaOps.Policy/Schemas/policy-scoring-default.json`. ```jsonc { "version": "1.0", "severityWeights": {"Critical": 90, "High": 75, "Unknown": 60, "...": 0}, "quietPenalty": 45, "warnPenalty": 15, "ignorePenalty": 35, "trustOverrides": {"vendor": 1.0, "distro": 0.85}, "reachabilityBuckets": {"entrypoint": 1.0, "direct": 0.85, "runtime": 0.45, "unknown": 0.5}, "unknownConfidence": { "initial": 0.8, "decayPerDay": 0.05, "floor": 0.2, "bands": [ {"name": "high", "min": 0.65}, {"name": "medium", "min": 0.35}, {"name": "low", "min": 0.0} ] } } ``` Validation occurs alongside policy binding (`PolicyScoringConfigBinder`), producing deterministic digests via `PolicyScoringConfigDigest`. Bands are ordered descending by `min` so consumers can resolve confidence tiers deterministically. Reachability buckets are case-insensitive keys (`entrypoint`, `direct`, `indirect`, `runtime`, `unreachable`, `unknown`) with numeric multipliers (default ≤1.0). **Runtime usage** - `trustOverrides` are matched against `finding.tags` (`trust:`) first, then `finding.source`/`finding.vendor`; missing keys default to `1.0`. - `reachabilityBuckets` consume `finding.tags` with prefix `reachability:` (fallback `usage:` or `unknown`). Missing buckets fall back to `unknown` weight when present, otherwise `1.0`. - Policy verdicts expose scoring inputs (`severityWeight`, `trustWeight`, `reachabilityWeight`, `baseScore`, penalties) plus unknown-state metadata (`unknownConfidence`, `unknownAgeDays`, `confidenceBand`) for auditability. See `samples/policy/policy-preview-unknown.json` and `samples/policy/policy-report-unknown.json` for offline reference payloads validated against the published schemas below. Validate the samples locally with **Ajv** before publishing changes: ```bash # install once per checkout (offline-safe): npm install --no-save ajv-cli@5 ajv-formats@2 npx ajv validate --spec=draft2020 -c ajv-formats \ -s docs/schemas/policy-preview-sample@1.json \ -d samples/policy/policy-preview-unknown.json npx ajv validate --spec=draft2020 -c ajv-formats \ -s docs/schemas/policy-report-sample@1.json \ -d samples/policy/policy-report-unknown.json ``` - Unknown confidence derives from `unknown-age-days:` (preferred) or `unknown-since:` + `observed-at:` tags; with no hints the engine keeps `initial` confidence. Values decay by `decayPerDay` down to `floor`, then resolve to the first matching `bands[].name`. --- ## 5 SLSA Attestation Schema ⭑ Planned for Q1‑2026 (kept here for early plug‑in authors). ```jsonc { "id": "prov_0291", "imageDigest": "sha256:e2b9…", "buildType": "https://slsa.dev/container/v1", "builder": { "id": "https://git.stella-ops.ru/ci/stella-runner@sha256:f7b7…" }, "metadata": { "invocation": { "parameters": {"GIT_SHA": "f6a1…"}, "buildStart": "2025-07-14T06:59:17Z", "buildEnd": "2025-07-14T07:01:22Z" }, "completeness": {"parameters": true} }, "materials": [ {"uri": "git+https://git…", "digest": {"sha1": "f6a1…"}} ], "rekorLogIndex": 99817 // entry in local Rekor mirror } ``` --- ## 6 Notify Foundations (Rule · Channel · Event) *Sprint 15 target* – canonically describe the Notify data shapes that UI, workers, and storage consume. JSON Schemas live under `docs/modules/notify/resources/schemas/` and deterministic fixtures under `docs/modules/notify/resources/samples/`. | Artifact | Schema | Sample | |----------|--------|--------| | **Rule** (catalogued routing logic) | `docs/modules/notify/resources/schemas/notify-rule@1.json` | `docs/modules/notify/resources/samples/notify-rule@1.sample.json` | | **Channel** (delivery endpoint definition) | `docs/modules/notify/resources/schemas/notify-channel@1.json` | `docs/modules/notify/resources/samples/notify-channel@1.sample.json` | | **Template** (rendering payload) | `docs/modules/notify/resources/schemas/notify-template@1.json` | `docs/modules/notify/resources/samples/notify-template@1.sample.json` | | **Event envelope** (Notify ingest surface) | `docs/modules/notify/resources/schemas/notify-event@1.json` | `docs/modules/notify/resources/samples/notify-event@1.sample.json` | ### 6.1 Rule highlights (`notify-rule@1`) * Keys are lower‑cased camelCase. `schemaVersion` (`notify.rule@1`), `ruleId`, `tenantId`, `name`, `match`, `actions`, `createdAt`, and `updatedAt` are mandatory. * `match.eventKinds`, `match.verdicts`, and other array selectors are pre‑sorted and case‑normalized (e.g. `scanner.report.ready`). * `actions[].throttle` serialises as ISO 8601 duration (`PT5M`), mirroring worker backoff guardrails. * `vex` gates let operators exclude accepted/not‑affected justifications; omit the block to inherit default behaviour. * Use `StellaOps.Notify.Models.NotifySchemaMigration.UpgradeRule(JsonNode)` when deserialising legacy payloads that might lack `schemaVersion` or retain older revisions. * Soft deletions persist `deletedAt` in Mongo (and disable the rule); repository queries automatically filter them. ### 6.2 Channel highlights (`notify-channel@1`) * `schemaVersion` is pinned to `notify.channel@1` and must accompany persisted documents. * `type` matches plug‑in identifiers (`slack`, `teams`, `email`, `webhook`, `custom`). * `config.secretRef` stores an external secret handle (Authority, Vault, K8s). Notify never persists raw credentials. * Optional `config.limits.timeout` uses ISO 8601 durations identical to rule throttles; concurrency/RPM defaults apply when absent. * `StellaOps.Notify.Models.NotifySchemaMigration.UpgradeChannel(JsonNode)` backfills the schema version when older documents omit it. * Channels share the same soft-delete marker (`deletedAt`) so operators can restore prior configuration without purging history. ### 6.3 Event envelope (`notify-event@1`) * Aligns with the platform event contract—`eventId` UUID, RFC 3339 `ts`, tenant isolation enforced. * Enumerated `kind` covers the initial Notify surface (`scanner.report.ready`, `scheduler.rescan.delta`, `zastava.admission`, etc.). * `scope.labels`/`scope.attributes` and top-level `attributes` mirror the metadata dictionaries workers surface for templating and audits. * Notify workers use the same migration helper to wrap event payloads before template rendering, so schema additions remain additive. ### 6.4 Template highlights (`notify-template@1`) * Carries the presentation key (`channelType`, `key`, `locale`) and the raw template body; `schemaVersion` is fixed to `notify.template@1`. * `renderMode` enumerates supported engines (`markdown`, `html`, `adaptiveCard`, `plainText`, `json`) aligning with `NotifyTemplateRenderMode`. * `format` signals downstream connector expectations (`slack`, `teams`, `email`, `webhook`, `json`). * Upgrade legacy definitions with `NotifySchemaMigration.UpgradeTemplate(JsonNode)` to auto-apply the new schema version and ordering. * Templates also record soft deletes via `deletedAt`; UI/API skip them by default while retaining revision history. **Validation loop:** ```bash # Validate Notify schemas and samples (matches Docs CI) for schema in docs/modules/notify/resources/schemas/*.json; do npx ajv compile -c ajv-formats -s "$schema" done for sample in docs/modules/notify/resources/samples/*.sample.json; do schema="docs/modules/notify/resources/schemas/$(basename "${sample%.sample.json}").json" npx ajv validate -c ajv-formats -s "$schema" -d "$sample" done ``` Integration tests can embed the sample fixtures to guarantee deterministic serialisation from the `StellaOps.Notify.Models` DTOs introduced in Sprint 15. --- ## 6 Validator Contracts * For SBOM wrapper – `ISbomValidator` (DLL plug‑in) must return *typed* error list. * For YAML policies – JSON‑Schema at `/schemas/policy‑v1.json`. * For Rego – OPA `opa eval --fail-defined` under the hood. * For **Free‑tier quotas** – `IQuotaService` integration tests ensure `quota:` resets at UTC midnight and produces correct `Retry‑After` headers. --- ## 7 Migration Notes 1. **Add `format` column** to existing SBOM wrappers; default to `trivy-json-v2`. 2. **Populate `layers` & `partial`** via backfill script (ship with `stellopsctl migrate` wizard). 3. Policy YAML previously stored in Redis → copy to Mongo if persistence enabled. 4. Prepare `attestations` collection (empty) – safe to create in advance. --- ## 8 Open Questions / Future Work * How to de‑duplicate *identical* Rego policies differing only in whitespace? * Embed *GOST 34.11‑2018* digests when users enable Russian crypto suite? * Should enterprise tiers share the same Redis quota keys or switch to JWT claim `tier != Free` bypass? * Evaluate sliding‑window quota instead of strict daily reset. * Consider rate‑limit for `/layers/missing` to avoid brute‑force enumeration. --- ## 9 Change Log | Date | Note | |------------|--------------------------------------------------------------------------------| | 2025‑07‑14 | **Added:** `format`, `partial`, delta cache keys, YAML policy schema v1.0. | | 2025‑07‑12 | **Initial public draft** – SBOM wrapper, Redis keyspace, audit collections. | ---