Files
git.stella-ops.org/docs/11_DATA_SCHEMAS.md
master 7b5bdcf4d3 feat(docs): Add comprehensive documentation for Vexer, Vulnerability Explorer, and Zastava modules
- Introduced AGENTS.md, README.md, TASKS.md, and implementation_plan.md for Vexer, detailing mission, responsibilities, key components, and operational notes.
- Established similar documentation structure for Vulnerability Explorer and Zastava modules, including their respective workflows, integrations, and observability notes.
- Created risk scoring profiles documentation outlining the core workflow, factor model, governance, and deliverables.
- Ensured all modules adhere to the Aggregation-Only Contract and maintain determinism and provenance in outputs.
2025-10-30 00:09:39 +02:00

22 KiB
Executable File
Raw Blame History

#Data Schemas & Persistence Contracts

Audience backend developers, plugin authors, DB admins.
Scope describes Redis, MongoDB (optional), and ondisk blob shapes that power StellaOps.


##0Document Conventions

  • CamelCase for JSON.
  • All timestamps are RFC 3339 / ISO 8601 with Z (UTC).
  •  = planned but not shipped yet (kept on Feature Matrix “To Do”).

##1SBOMWrapper Envelope

Every SBOM blob (regardless of format) is stored on disk or in object storage with a sidecar JSON file that indexes it for the scanners.

#### 1.1 JSON Shape

{
  "id": "sha256:417f…",          // digest of the SBOM *file* itself
  "imageDigest": "sha256:e2b9…", // digest of the original container image
  "created": "2025-07-14T07:02:13Z",
  "format": "trivy-json-v2",     // NEW enum: trivy-json-v2 | spdx-json | cyclonedx-json
  "layers": [
    "sha256:d38b…",              // layer digests (ordered)
    "sha256:af45…"
  ],
  "partial": false,              // true => delta SBOM (only some layers)
  "provenanceId": "prov_0291"    // ⭑ link to SLSA attestation (Q12026)
}

format NEW added to support multiple SBOM formats.
partial NEW true when generated via the delta SBOM flow (§1.3).

#### 1.2 Filesystem Layout

blobs/
 ├─ 417f…                # digest prefix
 │   ├─ sbom.json        # payload (any format)
 │   └─ sbom.meta.json   # wrapper (shape above)

Note

blob storage can point at S3, MinIO, or plain disk; driver plugins adapt.

####1.3Delta SBOM Extension

When partial: true, only the missing layers have been scanned.
Merging logic inside scanning module stitches new data onto the cached full SBOM in Redis.


##2Redis Keyspace

Key pattern Type TTL Purpose
scan:<digest> string Last scan JSON result (as returned by /scan)
layers:<digest> set 90d Layers already possessing SBOMs (delta cache)
policy:active string YAML or Rego ruleset
quota:<token> string until next UTC midnight Pertoken scan counter for Free tier ({{ quota_token }} scans).
policy:history list Change audit IDs (see Mongo)
feed:nvd:json string 24h Normalised feed snapshot
locator:<imageDigest> string 30d Maps image digest → sbomBlobId
metrics:… various Prom / OTLP runtime metrics

Delta SBOM uses layers:* to skip work in <20ms. Quota enforcement increments quota:<token> atomically; when {{ quota_token }} the API returns 429.


##3MongoDB Collections (Optional)

Only enabled when MONGO_URI is supplied (for longterm audit).

Collection Shape (summary) Indexes
sbom_history Wrapper JSON + replaceTs on overwrite {imageDigest} {created}
policy_versions {_id, yaml, rego, authorId, created} {created}
attestations SLSA provenance doc + Rekor log pointer {imageDigest}
audit_log Fully rendered RFC 5424 entries (UI & CLI actions) {userId} {ts}

Schema detail for policy_versions:

Samples live under samples/api/scheduler/ (e.g., schedule.json, run.json, impact-set.json, audit.json) and mirror the canonical serializer output shown below.

{
  "_id": "6619e90b8c5e1f76",
  "yaml": "version: 1.0\nrules:\n  - …",
  "rego": null,                    // filled when Rego uploaded
  "authorId": "u_1021",
  "created": "2025-07-14T08:15:04Z",
  "comment": "Imported via API"
}

###3.1Scheduler Sprints 16 Artifacts

Collections. schedules, runs, impact_snapshots, audit (modulelocal). All documents reuse the canonical JSON emitted by StellaOps.Scheduler.Models so agents and fixtures remain deterministic.

####3.1.1Schedule (schedules)

{
  "_id": "sch_20251018a",
  "tenantId": "tenant-alpha",
  "name": "Nightly Prod",
  "enabled": true,
  "cronExpression": "0 2 * * *",
  "timezone": "UTC",
  "mode": "analysis-only",
  "selection": {
    "scope": "by-namespace",
    "namespaces": ["team-a", "team-b"],
    "repositories": ["app/service-api"],
    "includeTags": ["canary", "prod"],
    "labels": [{"key": "env", "values": ["prod", "staging"]}],
    "resolvesTags": true
  },
  "onlyIf": {"lastReportOlderThanDays": 7, "policyRevision": "policy@42"},
  "notify": {"onNewFindings": true, "minSeverity": "high", "includeKev": true},
  "limits": {"maxJobs": 1000, "ratePerSecond": 25, "parallelism": 4},
  "subscribers": ["notify.ops"],
  "createdAt": "2025-10-18T22:00:00Z",
  "createdBy": "svc_scheduler",
  "updatedAt": "2025-10-18T22:00:00Z",
  "updatedBy": "svc_scheduler"
}

Constraints: arrays are alphabetically sorted; selection.tenantId is optional but when present must match tenantId. Cron expressions are validated for newline/length, timezones are validated via TimeZoneInfo.

####3.1.2Run (runs)

{
  "_id": "run_20251018_0001",
  "tenantId": "tenant-alpha",
  "scheduleId": "sch_20251018a",
  "trigger": "feedser",
  "state": "running",
  "stats": {
    "candidates": 1280,
    "deduped": 910,
    "queued": 624,
    "completed": 310,
    "deltas": 42,
    "newCriticals": 7,
    "newHigh": 11,
    "newMedium": 18,
    "newLow": 6
  },
  "reason": {"feedserExportId": "exp-20251018-03"},
  "createdAt": "2025-10-18T22:03:14Z",
  "startedAt": "2025-10-18T22:03:20Z",
  "finishedAt": null,
  "error": null,
  "deltas": [
    {
      "imageDigest": "sha256:a1b2c3",
      "newFindings": 3,
      "newCriticals": 1,
      "newHigh": 1,
      "newMedium": 1,
      "newLow": 0,
      "kevHits": ["CVE-2025-0002"],
      "topFindings": [
        {
          "purl": "pkg:rpm/openssl@3.0.12-5.el9",
          "vulnerabilityId": "CVE-2025-0002",
          "severity": "critical",
          "link": "https://ui.internal/scans/sha256:a1b2c3"
        }
      ],
      "attestation": {"uuid": "rekor-314", "verified": true},
      "detectedAt": "2025-10-18T22:03:21Z"
    }
  ]
}

Counters are clamped to ≥0, timestamps are converted to UTC, and delta arrays are sorted (critical → info severity precedence, then vulnerability id). Missing deltas implies "no change" snapshots.

####3.1.3Impact Snapshot (impact_snapshots)

{
  "selector": {
    "scope": "all-images",
    "tenantId": "tenant-alpha"
  },
  "images": [
    {
      "imageDigest": "sha256:f1e2d3",
      "registry": "registry.internal",
      "repository": "app/api",
      "namespaces": ["team-a"],
      "tags": ["prod"],
      "usedByEntrypoint": true,
      "labels": {"env": "prod"}
    }
  ],
  "usageOnly": true,
  "generatedAt": "2025-10-18T22:02:58Z",
  "total": 412,
  "snapshotId": "impact-20251018-1"
}

Images are deduplicated and sorted by digest. Label keys are normalised to lowercase to avoid casesensitive duplicates during reconciliation. snapshotId enables run planners to compare subsequent snapshots for drift.

####3.1.4Audit (audit)

{
  "_id": "audit_169754",
  "tenantId": "tenant-alpha",
  "category": "scheduler",
  "action": "pause",
  "occurredAt": "2025-10-18T22:10:00Z",
  "actor": {"actorId": "user_admin", "displayName": "Cluster Admin", "kind": "user"},
  "scheduleId": "sch_20251018a",
  "correlationId": "corr-123",
  "metadata": {"details": "schedule paused", "reason": "maintenance"},
  "message": "Paused via API"
}

Metadata keys are lowercased, firstwriter wins (duplicates with different casing are ignored), and optional IDs (scheduleId, runId) are trimmed when empty. Use the canonical serializer when emitting events so audit digests remain reproducible.

####3.1.5Run Summary (run_summaries)

Materialized view powering the Scheduler UI dashboards. Stores the latest roll-up per schedule/tenant, enabling quick “last run” banners and sparkline counters without scanning the full runs collection.

{
  "tenantId": "tenant-alpha",
  "scheduleId": "sch_20251018a",
  "updatedAt": "2025-10-18T22:10:10Z",
  "lastRun": {
    "runId": "run_20251018_0001",
    "trigger": "feedser",
    "state": "completed",
    "createdAt": "2025-10-18T22:03:14Z",
    "startedAt": "2025-10-18T22:03:20Z",
    "finishedAt": "2025-10-18T22:08:45Z",
    "stats": {
      "candidates": 1280,
      "deduped": 910,
      "queued": 0,
      "completed": 910,
      "deltas": 42,
      "newCriticals": 7,
      "newHigh": 11,
      "newMedium": 18,
      "newLow": 6
    },
    "error": null
  },
  "recent": [
    {
      "runId": "run_20251018_0001",
      "trigger": "feedser",
      "state": "completed",
      "createdAt": "2025-10-18T22:03:14Z",
      "startedAt": "2025-10-18T22:03:20Z",
      "finishedAt": "2025-10-18T22:08:45Z",
      "stats": {
        "candidates": 1280,
        "deduped": 910,
        "queued": 0,
        "completed": 910,
        "deltas": 42,
        "newCriticals": 7,
        "newHigh": 11,
        "newMedium": 18,
        "newLow": 6
      },
      "error": null
    },
    {
      "runId": "run_20251017_0003",
      "trigger": "cron",
      "state": "error",
      "createdAt": "2025-10-17T22:01:02Z",
      "startedAt": "2025-10-17T22:01:08Z",
      "finishedAt": "2025-10-17T22:04:11Z",
      "stats": {
        "candidates": 1040,
        "deduped": 812,
        "queued": 0,
        "completed": 640,
        "deltas": 18,
        "newCriticals": 2,
        "newHigh": 4,
        "newMedium": 7,
        "newLow": 3
      },
      "error": "scanner timeout"
    }
  ],
  "counters": {
    "total": 3,
    "planning": 0,
    "queued": 0,
    "running": 0,
    "completed": 1,
    "error": 1,
    "cancelled": 1,
    "totalDeltas": 60,
    "totalNewCriticals": 9,
    "totalNewHigh": 15,
    "totalNewMedium": 25,
    "totalNewLow": 9
  }
}
  • _id combines tenantId and scheduleId (tenant:schedule).
  • recent contains the 20 most recent runs ordered by createdAt (UTC). Updates replace the existing entry for a run to respect state transitions.
  • counters aggregate over the retained window (20 runs) for quick trend indicators. Totals are recomputed after every update.
  • Schedulers should call the projection service after every run state change so the cache mirrors planner/runner progress.

Sample file: samples/api/scheduler/run-summary.json.


##4Policy Schema (YAML v1.0)

Minimal viable grammar (subset of OSVSCHEMA ideas).

version: "1.0"
rules:
  - name: Block Critical
    severity: [Critical]
    action: block
  - name: Ignore Low Dev
    severity: [Low, None]
    environments: [dev, staging]
    action: ignore
    expires: "2026-01-01"
  - name: Escalate RegionalFeed High
    sources: [NVD, CNNVD, CNVD, ENISA, JVN, BDU]
    severity: [High, Critical]
    action: escalate

Validation is performed by policy:mapping.yaml JSONSchema embedded in backend.

Canonical schema source: src/Policy/__Libraries/StellaOps.Policy/Schemas/policy-schema@1.json (embedded into StellaOps.Policy).
PolicyValidationCli (see src/Policy/__Libraries/StellaOps.Policy/PolicyValidationCli.cs) provides the reusable command handler that the main CLI wires up; in the interim it can be invoked from a short host like:

await new PolicyValidationCli().RunAsync(new PolicyValidationCliOptions
{
    Inputs = new[] { "policies/root.yaml" },
    Strict = true,
});

###4.1Rego Variant (Advanced  TODO)

Accepted but stored asis in rego field.
Evaluated via internal OPA sidecar once feature graduates from TODO list.

###4.2Policy Scoring Config (JSON)

Schema id. https://schemas.stella-ops.org/policy/policy-scoring-schema@1.json
Source. src/Policy/__Libraries/StellaOps.Policy/Schemas/policy-scoring-schema@1.json (embedded in StellaOps.Policy), default fixture at src/Policy/__Libraries/StellaOps.Policy/Schemas/policy-scoring-default.json.

{
  "version": "1.0",
  "severityWeights": {"Critical": 90, "High": 75, "Unknown": 60, "...": 0},
  "quietPenalty": 45,
  "warnPenalty": 15,
  "ignorePenalty": 35,
  "trustOverrides": {"vendor": 1.0, "distro": 0.85},
  "reachabilityBuckets": {"entrypoint": 1.0, "direct": 0.85, "runtime": 0.45, "unknown": 0.5},
  "unknownConfidence": {
    "initial": 0.8,
    "decayPerDay": 0.05,
    "floor": 0.2,
    "bands": [
      {"name": "high", "min": 0.65},
      {"name": "medium", "min": 0.35},
      {"name": "low", "min": 0.0}
    ]
  }
}

Validation occurs alongside policy binding (PolicyScoringConfigBinder), producing deterministic digests via PolicyScoringConfigDigest. Bands are ordered descending by min so consumers can resolve confidence tiers deterministically. Reachability buckets are case-insensitive keys (entrypoint, direct, indirect, runtime, unreachable, unknown) with numeric multipliers (default ≤1.0).

Runtime usage

  • trustOverrides are matched against finding.tags (trust:<key>) first, then finding.source/finding.vendor; missing keys default to 1.0.
  • reachabilityBuckets consume finding.tags with prefix reachability: (fallback usage: or unknown). Missing buckets fall back to unknown weight when present, otherwise 1.0.
  • Policy verdicts expose scoring inputs (severityWeight, trustWeight, reachabilityWeight, baseScore, penalties) plus unknown-state metadata (unknownConfidence, unknownAgeDays, confidenceBand) for auditability. See samples/policy/policy-preview-unknown.json and samples/policy/policy-report-unknown.json for offline reference payloads validated against the published schemas below.

Validate the samples locally with Ajv before publishing changes:

# install once per checkout (offline-safe):
npm install --no-save ajv-cli@5 ajv-formats@2

npx ajv validate --spec=draft2020 -c ajv-formats \
  -s docs/schemas/policy-preview-sample@1.json \
  -d samples/policy/policy-preview-unknown.json

npx ajv validate --spec=draft2020 -c ajv-formats \
  -s docs/schemas/policy-report-sample@1.json \
  -d samples/policy/policy-report-unknown.json
  • Unknown confidence derives from unknown-age-days: (preferred) or unknown-since: + observed-at: tags; with no hints the engine keeps initial confidence. Values decay by decayPerDay down to floor, then resolve to the first matching bands[].name.

##5SLSA Attestation Schema ⭑

Planned for Q12026 (kept here for early plugin authors).

{
  "id": "prov_0291",
  "imageDigest": "sha256:e2b9…",
  "buildType": "https://slsa.dev/container/v1",
  "builder": {
    "id": "https://git.stella-ops.ru/ci/stella-runner@sha256:f7b7…"
  },
  "metadata": {
    "invocation": {
      "parameters": {"GIT_SHA": "f6a1…"},
      "buildStart": "2025-07-14T06:59:17Z",
      "buildEnd": "2025-07-14T07:01:22Z"
    },
    "completeness": {"parameters": true}
  },
  "materials": [
    {"uri": "git+https://git…", "digest": {"sha1": "f6a1…"}}
  ],
  "rekorLogIndex": 99817    // entry in local Rekor mirror
}

##6NotifyFoundations (Rule·Channel·Event)

Sprint 15 target canonically describe the Notify data shapes that UI, workers, and storage consume. JSON Schemas live under docs/modules/notify/resources/schemas/ and deterministic fixtures under docs/modules/notify/resources/samples/.

Artifact Schema Sample
Rule (catalogued routing logic) docs/modules/notify/resources/schemas/notify-rule@1.json docs/modules/notify/resources/samples/notify-rule@1.sample.json
Channel (delivery endpoint definition) docs/modules/notify/resources/schemas/notify-channel@1.json docs/modules/notify/resources/samples/notify-channel@1.sample.json
Template (rendering payload) docs/modules/notify/resources/schemas/notify-template@1.json docs/modules/notify/resources/samples/notify-template@1.sample.json
Event envelope (Notify ingest surface) docs/modules/notify/resources/schemas/notify-event@1.json docs/modules/notify/resources/samples/notify-event@1.sample.json

###6.1Rule highlights (notify-rule@1)

  • Keys are lowercased camelCase. schemaVersion (notify.rule@1), ruleId, tenantId, name, match, actions, createdAt, and updatedAt are mandatory.
  • match.eventKinds, match.verdicts, and other array selectors are presorted and casenormalized (e.g. scanner.report.ready).
  • actions[].throttle serialises as ISO8601 duration (PT5M), mirroring worker backoff guardrails.
  • vex gates let operators exclude accepted/notaffected justifications; omit the block to inherit default behaviour.
  • Use StellaOps.Notify.Models.NotifySchemaMigration.UpgradeRule(JsonNode) when deserialising legacy payloads that might lack schemaVersion or retain older revisions.
  • Soft deletions persist deletedAt in Mongo (and disable the rule); repository queries automatically filter them.

###6.2Channel highlights (notify-channel@1)

  • schemaVersion is pinned to notify.channel@1 and must accompany persisted documents.
  • type matches plugin identifiers (slack, teams, email, webhook, custom).
  • config.secretRef stores an external secret handle (Authority, Vault, K8s). Notify never persists raw credentials.
  • Optional config.limits.timeout uses ISO8601 durations identical to rule throttles; concurrency/RPM defaults apply when absent.
  • StellaOps.Notify.Models.NotifySchemaMigration.UpgradeChannel(JsonNode) backfills the schema version when older documents omit it.
  • Channels share the same soft-delete marker (deletedAt) so operators can restore prior configuration without purging history.

###6.3Event envelope (notify-event@1)

  • Aligns with the platform event contract—eventId UUID, RFC3339 ts, tenant isolation enforced.
  • Enumerated kind covers the initial Notify surface (scanner.report.ready, scheduler.rescan.delta, zastava.admission, etc.).
  • scope.labels/scope.attributes and top-level attributes mirror the metadata dictionaries workers surface for templating and audits.
  • Notify workers use the same migration helper to wrap event payloads before template rendering, so schema additions remain additive.

###6.4Template highlights (notify-template@1)

  • Carries the presentation key (channelType, key, locale) and the raw template body; schemaVersion is fixed to notify.template@1.
  • renderMode enumerates supported engines (markdown, html, adaptiveCard, plainText, json) aligning with NotifyTemplateRenderMode.
  • format signals downstream connector expectations (slack, teams, email, webhook, json).
  • Upgrade legacy definitions with NotifySchemaMigration.UpgradeTemplate(JsonNode) to auto-apply the new schema version and ordering.
  • Templates also record soft deletes via deletedAt; UI/API skip them by default while retaining revision history.

Validation loop:

# Validate Notify schemas and samples (matches Docs CI)
for schema in docs/modules/notify/resources/schemas/*.json; do
  npx ajv compile -c ajv-formats -s "$schema"
done

for sample in docs/modules/notify/resources/samples/*.sample.json; do
  schema="docs/modules/notify/resources/schemas/$(basename "${sample%.sample.json}").json"
  npx ajv validate -c ajv-formats -s "$schema" -d "$sample"
done

Integration tests can embed the sample fixtures to guarantee deterministic serialisation from the StellaOps.Notify.Models DTOs introduced in Sprint15.


##6Validator Contracts

  • For SBOM wrapper ISbomValidator (DLL plugin) must return typed error list.
  • For YAML policies JSONSchema at /schemas/policyv1.json.
  • For Rego OPA opa eval --fail-defined under the hood.
  • For Freetier quotas IQuotaService integration tests ensure quota:<token> resets at UTC midnight and produces correct RetryAfter headers.

##7Migration Notes

  1. Add format column to existing SBOM wrappers; default to trivy-json-v2.
  2. Populate layers & partial via backfill script (ship with stellopsctl migrate wizard).
  3. Policy YAML previously stored in Redis → copy to Mongo if persistence enabled.
  4. Prepare attestations collection (empty) safe to create in advance.

##8Open Questions / Future Work

  • How to deduplicate identical Rego policies differing only in whitespace?
  • Embed GOST 34.112018 digests when users enable Russian crypto suite?
  • Should enterprise tiers share the same Redis quota keys or switch to JWT claimtier != Free bypass?
  • Evaluate slidingwindow quota instead of strict daily reset.
  • Consider ratelimit for /layers/missing to avoid bruteforce enumeration.

##9Change Log

Date Note
20250714 Added: format, partial, delta cache keys, YAML policy schema v1.0.
20250712 Initial public draft SBOM wrapper, Redis keyspace, audit collections.