Files
git.stella-ops.org/docs/11_DATA_SCHEMAS.md
master 5ce40d2eeb feat: Initialize Zastava Webhook service with TLS and Authority authentication
- Added Program.cs to set up the web application with Serilog for logging, health check endpoints, and a placeholder admission endpoint.
- Configured Kestrel server to use TLS 1.3 and handle client certificates appropriately.
- Created StellaOps.Zastava.Webhook.csproj with necessary dependencies including Serilog and Polly.
- Documented tasks in TASKS.md for the Zastava Webhook project, outlining current work and exit criteria for each task.
2025-10-19 18:36:22 +03:00

441 lines
18 KiB
Markdown
Executable File
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#Data Schemas & Persistence Contracts
*Audience* backend developers, plugin authors, DB admins.
*Scope* describes **Redis**, **MongoDB** (optional), and ondisk blob shapes that power StellaOps.
---
##0Document Conventions
* **CamelCase** for JSON.
* All timestamps are **RFC 3339 / ISO 8601** with `Z` (UTC).
* `⭑` = planned but *not* shipped yet (kept on Feature Matrix “To Do”).
---
##1SBOMWrapper Envelope
Every SBOM blob (regardless of format) is stored on disk or in object storage with a *sidecar* JSON file that indexes it for the scanners.
#### 1.1 JSON Shape
```jsonc
{
"id": "sha256:417f…", // digest of the SBOM *file* itself
"imageDigest": "sha256:e2b9…", // digest of the original container image
"created": "2025-07-14T07:02:13Z",
"format": "trivy-json-v2", // NEW enum: trivy-json-v2 | spdx-json | cyclonedx-json
"layers": [
"sha256:d38b…", // layer digests (ordered)
"sha256:af45…"
],
"partial": false, // true => delta SBOM (only some layers)
"provenanceId": "prov_0291" // ⭑ link to SLSA attestation (Q12026)
}
```
*`format`* **NEW** added to support **multiple SBOM formats**.
*`partial`* **NEW** true when generated via the **delta SBOM** flow (§1.3).
#### 1.2 Filesystem Layout
```
blobs/
├─ 417f… # digest prefix
│   ├─ sbom.json # payload (any format)
│   └─ sbom.meta.json # wrapper (shape above)
```
> **Note** blob storage can point at S3, MinIO, or plain disk; driver plugins adapt.
####1.3Delta SBOM Extension
When `partial: true`, *only* the missing layers have been scanned.
Merging logic inside `scanning` module stitches new data onto the cached full SBOM in Redis.
---
##2Redis Keyspace
| Key pattern | Type | TTL | Purpose |
|-------------------------------------|---------|------|--------------------------------------------------|
| `scan:<digest>` | string | ∞ | Last scan JSON result (as returned by `/scan`) |
| `layers:<digest>` | set | 90d | Layers already possessing SBOMs (delta cache) |
| `policy:active` | string | ∞ | YAML **or** Rego ruleset |
| `quota:<token>` | string | *until next UTC midnight* | Pertoken scan counter for Free tier ({{ quota_token }} scans). |
| `policy:history` | list | ∞ | Change audit IDs (see Mongo) |
| `feed:nvd:json` | string | 24h | Normalised feed snapshot |
| `locator:<imageDigest>` | string | 30d | Maps image digest → sbomBlobId |
| `metrics:…` | various | — | Prom / OTLP runtime metrics |
> **Delta SBOM** uses `layers:*` to skip work in <20ms.
> **Quota enforcement** increments `quota:<token>` atomically; when {{ quota_token }} the API returns **429**.
---
##3MongoDB Collections (Optional)
Only enabled when `MONGO_URI` is supplied (for longterm audit).
| Collection | Shape (summary) | Indexes |
|--------------------|------------------------------------------------------------|-------------------------------------|
| `sbom_history` | Wrapper JSON + `replaceTs` on overwrite | `{imageDigest}` `{created}` |
| `policy_versions` | `{_id, yaml, rego, authorId, created}` | `{created}` |
| `attestations` ⭑ | SLSA provenance doc + Rekor log pointer | `{imageDigest}` |
| `audit_log` | Fully rendered RFC 5424 entries (UI & CLI actions) | `{userId}` `{ts}` |
Schema detail for **policy_versions**:
Samples live under `samples/api/scheduler/` (e.g., `schedule.json`, `run.json`, `impact-set.json`, `audit.json`) and mirror the canonical serializer output shown below.
```jsonc
{
"_id": "6619e90b8c5e1f76",
"yaml": "version: 1.0\nrules:\n - …",
"rego": null, // filled when Rego uploaded
"authorId": "u_1021",
"created": "2025-07-14T08:15:04Z",
"comment": "Imported via API"
}
```
###3.1Scheduler Sprints 16 Artifacts
**Collections.** `schedules`, `runs`, `impact_snapshots`, `audit` (modulelocal). All documents reuse the canonical JSON emitted by `StellaOps.Scheduler.Models` so agents and fixtures remain deterministic.
####3.1.1Schedule (`schedules`)
```jsonc
{
"_id": "sch_20251018a",
"tenantId": "tenant-alpha",
"name": "Nightly Prod",
"enabled": true,
"cronExpression": "0 2 * * *",
"timezone": "UTC",
"mode": "analysis-only",
"selection": {
"scope": "by-namespace",
"namespaces": ["team-a", "team-b"],
"repositories": ["app/service-api"],
"includeTags": ["canary", "prod"],
"labels": [{"key": "env", "values": ["prod", "staging"]}],
"resolvesTags": true
},
"onlyIf": {"lastReportOlderThanDays": 7, "policyRevision": "policy@42"},
"notify": {"onNewFindings": true, "minSeverity": "high", "includeKev": true},
"limits": {"maxJobs": 1000, "ratePerSecond": 25, "parallelism": 4},
"subscribers": ["notify.ops"],
"createdAt": "2025-10-18T22:00:00Z",
"createdBy": "svc_scheduler",
"updatedAt": "2025-10-18T22:00:00Z",
"updatedBy": "svc_scheduler"
}
```
*Constraints*: arrays are alphabetically sorted; `selection.tenantId` is optional but when present must match `tenantId`. Cron expressions are validated for newline/length, timezones are validated via `TimeZoneInfo`.
####3.1.2Run (`runs`)
```jsonc
{
"_id": "run_20251018_0001",
"tenantId": "tenant-alpha",
"scheduleId": "sch_20251018a",
"trigger": "feedser",
"state": "running",
"stats": {
"candidates": 1280,
"deduped": 910,
"queued": 624,
"completed": 310,
"deltas": 42,
"newCriticals": 7,
"newHigh": 11,
"newMedium": 18,
"newLow": 6
},
"reason": {"feedserExportId": "exp-20251018-03"},
"createdAt": "2025-10-18T22:03:14Z",
"startedAt": "2025-10-18T22:03:20Z",
"finishedAt": null,
"error": null,
"deltas": [
{
"imageDigest": "sha256:a1b2c3",
"newFindings": 3,
"newCriticals": 1,
"newHigh": 1,
"newMedium": 1,
"newLow": 0,
"kevHits": ["CVE-2025-0002"],
"topFindings": [
{
"purl": "pkg:rpm/openssl@3.0.12-5.el9",
"vulnerabilityId": "CVE-2025-0002",
"severity": "critical",
"link": "https://ui.internal/scans/sha256:a1b2c3"
}
],
"attestation": {"uuid": "rekor-314", "verified": true},
"detectedAt": "2025-10-18T22:03:21Z"
}
]
}
```
Counters are clamped to ≥0, timestamps are converted to UTC, and delta arrays are sorted (critical → info severity precedence, then vulnerability id). Missing `deltas` implies "no change" snapshots.
####3.1.3Impact Snapshot (`impact_snapshots`)
```jsonc
{
"selector": {
"scope": "all-images",
"tenantId": "tenant-alpha"
},
"images": [
{
"imageDigest": "sha256:f1e2d3",
"registry": "registry.internal",
"repository": "app/api",
"namespaces": ["team-a"],
"tags": ["prod"],
"usedByEntrypoint": true,
"labels": {"env": "prod"}
}
],
"usageOnly": true,
"generatedAt": "2025-10-18T22:02:58Z",
"total": 412,
"snapshotId": "impact-20251018-1"
}
```
Images are deduplicated and sorted by digest. Label keys are normalised to lowercase to avoid casesensitive duplicates during reconciliation. `snapshotId` enables run planners to compare subsequent snapshots for drift.
####3.1.4Audit (`audit`)
```jsonc
{
"_id": "audit_169754",
"tenantId": "tenant-alpha",
"category": "scheduler",
"action": "pause",
"occurredAt": "2025-10-18T22:10:00Z",
"actor": {"actorId": "user_admin", "displayName": "Cluster Admin", "kind": "user"},
"scheduleId": "sch_20251018a",
"correlationId": "corr-123",
"metadata": {"details": "schedule paused", "reason": "maintenance"},
"message": "Paused via API"
}
```
Metadata keys are lowercased, firstwriter wins (duplicates with different casing are ignored), and optional IDs (`scheduleId`, `runId`) are trimmed when empty. Use the canonical serializer when emitting events so audit digests remain reproducible.
---
##4Policy Schema (YAML v1.0)
Minimal viable grammar (subset of OSVSCHEMA ideas).
```yaml
version: "1.0"
rules:
- name: Block Critical
severity: [Critical]
action: block
- name: Ignore Low Dev
severity: [Low, None]
environments: [dev, staging]
action: ignore
expires: "2026-01-01"
- name: Escalate RegionalFeed High
sources: [NVD, CNNVD, CNVD, ENISA, JVN, BDU]
severity: [High, Critical]
action: escalate
```
Validation is performed by `policy:mapping.yaml` JSONSchema embedded in backend.
Canonical schema source: `src/StellaOps.Policy/Schemas/policy-schema@1.json` (embedded into `StellaOps.Policy`).
`PolicyValidationCli` (see `src/StellaOps.Policy/PolicyValidationCli.cs`) provides the reusable command handler that the main CLI wires up; in the interim it can be invoked from a short host like:
```csharp
await new PolicyValidationCli().RunAsync(new PolicyValidationCliOptions
{
Inputs = new[] { "policies/root.yaml" },
Strict = true,
});
```
###4.1Rego Variant (Advanced  TODO)
*Accepted but stored asis in `rego` field.*
Evaluated via internal **OPA** sidecar once feature graduates from TODO list.
###4.2Policy Scoring Config (JSON)
*Schema id.* `https://schemas.stella-ops.org/policy/policy-scoring-schema@1.json`
*Source.* `src/StellaOps.Policy/Schemas/policy-scoring-schema@1.json` (embedded in `StellaOps.Policy`), default fixture at `src/StellaOps.Policy/Schemas/policy-scoring-default.json`.
```jsonc
{
"version": "1.0",
"severityWeights": {"Critical": 90, "High": 75, "Unknown": 60, "...": 0},
"quietPenalty": 45,
"warnPenalty": 15,
"ignorePenalty": 35,
"trustOverrides": {"vendor": 1.0, "distro": 0.85},
"reachabilityBuckets": {"entrypoint": 1.0, "direct": 0.85, "runtime": 0.45, "unknown": 0.5},
"unknownConfidence": {
"initial": 0.8,
"decayPerDay": 0.05,
"floor": 0.2,
"bands": [
{"name": "high", "min": 0.65},
{"name": "medium", "min": 0.35},
{"name": "low", "min": 0.0}
]
}
}
```
Validation occurs alongside policy binding (`PolicyScoringConfigBinder`), producing deterministic digests via `PolicyScoringConfigDigest`. Bands are ordered descending by `min` so consumers can resolve confidence tiers deterministically. Reachability buckets are case-insensitive keys (`entrypoint`, `direct`, `indirect`, `runtime`, `unreachable`, `unknown`) with numeric multipliers (default ≤1.0).
**Runtime usage**
- `trustOverrides` are matched against `finding.tags` (`trust:<key>`) first, then `finding.source`/`finding.vendor`; missing keys default to `1.0`.
- `reachabilityBuckets` consume `finding.tags` with prefix `reachability:` (fallback `usage:` or `unknown`). Missing buckets fall back to `unknown` weight when present, otherwise `1.0`.
- Policy verdicts expose scoring inputs (`severityWeight`, `trustWeight`, `reachabilityWeight`, `baseScore`, penalties) plus unknown-state metadata (`unknownConfidence`, `unknownAgeDays`, `confidenceBand`) for auditability. See `samples/policy/policy-preview-unknown.json` for an end-to-end preview payload.
- Unknown confidence derives from `unknown-age-days:` (preferred) or `unknown-since:` + `observed-at:` tags; with no hints the engine keeps `initial` confidence. Values decay by `decayPerDay` down to `floor`, then resolve to the first matching `bands[].name`.
---
##5SLSA Attestation Schema 
Planned for Q12026 (kept here for early plugin authors).
```jsonc
{
"id": "prov_0291",
"imageDigest": "sha256:e2b9…",
"buildType": "https://slsa.dev/container/v1",
"builder": {
"id": "https://git.stella-ops.ru/ci/stella-runner@sha256:f7b7…"
},
"metadata": {
"invocation": {
"parameters": {"GIT_SHA": "f6a1…"},
"buildStart": "2025-07-14T06:59:17Z",
"buildEnd": "2025-07-14T07:01:22Z"
},
"completeness": {"parameters": true}
},
"materials": [
{"uri": "git+https://git…", "digest": {"sha1": "f6a1…"}}
],
"rekorLogIndex": 99817 // entry in local Rekor mirror
}
```
---
##6NotifyFoundations (Rule·Channel·Event)
*Sprint 15 target* canonically describe the Notify data shapes that UI, workers, and storage consume. JSON Schemas live under `docs/notify/schemas/` and deterministic fixtures under `docs/notify/samples/`.
| Artifact | Schema | Sample |
|----------|--------|--------|
| **Rule** (catalogued routing logic) | `docs/notify/schemas/notify-rule@1.json` | `docs/notify/samples/notify-rule@1.sample.json` |
| **Channel** (delivery endpoint definition) | `docs/notify/schemas/notify-channel@1.json` | `docs/notify/samples/notify-channel@1.sample.json` |
| **Template** (rendering payload) | `docs/notify/schemas/notify-template@1.json` | `docs/notify/samples/notify-template@1.sample.json` |
| **Event envelope** (Notify ingest surface) | `docs/notify/schemas/notify-event@1.json` | `docs/notify/samples/notify-event@1.sample.json` |
###6.1Rule highlights (`notify-rule@1`)
* Keys are lowercased camelCase. `schemaVersion` (`notify.rule@1`), `ruleId`, `tenantId`, `name`, `match`, `actions`, `createdAt`, and `updatedAt` are mandatory.
* `match.eventKinds`, `match.verdicts`, and other array selectors are presorted and casenormalized (e.g. `scanner.report.ready`).
* `actions[].throttle` serialises as ISO8601 duration (`PT5M`), mirroring worker backoff guardrails.
* `vex` gates let operators exclude accepted/notaffected justifications; omit the block to inherit default behaviour.
* Use `StellaOps.Notify.Models.NotifySchemaMigration.UpgradeRule(JsonNode)` when deserialising legacy payloads that might lack `schemaVersion` or retain older revisions.
* Soft deletions persist `deletedAt` in Mongo (and disable the rule); repository queries automatically filter them.
###6.2Channel highlights (`notify-channel@1`)
* `schemaVersion` is pinned to `notify.channel@1` and must accompany persisted documents.
* `type` matches plugin identifiers (`slack`, `teams`, `email`, `webhook`, `custom`).
* `config.secretRef` stores an external secret handle (Authority, Vault, K8s). Notify never persists raw credentials.
* Optional `config.limits.timeout` uses ISO8601 durations identical to rule throttles; concurrency/RPM defaults apply when absent.
* `StellaOps.Notify.Models.NotifySchemaMigration.UpgradeChannel(JsonNode)` backfills the schema version when older documents omit it.
* Channels share the same soft-delete marker (`deletedAt`) so operators can restore prior configuration without purging history.
###6.3Event envelope (`notify-event@1`)
* Aligns with the platform event contract—`eventId` UUID, RFC3339 `ts`, tenant isolation enforced.
* Enumerated `kind` covers the initial Notify surface (`scanner.report.ready`, `scheduler.rescan.delta`, `zastava.admission`, etc.).
* `scope.labels`/`scope.attributes` and top-level `attributes` mirror the metadata dictionaries workers surface for templating and audits.
* Notify workers use the same migration helper to wrap event payloads before template rendering, so schema additions remain additive.
###6.4Template highlights (`notify-template@1`)
* Carries the presentation key (`channelType`, `key`, `locale`) and the raw template body; `schemaVersion` is fixed to `notify.template@1`.
* `renderMode` enumerates supported engines (`markdown`, `html`, `adaptiveCard`, `plainText`, `json`) aligning with `NotifyTemplateRenderMode`.
* `format` signals downstream connector expectations (`slack`, `teams`, `email`, `webhook`, `json`).
* Upgrade legacy definitions with `NotifySchemaMigration.UpgradeTemplate(JsonNode)` to auto-apply the new schema version and ordering.
* Templates also record soft deletes via `deletedAt`; UI/API skip them by default while retaining revision history.
**Validation loop:**
```bash
# Validate Notify schemas and samples (matches Docs CI)
for schema in docs/notify/schemas/*.json; do
npx ajv compile -c ajv-formats -s "$schema"
done
for sample in docs/notify/samples/*.sample.json; do
schema="docs/notify/schemas/$(basename "${sample%.sample.json}").json"
npx ajv validate -c ajv-formats -s "$schema" -d "$sample"
done
```
Integration tests can embed the sample fixtures to guarantee deterministic serialisation from the `StellaOps.Notify.Models` DTOs introduced in Sprint15.
---
##6Validator Contracts
* For SBOM wrapper `ISbomValidator` (DLL plugin) must return *typed* error list.
* For YAML policies JSONSchema at `/schemas/policyv1.json`.
* For Rego OPA `opa eval --fail-defined` under the hood.
* For **Freetier quotas** `IQuotaService` integration tests ensure `quota:<token>` resets at UTC midnight and produces correct `RetryAfter` headers.
---
##7Migration Notes
1. **Add `format` column** to existing SBOM wrappers; default to `trivy-json-v2`.
2. **Populate `layers` & `partial`** via backfill script (ship with `stellopsctl migrate` wizard).
3. Policy YAML previously stored in Redis → copy to Mongo if persistence enabled.
4. Prepare `attestations` collection (empty) safe to create in advance.
---
##8Open Questions / Future Work
* How to deduplicate *identical* Rego policies differing only in whitespace?
* Embed *GOST 34.112018* digests when users enable Russian crypto suite?
* Should enterprise tiers share the same Redis quota keys or switch to JWT claim`tier != Free` bypass?
* Evaluate slidingwindow quota instead of strict daily reset.
* Consider ratelimit for `/layers/missing` to avoid bruteforce enumeration.
---
##9Change Log
| Date | Note |
|------------|--------------------------------------------------------------------------------|
| 20250714 | **Added:** `format`, `partial`, delta cache keys, YAML policy schema v1.0. |
| 20250712 | **Initial public draft** SBOM wrapper, Redis keyspace, audit collections. |
---